Homework

File germean_credit_data.xlsx
(source: Kaggle.com) contains data on default of some customers. Risk = Good
means no default, and Risk = Bad means default. Use this data to answer the
following questions:

1.
What is odds of default for this data? What does
that mean in Layman’s terms? Odd for the data is 50:50 (1/2); Odd means what
are the chances that an event will occur

2.
What is probability of default conditional on
being a home owner?

3.
Complete this sentence.

Being male, changes
odds of default by …

4.
Complete this sentence.

1 unit increase in duration, changes credit amount by …

5.
What is unconditional expected credit amount for
this sample?

In the next 7 questions, build a model to predict credit amount conditional
on Duration, Checking Account, Housing, Job, and Age. Use only observations
with Age > 60. Use “Mean Absolute Error” as optimization metric.

6.
Write down the optimization problem. Use in your answer.

7.
Write down the optimization problem for the
first two observations (ID = 0 and 8); i.e. replace Xs in formula of part 6
with actual values for the first two observations.

8.
Solve the model using Excel solver. Use 0 as
initial value for all coefficients.

9.
Solve the model using Golden Rule pf Beta.

10.
Calculate t-statistics for coefficients.

11.
Use coefficients that are significant at 5%
level of significance to predict credit amount for observations with Age <
21. Write down the final model in the form of replacing βs with your estimated
values. Calculate Mean Absolute Percentage Error for these observations?

12.
Using model of part 11 complete this sentence.
Compare this response with response in part 4. Why are they different? Which
one is more reliable?

1 unit increase in duration, changes
credit amount by …

In the next questions, build
a model to predict probability of default conditional on Duration,
Checking Account, Housing, Job, Age, and Credit Amount. Use cases with Age >
60 for train and cases with Age < 21 for test. Consider good as 0 and bad as
1.

13.
What is the problem with the above design to
define test and train samples?

14.
Write down the Log Likelihood function for the
first two observations (ID = 0 and 8)

15.
Solve the model with solver using 0 as initial
values.

16.
For model of part 15, use 0.5 as probability
threshold. What is the False positive rate in test sample (positive meaning
default)?

17.
Management intends to increase bank’s market
share in population with age < 21. Which one is more important with respect
to management’s goals, false positive or false negative? Explain.

18.
Using model of part 15 complete this sentence.
Compare this response with response in part 3. Why are they different? Which
one is more reliable?

Being male, changes
odds of default by …

19.
For the train sample, calculate GINI of split by
“house owner” vs. “rent or free”. Do we gain information by this split?

20.
Using model of part 15, what is the effect of 1
more unit of credit amount on odds of default?

Get Higher Grades Now

Tutors Online