What is odds of default for this data? What does that mean in Layman’s terms?

statistics

Description

Homework

 

File germean_credit_data.xlsx (source: Kaggle.com) contains data on default of some customers. Risk = Good means no default, and Risk = Bad means default. Use this data to answer the following questions:

1.       What is odds of default for this data? What does that mean in Layman’s terms? Odd for the data is 0.3 Odd means what are the chances that an event will occur

2.       What is probability of default conditional on being a home owner? 0.896

3.       Complete this sentence.

Being male, changes odds of default by 0.27

4.       Complete this sentence.

1 unit increase in duration, changes credit amount by 0.6

5.       What is unconditional expected credit amount for this sample? 3,271.26

 

In the next 7 questions, build  a model to predict credit amount conditional on Duration, Checking Account, Housing, Job, and Age. Use only observations with Age > 60. Use “Mean Absolute Error” as optimization metric.

6.       Write down the optimization problem. Use  in your answer.

7.       Write down the optimization problem for the first two observations (ID = 0 and 8); i.e. replace Xs in formula of part 6 with actual values for the first two observations.

8.       Solve the model using Excel solver. Use 0 as initial value for all coefficients.

9.       Solve the model using Golden Rule pf Beta.

10.   Calculate t-statistics for coefficients.

11.   Use coefficients that are significant at 5% level of significance to predict credit amount for observations with Age < 21. Write down the final model in the form of  replacing βs with your estimated values. Calculate Mean Absolute Percentage Error for these observations?

12.   Using model of part 11 complete this sentence. Compare this response with response in part 4. Why are they different? Which one is more reliable?

1 unit increase in duration, changes credit amount by …

 

In the next questions, build  a model to predict probability of default conditional on Duration, Checking Account, Housing, Job, Age, and Credit Amount. Use cases with Age > 60 for train and cases with Age < 21 for test. Consider good as 0 and bad as 1.

13.   What is the problem with the above design to define test and train samples?

14.   Write down the Log Likelihood function for the first two observations (ID = 0 and 8)

15.   Solve the model with solver using 0 as initial values.

16.   For model of part 15, use 0.5 as probability threshold. What is the False positive rate in test sample (positive meaning default)?

Management intends to increase bank’s market share in population with age < 21. Which one is more important with respect to management’s goals, false positive or false negative? Explain


Related Questions in statistics category