## The dataset "HW5.sav" contains a random sample of 113 hospitals (Data from Kutner, et al., 2005). Use the data on STAY, AGE, INFRISK, and XRAY to answer the following questions.

### statistics

##### Description

Exercise:  Diagnostics

Problem I: (For Chapter 4 & Chapter 10 in the Textbook)

The dataset "HW5.sav" contains a random sample of 113 hospitals (Data from Kutner, et al., 2005). Use the data on STAY, AGE, INFRISK, and XRAY to answer the following questions. Please include relevant analysis outputs from the statistical software (SPSS) you used and show your work process when applicable.

1.      Fit the regression model of predicting STAY from AGE, INFRISK, andXRAY. Save the following statistics: unstandardized predicted values, unstandardized residuals, studentized deleted residuals, leverage values, Mahalanobis distance, Cook' s distance, standardized DfFit, and standardized DfBetas.

a)        Check the normality of the unstandardized residuals using both plot (histogram or Q-Q plot) and test. Does there appear to be any outliers? If yes, write down the case number(s).

b)        Examine the studentized deleted residuals (e.g., index plot). Are there any cases that you would identify as outliers? State the criterion you use to make the decision. Write down the case number(s). Are they consistent with the results from a)?

c)        Examine the centered leverage values (e.g., index plot). Are there any cases that you would identify as outliers? State the criterion you use to make the decision. Write down the case number(s). Are they consistent with the results from a) orb)? Explain.

d)        Based on standardized DfFit statistic, what case(s) will you identify as outliers? State the criterion you use to make the decision. Write down the case number and the standardized DfFit value for the outlier(s). How do you interpret a standardized DfFit value?

2.          Delete the outliers you identified in la) and fit the regression model again (STAY= AGE+ INFRISK + XRAY). Ask for Durbin-Watson' s statistic, collinearity statistics, and save the following statistics: standardized predicted values and standardized residuals. Obtain the Q plot of standardized residuals, and plot the standardized residuals against the standardized predicted values. Fit a linear line and a lowess curve to the scattered points.

a)      Comment on the level of satisfaction of the following assumptions based on the information obtained. Be sure to specify the criterion you use to make the decision.

1.             Normality

11.     Linearity

111.      Homoscedasticity 1v.           Independence

b)     Report the VIF value for each independent variable. Interpret the VIF value for XRAY

c)     Determine whether multicollinearity is likely a problem. Consider all relevant statistics and state the criteria you use to make the decision. If it is a problem, recommend a solution.

Problem II: (For Chapter 11 in the Textbook)

The dataset "HW5MI.sav" contains item responses for 5 Likert-type items from a random sample of 125 participants (missing data code= 999). Screen the data for missingness and answer the following questions. Please include relevant analysis outputs from the statistical software(s) you used.

1.      Report the total percentage of data missing. What would be the sample size if listwise deletion was used to handle missing data?

2.      Briefly describe the patterns of missing data (e.g., number missing 0, 1, 2, ... , all items).

3.      Conduct Little ' s MCAR test. Is missing completely at random? If not, what observed variable(s) are related to missingness?