Exercise: Diagnostics
Problem I: (For Chapter 4 & Chapter 10 in the
Textbook)
The dataset "HW5.sav" contains a random sample of 113 hospitals (Data
from Kutner, et al., 2005). Use the data on STAY, AGE, INFRISK, and XRAY to answer the following questions.
Please include relevant analysis outputs
from the statistical software (SPSS) you used and show your work process when
applicable.
1.
Fit the regression
model of predicting STAY from AGE, INFRISK,
andXRAY. Save the following statistics: unstandardized predicted values, unstandardized residuals, studentized deleted residuals, leverage values, Mahalanobis distance, Cook' s
distance, standardized DfFit,
and standardized DfBetas.
a)
Check the normality of the unstandardized residuals using both plot (histogram or Q-Q plot) and test. Does there
appear to be any outliers? If yes, write
down the case number(s).
b)
Examine the
studentized deleted residuals (e.g., index
plot). Are there any cases that you would identify as outliers? State the
criterion you use to make the decision. Write down the case number(s). Are they
consistent with the results from a)?
c)
Examine the centered
leverage values (e.g., index plot). Are there any cases that you would identify
as outliers? State the criterion you use to make the decision. Write down the
case number(s). Are they consistent with the results from a) orb)? Explain.
d)
Based on standardized
DfFit statistic, what case(s) will you identify as outliers? State the
criterion you use to make the decision. Write down the case number and the
standardized DfFit value for the outlier(s). How do you interpret a
standardized DfFit value?
2.
Delete the outliers
you identified in la) and fit the regression model again (STAY= AGE+ INFRISK + XRAY). Ask for Durbin-Watson'
s statistic, collinearity
statistics, and save the following statistics: standardized predicted values and standardized residuals. Obtain the Q Q plot of standardized residuals, and plot the standardized residuals
against the standardized predicted values. Fit a
linear line and a lowess curve to the scattered points.
a) Comment on the level of satisfaction of the following
assumptions based on the information obtained. Be sure to specify the criterion
you use to make the decision.
1.
Normality
11. Linearity
111. Homoscedasticity 1v. Independence
b) Report the VIF value for each independent variable.
Interpret the VIF value for XRAY
c) Determine whether multicollinearity is likely a problem. Consider
all relevant statistics and state the criteria you use to make the decision. If it is a problem,
recommend a solution.
Problem II: (For Chapter 11 in the Textbook)
The dataset "HW5MI.sav" contains item responses for 5 Likert-type items
from a random sample of 125 participants (missing data code= 999). Screen the
data for missingness and answer the following questions. Please include relevant analysis outputs from the statistical
software(s) you used.
1.
Report the total percentage of data missing.
What would be the sample
size if listwise deletion was used to handle
missing data?
2. Briefly describe the patterns of missing data (e.g., number missing 0, 1, 2, ... , all items).
3.
Conduct Little
' s MCAR test. Is missing completely at random? If not, what observed variable(s) are related to missingness?
Get Free Quote!
415 Experts Online