In this assignment, I hope that you will show that you can:
1. compute correlations, scatterplots, and regression tables using R
2. reach well-reasoned conclusions arising from correlations and scatterplots
3. demonstrate an understanding of the linear regression equation and its components
4. interpret correctly the standard error of prediction or estimate
Professor Schmedlap is pilot testing three new and improved versions of his already popular and lucrative Schmedlap Continuous Achievement Measure (SCAM). He has administered the old SCAM test and three new SCAM prototypes of his test to a sample of children. A partial listing of the three data sets is as follows:
SET 1 SET 2 SET 3
X Y X Y X Y
10.0 8.04 10.0 9.14 10.0 7.46
8.0 6.95 8.0 8.14 8.0 6.77
13.0 7.58 13.0 8.74 13.0 12.74
9.0 8.81 9.0 8.77 9.0 7.11
11.0 8.33 11.0 9.26 11.0 7.81
14.0 9.96 14.0 8.10 14.0 8.84
6.0 7.24 6.0 6.13 6.0 6.08
4.0 4.26 4.0 3.10 4.0 5.39
12.0 10.80 12.0 9.13 12.0 8.15
7.0 4.80 7.0 7.26 7.0 6.42
5.0 5.60 5.0 4.74 5.0 5.73
The X value in each data set is the score students received on the old SCAM. The y variable refers to the new version of the SCAM examined in each of the data sets.
1. Professor Schmedlap begins his data analysis by calculating the correlation between the old version and each new version of the SCAM. He reasons that a reasonably strong relationship should be present, if both versions are measuring the same underlying construct.
a) Calculate the correlation between the new and old versions of the SCAM in each of the data sets. (10 marks)
b) How much variance is shared between the new and old SCAM in each version? (5 marks).
c) Based solely on the correlative evidence, which new version would you say is best? Why? (10 marks).
2. Professor Schmedlap knows that the new versions of his test will yield slightly different scores because the scale has changed. He has a novel idea. He proceeds to calculate the intercept and regression coefficient for each data set. He reasons that the resultant linear regression equations can be used by practitioners to convert old SCAM scores to new SCAM scores. In this way, users of the new SCAM will be able to make comparisons for students who have scores on the old SCAM. He will, of course, sell these equations as an option to test users.
a) What is the linear equation for each data set? (10 marks).
b) What is the standard error of estimate for each equation? (10 marks)
c) Generally speaking, what factors effect the size of the standard error of estimate? (10 marks).
d) Using the linear equation generated for the first data set, what new SCAM scores would be predicted from old scam sores of 6, 7, and 10 (10 marks).
e) What is the predicted range for each of these scores at the 68% confidence level (10 marks).
3. For each data set, construct a scatter plot. What do the plots tell you about the relationship between the new and old SCAM in each data set? Now, which version of the New Scam would you say is best and why? (25 marks).