your solutions in pdf format to the
dropbox on the canvas page by 5:00PM,
Wednesday January 22. You may use any program to generate your pdf file.
(RStudio is recommended but not required.)
rubric: For each question you will be given 1 point for complete credit, ½
point for partial credit, and 0 points for no credit. Assignment of credit will
be based on the correctness of your answers as well as your reasoning (when requested
as part of the question). R code and/or computer output including tables and
graphs are not required and will be evaluated only when requested as part of a
may work together to help each other solve problems, but you should create your
own solutions and hand in your own work without copying others’ work.
Questions 1 – 7 are based on the ‘iq.csv’ data set (see Exercise 1).
1. Test the null hypothesis
that the mean IQ score in the community is equal to 100 using the 2-sided
1-sample t-test with a significance level of 0.05. State the value of the test
statistic and whether or not you reject the null hypothesis at significance
2. Give the p-value for the
test in Q1. State the interpretation of the p-value.
3. Compute a 95% confidence
interval for the mean IQ. Do the
confidence interval and hypothesis test give results that agree or conflict
with each other? Explain.
4. Repeat the hypothesis test and confidence
interval using a significance level of 0.01 and a 99% confidence interval.
5. Perform a simulation study
to assess the type I error probability of the test conducted in Q1. For the simulation,
generate samples of IQ scores with sample size 124 from the normal distribution
with mean 100 and SD 15. Report the observed type I error based on your
simulation and comment on how well it agrees with theory.
6. Perform a simulation study
to estimate the power of the test to detect an alternative mean value for the
mean IQ equal to 95. Generate samples of size 124 from a normal distribution
with SD equal to 15.
7. Perform a simulation study
to estimate the coverage probability of the 95% confidence interval for mean IQ
based on a sample size of 124. For the simulation, generate samples of IQ
scores with sample size 124 using the normal distribution with mean 100 and SD
Description for Q 8 – 17:
A researcher is interested in
measurements of a pollutant in water samples. In particular, there is a
question about whether the value changes if the sample is tested when it is
older compared with being tested right after it is collected. The researcher
does not know whether aging could increase or decrease the pollutant
concentration. A set of 15 samples of
water were taken from a lake. Each sample was divided into 2 aliquots, one to
be analysed right away and the other to be analysed 1 month later. The
difference between pollutant concentrations was recorded for each of the
samples. The values obtained for the differences (fresh sample - aged sample),
arranged from smallest to largest, were as follows: -5, -2, -1, -1, 0, 0, 2, 3,
4, 4, 5, 5, 6, 6, 11.
8. State the null hypothesis
and alternative hypothesis in words.
9. Perform a test of the null
hypothesis with type I error probability 0.05. State whether or not you would
reject the null hypothesis and provide the p-value for the test.
10. Calculate a 95%
confidence interval for the mean difference in concentration between fresh and
aged samples. Compare with the results of the hypothesis test. Do the
confidence interval and hypothesis test give the same conclusions?
11. Suppose that it was
determined that the last data value (11) was an error due to failure of the
measuring equipment. Re-run the test and confidence interval with this value
excluded. How did the results change?
For Q12 and Q13, conduct a
simulation study to assess the performance of the hypothesis testing procedure
from Q9 and confidence interval from Q10. Assume that the distribution of the
difference in pollutant measurements is normal with mean 0 and SD 4.
12. Estimate the type I error
probability of the test.
13. Estimate the coverage
probability of the confidence interval.
For Q14 and Q15, you will
assess the performance of the hypothesis testing procedure from Q9 and
confidence interval from Q10 under a different assumption about the
distribution for the difference in pollutant measurements. In this case, assume
the distribution is a t-distribution with 3 degrees of freedom. Note: the
t-distribution is useful for modeling distributions that may have heavier tails
than the normal.
14. Estimate the type I error
probability of the test under sampling from the t distribution with 3 df.
15. Estimate the coverage
probability of the confidence interval under sampling from the t distribution
with 3 df.
16. Compare results from the
two simulation studies. Explain how they differ.
17. Repeat the simulation
study for Q14 and Q15 with a sample size of 200. Explain how the results