## Grading rubric: For each question you will be given 1 point for complete credit, ½ point for partial credit, and 0 points for no credit.

### statistics

##### Description

Instructions

Submit your solutions in pdf format to the dropbox on the canvas page by 5:00PM, Wednesday January 22. You may use any program to generate your pdf file. (RStudio is recommended but not required.)

Grading rubric: For each question you will be given 1 point for complete credit, ½ point for partial credit, and 0 points for no credit. Assignment of credit will be based on the correctness of your answers as well as your reasoning (when requested as part of the question). R code and/or computer output including tables and graphs are not required and will be evaluated only when requested as part of a question.

You may work together to help each other solve problems, but you should create your own solutions and hand in your own work without copying others’ work.

Questions 1 – 7 are based on the ‘iq.csv’ data set (see Exercise 1).

1. Test the null hypothesis that the mean IQ score in the community is equal to 100 using the 2-sided 1-sample t-test with a significance level of 0.05. State the value of the test statistic and whether or not you reject the null hypothesis at significance level 0.05.

2. Give the p-value for the test in Q1. State the interpretation of the p-value.

3. Compute a 95% confidence interval for the mean IQ.  Do the confidence interval and hypothesis test give results that agree or conflict with each other? Explain.

4.  Repeat the hypothesis test and confidence interval using a significance level of 0.01 and a 99% confidence interval.

5. Perform a simulation study to assess the type I error probability of the test conducted in Q1. For the simulation, generate samples of IQ scores with sample size 124 from the normal distribution with mean 100 and SD 15. Report the observed type I error based on your simulation and comment on how well it agrees with theory.

6. Perform a simulation study to estimate the power of the test to detect an alternative mean value for the mean IQ equal to 95. Generate samples of size 124 from a normal distribution with SD equal to 15.

7. Perform a simulation study to estimate the coverage probability of the 95% confidence interval for mean IQ based on a sample size of 124. For the simulation, generate samples of IQ scores with sample size 124 using the normal distribution with mean 100 and SD 15.

Description for Q 8 – 17:

A researcher is interested in measurements of a pollutant in water samples. In particular, there is a question about whether the value changes if the sample is tested when it is older compared with being tested right after it is collected. The researcher does not know whether aging could increase or decrease the pollutant concentration.  A set of 15 samples of water were taken from a lake. Each sample was divided into 2 aliquots, one to be analysed right away and the other to be analysed 1 month later. The difference between pollutant concentrations was recorded for each of the samples. The values obtained for the differences (fresh sample - aged sample), arranged from smallest to largest, were as follows: -5, -2, -1, -1, 0, 0, 2, 3, 4, 4, 5, 5, 6, 6, 11.

8. State the null hypothesis and alternative hypothesis in words.

9. Perform a test of the null hypothesis with type I error probability 0.05. State whether or not you would reject the null hypothesis and provide the p-value for the test.

10. Calculate a 95% confidence interval for the mean difference in concentration between fresh and aged samples. Compare with the results of the hypothesis test. Do the confidence interval and hypothesis test give the same conclusions?

11. Suppose that it was determined that the last data value (11) was an error due to failure of the measuring equipment. Re-run the test and confidence interval with this value excluded. How did the results change?

For Q12 and Q13, conduct a simulation study to assess the performance of the hypothesis testing procedure from Q9 and confidence interval from Q10. Assume that the distribution of the difference in pollutant measurements is normal with mean 0 and SD 4.

12. Estimate the type I error probability of the test.

13. Estimate the coverage probability of the confidence interval.

For Q14 and Q15, you will assess the performance of the hypothesis testing procedure from Q9 and confidence interval from Q10 under a different assumption about the distribution for the difference in pollutant measurements. In this case, assume the distribution is a t-distribution with 3 degrees of freedom. Note: the t-distribution is useful for modeling distributions that may have heavier tails than the normal.

14. Estimate the type I error probability of the test under sampling from the t distribution with 3 df.

15. Estimate the coverage probability of the confidence interval under sampling from the t distribution with 3 df.

16. Compare results from the two simulation studies. Explain how they differ.

17. Repeat the simulation study for Q14 and Q15 with a sample size of 200. Explain how the results differ.