# Lab 6 – Sampling: Decision-Making Thresholds

A key focus of Chapter 10 is how to make inferences about populations based on samples. The essential logic lies in comparing a single instance of a statistic, such as a sample mean, to a distribution of such values. The comparison can lead to one of two conclusions – the sample statistic is either extreme or not extreme. But what are the thresholds for making this kind of judgment call (i.e., whether a value is extreme or not)? This activity explores that question.

The problem is this: You receive a sample containing the ages of 30 students. You are wondering whether this sample is a group of undergraduates (mean age = 20 years) or graduates (mean age = 25 years). To answer this question, you must compare the mean of the sample you receive to a distribution of means from the population. The following fragment of R code begins the solution:

set.seed(2) #this is to set seed. By doing so, the initiation point is always the same, not random.

sampleSize <- 30

# create normal distribution of 20000 observations with mean value 20 and standard deviation of 3 and set this as a student population

studentPop <- rnorm(20000,mean=20,sd=3)

#investigate studentPop now. How many rows? What are the values look like? Are they close to the mean value 20?

#create a sample of graduate students. Sample size is 30, mean is 25, standard deviation is 3. See the mean is 5 years older than the undergraduate sample apparently.

if (runif(1)>0.5) { testSample <- grads } else { testSample <- undergrads }

mean(testSample)

After you run this code, the variable “testSample” will contain either a sample of undergrads or a sample of grads. The line before last “flips a coin” by generating one value from a uniform distribution (by default the distribution covers 0 to 1) and comparing it to 0.5. The question you must answer with additional code is: Which is it, grad or undergrad?

1.    Annotate the code above with line-by-line commentary. To get full credit on this assignment, you must demonstrate a clear understanding of what the six lines of code actually do! You will have to look up the meaning of some commands.

2.    The next line of code should generate a list of sample means from the population called “studentPop.” Very similar code to accomplish this appears right in Chapter 7. How many sample means should you generate? You can create any number that you want – hundreds, thousands, whatever – but I suggest that you generate just 100 means for ease of inspection. That is a pretty small number, but it makes it easy to think about percentiles and ranks.

3.    Once you have your list of sample means generated from studentPop, now you need to compare mean (testSample) to that list of sample means and see where it falls.

4. Now use if else statement to figure out if the mean(testSample) is less than quantiles on thresholds 2.5% or greater than quantiles on thresholds 97.5%. If the mean(testSample) is in that range, then it can be defined as extreme. Otherwise it is not extreme. Your code should end with a print() statement that could say either, “Sample mean is extreme,” or “Sample mean is not extreme.”

·         Hint: it may look like below. Figure out what should be written in XXX, XXXX, or XXXXX.

·         if (mean(XXX) < quantile(XXX, probs=0.025) | mean(XXX) > quantile(XXX, probs=XXXX)) {XXXXX} else {XXXXX}

·         Is it in the middle of the pack? Far out toward one end? Here is one hint that will help you: In Chapter 7, the quantile() command is used to generate percentiles based on thresholds of 2.5% and 97.5%. Those are the thresholds we want, and the quantile() command will help you create them.

5. Please submit both the output of your runs and the R code.