## Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx) right justified and then Data Analysis Assignment #1 centered on the top of page 1 below your name the begin your document.

### statistics

##### Description

1. Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx) right justified and then Data Analysis Assignment #1 centered on the top of page 1 below your name the begin your document.
3. Your document should include the ANSWERS ONLY with each answer labeled by its corresponding number and subpart. Keep the answers in order. Do not include the questions in your submitted document.
4. Generate all requested graphs and tables using StatCrunch.
6. You may not work with other individuals on this assignment.  It is an honor code violation if you do.

Elements of good technical writing:

Use complete and coherent sentences to answer the questions.

Graphs must be appropriately titled and should refer to the context of the question.

Graphical displays must include labels with units if appropriate for each axis.

Units should always be included when referring to numerical values.

When making a comparison you must use comparative language, such as “greater than”, “less than”, or “about the same as.”

Ensure that all graphs and tables appear on one page and are not split across two pages.

Type all mathematical calculations when directed to compute an answer ‘by-hand.’

Pictures of actual handwritten work are not accepted on this assignment.

When writing mathematical expressions into your document you may use either an equation editor or common shortcuts such as:  can be written as sqrt(x), can be written as p-hat, can be written as x-bar.

Problem 1: Ramen Noodle Ratings

a)      Use StatCrunch to create a one-way table for the variable “Country/Region” using both counts and percentages.  Select Stat à Tables à Frequency.  Select “Country/Region” in the Select Column(s) box, and select both ‘Frequency’ and ‘Percent of total’ in the Statistic(s) box by holding down the Ctrl Key (Command Key on Macs) when making these selections. Copy your table into your document and then manually round the values in the ‘Percent of total” column to two decimal places in the StatCrunch table that you have copied into your document.

b)      Interpret your findings from the table in 1(a) by identifying the country/region with the largest and smallest percentage.  Use complete sentences with context and include the country/region and percentage in the sentences.

c)      Use StatCrunch to generate a two-way table for the variables “Country/Region” and “Style”.  Go to Stat à Tables à Contingency à With Data (since you have the raw data in StatCrunch).  Select “Country/Region” as your row variable and “Style” as your column variable.  Lastly, unclick (or deselect) “Chi-Square test for independence” since it is highlighted by default by holding the Ctrl key and clicking on it.  Copy your table into your document.

d)      Calculate the combination of variables with the largest and smallest percent of the total (2095) in regards to both “Country/Region” and “Style.”  Provide both the label and the percentage and answer this question in two complete sentences.

e)      What values are the same when looking at both your one-way table and your two-way table?  Be specific if referencing rows or columns.

f)       Now, create two more two-way tables keeping “Country/Region” as your row variable and “Style” as your column variable.  One table needs to include row percentages and the other needs to include column percentages.  To do this, select row percent in the display box to for the first table and column percent for the second table.  Include both tables in your document.

g)      Specifically interpret the meaning of the row percentage found in the “Japan” and “Pack” cell.  Note that there are 225 observations in that cell.

h)      Now, specifically interpret the meaning of the column percentage found in the “Japan” and “Pack” cell.  Note that there are 225 observations in that cell.

Problem 2: Five Star Ramen Ratings

Let us analyze the distribution of countries of origin of the ramen noodles that received five star ratings.  Use the “Five Star Ramen Ratings” data set posted in our StatCrunch group to answer the following questions.

a)      Using the variable named “Country/Region”, produce a relative frequency bar chart using Graph à Bar Plot à With Data.  Please properly label axes and provide a meaningful title and copy it into your document.

b)      Using the variable “Country/Region”, produce a relative frequency Pareto chart.  Begin with your bar chart, and edit it by changing “Order by” to Count Descending.  Properly title and label your graph and copy it into your document.

c)      Using the variable “Country/Region”, produce a Pie Chart using Graph à Pie Chart à With Data.  Add an appropriate title and copy this entire graph including the legend into your document.

d)      Use the three graphs to answer the question:  Which country/region of origin earned the most five star ratings?  Present both the count and the proportion and write your answer in one sentence.

e)      Now produce a grouped relative frequency bar chart (to copy to your document) by following the directions below.  Continue to use the Five Star Ramen Ratings data set.

Go to Graph à Bar Plot à With Data.

For this grouped bar chart, graph the variable “Style” and group by “Country/Region.”  To “group by” click the arrow next to Group by box (the third box down) and select the variable you are asked to group by.  In the Type box (5th box down from the top) choose relative frequency within category.  Title this graph clearly.  You may keep the default labels for the x and y-axis.

f)       For this next grouped bar chart, graph the variable “Country/Region” and group by “Style.”  In the Type box (5th box down from the top) choose relative frequency within category.  Title this graph clearly.  You may keep the default labels for the x and y-axis.

g)      The two graphs you made in 2(e) and 2(f) are another representation of row and column percentage two-way tables.  If we consider the row variable to be “Country/Region” and the column variable to be “Style,” which graph would correspond to a row percentage two-way table and which graph would correspond to the column percentage two-way table?  Answer in two complete sentences. You may create these two-way tables for this data set to help you answer the questions, but the tables do not need to be copied into your solutions document.

See next page for Problem 3

Problem 3: State Population Information

A number of variables concerning the population of the states of the United States are presented in the “State Population Information” data set posted in StatCrunch.  The “Density” variable is measured as people per square mile.

a)      Create a relative frequency histogram for the variable “Density” by using Graph à Histogram.  Properly title and label your graph and copy it into your document.

b)      Interpret the shape of this distribution in one complete sentence.

c)      Use StatCrunch to obtain the sample size (n), mean, and standard deviation for the “Density” variable by using Stat à Summary Stats à Columns.  Note: in the Statistics box, select the summary statistics listed above in the exact order given.  Copy the entire table into your document and manually round each value to two decimal places when necessary.

d)      Use StatCrunch to obtain the five number summary and the IQR for the “Density” variable (the five number summary includes Min, Q1, Median, Q3, Max).  Go to Stat à Summary Stats à Columns to obtain these values.  Note: in the Statistics box, select the summary statistics listed above in the exact order given.  Copy the entire table into your document.

e)      Choose the appropriate summary statistics for center and spread (presented in either 3(c) or 3(d)) based on your stated shape of the distribution in 3(b).

f)       Use your summary statistics from part 3(d) and determine the fences used to mathematically identify outliers for the “Density” variable.  To do this, type all steps in your calculations manually, including how you obtained the upper and lower fences.

g)      Construct a horizontally oriented boxplot of the “Density” variable by using Graph à Boxplot.  To do this, click the “Draw boxes horizontally” box.  Properly title and label and copy this graph into your document.

h)      How many outliers do you identify (please use both the boxplot and your results from 3(f) to answer this question)?  Hint: you can sort your data by “Density” (using Data à Sort in StatCrunch) to get the correct number or hover your cursor over the boxplot.  Write your response in a complete sentence.

i)       List the three largest outliers in one complete sentence.  Include the state and the actual density measurement in your answer.

See next page for Problem 4

Problem 4: Student-Teacher Ratio among Private and Public Universities

Data were collected from 197 Mid-Atlantic Universities.  The data set “Mid-Atlantic University Data” presents many variables describing these universities.

a)      Construct a frequency histogram on the “Student-Teacher Ratio” variable.  Properly title and label your graph and keep the default binning.  Copy and paste this graph into your document.

b)      Describe the shape of the distribution of the “Student-Teacher Ratio” variable in context in one sentence.

c)      In StatCrunch, starting with the histogram you constructed in part (a), click on Options in the top left corner and select Edit.  From there, under bins, enter 2 for the width.  Copy and paste this graph into your document.

d)      Does this image change your description of the shape of the distribution?  Write another one sentence description of the shape of this overall distribution.  (Note: do not change your initial comments in part (b) after seeing the image in part (c).)

e)      Now construct two separate frequency histograms for the “Student-Teacher Ratio” variable – one for each type of university (Public and Private).  Use the “Group by:” option when building the histogram and select “Public/Private”.  Properly title and label your graphs.  Below the titling area, under “For multiple graphs” change Columns per page from 1 to 2 and select the “Use same X-axis” and “Use same Y-axis” options.  Finally, click Compute!  Copy and paste your graphs into your document.

f)       Write a sentence describing the shape of the distribution of Student-Teacher Ratio for each type of university.

g)      Based on your histograms in part (e), generally compare the centers and spreads of each distribution (use comparative language).  Answer in two sentences.

h)      Use StatCrunch to obtain sample size (n), the mean, and standard deviation of the “Student-Teacher Ratio” variable by “Public/Private” (using “Group by:”).  Copy and paste the table into your document.  Round your answers to whole numbers in your document.

For parts 4(i)-4(k), determine how well the Empirical Rule does in predicting the percentage of observations within some number of standard deviations of the mean.

i)       Use your rounded summary statistics for Private from part 4(h) to calculate the interval corresponding to one, two, and three standard deviations about the mean Student-Teacher ratio.  Type your work showing how you obtained these intervals.  Clearly label and list these three intervals in your document as shown below:

68% interval    (lower value, upper value)

95% interval    (lower value, upper value)

99.7% interval (lower value, upper value)

j)       Use StatCrunch to determine the count and percentage of observations falling in each of these intervals by following the instructions listed below or using another appropriate counting method.  Properly label and list these counts and percentages in your document.

Go to Data à Row Selection à Interactive Tools.  In the Slider selectors box, click the variable “Student-Teacher Ratio” into the variable box.  In the Category selectors box, click the variable “Public/Private” into the variable box. Click Compute! to open the tool.

The box that appears has a slider under the words Student-Teacher Ratio that allows you to create ranges of scores that you determined in 4(i).  Under Public/Private, you can select whether you want to use both university types (here select only Private).  Use the slider to obtain the count for each interval from 4(i) by looking at the “# rows selected” presented in the first line of the box.  Calculate the percentages from the counts you obtained for each interval and include them in your document.

k)      Do each of the three percentages found in part 4(j) match what the Empirical Rule predicts?  Compare your results in 4(j) with the expected percentage stated in the Empirical Rule.  State your answer in three sentences (one sentence for each comparison).

l)       Suppose a new private student-teacher ratio value of 20 was examined.  Use your rounded summary statistics from 4(h), calculate the z-score of this value and explain in a complete sentence what this z-score indicates.

m)   Suppose a public student-teacher ratio value of 20 was examined.  Use your rounded summary statistics from 4(h), calculate the z-score of this value and explain in a complete sentence what this z-score indicates.

n)      Is the private value from 4(l) or the public value from 4(m) a larger value in general?  How can you tell?  Compare the z-scores to answer this question in a sentence.