In class, we learned that the two-factor ANOVA can be a powerful test that allows us to test the influence of two separate factors on a variable of interest.

statistics

Description

Manipulating Spreadsheets and Two-factor ANOVAs

 

In class, we learned that the two-factor ANOVA can be a powerful test that allows us to test the influence of two separate factors on a variable of interest. We can assess the additive influences of each factor separately and we can also multiplicative influences by testing to see if the two factors are interdependent.

 

In class, we have been working with a data set assessing the influence of hormone treatment and sex on blood calcium levels of birds. Oftentimes, when collecting data, the spreadsheets will be filled out in a different way that is not intuitive to a statistical program like R. Because of this, it is important to understand how a data file should be formatted in order for R to correctly utilize the data set during the analysis. Many times, data sets will be constructed in a cell format where all of the observations within a treatment group are lumped together. Something like this (this table is provided in the homework folder for this week and is labeled as “bloodcalcium.xlsx”:

 

No Hormone Treatment

 

Hormone Treatment

Female

Male

Female

Male

16.3

15.3

38.1

34

20.4

17.4

26.2

22.8

12.4

10.9

32.3

27.8

15.8

10.3

35.8

25

9.5

6.7

 

30.2

29.3

 

This format style allows the data collector to reduce the amount of information that needs to be repeated during each observation. While this is convenient at first, it is important understand that R, and all other statistical software, requires redundancy of information for correct analyses. That means that we need to adjust this dataset so that R can interpret the results. Essentially, we need to reduce the number of columns as much as possible while increasing the number of rows as a side effect. How can we simplify the information in this table? R requires that the dependent variable (values within the table) all be in a single column. Your first objective is to convert “bloodcalcium.xlsx” into a format that is usable by R. I want you to merge this information (within excel) into 3 columns labeled “calcium”, “treatment”, and “sex”. Rename the correctly formatted file “bloodcalcium.csv”. Remember, it is currently in a .xlsx format and the new file but be saved as a .csv for R to read it correctly.

 

On the next page you will find an example of what the .csv file should look like.

 


Related Questions in statistics category