Each question is associated with a domain or topic, which helps students know how they performed in each of the five domain categories and where they should focus their studying to improve their overall score.

statistics

Description

Assignment Requirements

You will be looking at 99 student responses from two different variations of an exam (Forms A and B). Each question is associated with a domain or topic, which helps students know how they performed in each of the five domain categories and where they should focus their studying to improve their overall score.


Here are the four raw data files you will need to download and import into SAS:

FormA.csv: The Form A exam responses for 50 students. There are 150 questions on the exam (Q1-Q150). The first row in your SAS table (beginning with AAAAKEY) is the answer key. 

FormB.csv: Same as above except for Form B of the exam. There are 49 students who took Form B.

Domains FormA.csv: Each question is associated with a category or domain. This helps students know how they performed in each of the five areas of expertise and where they should focus their study to improve their overall score. For the assignment, you can reference the domains by their number instead of domain name.

Domains FormB.csv: Same as above except for Form B.

Click on the file name and select “Download” at the top-right corner of the screen. You cannot alter the original files to make it easier to code the project!  Upload the files into SAS and create four SAS tables. 


Create two tables (one for each Form) that display whether students got each question correct (using 0's and 1's). The example is similar enough to help you understand what you need to do, but it will not be the exact same code used in the project. For example, you should use PROC IMPORT to import the data instead of the DATA step.


Here is an example of what the tables will look like. Notice these tables are in “long” format. On all example tables, your variable names and column order may be different.


Additional Hints: 

The table created for Form A will have 7,500 rows/observations (50 students x 150 questions). 

Form B will have 7,350 rows (49 students x 150 questions).


Calculating scores: To calculate the scores for each student, you will need to use arrays. Start by creating two arrays (think about which should be numeric or character!):

-One for the answer key

-One for the student responses


You’ll populate these arrays with the exam data you imported. Then compute the scores using DO loops and IF/THEN statements. Remember to OUTPUT the scores in the DO loop!


While it is not required, you might consider creating permanent SAS tables if you aren’t coding your entire project in one SAS session. Then you won’t need to recreate the tables every time you start a new SAS session.


Merge the Form A scores table you created in Step 1 with the domain number (or area of expertise) associated with each question contained in Domains FormA.csv. Merge them by question number since it’s the only variable the two tables have in common. Repeat for Form B scores and Domains FormB.csv. Look back at Lesson 7 to refresh your memory on merging tables. 


An example of the merged tables can be found here.


Now use a macro variable to cut all your code in half thus far (i.e., importing the raw files and Steps 1 and 2). You only need one macro variable in the whole project to do this! The macro variable will be assigned the values A and B. This will require you to run your code twice. The first time will import the files and create all the tables for Form A in Steps 1 and 2. Then reassign the macro variable to B and run the same code to create the tables for Form B. Refer to Homework 5 for guidance on using macro variables to create multiple tables with the same code.


You might want to use “Ctrl + F” to look for every instance of “A” so you don’t miss any!! Once you create your SAS tables for Form A, verify your tables are still correct. Change the macro assignment and run it again to create the tables for Form B. Verify again that you calculated the scores correctly for Form B.


Remember that macro variables aren’t about improving computing speed but maintainability, re-usability, and readability. Using a macro variable should cut all your code in half (up through Step 2). If you use a macro variable but your code repeats for Form A and B, there is no point in having a macro variable at all. Imagine if you were given four versions of the exam. Would you want your code to double again? What if there were ten versions of the exam? A macro variable allows you to expand your code to n versions of the exam.


Combine the two tables for Forms A and B created in Step 2 into one table (NOT merging, just concatenating the two tables together into one table with 7,500+7,350 = 14,850 rows). See Homework 4 for guidance on combining tables.


Calculate the total score and percent for each student. Also calculate the student's score and percentage for each of the five domain categories.


Hints:

-PROC MEANS can output both of these (by student and by domain number for each student) in a single table.

-Since these are all 1's and 0's, the mean is the same as the percentage.

-Use the ID statement in PROC MEANS in order to retain which exam form the student used. ID variables are stored in the output table. If you only look at the results tab from Proc Means, the ID variables will not be displayed.


Sort the table created in Step 4 by Student. Make sure Student is numeric in order to sort correctly (also note you need a length of at least 3 since there are 100 students). Do not include rows where Student is missing because those are overall percentages not needed in the reports. You should have 594 rows in this table. While column order doesn’t matter, the row order needs to be identical to the example to correctly answer the quiz questions. This will also make coding Step 6 much easier.


Click here to see an example table. 


Using the table from Step 5, create another table where each row contains all of the student scores/percentages (converting the table to “wide” format). 


Click here to see an example table.


Please watch this example video before asking for help!! The code is similar to what you will do except the variable names and array will be different. If you are getting an error message saying your array is out of range, you need to check the length of your array. Hint: It should be a length of 12.


Create side-by-side boxplots of the five domains using student percentages as the response. Click here for an example plot (yours may look slightly different). Revisit the lesson on Arrays to see example code producing a side-by-side boxplot.


Using the table created in Step 3, calculate the percentage correct for each question. Keep this separated by Form. For example, Question 1 on Form A is different from Question 1 on Form B and should be different percentages.


Click here to see an example.


Create a final report with two sections as described below. Use ODS to output your results as a PDF. Be sure a single line from a table doesn’t wrap onto another line or second page. Remember you can change the page orientation if necessary.


Section A: Student Scores includes all the students together and has the following subsections:

A table sorted by student ID and includes the variables student ID, the exam form taken, overall score, overall percentage, domain scores and percentages (notice the order of the variables). You should have 14 columns in this table.

A table sorted by overall percentage (highest to lowest) and includes the variables student ID, the exam form taken, overall percentage, overall score, domain percentages and scores (notice the order of the variables).

The side-by-side boxplot created in Step 7.


Section B: Question Analysis includes information about the exam questions and has the following subsections:

A table sorted by exam form then by question number and includes the variables exam form, question number, and question percentage. (Notice the order of the variables.)

A table sorted by question percentage (easiest to hardest) and includes the variables question percentage, exam form, and question number. (Notice the order of the variables.)

An example of the final report is found here. The tables in the example are a small subset of what your actual tables will look like. You must include the tables in their entirety to receive full credit.

You need to submit TWO documents on Canvas:

The final report from Step 9 with Sections A and B as a PDF. Do not include extra tables or output that is not required.

The program summary. Double check to make sure it includes all your code. Sometimes SAS Studio will cut off the last few lines of code. If this happens, add a few blank lines at the end. If a big portion of your code is missing, use a different Internet browser to run SAS.


Before submitting your program summary, comment out all unnecessary PROC PRINT/CONTENTS steps. Those are for debugging purposes only and don’t need to be submitted. Otherwise your program summary might be 250+ pages long. The TAs will appreciate not having to flip through a document of that size. The only case where you may want to keep everything is if you didn’t finish the project (or have syntax errors) and expect to receive partial credit.


In order to receive full credit, double check the following:

A single line from a table should not wrap onto another line or a second page. For instance, in the first table of the student section, make sure all 14 variables for a given student fit on a single page.

All percentages should be formatted as nnn.n% (e.g., 49.6% or 100.0%).

A macro variable was used to import the raw data files and also for Steps 1 and 2 in order to reuse as much code as possible. If your code length didn’t reduce using the macro variable, you didn’t do it correctly.

Add appropriate titles and labels to your tables, graphs, and sections. Titles should be meaningful instead of “Section A: Report 1”. 

Tables should be neat and orderly. The column headings (variable names) should be easily identifiable and presentation worthy.

Use ODS to output the results.

Your code should be properly formatted and easy to read/follow.

Include comments explaining what your code is doing.

Instruction Files

Related Questions in statistics category