Analytical Methods I (ANLY
Project: Tell your data story!
We are surrounded by data. As future data analysts, you will be called upon to tell a story, answer questions, and make predictions from a data set. This is what you will do with this
Data set: Find a data set with
● 1 response variable
● At least
10 explanatory variables ● At least
EDA: Do some exploratory data analysis
tell an “interesting” story about
data. Instead of limiting yourself to relationships
between just two
variables, broaden the scope of your analysis
and employ creative
approaches that evaluate
variables while controlling for another one.
Inference: Come up with a
research question that can be answered with a hypothesis
test or a confidence interval. Your question could be used
to shed some light on your choice of the “best” linear model. Carry out
the appropriate inference task to answer your question.
multiple linear regression
model to explain your response variable
Prediction: Based on your model,
make a prediction using the predict function
in R. Also quantify the uncertainty
around this prediction.
A. Data set description
○ General description of data
○ Link to data set
○ List of explanatory variables
○ Size of data set
○ 4-6 pages @ 12 pt font size, arial or times
new roman fonttype,
○ Your report should
be organized with the following parts included and clearly labeled:
1. Introduction: a summary of the data set and yourgoal.
2. EDA: any univariate or bivariate
3. Inference: Answer the research question
you have posed using a hypothesis
test or a confidence interval.
4. The “Best” Model:
What is the “best”
linear model for predicting the
You do not need to explain
step you took to arrive
at this model, but
some indication of why you
chose the model you did. If you tried
a few different models, how did you
settle on one?
● How well does your model do? What is the percent variation explained?
● What does
model tell you about
and your response variable?
● What conditions
do you need for your analysis
hold? What are
the implications if some of those conditions
Using your best model, make a prediction about a future event from your response
Include a description of the
uncertainty of your prediction.
● What is
bottom line from your analysis?
● How well can
you predict your response variable? ● What are
the caveats to
● Does this
set lack information that you
liked to use?
details will be provided later
○ 15 minutes max
○ Live synchronous delivery
on Adobe connect with ALL team memberspresent
○ Scheduling instructions will
be provided later
Tips for your
report and presentation
This project is
an opportunity to apply
learned about descriptive statistics, graphical
methods, correlation and regression,
and hypothesis testing and confidence
is not to do an exhaustive
data analysis i.e., do not calculate every statistic and procedure you have learned
for every variable, but rather
to show that you
are proficient at using R at a basic level
and that you
are proficient at interpreting
and presenting the results.
You might consider critiquing your own method, such as issues pertaining to the
reliability of the data and the appropriateness of the statistical
used within the context of this specific data
the project will take into
the procedures and explanations correct? Presentation: What was the
quality of the
presentation and poster? Content/Critical
thought: Did your think carefully about the problem?
Your grade will
be roughly based on the following components:
30% presentation 25% code
10% team peer evaluations
` 5% data
description submitted onMoodle
Peer feedback: You
be asked to fill
out a questionnaire during the last executive session.