Topic: |
The
Center for Disease Control and Prevention (CDC) uses the social vulnerability
index (SVI) to evaluate the impact of disasters on communities, weighting the
damage with social factors in the states of Washington and Idaho. |
Problem: |
The
data consolidated by the CDC is used to determine the most vulnerable areas
should a disaster occur. In a perfect world, the indicators of vulnerability
would represent the people correctly. Currently, this far-from-perfect method
is the best that has been developed. There may be indicators that are not
adequately predictive of social vulnerability. |
Question 1: What relationships exist in the
states of Washington
and Idaho between the socioeconomic indicators, household, and
composition indicators, disability indicators, and social vulnerability when
using the data consolidated by the CDC (2018a)?
Question 2: What
indicators in the states of Washington and
Idaho between the socioeconomic
indicators, household, and composition indicators, disability indicators have
the most influence in predicting social vulnerability when using the data
consolidated by the CDC (2018a)?
Data:
• The data and data dictionaries
are online.
o Center for Disease Control and
Prevention. (2018a). Social vulnerability
index [data set]. https://svi.cdc.gov/Documents/Data/2018_SVI_Data/CSV/SVI2018_US.csv
o Center for Disease Control and
Prevention. (2018b). Social vulnerability
index [code book]. https://svi.cdc.gov/Documents/Data/2018_SVI_Data/SVI2018Documentation.pdf
o Note: The raw data must be this report
in its original form when it enters the R script file. Use the data dictionary
to understand the data.
• Create a subset of the data to
represent the sample of secondary data in this analysis.
o The SVI index’s variable name is
o
RPL_THEMES, in column 99 o Socioeconomic
• Persons below the poverty
estimate
▪ Civilian unemployed estimate
▪ Per capita income estimate
▪ Persons with no high school o
diploma
Household
and composition disability features
▪ Ages 65 and older
▪ Ages 17 and under
▪ Persons with a disability, over
the age of 5
▪ Single-parent households
The
state field
Note:
Do not use more
than one indicator for each measure defined in this section.
Variable
names preceded with “E_” are actual
measures, while “M_” represents the margin of error estimates.
Other
prefixes are follow-on calculations or qualitative information, do not include
variables that are not identified in the research questions, as listed in the
data section.
Do not include the margin of error
estimates at this time.
Considering
the research questions, after subsetting, there will be 10 variables used in
this analysis.
Data Cleaning:
• Do not remove missing values
during cleaning. If missing values need to be removed for analysis method, do
it during the preparation for analysis. A code represents missing values. Use
the data dictionary to understand the data sample and how missing values are
represented.
• When changing an object or part
of an object, validate the change that occurred as expected.
• The steps that are taken in cleaning
are not discussed in the research paper.
• There is a code that represents
missing values; ensure this is found in the data dictionary! These values will
have to be recoded as NA.
Analyze:
• Conduct two types of analysis:
visual analysis to identify relationships and a random forest model to identify
influential indicators in predicting the social vulnerability.
• The sub-stages of Analyze are necessary at least two
times; profile, prepare, and apply. This method is for programming, not documenting research.
• During the visual analysis, only
present meaningful visuals to understand what the relationships exist between
the indicators for the social vulnerability index.
• Ensure you establish that the
model is valid and reliable before discussing the influential indicators.
• Also, create a random forest
model for each state that is assigned. Ensure that this analysis is within the
scope of the research.
Documenting research:
Results, Impact of the Results:
• Ensure that assertions and
assessments in the results and discussion sections are derived from the
analysis in R.
• Do not speculate. Use evidence.
When documenting the results, consider the generalizability.
Future Recommendations:
• Include recommendations for
future analysis, based on the research in R.
• An example might look something
like this:
o
An opportunity for further
research, based on gaps found in the random forest modeling, is to look at the
ability to tune the parameters further, to improve the performance in
predicting the
o
Additionally, an opportunity for
future research is exploration modeling to determine what other variables, when
eliminated, have little or no impact on the ability to predict the SVI based on
the supporting characteristics in the data.
Please provide code with comments.
Get Free Quote!
432 Experts Online