My dependent variables for the final project will be suicides per 100k persons.

computer science

Description

Introduction Research Idea

My dependent variables for the final project will be suicides per 100k persons. I will use suicides per 100k persons rather than total suicides because this will help control for the varying populations among different countries. I plan to investigate a variety of independent variables that influence suicide rate: age, gender, Human development index (HDI), and GDP per capita. Additionally, I would like to see how the impact of these variables changes across country of residence and how the impact of these variables has changed from 1985-2016.

The data set

The data set contains time series data about suicides from 1984 to 2016 with information about country, gdp per capita, Human Development Index (HDI), age, gender, etc.

Description

I would like to use this data set to understand how certain factors influence suicide rates across different countries and time. I plan to do this by creating multiple regression models that regress different combinations of independent variables in the data set to try to discover accurate risk factors that contribute to the suicide rate. This dataset contains data with age groupings, generational data, raw number of suicides and suicides per 100k persons, population, HDI and gdp_per_capita values for over 100 countries from 1984 to 2016.

Source

The data set was downloaded from Kaggle.com, which aggregated a variety of sources to produce this data set. These references are:

  1. United Nations Development Program. (2018). Human development index (HDI). Retrieved from http://hdr.undp.org/en/indicators/137506

  2. World Bank. (2018). World development indicators: GDP (current US$) by country:1985 to 2016. Retrieved from http://databank.worldbank.org/data/source/world-development-indicators#

  3. [Szamil]. (2017). Suicide in the Twenty-First Century [dataset]. Retrieved from https://www.kaggle.com/szamil/suicide-in-the-twenty-first-century/notebook

  4. World Health Organization. (2018). Suicide prevention. Retrieved from http://www.who.int/mental_health/suicide-prevention/en/

# loads dataset into Rmarkdown
suicidedata <- read.csv("~/Desktop/Statistics Lab/suicidedata.csv")

Motivation: Graphs and other descriptive statistics

Data Manipulation

# rename columns 10 and 11 with variable names that
# are easy to workwith
names(suicidedata)[10] <- "gdp_for_year"
names(suicidedata)[11] <- "gdp_per_capita"

# remove observations from the year 2016 because
# these observations are not from a full year of
# data in the dataset.
suicidedata <- suicidedata[!(suicidedata$year == 2016), 
    ]

# sets the threshold for scientific notation on
# axis higher.
options(scipen = 3)
by_year <- group_by(suicidedata, year)

# levels(suicidedata$age) needed to relevel factors
# using this code below. It is commented because
# rerunning this code would continue to change the
# order.
suicidedata$age = factor(suicidedata$age, levels(suicidedata$age)[c(4, 
    1, 2, 3, 5, 6)])

# group total suicides by gender and create
# percentages of male/female suicides
by_sex <- suicidedata %>% group_by(sex)
by_sex <- by_sex %>% summarise(suicides_no = sum(suicides_no)/sum(suicidedata$suicides_no))

Summary Statistics

# stargazer table to show summary statistics of
# dataset
stargazer(suicidedata, type = "html", digits = 1)
StatisticNMeanSt. Dev.MinPctl(25)Pctl(75)Max
year27,6602,001.28.41,9851,9942,0082,015
suicides_no27,660243.4904.50313222,338
population27,6601,850,689.03,920,658.027897,535.21,491,041.043,805,214
suicides.100k.pop27,66012.819.00.00.916.6225.0
HDI.for.year8,3640.80.10.50.70.90.9
gdp_per_capita27,66016,815.618,861.62513,43624,796126,352

From this summary table we can see that the average number of suicides per 100k persons is 12.8 with a standard deviation of 19. This is not suprising that the standard deviation is greater than the mean in this case because suicides per 100k persons can never take on a value less than 0. This can be seen in the minimum column which shows 0.0. The 75% percentile shows 16.6 suicides per 100k persons but the max observed in the dataset is 225 which shows that there are some number of outliers that have a number of suicides per 100k persons that are very far away from the mean.


Related Questions in computer science category


Disclaimer
The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.