In this assignment you will use statistical tests for non-normal data. You may use methods (non-parametric statistics tests) and tools (R, Excel, or SPSS) of your own choice - please don't rely on one tool or method, variety is expected.

computer science

Description

Advanced Business Data Analysis

Replacement for Exam (50%)

 

Deadline for Submission – 9th April, 2020 (12:55 hours)

In this assignment you will use statistical tests for non-normal data. You may use methods (non-parametric statistics tests) and tools (R, Excel, or SPSS) of your own choice - please don't rely on one tool or method, variety is expected. It is not necessary to replicate any test you carry out, ie if you perform a test in R it is not necessary to repeat in SPSS and/or Excel. A data file (from the 2016 Census of Ireland) is suggested, though students are permitted to choose a different file if they wish (subject to approval by Dr O'Loughlin). Your task is to prepare a statistical report based on the data in the file.

LINK:

The Central Statistics Office provides data on "Small Area Population Statistics" from the 2016 census of Ireland – see:
 
https://www.cso.ie/en/census/census2016reports/census2016smallareapopulationstatistics/


For this assignment you will need two CSV files:

1.     Small Areas (18,641)
https://www.cso.ie/en/media/csoie/census/census2016/census2016boundaryfiles/SAPS2016_SA2017.csv

2.     Small areas OSI Boundaries
https://data.gov.ie/dataset/small-areas-generalised-100m-osi-national-statistical-boundaries-2015

 

The first file contains raw data based on the 2016 Census of Ireland. The second file contains information such as location names and IDs. You should be able to combine both data sets into one using the GUID field. The Glossary file at the above site will also be useful:

 (https://www.cso.ie/en/media/csoie/census/census2016/census2016boundaryfiles/SAPS_2016_Glossary.xlsx)

 

The Small Areas CSV file has 18,641 records based on 68 columns of data. You are not expected to use all the data in the file and you may reduce to eliminate unused data if you wish. As there are a lot of data in this file, please be careful on what you decide to report on - it is up to you to choose.

 

Some suggested reports:

·         a comparison of methods of transport to work by County/Planning Region

·         difference between different methods of transport in urban vs rural areas

·         a comparison of journey times to work by County/Planning Region

·         a comparison of time leaving home to travel to work by County/Planning Region

·         Correlations may also be tested

Suggested statistical tests:

·         Descriptive statistics for all data used

·         Tests for normality such Q-Q plots, Kolmogorov-Smirnov (please note - the Shapiro-Wilk test does not work for sample sizes over 5,000)

·         Mann-Whitney U Test/Wilcoxon Rank Test to compare two samples (eg - travel times for Kerry vs Cork)

·         Kruskal-Wallis H Test to compare three or more samples

·         Post-hoc tests where appropriate

Suggested visual representation of data

·         Q-Q/P-P plots

·         Residuals

·         Box plots

·         Frequency Distributions/Histograms

·         Scatter plots

Be aware that this is a statistical report and that Null/Alternate hypotheses, justification of levels of significance, correct reporting of results, and explanations of results are expected (see 8 Simple Rules document in Moodle). Please also explain and justify any statistical test used. State clearly any assumptions made.


Related Questions in computer science category


Disclaimer
The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.