Determine the number of variables and the number of records in this dataset.

computer science

Description

The use of numpy and pandas in this assignment are prohibited. You will receive zero marks to solve problems in this assignment if you use the mentioned packages. 


1. Please write codes to read the data file TrainingData.csv. The first row is the header (variable names). Data are stored in subsequent rows. 


2. Determine the number of variables and the number of records in this dataset. 


3. Store the variable names in a list. 


4. Determine if there is any missing values in the data set. If yes, please report the total number of missing values. 


5. Find the number of distinct LCID in the data set. 


6. Find the variable with the most missing values. 


7. Convert the variable hour_id to datetime format. 


8. What is the time duration of the entire data set? 


9. Determine the number of records per day. 


10. Use the median method in the statistics package (from statistics import median) or else, do the followings: (a) Divide the entire data set by distinct value of LCID. (b) For each distinct LCID value, determine the median of each variables in the divided data set. (c) Package the result in (b) in a dictionary. 


11. Determine the number of Complaint cases and Non-complaint cases in the entire data set. 


12. Determine the top 10 LCIDs with the most complaint cases. 


13. Calculate the median value per day per each variable in the entire data set. 14. Use the first 5 digits of the LCID values to define a new variable Region. 


15. Determine the region with the most complaint cases found in the data set.


Related Questions in computer science category