In this assignment, you will be running exploratory analysis on a dataset to better understand it and its features.

computer science

Description

signment Overview: In this assignment, you will be running exploratory analysis on a dataset to better understand it and its features. You will be processing and preparing the data to apply the machine learning knowledge you’ve obtained through the lectures. This will include creating, analysing and generating predictions with regression models. This assignment is mainly about the examples in the chapter 2, 3 and 4 of the course book with a different data set. Reviewing the book and the corresponding code will greatly help you. You are expected to primarily use Scikit-Learn in the assignment. Data Set: The data set provided for this assignment contains information on many different car types and their prices.This assignment challenges you to predict the sale price of each car. Data is in CSV format and has already been split into training and test sets for your convenience: train.csv: the training set, test.csv: the test set. DATA PREPARATION (25 points) In the first part of the assignment, you will analyze the dataset and preprocess it in order to prepare it for using machine learning algorithms. In this data set, our target variable is “price” while the others are our features. (a) (5 points) Split your data into X and y: As mentioned, “price” column is our dataset target. Create two pandas data frames using train.csv, one containing all the input features and the other containing the target label only. Name these data frames as train_x_a and train_y respectively. (b) (5 points) Handling missing values: Find all features (columns) that contain missing (NaN) values. Store these column names in a list called nan_columns.Fill the missing values with the median value of the corresponding feature. Save your resulting data frame as train_x_b. (Note that if there are any missing values in the target i.e. price column, drop the corresponding row completely from train_x and train_y)


Related Questions in computer science category