Bike sharing systems using apps have become popular in many cities. They provide economical transportation in an environmentally friendly manner.

data mining

Description

ASSIGNMENT I

 

Bike sharing systems using apps have become popular in many cities. They provide economical transportation in an environmentally friendly manner. A number of companies are now into the bike-sharing business.

Assume you are working for Lime, a bike-sharing company, and plan to enter a new market with bike sharing. You have collected data on bike sharing rentals in a major U.S. city on the east coast of the U.S. for two years (The data is actual bike-sharing rental data from one city). The data for two years is split into two data sets, training.csv and test.csv. The fields are as follows:

- ID: record ID

                - season : season (1:spring, 2:summer, 3:fall, 4:winter)

                - mnth : month ( 1 to 12)

                - day – day of the month ( 1 to 28 or 29 or 30 or 31)

                - hr : hour (0 to 23)

                - holiday : weather day is holiday or not

                - weekday : day of the week

                - workingday : if day is neither weekend nor holiday is 1, otherwise is 0.

                + weathersit :

                                - 1: Clear, Few clouds, Partly cloudy, Partly cloudy

                                - 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

                                - 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

                                - 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

                - temp : Normalized temperature in Celsius. The values are divided to 41 (max)

                - atemp: Normalized feeling temperature in Celsius. The values are divided to 50 (max)

                - hum: Normalized humidity. The values are divided to 100 (max)

                - windspeed: Normalized wind speed. The values are divided to 67 (max)

                - casual: count of casual users

                - registered: count of registered users

                - cnt: count of total rental bikes including both casual and registered

 

 

 

Instead of looking at data on an hourly basis, we will look at the data on a six hour basis. You have to bin the data (put into buckets) into the following groups.

For each day you have to create the buckets

EM (early morning) = Hr 0,1,2,3,4,5

MN  (morning to noon) = Hr 6,7 8,9,10,11

AN (afternoon) = Hr 12,13,14,15, 16,17,

EN (evening/night) = Hr 18,19,20,21,22,23 

 

You will have to add the  casual, registered and cnt values for the six hours in each bucket, but you have to take the average of temp, atemp, hum and windspeed for the six hours. For weathersit take the average and use the Round() function to round up/down the number.

 

MLR Model

Develop a MLR model for predicting total rental bikes (cnt) as a function of the independent variables using the training data set. Use Regsubsets with Malloy’s Cp as the selection criteria to select the best model.

1.       Copy/paste the MLR model here


Related Questions in data mining category