[Solved] For this problem you will experiment with various classif...

Check Out Our Work & Get Yours Done

Submit Work

Download Sample

Enroll in the complete course for only $250 USD*

Order Now

Submit work Offers

For this problem you will experiment with various classifiers provided as part of the scikit-learn (sklearn) machine learning module, as well as with some of its preprocessing and model evaluation capabilities.

data mining

Description

Please number the question on the python notebook.

[Dataset: magic04.csv]

https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope

For this problem you will experiment with various classifiers provided as part of the scikit-learn (sklearn) machine learning module, as well as with some of its preprocessing and model evaluation capabilities. The data is provided in a CSV formatted file with the first row containing the attribute names. Click “Data Folder”, and you can download the dataset to your PC by right-clicking and then selecting “save link as” the magic04.data link. The description of the different fields in the data is provided at http://archive.ics.uci.edu/ml/machine-learning-databases/magic/magic04.names . Please try to read the document and understand the case and the dataset.

In this assignment, you need to use the scikit-learn package, the main machine learning package in python to develop an ipython notebook. Please take a look at the scikit-learn home page (http://scikit-learn.org/stable/index.html) to get an overview of the package.

You want to make sure the scikit-learn package you are using is v20 or later versions. If you installed anaconda recently, you should have the version v23.2, which is fine though the latest version of sklearn is v24.1.

Please develop an ipython notebook titled 770_21_a1_yourlastname to finish the following tasks. You probably want to finish the tasks by modifying the German credit notebook I used in week 3 lecture

You are required to create an ipython notebook cell for each of the following tasks, where (C) indicates that you need to write code for the task, (O) indicates that you need to show output, and (A) that you need to type your answers using Markdown text.

At the beginning of each cell, you need to indicate which task the cell is about. For example, in the cell related to task 1, you should first type “# Task 1: Import data”. If you do not clearly label the cells, you will lose 1-2 points (out of 18 points).

1. You need to import data. (C) - completed

2. In this dataset, the dependent variable is class. It includes two categories: g and h. g represents gamma (signal), and h hadron (background). Please insert a cell and print the value count of each category. (C)(O) - completed

3. All the other variables are independent variables. Please insert a cell and print the histograms of the independent variables (C)(O). - completed

4. Insert a cell and print the basic stats of each independent variable using the describe() method (C)(O). – completed.

5. Insert a cell and write code to split the dataset into training and validation sets (Please use 60%-40% split) (C).

6. Insert a cell and describe the uses of validation (at least 3 uses). (A). I will complete this portion.

7. Insert a cell. In this cell, you need to use scikit-learn’s logistic regression classifier (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) and fit a model using the training dataset (C). Then you run the classifier on the validation set (C). Print the validation dataset classification report and Area Under the Receiver Operating Characteristic Curve (ROC AUC) for the validation set. (please google to find out how to get AUC using scikit-learn) (C)(O).

8. Insert a cell and use your own language to describe the SVM algorithm (with at most 8 sentences) (A). I will complete this portion.

9. Insert a new cell. In this cell, you use the same training and validation dataset you obtained in task 5 to fit SVM classifiers (Please use the SVC function in scikit-learn. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html). You need to tune the SVM hyperparameter, C (default = 1.0), the Regularization parameter. You need to try each C in the list [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] – you must use a FOR loop. In each iteration, please print the validation set classification report and AUC. (C)(O).

10. Insert a new cell. In this cell, please first tell me which C in the list [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] gives you the optimal SVM classifier with respect to AUC (A). Then, please use your own language (with at most 4 sentences) to discuss what this hyperparameter C means (A).

11. Insert a cell and write code to fit a random forest classifier (https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) using the same training and validation dataset obtained in task 5 and print classification report and AUC. When you fit the random forest model, you can just use the default hyperparameters (C)(O).

Insert a new cell and use your own language (with at most 8 sentences) to describe the random forest algorithm (A).

Instruction Files

myworkbook1.ipynb

52.3 KB

myworkbook.docx

16.9 KB

Price $15

Buy Ready Solution

(569 times downloaded)

OR

Get Same Assignment Done From Scratch

Get instant assignment help service

Related Questions in data mining category

Organizing today is portrayed by various levels of switches and routers that disseminate data to servers and customer gadgets.

This assignment is founded on the Key Performance Indicators for the Google Merchandise Store for the 12 months starting on the 1st of September 2017 to 31st August 2018. You are required to:

When reviewing posts made by other students discuss the data characteristics presented

Find the two categorical attributes that have the highest positive/negative correlation. Draw a scatter plot of these two attributes.

HERE IS WHERE YOU CAN WORK ON PLEASE CHECK THIS AND WORK ON THIS PLATFORM BOOKCLUB Database

Structured Query Language Database constraints are an important tool used to maintain data integrity. They can prevent accidental loss of data and the introduction of "garbage" data.

Databases are not always stored in a central location. Many databases have evolved from centralized DBMSs to distributed DBMSs. Describe the different types of database requests and transactions

Descriptive statistics and ANOVA need to include the 2009 – 2010 data. Also,

In this assignment, you will use data software to create some visualizations. This assignment is designed to help you become familiar with the software and learn some new ways to visualize data.

What useful information can be extrapolated on your visualization that you want to convey to the end-user/audience?

Disclaimer

The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.

Get Higher Grades Now

Tutors Online

Description

Drop Files Here Or Click to Upload

May

January

February

March

April

May

June

July

August

September

October

November

December

2025

1950

1951

1952

1953

1954

1955

1956

1957

1958

1959

1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

1979

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

2027

2028

2029

2030

2031

2032

2033

2034

2035

2036

2037

2038

2039

2040

2041

2042

2043

2044

2045

2046

2047

2048

2049

2050

Sun	Mon	Tue	Wed	Thu	Fri	Sat
27	28	29	30	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

00:00

00:30

01:00

01:30

02:00

02:30

03:00

03:30

04:00

04:30

05:00

05:30

06:00

06:30

07:00

07:30

08:00

08:30

09:00

09:30

10:00

10:30

11:00

11:30

12:00

12:30

13:00

13:30

14:00

14:30

15:00

15:30

16:00

16:30

17:00

17:30

18:00

18:30

19:00

19:30

20:00

20:30

21:00

21:30

22:00

22:30

23:00

23:30

Warning: require_once(/home/u706648698/domains/calltutors.com/public_html/service_page_footer.php): failed to open stream: No such file or directory in /home/u706648698/domains/calltutors.com/public_html/Assignment.php on line 380

Fatal error: require_once(): Failed opening required '/home/u706648698/domains/calltutors.com/public_html/service_page_footer.php' (include_path='.:/opt/alt/php73/usr/share/pear') in /home/u706648698/domains/calltutors.com/public_html/Assignment.php on line 380

Sun	Mon	Tue	Wed	Thu	Fri	Sat
27	28	29	30	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Sun	Mon	Tue	Wed	Thu	Fri	Sat
27	28	29	30	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Sun	Mon	Tue	Wed	Thu	Fri	Sat
27	28	29	30	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31