Implement L2 regularized linear regression algorithm with λ ranging from 0 to 150 (integers only). For each of the 6 dataset, plot both the training set MSE and the test set MSE as a function of λ (x-axis) in one graph.

data mining

Description

Start the experiment by creating 3 additional training files from the train-1000-100.csv by taking the first 50, 100, and 150 instances respectively. Call them: train-50(1000)- 100.csv, train-100(1000)-100.csv, train-150(1000)-100.csv. The corresponding test file for these dataset would be test-1000-100.csv and no modification is needed.


1. Implement L2 regularized linear regression algorithm with λ ranging from 0 to 150 (integers only). For each of the 6 dataset, plot both the training set MSE and the test set MSE as a function of λ (x-axis) in one graph.


(a) For each dataset, which λ value gives the least test set MSE? 


(b) For each of datasets 100-100, 50(1000)-100, 100(1000)-100, provide an additional graph with λ ranging from 1 to 150. 


(c) Explain why λ = 0 (i.e., no regularization) gives abnormally large MSEs for those three datasets in (b).


2. From the plots in question 1, we can tell which value of λ is best for each dataset once we know the test data and its labels. This is not realistic in real world applications. In this part, we use cross validation (CV) to set the value for λ. Implement the 10-fold CV technique discussed in class (pseudo code given in Appendix A) to select the best λ value from the training set. 


(a) Using CV technique, what is the best choice of λ value and the corresponding test set MSE for each of the six datasets? 


(b) How do the values for λ and MSE obtained from CV compare to the choice of λ and MSE in question 1(a)? 


(c) What are the drawbacks of CV? 


(d) What are the factors affecting the performance of CV?

Instruction Files
HW1.pdf
114.8 KB
datasets.zip
997.0 KB

Related Questions in data mining category


Disclaimer
The ready solutions purchased from Library are already used solutions. Please do not submit them directly as it may lead to plagiarism. Once paid, the solution file download link will be sent to your provided email. Please either use them for learning purpose or re-write them in your own language. In case if you haven't get the email, do let us know via chat support.