Provide the summary statistics for all the variables from the dataset. Explain some of the key aspects of the dataset.

data mining

Description

Using SAS, build a k-nearest neighbor algorithm to predict the quality of wine from different factors. (input files are provided):


1 - Provide the summary statistics for all the variables from the dataset. Explain some of the key aspects of the dataset.

2 - Review the SAS code in the file knn.sas and for each SAS statement, provide explanation of the code as comments

3 - Perform the k-NN using k = 1, 2, and 3. For each case, provide the code and explain the SAS output and give interpretation(s).

4 - Which case (k = 1, 2, or 3) provides the best model? Explain why using the output from #3


Thank you.


Related Questions in data mining category