1. Determine the Ideal k (where k is the number of clusters). First apply k-means clustering setting k from 1 to 10. Using Excel, generate a line graph and plot k values on the x-asis and SSE for each k value on the y-axis. Based on the line graph, select the point/k value where the SSE stabilizes or where the line stabilizes/becomes constant. Submit a copy of the line graph and state which k you selected.
Note: k-means only applies to numeric data, and DATE is NOT numeric.
2. Split the dataset records into 3 equal bins and generate 3 separate csv files, one for each bin. (For simplicity, this can be done using Excel). Submit all binned files.
3. For each bin, apply k-means clusters and set k to the value you selected in question (1). Describe your clustering outcomes. Provide screenshots.
4. What alternative analysis can you perform on this data and why?