1. Determine the Ideal k (where k is the number of clusters). First apply k-means clustering setting k from 1 to 10. Using Excel, generate a line graph and plot k values on the x-asis and SSE for each k value on the y-axis. Based on the line graph, select the point/k value where the SSE stabilizes or where the line stabilizes/becomes constant. Submit a copy of the line graph and state which k you selected.
Note: k-means only applies to numeric data, and DATE is NOT numeric.
2. Split the dataset records into 3 equal bins and generate 3 separate csv files, one for each bin. (For simplicity, this can be done using Excel). Submit all binned files.
3. For each bin, apply k-means clusters and set k to the value you selected in question (1). Describe your clustering outcomes. Provide screenshots.
4. What alternative analysis can you perform on this data and why?
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
29 | 30 | 1 | 2 | 3 | 4 | 5 |
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 | 1 | 2 |