The excel sheet Mall_customers.csv has information about customers shopping in a mall with information about their gender, age, annual income and spending score. In class, we have applied KMeans clustering using only two features: annual income and spending score.
Task 1: Extend the K-Means clustering on the above dataset by applying it to the following feature
a) Only two features: Age and Spending Score.
b) Three features: Age, Annual Income and Spending Score
Remember that K-means clustering uses the Euclidean distance between points to compute the
“goodness of fit”, i.e. within cluster sum of squares to be minimized. Hence it is important that these
feature values be scaled to a common scale, using the same approach that you had used to scale the
variables in the Multiple Linear Regression examples. This is particularly important when you are
dealing with 3 or more features.
Task 2: Now we want to extend the K-Means clustering to include Gender also.
Remember here that Gender is a categorical variable (not a continuous variable like Age, Annual
Income, etc). Replace the value of Male with 0, and Female with 1.