The excel sheet Mall_customers.csv has information about customers shopping in a mall with information about their gender, age, annual income and spending score.

computer science

Description

The excel sheet Mall_customers.csv has information about customers shopping in a mall with information about their gender, age, annual income and spending score. In class, we have applied KMeans clustering using only two features: annual income and spending score. 


Task 1: Extend the K-Means clustering on the above dataset by applying it to the following feature sets:

a) Only two features: Age and Spending Score. 

b) Three features: Age, Annual Income and Spending Score


Remember that K-means clustering uses the Euclidean distance between points to compute the “goodness of fit”, i.e. within cluster sum of squares to be minimized. Hence it is important that these feature values be scaled to a common scale, using the same approach that you had used to scale the variables in the Multiple Linear Regression examples. This is particularly important when you are dealing with 3 or more features. 


Task 2: Now we want to extend the K-Means clustering to include Gender also. 


Remember here that Gender is a categorical variable (not a continuous variable like Age, Annual Income, etc). Replace the value of Male with 0, and Female with 1. 


Related Questions in computer science category