Recall the data cleaning assignments we did on the
investments data. We showed that customers who had negative service encounters
took out a lot more money than customers who had positive service encounters.
One main limitation of the previous work was that it was carried out across the
entire sample of data without looking to see if different segments of customers
responded differently to bad vs. good service. The executives at the
investments firm have asked you to reanalyze the data in greater detail. Specifically,
the executives want to know if there are certain customer segments who are at
higher risk of taking out more money. In order to answer this question, you
will perform a cluster analysis on the data, and calculate the average changes
in dollars (Chg good service – Chg bad service) within each segment to
see if there are certain groups of customers the firm needs to make sure always
have good service encounters.
You must provide all of your R code in a script file
and we must be able to replicate your results in order for you to receive full
credit for this assignment. Please upload both your answers as well as the R
file. The dataset "hw3_data.txt" contains 1759 observations and 10
variables. The most important variables are defined below:
“categ”: Describes whether a customer had a good
or a bad customer service experience. 860 customers had a bad service
experience (answered a 1 or 2 on the customer satisfaction survey), and 899
customers had a good service experience (answered a 4 or 5 on the survey).
Customers answering a “3” (average service experience) were omitted from this
“Inv_Chg”: This variable is the primary variable
of interest to the firm. It represents the change in investment dollars for each
customer 1 month before the service encounter vs. 3 months after the service
encounter: Inv_Chg = Inv_3M_Aft – Inv_1M_Bef.
“Inv_1M_Bef”: This variable represents the total
investment dollars customers had with the firm 1 month before they had a
service encounter and survey.
“cust_age”: This variable represents each
customer’s age in years at the time of the survey.
“cust_tenure”: This variable represents how long
each customer has been with the firm (measured in years) at the time of the
“tottrans”: Total monthly transactions the
customer had with the firm at the time of the survey.
“cust_id”: Customer ID number: unique ID field
used to identify each customer.
The firm has asked you to do a segmentation (i.e.
cluster analysis) on the following four variables: 1) Inv_1M_Bef, 2) cust_age, 3)
cust_tenure, 4) tottrans to see if you can identify segments of customers who
are especially likely to take out a lot of money should a bad service encounter
occur. If you can identify segments of customers likely to take out a lot of
money, then this can be used to proactively identify future customers who may also
be especially at risk of disengaging after a bad customer service experience.
The firm believes these four variables are especially important for clustering
Part A: Standardize Data (20 Points)
Standardize the four variables: Inv_1M_Bef,
cust_age, cust_tenure, tottrans to a mean of 0 and a standard deviation of 1.
Don’t replace the original data. Instead, create a new data matrix
called “X.scaled” that contains the four variables that have been standardized
so that they are all scaled the same way. Be sure to show your work in R.
Get Free Quote!
431 Experts Online