12. a) Here we have used K-Means clustering with 2 no of clusters.
Cluster | Size | Average Distance |
Cluster 1 | 17 | 37.39795319 |
Cluster 2 | 4 | 33.73175476 |
Total | 21 | 36.69962968 |
From the above we can see cluster 2 is more dense than cluster 1 as the distance between the points is less. And size of cluster 1 is more than cluster 2, this specifies that we have a large group of more similar firms.
b)
Cluster | Market_Cap | Beta | PE_Ratio | ROE | ROA | Asset_Turnover | Leverage | Rev_Growth | Net_Profit_Margin | |
Cluster 1 | 34.27117647 | 0.536470588 | 26.2235294 | 21.41176 | 8.823529 | 0.641176471 | 0.671764706 | 12.15647059 | 14.78235294 | |
Cluster 2 | 157.0175 | 0.48 | 22.225 | 44.425 | 17.7 | 0.95 | 0.22 | 18.5325 | 19.575 |
Above are the centroids of the selected variables.
c) From the pivot table we can analyze that most of the data belongs to NYSE and most moderate buyers are from US.
d) We can name the clusters on the basis of market capital as it is most prominent factor.
15.2
Ans- As we can see in the sheet pivot(analyze) the net profit in the NYSE is highest for most locations. In this we also analyze that hold has the highest profit. So we can suggest the person on the basis of net profit that it is better to register in the NYSE. As we have the most important factor as profit we have analyzed the Net Profit.
15.4
a)
b) Not normalized data will have a clustering as below
c)
Cluster 2 metrics are leading in progressive way except non-flight bonus transactions.
These are the segment of customers, who are associated with EastWest Airlines since long time.
Cluster 3 metrics depicts that the flight miles and flight transactions in last 12 months is zero and their
non-flight bonus transactions are leading than the other clusters.
cluster 1 metrics depicts that the level of spending is average to cluster 2 and 3.
Centroids
Balance 73547.288828 115682.00 89401.333
Qual_miles 144.071142 295.50 0.000
cc1_miles 2.057114 3.75 3.000
cc2_miles 1.014529 1.00 1.000
cc3_miles 1.009519 3.00 2.000
Bonus_miles 17064.150802 74597.50 47920.000
Bonus_trans 11.576653 32.75 17.000
Flight_miles_12mo 460.035822 825.00 0.000
Flight_trans_12 1.373246 2.75 0.000
Days_since_enroll 4117.809118 5058.75 3863.333
d) K-Means clustering
e) For n_clusters = 2 The average silhouette_score is : 0.3639788167896175
For n_clusters = 3 The average silhouette_score is : 0.34038142310682834
For n_clusters = 4 The average silhouette_score is : 0.33811895596476305
For n_clusters = 5 The average silhouette_score is : 0.34885887029453005
For n_clusters = 6 The average silhouette_score is : 0.3665724589453308
For n_clusters = 7 The average silhouette_score is : 0.26773101005827354
For n_clusters = 8 The average silhouette_score is : 0.28284236665851537
For n_clusters = 9 The average silhouette_score is : 0.2708765675709241
From the Scores we choose no of clusters = 6. For best explanation.
f)
- KMeans clustering is sensitive to outliers.Since produced min. sil. score. whereas hierarchical gives max. in this case.
- Hierarchical clustering given total cluster number=5, but when i comapred both kmean & Hierarchical clustering, the hierarchical clustering contains approx 99% data in cluster 1 only. So this will not useful to analyze diff. customer. So i continued with KMeans.