K Means Clustering Example Ohio Counties
K Means Clustering and Ohio Counties In this example, I am grouping and clustering counties in Ohio. I show how the number of clusters can significantly impact grouping. Why Cluster? - Separate data into homogenous clusters - Can make supervised learning easier, as we model each cluster separately, instead of the entire data set. - Good when you need to analyze a large set of data. Examples of clustering: - Linneaus categorization. - Periodic table of the elements. - Market segmentation. K-Means is ideal for: - Large dataset applications. - Records are associated to clusters, based on distance. - Use pre-specified number of clusters. - Not computationally expensive. In the example, I read Ohio election data, but only for attributes of each of Ohio's 88 counties. The number of clusters help determine how we group counties, and what attributes those counties have. I scale the data to give a mean of 0 and 1 standard deviation. I use the KMeans function, and specify the number of clusters. I experiment with different numbers of clusters to see how that impacts results. We can evaluate results in a cluster map. I also show how to use a centroid map to show what makes a cluster distinct, point by point.
Download
0 formatsNo download links available.