Back to Browse

Clustering Algorithm for mixed datatypes - K-Prototypes

48.9K views
Apr 18, 2020
11:47

#datascience #machinelearning #ml The k-means based methods are efficient for processing large data sets, but they are often limited to numeric data. Kmeans optimize a cost function defined on the Euclidean distance measure between data points and means of clusters. Minimizing the cost function by calculating means limits their use to numeric data. This is where K-Prototype shines. When applied to numeric data the algorithm is identical to k-means. For categorical data algorithm uses a simple matching dissimilarity measure , replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimize the clustering cost function.

Download

0 formats

No download links available.

Clustering Algorithm for mixed datatypes - K-Prototypes | NatokHD