#datascience #machinelearning #ml
The k-means based methods are efficient for processing large data sets, but they are often limited to numeric data. Kmeans optimize a cost function defined on the Euclidean distance
measure between data points and means of clusters. Minimizing the cost function by
calculating means limits their use to numeric data.
This is where K-Prototype shines. When applied to numeric data the algorithm is identical to k-means. For categorical data algorithm uses a simple matching dissimilarity measure
, replaces the means of clusters with modes, and uses a frequency-based method to
update modes in the clustering process to minimize the clustering cost function.
Download
0 formats
No download links available.
Clustering Algorithm for mixed datatypes - K-Prototypes | NatokHD