Latent class analysis is the most common model that is used to perform model-based clustering for multivariate categorical responses. The selection of the variables most relevant for clustering is an important task which can affect the quality of clustering considerably.
We outline two approaches for model-based clustering and variable selection for multivariate categorical data.
The first method uses a Bayesian approach where both clustering and variable selection are carried out simultaneously using an MCMC approach based on a collapsed Gibbs sampler; post-hoc procedures for parameter and uncertainty estimation are outlined.
The second method considers a variable selection method based on stepwise model selection using a model that avoids a local independence assumption which is used in competing approaches.
The methods are illustrated on a simulated and real data and are shown to give improved clustering performance compared to competing methods.
The talk is based on: http://arxiv.org/pdf/1402.6928 and http://arxiv.org/pdf/1512.03350v1.pdf.
Download
0 formats
No download links available.
Prof. Brendan Murphy - Model-based clustering for multivariate categorical data | NatokHD