Definition
Model-based clustering is a statistical approach to data clustering. The observed (multivariate) data is assumed to have been generated from a finite mixture of component models. Each component model is a probability distribution, typically a parametric multivariate distribution. For example, in a multivariate Gaussian mixture model, each component is a multivariate Gaussian distribution. The component responsible for generating a particular observation determines the cluster to which the observation belongs. However, the component generating each observation as well as the parameters for each of the component distributions are unknown. The key learning task is to determine the component responsible for generating each observation, which in turn gives the clustering of the data. Ideally, observations generated from the same component are inferred to belong to the same cluster. In addition to inferring the component assignment of observations, most popular learning approaches...
This is a preview of subscription content, log in via an institution.
Recommended Reading
Banerjee, A., Merugu, S., Dhillon, I., & Ghosh, J. (2005). Clustering with Bregman divergences. Journal of Machine Learning Research, 6, 1705–1749.
Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report ICSI-TR-97-02, University of Berkeley.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allo- cation. Journal of Machine Learning Research, 3, 993–1022.
Dasgupta, S. (1999). Learning mixtures of Gaussians. IEEE Symposium on foundations of Computer Science (FOCS). Washington, DC: IEEE Press.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.
Kannan, R., Salmasian, H., & Vempala, S. (2005). The spectral method for general mixture models. Conference on Learning Theory (COLT).
McLachlan, G. J., & Krishnan, T. (1996). The EM algorithm and extensions. New York: Wiley-Interscience.
McLachlan, G. J., & Peel, D. (2000). Finite mixture models. Wiley series in probability and mathematical statistics: Applied probability and statistics section. New York: Wiley.
Neal, R. M., & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in graphical models (pp. 355–368). Cambridge, MA: MIT Press.
Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2), 195–239.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Banerjee, A., Shan, H. (2011). Model-Based Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_554
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_554
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering