Skip to main content

Model-Based Clustering

  • Reference work entry
Encyclopedia of Machine Learning
  • 1744 Accesses


Model-based clustering is a statistical approach to data clustering. The observed (multivariate) data is assumed to have been generated from a finite mixture of component models. Each component model is a probability distribution, typically a parametric multivariate distribution. For example, in a multivariate Gaussian mixture model, each component is a multivariate Gaussian distribution. The component responsible for generating a particular observation determines the cluster to which the observation belongs. However, the component generating each observation as well as the parameters for each of the component distributions are unknown. The key learning task is to determine the component responsible for generating each observation, which in turn gives the clustering of the data. Ideally, observations generated from the same component are inferred to belong to the same cluster. In addition to inferring the component assignment of observations, most popular learning approaches...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  • Banerjee, A., Merugu, S., Dhillon, I., & Ghosh, J. (2005). Clustering with Bregman divergences. Journal of Machine Learning Research, 6, 1705–1749.

    MATH  MathSciNet  Google Scholar 

  • Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report ICSI-TR-97-02, University of Berkeley.

    Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allo- cation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Dasgupta, S. (1999). Learning mixtures of Gaussians. IEEE Symposium on foundations of Computer Science (FOCS). Washington, DC: IEEE Press.

    Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.

    MATH  MathSciNet  Google Scholar 

  • Kannan, R., Salmasian, H., & Vempala, S. (2005). The spectral method for general mixture models. Conference on Learning Theory (COLT).

    Google Scholar 

  • McLachlan, G. J., & Krishnan, T. (1996). The EM algorithm and extensions. New York: Wiley-Interscience.

    Google Scholar 

  • McLachlan, G. J., & Peel, D. (2000). Finite mixture models. Wiley series in probability and mathematical statistics: Applied probability and statistics section. New York: Wiley.

    Google Scholar 

  • Neal, R. M., & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in graphical models (pp. 355–368). Cambridge, MA: MIT Press.

    Google Scholar 

  • Redner, R., & Walker, H. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2), 195–239.

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Banerjee, A., Shan, H. (2011). Model-Based Clustering. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA.

Download citation

Publish with us

Policies and ethics