Statistics and Computing

, Volume 10, Issue 1, pp 63–72

Model selection for probabilistic clustering using cross-validated likelihood

  • Padhraic Smyth

DOI: 10.1023/A:1008940618127

Cite this article as:
Smyth, P. Statistics and Computing (2000) 10: 63. doi:10.1023/A:1008940618127


Cross-validated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modeling, particularly in the context of model-based probabilistic clustering. The conceptual framework for the cross-validation approach to model selection is straightforward in the sense that models are judged directly on their estimated out-of-sample predictive performance. The cross-validation approach, as well as penalized likelihood and McLachlan's bootstrap method, are applied to two data sets and the results from all three methods are in close agreement. The second data set involves a well-known clustering problem from the atmospheric science literature using historical records of upper atmosphere geopotential height in the Northern hemisphere. Cross-validated likelihood provides an interpretable and objective solution to the atmospheric clustering problem. The clusters found are in agreement with prior analyses of the same data based on non-probabilistic clustering techniques.

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Padhraic Smyth
    • 1
    • 2
  1. 1.Information and Computer ScienceUniversity of CaliforniaIrvine
  2. 2.Jet Propulsion Laboratory 126-347California Institute of TechnologyPasadena

Personalised recommendations