A new Bayesian approach for determining the number of components in a finite mixture


This article evaluates a new Bayesian approach to determining the number of components in a finite mixture. We evaluate through simulation studies mixtures of normals and latent class mixtures of Bernoulli responses. For normal mixtures we use a “gold standard” set of population models based on a well-known “testbed” data set—the galaxy recession velocity data set of Roeder (J Am Stat Assoc 85:617–624, 1990). For Bernoulli latent class mixtures we consider models for psychiatric diagnosis Berkhof et al. (Stat Sin 13:423–442, 2003). The new approach is based on comparing models with different numbers of components through their posterior deviance distributions, based on non-informative or diffuse priors. Simulations show that even large numbers of closely spaced normal components can be identified with sufficiently large samples, while for latent classes with Bernoulli responses identification is more complex, though it again improves with increasing sample size.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. Full details can be found in their paper.

  2. This may seem counter-intuitive, since the ML estimate of the saturated model always has the smallest frequentist deviance. The single observation in each “class” however gives a very diffuse likelihood for each \(p_{ij}\) and this leads to a very diffuse and large deviance distribution.


We are grateful for research support from the Australian Research Council under project DP120102902 for the support of Duy Vu for the period of this research (2012–2015), and for visits by Brian Francis from the University of Lancaster.

