Skip to main content
Log in

Order selection in finite mixtures of linear regressions

Literature review and a simulation study

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Finite mixture models can adequately model population heterogeneity when this heterogeneity arises from a finite number of relatively homogeneous clusters. An example of such a situation is market segmentation. Order selection in mixture models, i.e. selecting the correct number of components, however, is a problem which has not been satisfactorily resolved. Existing simulation results in the literature do not completely agree with each other. Moreover, it appears that the performance of different selection methods is affected by the type of model and the parameter values. Furthermore, most existing results are based on simulations where the true generating model is identical to one of the models in the candidate set. In order to partly fill this gap we carried out a (relatively) large simulation study for finite mixture models of normal linear regressions. We included several types of model (mis)specification to study the robustness of 18 order selection methods. Furthermore, we compared the performance of these selection methods based on unpenalized and penalized estimates of the model parameters. The results indicate that order selection based on penalized estimates greatly improves the success rates of all order selection methods. The most successful methods were \(MDL2\), \(MRC\), \(MRC_k\), \(ICL\)\(BIC\), \(ICL\), \(CAIC\), \(BIC\) and \(CLC\) but not one method was consistently good or best for all types of model (mis)specification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. This can be readily extended to the multivariate case.

  2. It is possible to generalize (4) by including explanatory variables to model the mixture proportions using a logistic regression model for instance. If these explanatory variables are different from the variables which model the component means they can be ignored for order selection as the marginal model is a mixture model with the same number of components (Bandeen-Roche et al. 1997).

  3. For more on this topic, see for instance McLachlan and Peel (McLachlan and Peel 2000, Sect. 6.4) or Garel (2007).

  4. Bootstrapping the likelihood ratio test may however be very useful if one has enough time and/or computing power. Nylund et al. (2007) presented very favorable results from their simulation study.

  5. The most widely known criterion of this type is probably \(ICOMP\) (Bozdogan 1993) which is defined as \(-2LL\left( \hat{\varvec{\varPsi }}\right) +n_p\log \left[ n_p^{-1}trace\left( \mathcal I ^{-1}\right) \right] -\log \left( |\mathcal I ^{-1}|\right) \) where \(\mathcal I \) denotes the expected information matrix, \(n_p\) is the number of parameters and \(|.|\) is the determinant.

  6. Akaike himself actually called it ‘An information criterion’ (Burnham and Anderson 2002).

  7. The Kullback–Leibler divergence between distributions \(f\) and \(g\) is defined as \( I(f,g)=\int {f(x)\log {f(x)}dx}-\int {f(x)\log {g(x)}dx}\) and represents the lost information when approximating \(f\) by \(g\) (Kullback and Leibler 1951; Burnham and Anderson 2002).

  8. The symmetric Kullback–Leibler divergence \(J(f,g)\) between f and g is defined as \(J(f,g)=I(f,g)+I(g,f)\).

  9. Burnham and Anderson (2002) argue that in most realistic situations, it is impossible that the true model is in the set of candidate models and show furthermore by simulation that in case it is, \(AIC\) also selects the true model with high probability.

  10. It is not possible to set skewness independently from kurtosis (Headrick 2002).

  11. The complete collection of tables with success rates for each combination of the experimental settings can be found in the supplementary materials.

References

  • Abbi R, El-Darzi E, Vasilakis C, Millard P (2008) Analysis of stopping criteria for the EM algorithm in the context of patient grouping according to length of stay. In 4th International IEEE Conference Intelligent Systems. IEEE (2008), Varna, Bulgaria, pp 9–14

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723

    Article  MATH  MathSciNet  Google Scholar 

  • Andrews RL, Currim IS (2003a) A comparison of segment retention criteria for finite mixture logit models. J Mark Res 40(2):235–243

    Article  Google Scholar 

  • Andrews RL, Currim IS (2003b) Retention of latent segments in regression-based marketing models. Int J Res Mark 20(4):315–321

    Article  Google Scholar 

  • Bandeen-Roche K, Miglioretti DL, Zeger SL, Rathouz PJ (1997) Latent variable regression for multiple discrete outcomes. J Am Stat Assoc 92(440):1375–1386

    Article  MATH  MathSciNet  Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803

    Article  MATH  MathSciNet  Google Scholar 

  • Bhansali RJ, Downham DY (1977) Some properties of the order of an autoregressive model selected by a generalization of Akaike’s EPF criterion. Biometrika 64(3):547

    MATH  MathSciNet  Google Scholar 

  • Biernacki C, Govaert G (1997) Using the classification likelihood to choose the number of clusters. Comput Sci Stat 29(2):451–457

    Google Scholar 

  • Biernacki C, Celeux G, Govaert G (1998) Assessing a mixture model for clustering with the integrated classification likelihood. Tech. Rep. 3521, No. 3521. Rhône-Alpes:INRIA.

  • Biernacki C, Celeux G, Govaert G (1999) An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Pattern Recognit Lett 20(3):267–272

    Article  MATH  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725

    Article  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4):561–575

    Article  MATH  MathSciNet  Google Scholar 

  • Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46(2):373–388

    Article  MATH  Google Scholar 

  • Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3):345–370

    Article  MATH  MathSciNet  Google Scholar 

  • Bozdogan H (1993) Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the Inverse-Fisher information matrix. In: Opitz O, Lausen B, Klar R (eds) Information and classification. Springer, Heidelberg, pp 40–54

    Chapter  Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information—theoretic approach, 2nd edn. Springer, Berlin

    Google Scholar 

  • Cavanaugh JE (1999) A large-sample model selection criterion based on Kullback’s symmetric divergence. Stat Probab Lett 42(4):333–343

    Article  MATH  MathSciNet  Google Scholar 

  • Cavanaugh JE (2004) Criteria for linear model selection based on Kullback’s symmetric divergence. Aust N Z J Stat 46(2):257–274

    Article  MATH  MathSciNet  Google Scholar 

  • Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13(2):195–212

    Article  MATH  MathSciNet  Google Scholar 

  • Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivar Anal 100(7):1367–1383

    Article  MATH  MathSciNet  Google Scholar 

  • Chen J, Tan X, Zhang R (2008) Inference for normal mixtures in mean and variance. Stat Sinica 18(2):443–465

    MATH  MathSciNet  Google Scholar 

  • Ciuperca G, Ridolfi A, Idier J (2003) Penalized maximum likelihood estimator for normal mixtures. Scand J Stat 30(1):45–59

    Article  MATH  MathSciNet  Google Scholar 

  • Cutler A, Windham MP (1994) Information-based validity functionals for mixture analysis. In: Bozdogan H (ed) Proceedings of the First US/Japan Conference for Mixture Analysis. Kluwer, Amsterdam, pp 149–170

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B 39(1):1–38

    Google Scholar 

  • Desarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5(2):249–282

    Article  MATH  MathSciNet  Google Scholar 

  • Dias JG (2007) Performance evaluation of information criteria for the naive-Bayes model in the case of latent class analysis: a Monte Carlo study. J Korean Stat Soc 36(3):435–445

    MATH  MathSciNet  Google Scholar 

  • Falk M (1999) A simple approach to the generation of uniformly distributed random variables with prescribed correlations. Commun Stat Simul Comput 28(3):785–791

    Article  MATH  MathSciNet  Google Scholar 

  • Fleishman AI (1978) A method for simulating non-normal distributions. Psychometrika 43(4):521–532

    Article  MATH  Google Scholar 

  • Fonseca JRS, Cardoso MGMS (2007) Mixture-model cluster analysis using information theoretical criteria. Intell Data Anal 11(2):155–173

    Google Scholar 

  • Garel B (2007) Recent asymptotic results in testing for mixtures. Comput Stat Data Anal 51(11):5295–5304

    Article  MATH  MathSciNet  Google Scholar 

  • Ghosh JK, Sen PK (1985) On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. In: Proceedings of the Berkeley Conference in Honor of Jerzy Neymnan and Jack Kiefer, vol 2. Wadsworth, Monterey, pp 789–806

  • Hafidi B, Mkhadri A (2010) The Kullback information criterion for mixture regression models. Stat Probab Lett 80(9–10):807–815

    Article  MATH  MathSciNet  Google Scholar 

  • Hannan EJ, Quinn BG (1979) The determination of the order of an autoregression. J Royal Stat Soc Ser B 41(2):190–195

    MATH  MathSciNet  Google Scholar 

  • Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13(2):795–800

    Article  MATH  MathSciNet  Google Scholar 

  • Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probab Lett 4(2):53–56

    Article  MATH  MathSciNet  Google Scholar 

  • Hawkins DS, Allen DM, Stromberg AJ (2001) Determining the number of components in mixtures of linear models. Comput Stat Data Anal 38(1):15–48

    Article  MATH  MathSciNet  Google Scholar 

  • Headrick TC (2002) Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions. Comput Stat Data Anal 40(4):685–711

    Article  MATH  MathSciNet  Google Scholar 

  • Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrika 76(2):297

    Article  MATH  MathSciNet  Google Scholar 

  • James W, Stein C (1961) Estimation with quadratic loss. In: Neyman J (ed) Proceedings Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol 1. University of California Press, California, pp 361–379

    Google Scholar 

  • Jedidi K, Jagpal HS, DeSarbo WS (1997) Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Mark Sci 16(1):39–59

    Article  Google Scholar 

  • Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41(3–4):577–590

    Article  MATH  MathSciNet  Google Scholar 

  • Konishi S, Kitagawa G (1996) Generalized information criteria in model selection. Biometrika 83(4):875–890

    Article  MATH  MathSciNet  Google Scholar 

  • Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86

    Article  MATH  MathSciNet  Google Scholar 

  • Liang Z, Jaszczak RJ, Coleman RE (1992) Parameter estimation of finite mixtures using the EM algorithm and information criteria with application to medical image processing. IEEE Trans Nucl Sci 39(4):1126–1133

    Article  Google Scholar 

  • Lindstrom MJ, Bates DM (1988) Newton–Raphson and EM algorithms for linear mixed models for repeated-measures data. J Am Stat Assoc 83(404):1014–1022

    MATH  MathSciNet  Google Scholar 

  • Lubke GH, Neale MC (2006) Distinguishing between latent classes and continuous factors: resolution by maximum likelihood? Multivar Behav Res 41(4):499–532

    Article  Google Scholar 

  • Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 20(2):712–736

    Article  MATH  MathSciNet  Google Scholar 

  • McLachlan GJ (1987) On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture. J Royal Stat Soc Ser C 36(3):318–324

    MathSciNet  Google Scholar 

  • McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, London

    Book  MATH  Google Scholar 

  • McLachlan GJ, Ng SK (2000) A comparison of some information criteria for the number of components in a mixture model. University of Queensland, Brisbane, Tech. Rep.

  • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, London

    Book  MATH  Google Scholar 

  • Naik PA, Shi P, Tsai CL (2007) Extending the Akaike information criterion to mixture regression models. J Am Stat Assoc 102(477):244–254

    Article  MATH  MathSciNet  Google Scholar 

  • Nylund KL, Asparouhov T, Muthén BO (2007) Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct Equ Model 14(4):535–569

    Article  MathSciNet  Google Scholar 

  • Oliveira-Brochado A, Martins FV (2008) Determining the number of market segments using an experimental design. FEP Working Papers 263, Universidade do Porto, Faculdade de Economia do Porto, http://ideas.repec.org/p/por/fepwps/263.html

  • Quandt RE (1972) A new approach to estimating switching regressions. J Am Stat Assoc 67(338):306–310

    Article  MATH  Google Scholar 

  • Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions vectors. J Am Stat Assoc 73(364):730–738

    Article  MATH  MathSciNet  Google Scholar 

  • Rissanen J (1986) Stochastic complexity and modeling. Ann Stat 14(3):1080–1100

    Article  MATH  MathSciNet  Google Scholar 

  • Sarstedt M (2008) Market segmentation with mixture regression models: understanding measures that guide model selection. J Target Meas Anal Mark 16(3):228–246

    Article  Google Scholar 

  • Schlattmann P (2009) Medical applications of finite mixture models. Statistics for Biology and Health, Springer, Berlin

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MATH  Google Scholar 

  • Sclove SL (1987) Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52(3):333–343

    Article  Google Scholar 

  • Seidel W, Sevcikova H (2004) Types of likelihood maxima in mixture models and their implication on the performance of tests. Ann Inst Stat Math 41(4):85–654

    MathSciNet  Google Scholar 

  • Seidel W, Mosler K, Alker M (2000a) A cautionary note on likelihood ratio tests in mixture models. Ann Inst Stat Math 52(3):481–487

    Article  MATH  MathSciNet  Google Scholar 

  • Seidel W, Mosler K, Alker M (2000b) Likelihood ratio tests based on subglobal optimization: a power comparison in exponential mixture models. Stat Pap 41(1):85–98

    Article  MATH  Google Scholar 

  • Steele RJ, Raftery AE (2009) Performance of Bayesian model selection criteria for Gaussian mixture models. University of Washington, Tech. Rep.

  • Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions, vol 42. Wiley, London

    MATH  Google Scholar 

  • Tofighi D, Enders CK (2008) Identifying the correct number of classes in growth mixture models. In: Hancock GR, Samuelsen KM (eds) Advances in latent variable mixture models. Information Age Publishing Inc., Charlotte, pp 317–341

    Google Scholar 

  • Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12:315–330

    Article  MathSciNet  Google Scholar 

  • Wedel M, Kamakura WA (1999) Market segmentation: concepts and methodological foundations. Kluwer, Berlin

    Google Scholar 

  • Wong CS, Li WK (2000) On a mixture autoregressive model. J Royal Stat Soc Ser B 62(1):95–115

    Article  MATH  MathSciNet  Google Scholar 

  • Yang CC (2006) Evaluating latent class analysis models in qualitative phenotype identification. Comput Stat Data Anal 50(4):1090–1104

    Article  Google Scholar 

  • Yang CC, Yang CC (2007) Separating latent classes by information criteria. J Classif 24:183–203

    Article  MATH  Google Scholar 

  • Yang Y (2005) Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation. Biometrika 92(4):937–950

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Depraetere.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 157 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Depraetere, N., Vandebroek, M. Order selection in finite mixtures of linear regressions. Stat Papers 55, 871–911 (2014). https://doi.org/10.1007/s00362-013-0534-x

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-013-0534-x

Keywords

Mathematics Subject Classification (2000)

Navigation