Abstract
Finite mixture models can adequately model population heterogeneity when this heterogeneity arises from a finite number of relatively homogeneous clusters. An example of such a situation is market segmentation. Order selection in mixture models, i.e. selecting the correct number of components, however, is a problem which has not been satisfactorily resolved. Existing simulation results in the literature do not completely agree with each other. Moreover, it appears that the performance of different selection methods is affected by the type of model and the parameter values. Furthermore, most existing results are based on simulations where the true generating model is identical to one of the models in the candidate set. In order to partly fill this gap we carried out a (relatively) large simulation study for finite mixture models of normal linear regressions. We included several types of model (mis)specification to study the robustness of 18 order selection methods. Furthermore, we compared the performance of these selection methods based on unpenalized and penalized estimates of the model parameters. The results indicate that order selection based on penalized estimates greatly improves the success rates of all order selection methods. The most successful methods were \(MDL2\), \(MRC\), \(MRC_k\), \(ICL\)–\(BIC\), \(ICL\), \(CAIC\), \(BIC\) and \(CLC\) but not one method was consistently good or best for all types of model (mis)specification.
Similar content being viewed by others
Notes
This can be readily extended to the multivariate case.
It is possible to generalize (4) by including explanatory variables to model the mixture proportions using a logistic regression model for instance. If these explanatory variables are different from the variables which model the component means they can be ignored for order selection as the marginal model is a mixture model with the same number of components (Bandeen-Roche et al. 1997).
Bootstrapping the likelihood ratio test may however be very useful if one has enough time and/or computing power. Nylund et al. (2007) presented very favorable results from their simulation study.
The most widely known criterion of this type is probably \(ICOMP\) (Bozdogan 1993) which is defined as \(-2LL\left( \hat{\varvec{\varPsi }}\right) +n_p\log \left[ n_p^{-1}trace\left( \mathcal I ^{-1}\right) \right] -\log \left( |\mathcal I ^{-1}|\right) \) where \(\mathcal I \) denotes the expected information matrix, \(n_p\) is the number of parameters and \(|.|\) is the determinant.
Akaike himself actually called it ‘An information criterion’ (Burnham and Anderson 2002).
The symmetric Kullback–Leibler divergence \(J(f,g)\) between f and g is defined as \(J(f,g)=I(f,g)+I(g,f)\).
Burnham and Anderson (2002) argue that in most realistic situations, it is impossible that the true model is in the set of candidate models and show furthermore by simulation that in case it is, \(AIC\) also selects the true model with high probability.
It is not possible to set skewness independently from kurtosis (Headrick 2002).
The complete collection of tables with success rates for each combination of the experimental settings can be found in the supplementary materials.
References
Abbi R, El-Darzi E, Vasilakis C, Millard P (2008) Analysis of stopping criteria for the EM algorithm in the context of patient grouping according to length of stay. In 4th International IEEE Conference Intelligent Systems. IEEE (2008), Varna, Bulgaria, pp 9–14
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Andrews RL, Currim IS (2003a) A comparison of segment retention criteria for finite mixture logit models. J Mark Res 40(2):235–243
Andrews RL, Currim IS (2003b) Retention of latent segments in regression-based marketing models. Int J Res Mark 20(4):315–321
Bandeen-Roche K, Miglioretti DL, Zeger SL, Rathouz PJ (1997) Latent variable regression for multiple discrete outcomes. J Am Stat Assoc 92(440):1375–1386
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803
Bhansali RJ, Downham DY (1977) Some properties of the order of an autoregressive model selected by a generalization of Akaike’s EPF criterion. Biometrika 64(3):547
Biernacki C, Govaert G (1997) Using the classification likelihood to choose the number of clusters. Comput Sci Stat 29(2):451–457
Biernacki C, Celeux G, Govaert G (1998) Assessing a mixture model for clustering with the integrated classification likelihood. Tech. Rep. 3521, No. 3521. Rhône-Alpes:INRIA.
Biernacki C, Celeux G, Govaert G (1999) An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Pattern Recognit Lett 20(3):267–272
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4):561–575
Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46(2):373–388
Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3):345–370
Bozdogan H (1993) Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the Inverse-Fisher information matrix. In: Opitz O, Lausen B, Klar R (eds) Information and classification. Springer, Heidelberg, pp 40–54
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information—theoretic approach, 2nd edn. Springer, Berlin
Cavanaugh JE (1999) A large-sample model selection criterion based on Kullback’s symmetric divergence. Stat Probab Lett 42(4):333–343
Cavanaugh JE (2004) Criteria for linear model selection based on Kullback’s symmetric divergence. Aust N Z J Stat 46(2):257–274
Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13(2):195–212
Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivar Anal 100(7):1367–1383
Chen J, Tan X, Zhang R (2008) Inference for normal mixtures in mean and variance. Stat Sinica 18(2):443–465
Ciuperca G, Ridolfi A, Idier J (2003) Penalized maximum likelihood estimator for normal mixtures. Scand J Stat 30(1):45–59
Cutler A, Windham MP (1994) Information-based validity functionals for mixture analysis. In: Bozdogan H (ed) Proceedings of the First US/Japan Conference for Mixture Analysis. Kluwer, Amsterdam, pp 149–170
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B 39(1):1–38
Desarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5(2):249–282
Dias JG (2007) Performance evaluation of information criteria for the naive-Bayes model in the case of latent class analysis: a Monte Carlo study. J Korean Stat Soc 36(3):435–445
Falk M (1999) A simple approach to the generation of uniformly distributed random variables with prescribed correlations. Commun Stat Simul Comput 28(3):785–791
Fleishman AI (1978) A method for simulating non-normal distributions. Psychometrika 43(4):521–532
Fonseca JRS, Cardoso MGMS (2007) Mixture-model cluster analysis using information theoretical criteria. Intell Data Anal 11(2):155–173
Garel B (2007) Recent asymptotic results in testing for mixtures. Comput Stat Data Anal 51(11):5295–5304
Ghosh JK, Sen PK (1985) On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. In: Proceedings of the Berkeley Conference in Honor of Jerzy Neymnan and Jack Kiefer, vol 2. Wadsworth, Monterey, pp 789–806
Hafidi B, Mkhadri A (2010) The Kullback information criterion for mixture regression models. Stat Probab Lett 80(9–10):807–815
Hannan EJ, Quinn BG (1979) The determination of the order of an autoregression. J Royal Stat Soc Ser B 41(2):190–195
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13(2):795–800
Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probab Lett 4(2):53–56
Hawkins DS, Allen DM, Stromberg AJ (2001) Determining the number of components in mixtures of linear models. Comput Stat Data Anal 38(1):15–48
Headrick TC (2002) Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions. Comput Stat Data Anal 40(4):685–711
Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrika 76(2):297
James W, Stein C (1961) Estimation with quadratic loss. In: Neyman J (ed) Proceedings Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol 1. University of California Press, California, pp 361–379
Jedidi K, Jagpal HS, DeSarbo WS (1997) Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Mark Sci 16(1):39–59
Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41(3–4):577–590
Konishi S, Kitagawa G (1996) Generalized information criteria in model selection. Biometrika 83(4):875–890
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Liang Z, Jaszczak RJ, Coleman RE (1992) Parameter estimation of finite mixtures using the EM algorithm and information criteria with application to medical image processing. IEEE Trans Nucl Sci 39(4):1126–1133
Lindstrom MJ, Bates DM (1988) Newton–Raphson and EM algorithms for linear mixed models for repeated-measures data. J Am Stat Assoc 83(404):1014–1022
Lubke GH, Neale MC (2006) Distinguishing between latent classes and continuous factors: resolution by maximum likelihood? Multivar Behav Res 41(4):499–532
Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 20(2):712–736
McLachlan GJ (1987) On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture. J Royal Stat Soc Ser C 36(3):318–324
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, London
McLachlan GJ, Ng SK (2000) A comparison of some information criteria for the number of components in a mixture model. University of Queensland, Brisbane, Tech. Rep.
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, London
Naik PA, Shi P, Tsai CL (2007) Extending the Akaike information criterion to mixture regression models. J Am Stat Assoc 102(477):244–254
Nylund KL, Asparouhov T, Muthén BO (2007) Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct Equ Model 14(4):535–569
Oliveira-Brochado A, Martins FV (2008) Determining the number of market segments using an experimental design. FEP Working Papers 263, Universidade do Porto, Faculdade de Economia do Porto, http://ideas.repec.org/p/por/fepwps/263.html
Quandt RE (1972) A new approach to estimating switching regressions. J Am Stat Assoc 67(338):306–310
Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions vectors. J Am Stat Assoc 73(364):730–738
Rissanen J (1986) Stochastic complexity and modeling. Ann Stat 14(3):1080–1100
Sarstedt M (2008) Market segmentation with mixture regression models: understanding measures that guide model selection. J Target Meas Anal Mark 16(3):228–246
Schlattmann P (2009) Medical applications of finite mixture models. Statistics for Biology and Health, Springer, Berlin
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Sclove SL (1987) Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52(3):333–343
Seidel W, Sevcikova H (2004) Types of likelihood maxima in mixture models and their implication on the performance of tests. Ann Inst Stat Math 41(4):85–654
Seidel W, Mosler K, Alker M (2000a) A cautionary note on likelihood ratio tests in mixture models. Ann Inst Stat Math 52(3):481–487
Seidel W, Mosler K, Alker M (2000b) Likelihood ratio tests based on subglobal optimization: a power comparison in exponential mixture models. Stat Pap 41(1):85–98
Steele RJ, Raftery AE (2009) Performance of Bayesian model selection criteria for Gaussian mixture models. University of Washington, Tech. Rep.
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions, vol 42. Wiley, London
Tofighi D, Enders CK (2008) Identifying the correct number of classes in growth mixture models. In: Hancock GR, Samuelsen KM (eds) Advances in latent variable mixture models. Information Age Publishing Inc., Charlotte, pp 317–341
Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12:315–330
Wedel M, Kamakura WA (1999) Market segmentation: concepts and methodological foundations. Kluwer, Berlin
Wong CS, Li WK (2000) On a mixture autoregressive model. J Royal Stat Soc Ser B 62(1):95–115
Yang CC (2006) Evaluating latent class analysis models in qualitative phenotype identification. Comput Stat Data Anal 50(4):1090–1104
Yang CC, Yang CC (2007) Separating latent classes by information criteria. J Classif 24:183–203
Yang Y (2005) Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation. Biometrika 92(4):937–950
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Depraetere, N., Vandebroek, M. Order selection in finite mixtures of linear regressions. Stat Papers 55, 871–911 (2014). https://doi.org/10.1007/s00362-013-0534-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-013-0534-x