Order selection in finite mixtures of linear regressions

Depraetere, Nicolas; Vandebroek, Martina

doi:10.1007/s00362-013-0534-x

Order selection in finite mixtures of linear regressions

Literature review and a simulation study

Regular Article
Published: 12 June 2013

Volume 55, pages 871–911, (2014)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Nicolas Depraetere¹ &
Martina Vandebroek^1,2

475 Accesses
16 Citations
Explore all metrics

Abstract

Finite mixture models can adequately model population heterogeneity when this heterogeneity arises from a finite number of relatively homogeneous clusters. An example of such a situation is market segmentation. Order selection in mixture models, i.e. selecting the correct number of components, however, is a problem which has not been satisfactorily resolved. Existing simulation results in the literature do not completely agree with each other. Moreover, it appears that the performance of different selection methods is affected by the type of model and the parameter values. Furthermore, most existing results are based on simulations where the true generating model is identical to one of the models in the candidate set. In order to partly fill this gap we carried out a (relatively) large simulation study for finite mixture models of normal linear regressions. We included several types of model (mis)specification to study the robustness of 18 order selection methods. Furthermore, we compared the performance of these selection methods based on unpenalized and penalized estimates of the model parameters. The results indicate that order selection based on penalized estimates greatly improves the success rates of all order selection methods. The most successful methods were \(MDL2\), \(MRC\), \(MRC_k\), \(ICL\)–\(BIC\), \(ICL\), \(CAIC\), \(BIC\) and \(CLC\) but not one method was consistently good or best for all types of model (mis)specification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

This can be readily extended to the multivariate case.
It is possible to generalize (4) by including explanatory variables to model the mixture proportions using a logistic regression model for instance. If these explanatory variables are different from the variables which model the component means they can be ignored for order selection as the marginal model is a mixture model with the same number of components (Bandeen-Roche et al. 1997).
For more on this topic, see for instance McLachlan and Peel (McLachlan and Peel 2000, Sect. 6.4) or Garel (2007).
Bootstrapping the likelihood ratio test may however be very useful if one has enough time and/or computing power. Nylund et al. (2007) presented very favorable results from their simulation study.
The most widely known criterion of this type is probably \(ICOMP\) (Bozdogan 1993) which is defined as \(-2LL\left( \hat{\varvec{\varPsi }}\right) +n_p\log \left[ n_p^{-1}trace\left( \mathcal I ^{-1}\right) \right] -\log \left( |\mathcal I ^{-1}|\right) \) where \(\mathcal I \) denotes the expected information matrix, \(n_p\) is the number of parameters and \(|.|\) is the determinant.
Akaike himself actually called it ‘An information criterion’ (Burnham and Anderson 2002).
The Kullback–Leibler divergence between distributions \(f\) and \(g\) is defined as \( I(f,g)=\int {f(x)\log {f(x)}dx}-\int {f(x)\log {g(x)}dx}\) and represents the lost information when approximating \(f\) by \(g\) (Kullback and Leibler 1951; Burnham and Anderson 2002).
The symmetric Kullback–Leibler divergence \(J(f,g)\) between f and g is defined as \(J(f,g)=I(f,g)+I(g,f)\).
Burnham and Anderson (2002) argue that in most realistic situations, it is impossible that the true model is in the set of candidate models and show furthermore by simulation that in case it is, \(AIC\) also selects the true model with high probability.
It is not possible to set skewness independently from kurtosis (Headrick 2002).
The complete collection of tables with success rates for each combination of the experimental settings can be found in the supplementary materials.

References

Abbi R, El-Darzi E, Vasilakis C, Millard P (2008) Analysis of stopping criteria for the EM algorithm in the context of patient grouping according to length of stay. In 4th International IEEE Conference Intelligent Systems. IEEE (2008), Varna, Bulgaria, pp 9–14
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Article MATH MathSciNet Google Scholar
Andrews RL, Currim IS (2003a) A comparison of segment retention criteria for finite mixture logit models. J Mark Res 40(2):235–243
Article Google Scholar
Andrews RL, Currim IS (2003b) Retention of latent segments in regression-based marketing models. Int J Res Mark 20(4):315–321
Article Google Scholar
Bandeen-Roche K, Miglioretti DL, Zeger SL, Rathouz PJ (1997) Latent variable regression for multiple discrete outcomes. J Am Stat Assoc 92(440):1375–1386
Article MATH MathSciNet Google Scholar
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803
Article MATH MathSciNet Google Scholar
Bhansali RJ, Downham DY (1977) Some properties of the order of an autoregressive model selected by a generalization of Akaike’s EPF criterion. Biometrika 64(3):547
MATH MathSciNet Google Scholar
Biernacki C, Govaert G (1997) Using the classification likelihood to choose the number of clusters. Comput Sci Stat 29(2):451–457
Google Scholar
Biernacki C, Celeux G, Govaert G (1998) Assessing a mixture model for clustering with the integrated classification likelihood. Tech. Rep. 3521, No. 3521. Rhône-Alpes:INRIA.
Biernacki C, Celeux G, Govaert G (1999) An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Pattern Recognit Lett 20(3):267–272
Article MATH Google Scholar
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725
Article Google Scholar
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4):561–575
Article MATH MathSciNet Google Scholar
Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46(2):373–388
Article MATH Google Scholar
Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3):345–370
Article MATH MathSciNet Google Scholar
Bozdogan H (1993) Choosing the number of component clusters in the mixture-model using a new informational complexity criterion of the Inverse-Fisher information matrix. In: Opitz O, Lausen B, Klar R (eds) Information and classification. Springer, Heidelberg, pp 40–54
Chapter Google Scholar
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information—theoretic approach, 2nd edn. Springer, Berlin
Google Scholar
Cavanaugh JE (1999) A large-sample model selection criterion based on Kullback’s symmetric divergence. Stat Probab Lett 42(4):333–343
Article MATH MathSciNet Google Scholar
Cavanaugh JE (2004) Criteria for linear model selection based on Kullback’s symmetric divergence. Aust N Z J Stat 46(2):257–274
Article MATH MathSciNet Google Scholar
Celeux G, Soromenho G (1996) An entropy criterion for assessing the number of clusters in a mixture model. J Classif 13(2):195–212
Article MATH MathSciNet Google Scholar
Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivar Anal 100(7):1367–1383
Article MATH MathSciNet Google Scholar
Chen J, Tan X, Zhang R (2008) Inference for normal mixtures in mean and variance. Stat Sinica 18(2):443–465
MATH MathSciNet Google Scholar
Ciuperca G, Ridolfi A, Idier J (2003) Penalized maximum likelihood estimator for normal mixtures. Scand J Stat 30(1):45–59
Article MATH MathSciNet Google Scholar
Cutler A, Windham MP (1994) Information-based validity functionals for mixture analysis. In: Bozdogan H (ed) Proceedings of the First US/Japan Conference for Mixture Analysis. Kluwer, Amsterdam, pp 149–170
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B 39(1):1–38
Google Scholar
Desarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5(2):249–282
Article MATH MathSciNet Google Scholar
Dias JG (2007) Performance evaluation of information criteria for the naive-Bayes model in the case of latent class analysis: a Monte Carlo study. J Korean Stat Soc 36(3):435–445
MATH MathSciNet Google Scholar
Falk M (1999) A simple approach to the generation of uniformly distributed random variables with prescribed correlations. Commun Stat Simul Comput 28(3):785–791
Article MATH MathSciNet Google Scholar
Fleishman AI (1978) A method for simulating non-normal distributions. Psychometrika 43(4):521–532
Article MATH Google Scholar
Fonseca JRS, Cardoso MGMS (2007) Mixture-model cluster analysis using information theoretical criteria. Intell Data Anal 11(2):155–173
Google Scholar
Garel B (2007) Recent asymptotic results in testing for mixtures. Comput Stat Data Anal 51(11):5295–5304
Article MATH MathSciNet Google Scholar
Ghosh JK, Sen PK (1985) On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. In: Proceedings of the Berkeley Conference in Honor of Jerzy Neymnan and Jack Kiefer, vol 2. Wadsworth, Monterey, pp 789–806
Hafidi B, Mkhadri A (2010) The Kullback information criterion for mixture regression models. Stat Probab Lett 80(9–10):807–815
Article MATH MathSciNet Google Scholar
Hannan EJ, Quinn BG (1979) The determination of the order of an autoregression. J Royal Stat Soc Ser B 41(2):190–195
MATH MathSciNet Google Scholar
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13(2):795–800
Article MATH MathSciNet Google Scholar
Hathaway RJ (1986) Another interpretation of the EM algorithm for mixture distributions. Stat Probab Lett 4(2):53–56
Article MATH MathSciNet Google Scholar
Hawkins DS, Allen DM, Stromberg AJ (2001) Determining the number of components in mixtures of linear models. Comput Stat Data Anal 38(1):15–48
Article MATH MathSciNet Google Scholar
Headrick TC (2002) Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions. Comput Stat Data Anal 40(4):685–711
Article MATH MathSciNet Google Scholar
Hurvich CM, Tsai CL (1989) Regression and time series model selection in small samples. Biometrika 76(2):297
Article MATH MathSciNet Google Scholar
James W, Stein C (1961) Estimation with quadratic loss. In: Neyman J (ed) Proceedings Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol 1. University of California Press, California, pp 361–379
Google Scholar
Jedidi K, Jagpal HS, DeSarbo WS (1997) Finite-mixture structural equation models for response-based segmentation and unobserved heterogeneity. Mark Sci 16(1):39–59
Article Google Scholar
Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41(3–4):577–590
Article MATH MathSciNet Google Scholar
Konishi S, Kitagawa G (1996) Generalized information criteria in model selection. Biometrika 83(4):875–890
Article MATH MathSciNet Google Scholar
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86
Article MATH MathSciNet Google Scholar
Liang Z, Jaszczak RJ, Coleman RE (1992) Parameter estimation of finite mixtures using the EM algorithm and information criteria with application to medical image processing. IEEE Trans Nucl Sci 39(4):1126–1133
Article Google Scholar
Lindstrom MJ, Bates DM (1988) Newton–Raphson and EM algorithms for linear mixed models for repeated-measures data. J Am Stat Assoc 83(404):1014–1022
MATH MathSciNet Google Scholar
Lubke GH, Neale MC (2006) Distinguishing between latent classes and continuous factors: resolution by maximum likelihood? Multivar Behav Res 41(4):499–532
Article Google Scholar
Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 20(2):712–736
Article MATH MathSciNet Google Scholar
McLachlan GJ (1987) On bootstrapping the likelihood ratio test stastistic for the number of components in a normal mixture. J Royal Stat Soc Ser C 36(3):318–324
MathSciNet Google Scholar
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, London
Book MATH Google Scholar
McLachlan GJ, Ng SK (2000) A comparison of some information criteria for the number of components in a mixture model. University of Queensland, Brisbane, Tech. Rep.
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, London
Book MATH Google Scholar
Naik PA, Shi P, Tsai CL (2007) Extending the Akaike information criterion to mixture regression models. J Am Stat Assoc 102(477):244–254
Article MATH MathSciNet Google Scholar
Nylund KL, Asparouhov T, Muthén BO (2007) Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct Equ Model 14(4):535–569
Article MathSciNet Google Scholar
Oliveira-Brochado A, Martins FV (2008) Determining the number of market segments using an experimental design. FEP Working Papers 263, Universidade do Porto, Faculdade de Economia do Porto, http://ideas.repec.org/p/por/fepwps/263.html
Quandt RE (1972) A new approach to estimating switching regressions. J Am Stat Assoc 67(338):306–310
Article MATH Google Scholar
Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions vectors. J Am Stat Assoc 73(364):730–738
Article MATH MathSciNet Google Scholar
Rissanen J (1986) Stochastic complexity and modeling. Ann Stat 14(3):1080–1100
Article MATH MathSciNet Google Scholar
Sarstedt M (2008) Market segmentation with mixture regression models: understanding measures that guide model selection. J Target Meas Anal Mark 16(3):228–246
Article Google Scholar
Schlattmann P (2009) Medical applications of finite mixture models. Statistics for Biology and Health, Springer, Berlin
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MATH Google Scholar
Sclove SL (1987) Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52(3):333–343
Article Google Scholar
Seidel W, Sevcikova H (2004) Types of likelihood maxima in mixture models and their implication on the performance of tests. Ann Inst Stat Math 41(4):85–654
MathSciNet Google Scholar
Seidel W, Mosler K, Alker M (2000a) A cautionary note on likelihood ratio tests in mixture models. Ann Inst Stat Math 52(3):481–487
Article MATH MathSciNet Google Scholar
Seidel W, Mosler K, Alker M (2000b) Likelihood ratio tests based on subglobal optimization: a power comparison in exponential mixture models. Stat Pap 41(1):85–98
Article MATH Google Scholar
Steele RJ, Raftery AE (2009) Performance of Bayesian model selection criteria for Gaussian mixture models. University of Washington, Tech. Rep.
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions, vol 42. Wiley, London
MATH Google Scholar
Tofighi D, Enders CK (2008) Identifying the correct number of classes in growth mixture models. In: Hancock GR, Samuelsen KM (eds) Advances in latent variable mixture models. Information Age Publishing Inc., Charlotte, pp 317–341
Google Scholar
Viele K, Tong B (2002) Modeling with mixtures of linear regressions. Stat Comput 12:315–330
Article MathSciNet Google Scholar
Wedel M, Kamakura WA (1999) Market segmentation: concepts and methodological foundations. Kluwer, Berlin
Google Scholar
Wong CS, Li WK (2000) On a mixture autoregressive model. J Royal Stat Soc Ser B 62(1):95–115
Article MATH MathSciNet Google Scholar
Yang CC (2006) Evaluating latent class analysis models in qualitative phenotype identification. Comput Stat Data Anal 50(4):1090–1104
Article Google Scholar
Yang CC, Yang CC (2007) Separating latent classes by information criteria. J Classif 24:183–203
Article MATH Google Scholar
Yang Y (2005) Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation. Biometrika 92(4):937–950
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Economics and Business, KU Leuven, Naamsestraat 69, 3000 , Leuven, Belgium
Nicolas Depraetere & Martina Vandebroek
Leuven Statistics Research Center, W. de Croylaan 54, 3001 , Leuven-Heverlee, Belgium
Martina Vandebroek

Authors

Nicolas Depraetere
View author publications
You can also search for this author in PubMed Google Scholar
Martina Vandebroek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Depraetere.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 157 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Depraetere, N., Vandebroek, M. Order selection in finite mixtures of linear regressions. Stat Papers 55, 871–911 (2014). https://doi.org/10.1007/s00362-013-0534-x

Download citation

Received: 13 June 2012
Revised: 11 February 2013
Published: 12 June 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s00362-013-0534-x

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Order selection in finite mixtures of linear regressions

Abstract

Access this article

Similar content being viewed by others

How collinearity affects mixture regression results

Empirical likelihood for semi-varying coefficient models for panel data with fixed effects

Model Averaging Estimation for Varying-Coefficient Single-Index Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 157 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Order selection in finite mixtures of linear regressions

Abstract

Access this article

Similar content being viewed by others

How collinearity affects mixture regression results

Empirical likelihood for semi-varying coefficient models for panel data with fixed effects

Model Averaging Estimation for Varying-Coefficient Single-Index Models

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 157 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation