Skip to main content
Log in

Mixture model averaging for clustering

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In mixture model-based clustering applications, it is common to fit several models from a family and report clustering results from only the ‘best’ one. In such circumstances, selection of this best model is achieved using a model selection criterion, most often the Bayesian information criterion. Rather than throw away all but the best model, we average multiple models that are in some sense close to the best one, thereby producing a weighted average of clustering results. Two (weighted) averaging approaches are considered: averaging component membership probabilities and averaging models. In both cases, Occam’s window is used to determine closeness to the best model and weights are computed within a Bayesian model averaging paradigm. In some cases, we need to merge components before averaging; we introduce a method for merging mixture components based on the adjusted Rand index. The effectiveness of our model-based clustering averaging approaches is illustrated using a family of Gaussian mixture models on real and simulated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Anderson E (1935) The irises of the Gaspé peninsula. Bull Am Iris Soc 59:2–5

    Google Scholar 

  • Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373

    MathSciNet  Google Scholar 

  • Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate t-distributions. Comput Stat Data Anal 55(1):520–529

    MATH  MathSciNet  Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821

  • Basford KE, McLachlan GJ (1985) Estimation of allocation rates in a cluster analysis context. J Am Stat Assoc 80(390):286–293

    MathSciNet  Google Scholar 

  • Baudry J-P, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Graph Stat 19(2):332–353

    MathSciNet  Google Scholar 

  • Bhattacharya S, McNicholas PD (2014) A LASSO-penalized BIC for mixture model selection. Adv Data Anal Classif 8(1):45–61

    MathSciNet  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern Anal Mach Intell 22(7):719–725

    Google Scholar 

  • Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519

    MATH  MathSciNet  Google Scholar 

  • Browne RP, McNicholas PD (2013) Mixture: mixture models for clustering and classification. R package version 1.0

  • Browne RP, McNicholas PD (2014) Estimating common principal components in high dimensions. Adv Data Anal Classif 8(2):217–226

    MathSciNet  Google Scholar 

  • Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28(5):781–793

    Google Scholar 

  • Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do K-A, Müller P, Vannucci M (eds) Bayesian inference for gene expression and proteomics. Cambridge University Press, New York

  • Dasgupta A, Raftery AE (1998) Detecting features in spatial point processes with clutter via model-based clustering. J Am Stat Assoc 93:294–302

    MATH  Google Scholar 

  • Dean N, Murphy TB, Downey G (2006) Using unlabelled data to update classification rules with applications in food authenticity studies. J R Stat Soc: Ser C 55(1):1–14

    MATH  MathSciNet  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc: Ser B 39(1):1–38

    MATH  MathSciNet  Google Scholar 

  • Faraway J (2011) Faraway: functions and datasets for books by Julian Faraway. R package version 1.0.5

  • Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Google Scholar 

  • Flury B (1997) A first course in multivariate statistics. Springer, New York

    Book  MATH  Google Scholar 

  • Flury B (2012) Flury: data sets from flury, 1997. R package version 0.1-3

  • Forina M, Armanino C, Castino M, Ubigli M (1986) Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25:189–201

    Google Scholar 

  • Fraley C, Raftery AE, Murphy TB, Scrucca L (2012) mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Technical Report 597, Department of Statistics, University of Washington, Seattle, WA

  • Fraley C, Raftery AE, Scrucca L (2013) mclust: normal mixture modeling for model-based clustering, classification, and density estimation. R package version 4.2

  • Franczak BC, Browne RP, McNicholas PD (2014) Mixtures of shifted asymmetric Laplace distributions. IEEE Trans Pattern Anal Mach Intell 36(6):1149–1157

    Google Scholar 

  • Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27:835–850

    Google Scholar 

  • Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc: Ser B 58:155–176

    MATH  MathSciNet  Google Scholar 

  • Hennig C (2010) Methods for merging Gaussian mixture components. Adv Data Anal Classif 4:3–34

    MATH  MathSciNet  Google Scholar 

  • Hjort NL, Claeskens G (2003) Frequentist model average estimators. J Am Stat Assoc 98(464):879–899

    MATH  MathSciNet  Google Scholar 

  • Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: A tutorial. Stat Sci 14(4):382–401

    MATH  MathSciNet  Google Scholar 

  • Hoeting JA, Raftery AE, Madigan D (1999) Bayesian simultaneous variable and transformation selection in linear regression. Technical Report 9905, Department of Statistics, Colorado State University

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Google Scholar 

  • Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37

    MathSciNet  Google Scholar 

  • Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795

    MATH  Google Scholar 

  • Keribin C (2000) Consistent estimation of the order of mixture models. Sankhyā Indian J Stat Ser A 62(1):49–66

    MATH  MathSciNet  Google Scholar 

  • Krivitsky PN, Handcock MS, Raftery AE, Hoff PD (2009) Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc Netw 31(3):204–213

    Google Scholar 

  • Leroux BG (1992) Consistent estimation of a mixing distribution. Ann Stat 1992:1350–1360

    MathSciNet  Google Scholar 

  • Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546

    MATH  Google Scholar 

  • Mangasarian OL, Street WN, Wolberg WH (1995) Breast cancer diagnosis and prognosis via linear programming. Oper Res 43:570–577

  • MATLAB (2011). version 7.12.0.635 (R2011a). Natick, Massachusetts: The MathWorks Inc.

  • McNicholas PD (2010) Model-based classification using latent Gaussian mixture models. J Stat Plan Inference 140(5):1175–1181

    MATH  MathSciNet  Google Scholar 

  • McNicholas PD, Browne RP (2013) Discussion of How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J R Stat Soc: Ser C 62(3):352–353

    Google Scholar 

  • McNicholas PD, Jampani KR, McDaid AF, Murphy TB, Banks L (2014) pgmm: Parsimonious Gaussian Mixture Models. R package version 1.1

  • McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296

    MathSciNet  Google Scholar 

  • McNicholas PD, Murphy TB (2010) Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21):2705–2712

    Article  Google Scholar 

  • Milligan GW, Cooper MC (1986) A study of the comparability of external criteria for hierarchical cluster analysis. Multivar Behav Res 21(4):441–458

    Google Scholar 

  • Molitor J, Papathomas M, Jerrett M, Richardson S (2010) Bayesian profile regression with an application to the national survey of children’s health. Biostatistics 11(3):484–498

    Article  Google Scholar 

  • Murray PM, Browne RB, McNicholas PD (2014) Mixtures of skew-t factor analyzers. Comput Stat Data Anal 77:326–335

    MathSciNet  Google Scholar 

  • Qiu W, Joe H (2006) Generation of random clusters with specified degree of separation. J Classif 23:315–334

    MathSciNet  Google Scholar 

  • Qiu W, Joe H (2012) ClusterGeneration: random cluster generation (with specified degree of separation). R package version 1.2.9

  • R Core Team (2013) R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria

  • Raftery AE (1996) Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83(2):251–266

    Article  MATH  MathSciNet  Google Scholar 

  • Raftery AE, Madigan D, Hoeting JA (1998) Bayesian model averaging for linear regression models. J Am Stat Assoc 92:179–191

    MathSciNet  Google Scholar 

  • Raftery AE, Madigan D, Volinsky CT (1995) Accounting for model uncertainty in survival analysis improves predictive performance (with discussion). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian Statistics, vol 5. Oxford University Press, Oxford, pp 323–349

    Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66:846–850

    Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    MATH  Google Scholar 

  • Steinley D (2004) Properties of the Hubert-Arabie adjusted Rand index. Psychol Methods 9:386–396

    Google Scholar 

  • Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc: Ser B 62:795–809

    MATH  MathSciNet  Google Scholar 

  • Strehl A, Ghosh J, Cardie C (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    Google Scholar 

  • Volinsky CT, Madigan D, Raftery AE, Kronmal RA (1997) Bayesian model averaging in proportional hazard models: Assessing the risk of a stroke. J R Stat Soc: Ser C 46(4):433–448

    MATH  Google Scholar 

  • Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196–210

    MathSciNet  Google Scholar 

  • Wehrens R, Buydens LM, Fraley C, Raftery AE (2004) Model-based clustering for image segmentation and large datasets via sampling. J Classif 21:231–253

    MATH  MathSciNet  Google Scholar 

  • Wolfe JH (1963) Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley

  • Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987

Download references

Acknowledgments

The authors gratefully acknowledge the very helpful comments and suggestions of an associate editor and three anonymous reviewers. The authors are grateful to Professor Adrian Raftery and other members of the University of Washington Working Group on Model-Based Clustering for their comments and suggestions on an earlier version of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul D. McNicholas.

Additional information

This work was supported by an Ontario Graduate Scholarship, an Early Researcher Award from the Ontario Ministry of Research and Innovation, a grant-in-aid from Compusense Inc., and a Collaborative Research and Development Grant from the Natural Sciences and Engineering Research Council of Canada.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, Y., McNicholas, P.D. Mixture model averaging for clustering. Adv Data Anal Classif 9, 197–217 (2015). https://doi.org/10.1007/s11634-014-0182-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-014-0182-6

Keywords

Mathematics Subject Classification

Navigation