Skip to main content
Log in

Exponential family mixed membership models for soft clustering of multivariate data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

For several years, model-based clustering methods have successfully tackled many of the challenges presented by data-analysts. However, as the scope of data analysis has evolved, some problems may be beyond the standard mixture model framework. One such problem is when observations in a dataset come from overlapping clusters, whereby different clusters will possess similar parameters for multiple variables. In this setting, mixed membership models, a soft clustering approach whereby observations are not restricted to single cluster membership, have proved to be an effective tool. In this paper, a method for fitting mixed membership models to data generated by a member of an exponential family is outlined. The method is applied to count data obtained from an ultra running competition, and compared with a standard mixture model approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Note that these examples use different terminology to describe their methods: latent Dirichlet allocation (Blei et al. 2003), latent process decomposition (Rogers et al. 2005) and grade of membership (Erosheva et al. 2007; Gormley and Murphy 2009). Each of the models allocate individual observations to multiple components in a similar fashion, which we refer in general to as a mixed membership model (Erosheva et al. 2004).

  2. A version of this data is available at http://mathsci.ucd.ie/~brendan/data/24H.xlsx.

References

  • Abramowitz M, Stegun IA (1965) Handbook of mathematical functions: with formulas, graphs, and mathematical tables, 1st edn. Dover Publications, USA

  • Airoldi EM, Blei D, Erosheva E, Fienberg SE (2014) Introduction to mixed membership models and methods. In: Airoldi EM, Blei D, Erosheva E, Fienberg SE (eds) Handbook of mixed membership models, Chap. 1. Chapman & Hall/CRC, Boca Raton

  • Airoldi EM, Fienberg SE, Joutard C, Love T (2006) Discovering latent patterns with hierarchical Bayesian mixed-membership models. Technical report, Carnegie Mellon University, School of Computer Science, Machine Learning Department. Report no CMU-06-101. http://ra.adm.cs.cmu.edu/anon/ml/CMU-ML-06-101.pdf

  • Airoldi EM, Fienberg SE, Joutard C, Love T (2007) Discovering latent patterns with hierarchical Bayesian mixed-membership models. In: Poncelet P, Teisseire M, Masseglia F (eds) Data mining patterns: New methods and applications, Chap. 11. Idea Group Inc., Calgary

  • Baudry JP, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Gr Stat 19(2):332–353

    Article  MathSciNet  Google Scholar 

  • Beal M (2003) Variational algorithms for approximate Bayesian inference. Ph.D. dissertion. University College London

  • Bensmail H, Celeux G, Raftery AE, Robert C (1997) Inference in model-based cluster analysis. Stat Comput 7:1–10

    Article  Google Scholar 

  • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Anal Mach Intell IEEE Trans 22(7):719–725. doi:10.1109/34.865189

    Article  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, Secaucus

    MATH  Google Scholar 

  • Blei DM, Lafferty JD (2006) Dynamic topic models. In: Cohen W, Moore A (eds) Proceedings of the 23rd international machine learning conference. http://icml.cc/2016/awards/dtm.pdf. http://dl.acm.org/citation.cfm?id=1143859

  • Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35

    Article  MathSciNet  MATH  Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM Algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38. doi:10.2307/2984875

    MathSciNet  MATH  Google Scholar 

  • Erosheva EA, Fienberg SE, Joutard C (2007) Describing disability through individual-level mixture models for multivariate binary data. Ann Appl Stat 1(2):502–537

    Article  MathSciNet  MATH  Google Scholar 

  • Erosheva EA, Fienberg SE, Lafferty J (2004) Mixed-membership models of scientific publications. Proc Natl Acad Sci USA 101:5220–5227

    Article  Google Scholar 

  • Everitt BS, Hand DJ (1981) Finite mixture distributions. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631

    Article  MathSciNet  MATH  Google Scholar 

  • Galyardt A (2014) Interpreting mixed membership models: Implications of Erosheva’s representation theorem. In: Airoldi EM, Blei D, Erosheva E, Fienberg SE (eds) Handbook of mixed membership models, Chap. 11. Chapman & Hall/CRC, London

  • Gormley C, Murphy TB (2009) A grade of membership model for rank data. Bayesian Anal 4(2):265–296

    Article  MathSciNet  MATH  Google Scholar 

  • Hill MO (1973) Diversity and evenness: a unifying notation and its consequences. Ecology 54(2):427–432

    Article  Google Scholar 

  • Manrique-Vallier D (2014) Longitudinal mixed membership trajectory models for disability survey data. Ann Appl Stat 8(4):2268–2291

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan G, Peel D (2002). Finite mixture models. Wiley, New York

  • Ormerod JT, Wand MP (2010) Explaining variational approximations. Am Stat 64(2):140–153

    Article  MathSciNet  MATH  Google Scholar 

  • Rogers S, Girolami M, Campbell C, Breitling R (2005) The latent process decomposition of cDNA microarray datasets. IEEE/ACM Trans Comput Biol Bioinf 2:2005

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MathSciNet  MATH  Google Scholar 

  • van den Boogaart KG, Tolosana-Delgado R (2008) Compositions: A unified r package to analyze compositional data. Comput Geosci 34(4):320–338

  • Vermunt JK, Magidson J (2002) Latent class cluster analysis. In: Hagenaars JA, McCutcheon A (eds) Applied latent class analysis. Cambridge University Press, Cambridge, pp 89–106

  • Wang C, Blei D (2013) Variational inference in nonconjugate models. J Mach Learn Res 14:1005–1031

    MathSciNet  MATH  Google Scholar 

  • White A, Chan J, Hayes C, Murphy TB (2012) Mixed membership models for exploring user roles in online fora. In: Ellison N, Shanahan JG, Tufekci Z (eds) Proceedings of the sixth international AAAI conference on weblogs and social media (ICWSM 2012), pp 599–602. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4638

Download references

Acknowledgments

This work is supported by Science Foundation Ireland under the Clique Strategic Research Cluster (08/SRC/I1407) and Insight Research Centre grant (SF1/12/RC/2289).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur White.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

White, A., Murphy, T.B. Exponential family mixed membership models for soft clustering of multivariate data. Adv Data Anal Classif 10, 521–540 (2016). https://doi.org/10.1007/s11634-016-0267-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-016-0267-5

Keywords

Mathematics Subject Classification

Navigation