Exponential family mixed membership models for soft clustering of multivariate data

White, Arthur; Murphy, Thomas Brendan

doi:10.1007/s11634-016-0267-5

Exponential family mixed membership models for soft clustering of multivariate data

Regular Article
Published: 09 August 2016

Volume 10, pages 521–540, (2016)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Arthur White¹ &
Thomas Brendan Murphy²

363 Accesses
3 Citations
6 Altmetric
Explore all metrics

Abstract

For several years, model-based clustering methods have successfully tackled many of the challenges presented by data-analysts. However, as the scope of data analysis has evolved, some problems may be beyond the standard mixture model framework. One such problem is when observations in a dataset come from overlapping clusters, whereby different clusters will possess similar parameters for multiple variables. In this setting, mixed membership models, a soft clustering approach whereby observations are not restricted to single cluster membership, have proved to be an effective tool. In this paper, a method for fitting mixed membership models to data generated by a member of an exponential family is outlined. The method is applied to count data obtained from an ultra running competition, and compared with a standard mixture model approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clustering bivariate mixed-type data via the cluster-weighted model

Article 04 July 2015

Model selection and application to high-dimensional count data clustering

Article 13 November 2018

Advances in Robust Constrained Model Based Clustering

Notes

Note that these examples use different terminology to describe their methods: latent Dirichlet allocation (Blei et al. 2003), latent process decomposition (Rogers et al. 2005) and grade of membership (Erosheva et al. 2007; Gormley and Murphy 2009). Each of the models allocate individual observations to multiple components in a similar fashion, which we refer in general to as a mixed membership model (Erosheva et al. 2004).
A version of this data is available at http://mathsci.ucd.ie/~brendan/data/24H.xlsx.

References

Abramowitz M, Stegun IA (1965) Handbook of mathematical functions: with formulas, graphs, and mathematical tables, 1st edn. Dover Publications, USA
Airoldi EM, Blei D, Erosheva E, Fienberg SE (2014) Introduction to mixed membership models and methods. In: Airoldi EM, Blei D, Erosheva E, Fienberg SE (eds) Handbook of mixed membership models, Chap. 1. Chapman & Hall/CRC, Boca Raton
Airoldi EM, Fienberg SE, Joutard C, Love T (2006) Discovering latent patterns with hierarchical Bayesian mixed-membership models. Technical report, Carnegie Mellon University, School of Computer Science, Machine Learning Department. Report no CMU-06-101. http://ra.adm.cs.cmu.edu/anon/ml/CMU-ML-06-101.pdf
Airoldi EM, Fienberg SE, Joutard C, Love T (2007) Discovering latent patterns with hierarchical Bayesian mixed-membership models. In: Poncelet P, Teisseire M, Masseglia F (eds) Data mining patterns: New methods and applications, Chap. 11. Idea Group Inc., Calgary
Baudry JP, Raftery AE, Celeux G, Lo K, Gottardo R (2010) Combining mixture components for clustering. J Comput Gr Stat 19(2):332–353
Article MathSciNet Google Scholar
Beal M (2003) Variational algorithms for approximate Bayesian inference. Ph.D. dissertion. University College London
Bensmail H, Celeux G, Raftery AE, Robert C (1997) Inference in model-based cluster analysis. Stat Comput 7:1–10
Article Google Scholar
Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Anal Mach Intell IEEE Trans 22(7):719–725. doi:10.1109/34.865189
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Secaucus
MATH Google Scholar
Blei DM, Lafferty JD (2006) Dynamic topic models. In: Cohen W, Moore A (eds) Proceedings of the 23rd international machine learning conference. http://icml.cc/2016/awards/dtm.pdf. http://dl.acm.org/citation.cfm?id=1143859
Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35
Article MathSciNet MATH Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM Algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38. doi:10.2307/2984875
MathSciNet MATH Google Scholar
Erosheva EA, Fienberg SE, Joutard C (2007) Describing disability through individual-level mixture models for multivariate binary data. Ann Appl Stat 1(2):502–537
Article MathSciNet MATH Google Scholar
Erosheva EA, Fienberg SE, Lafferty J (2004) Mixed-membership models of scientific publications. Proc Natl Acad Sci USA 101:5220–5227
Article Google Scholar
Everitt BS, Hand DJ (1981) Finite mixture distributions. Chapman and Hall, London
Book MATH Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Article MathSciNet MATH Google Scholar
Galyardt A (2014) Interpreting mixed membership models: Implications of Erosheva’s representation theorem. In: Airoldi EM, Blei D, Erosheva E, Fienberg SE (eds) Handbook of mixed membership models, Chap. 11. Chapman & Hall/CRC, London
Gormley C, Murphy TB (2009) A grade of membership model for rank data. Bayesian Anal 4(2):265–296
Article MathSciNet MATH Google Scholar
Hill MO (1973) Diversity and evenness: a unifying notation and its consequences. Ecology 54(2):427–432
Article Google Scholar
Manrique-Vallier D (2014) Longitudinal mixed membership trajectory models for disability survey data. Ann Appl Stat 8(4):2268–2291
Article MathSciNet MATH Google Scholar
McLachlan G, Peel D (2002). Finite mixture models. Wiley, New York
Ormerod JT, Wand MP (2010) Explaining variational approximations. Am Stat 64(2):140–153
Article MathSciNet MATH Google Scholar
Rogers S, Girolami M, Campbell C, Breitling R (2005) The latent process decomposition of cDNA microarray datasets. IEEE/ACM Trans Comput Biol Bioinf 2:2005
Article Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Article MathSciNet MATH Google Scholar
van den Boogaart KG, Tolosana-Delgado R (2008) Compositions: A unified r package to analyze compositional data. Comput Geosci 34(4):320–338
Vermunt JK, Magidson J (2002) Latent class cluster analysis. In: Hagenaars JA, McCutcheon A (eds) Applied latent class analysis. Cambridge University Press, Cambridge, pp 89–106
Wang C, Blei D (2013) Variational inference in nonconjugate models. J Mach Learn Res 14:1005–1031
MathSciNet MATH Google Scholar
White A, Chan J, Hayes C, Murphy TB (2012) Mixed membership models for exploring user roles in online fora. In: Ellison N, Shanahan JG, Tufekci Z (eds) Proceedings of the sixth international AAAI conference on weblogs and social media (ICWSM 2012), pp 599–602. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4638

Download references

Acknowledgments

This work is supported by Science Foundation Ireland under the Clique Strategic Research Cluster (08/SRC/I1407) and Insight Research Centre grant (SF1/12/RC/2289).

Author information

Authors and Affiliations

School of Computer Science and Statistics, Trinity College Dublin, The University of Dublin, Dublin 2, Ireland
Arthur White
School of Mathematics & Statistics and Insight Research Centre, University College Dublin, Belfield, Dublin 4, Ireland
Thomas Brendan Murphy

Authors

Arthur White
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brendan Murphy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur White.

Rights and permissions

Reprints and permissions

About this article

Cite this article

White, A., Murphy, T.B. Exponential family mixed membership models for soft clustering of multivariate data. Adv Data Anal Classif 10, 521–540 (2016). https://doi.org/10.1007/s11634-016-0267-5

Download citation

Received: 14 May 2014
Revised: 28 July 2016
Accepted: 30 July 2016
Published: 09 August 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11634-016-0267-5

Keywords

Mathematics Subject Classification

62H30

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exponential family mixed membership models for soft clustering of multivariate data

Abstract

Access this article

Similar content being viewed by others

Clustering bivariate mixed-type data via the cluster-weighted model

Model selection and application to high-dimensional count data clustering

Advances in Robust Constrained Model Based Clustering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Exponential family mixed membership models for soft clustering of multivariate data

Abstract

Access this article

Similar content being viewed by others

Clustering bivariate mixed-type data via the cluster-weighted model

Model selection and application to high-dimensional count data clustering

Advances in Robust Constrained Model Based Clustering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation