Mixture of latent trait analyzers for model-based clustering of categorical data

Gollini, Isabella; Murphy, Thomas Brendan

doi:10.1007/s11222-013-9389-1

Mixture of latent trait analyzers for model-based clustering of categorical data

Published: 03 April 2013

Volume 24, pages 569–588, (2014)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Isabella Gollini¹ &
Thomas Brendan Murphy²

990 Accesses
40 Citations
1 Altmetric
Explore all metrics

Abstract

Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspondence between the estimated latent classes and groups in the population of interest. The mixture of latent trait analyzers model extends latent class analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable; the discrete latent class accommodates group structure and the continuous latent trait accommodates dependence within these groups. Fitting the mixture of latent trait analyzers model is potentially difficult because the likelihood function involves an integral that cannot be evaluated analytically. We develop a variational approach for fitting the mixture of latent trait models and this provides an efficient model fitting strategy. The mixture of latent trait analyzers model is demonstrated on the analysis of data from the National Long Term Care Survey (NLTCS) and voting in the U.S. Congress. The model is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture Models: Latent Profile and Latent Class Analysis

Mixture Model Clustering with Covariates Using Adjusted Three-Step Approaches

Model based clustering for mixed data: clustMD

Article 12 February 2016

References

Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th edn. Dover, New York (1964)
MATH Google Scholar
Allman, E.S., Matias, C., Rhodes, J.: Identifiability of parameters in latent structure models with many observed variables. Ann. Stat. 37, 3099–3132 (2009)
Article MATH MathSciNet Google Scholar
Andrews, J.L., McNicholas, P.D.: Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions: the tEIGEN family. Stat. Comput. 22, 1021–1029 (2012)
Article MATH MathSciNet Google Scholar
Baek, J., McLachlan, G., Flack, L.: Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualisation of high-dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1298–1309 (2010)
Article Google Scholar
Bartholomew, D.J.: Factor analysis for categorical data. J. R. Stat. Soc. B 42, 293–321 (1980)
MATH MathSciNet Google Scholar
Bartholomew, D.J., Steele, F., Moustaki, I., Galbraith, J.: The Analysis and Interpretation of Multivariate Data for Social Scientists. Chapman & Hall, London (2002)
MATH Google Scholar
Bartholomew, D.J., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis: A Unified Approach, 3rd edn. Wiley, New York (2011)
Book Google Scholar
Biernacki, C., Celeux, G., Govaert, G., Langrognet, F.: Model-based cluster and discriminant analysis with the MIXMOD software. Comput. Stat. Data Anal. 51, 587–600 (2006)
Article MATH MathSciNet Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)
MATH Google Scholar
Bock, R.D., Aitkin, M.: Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika 46, 443–459 (1981)
Article MathSciNet Google Scholar
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec. 26, 255–264 (1997). doi:10.1145/253262.253325
Article Google Scholar
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recognit. 28, 781–793 (1995)
Article Google Scholar
Congressional Quarterly Almanac: 98th congress, 2nd session, volume XL ed. (1984)
Dean, N., Raftery, A.: Latent class analysis variable selection. Ann. Inst. Stat. Math. 62, 11–35 (2010)
Article MathSciNet Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood for incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 39, 1–38 (1977)
MATH MathSciNet Google Scholar
Efron, B.: Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68, 589–599 (1981)
Article MATH MathSciNet Google Scholar
Erosheva, E.A.: Grade of membership and latent structure models with application to disability survey data. Ph.D. thesis, Department of Statistics, Carnegie Mellon University (2002)
Erosheva, E.A.: Bayesian estimation of the grade of membership model. In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statistics, Oxford, vol. 7, pp. 501–510 (2003)
Google Scholar
Erosheva, E.A.: Partial membership models with application to disability survey data. In: Bozdogan, H. (ed.) Statistical Data Mining and Knowledge Discovery, pp. 117–134. CRC Press, Boca Raton (2004)
Google Scholar
Erosheva, E.A., Fienberg, S.E., Joutard, C.: Describing disability through individual-level mixture models for multivariate binary data. Ann. Appl. Stat. 1, 502–537 (2007)
Article MATH MathSciNet Google Scholar
Fienberg, S.E., Hersh, P., Rinaldo, A., Zhou, Y.: Maximum likelihood estimation in latent class models for contingency tables. In: Gibilisco, P., Riccomagno, E., Rogantin, M., Wynn, H. (eds.) Algebraic and Geometric Methods in Statistics, pp. 31–66. Cambridge University Press, Cambridge (2009)
Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97, 611–612 (2002)
Article MATH MathSciNet Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2010). http://archive.ics.uci.edu/ml
Google Scholar
Ghahramani, Z., Hinton, G.E.: The EM algorithm for mixtures of factor analyzers. Tech. Rep. CRG-TR-96-1, University of Toronto, Toronto (1997)
Goodman, L.A.: Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61, 215–231 (1974)
Article MATH MathSciNet Google Scholar
Hadgu, A., Qu, Y.: A biomedical application of latent class models with random effects. Appl. Stat. 47, 603–616 (1998)
MATH Google Scholar
Jaakkola, T.S., Jordan, M.I.: A variational approach to Bayesian logistic regression models and their extensions. In: Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics (1996)
Google Scholar
Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2008)
Article MathSciNet Google Scholar
Lin, T.I.: Robust mixture modeling using multivariate skew t distributions. Stat. Comput. 20, 343–356 (2010)
Article MathSciNet Google Scholar
Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modeling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007)
MATH MathSciNet Google Scholar
McLachlan, G., Peel, D.: The EMMIX algorithm for the fitting of normal and t-components. J. Stat. Softw. 4, 1–14 (1999)
Google Scholar
McLachlan, G., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics: Applied Probability and Statistics. Wiley-Interscience, New York (2000)
Book MATH Google Scholar
McLachlan, G., Peel, D., Bean, R.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41, 379–388 (2003)
Article MATH MathSciNet Google Scholar
McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008)
Article MathSciNet Google Scholar
McNicholas, P.D., Murphy, T.B.: Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26, 2705–2712 (2010)
Article Google Scholar
Muthén, B.: Latent variable mixture modeling. In: Marcoulides, G.A., Schumacker, R.E. (eds.) New Developments and Techniques in Structural Equation Modeling, pp. 1–33. Lawrence Erlbaum Associates, Mahwah (2001)
Google Scholar
Pauler, D.K.: The Schwarz criterion and related methods for normal linear models. Biometrika 85, 13–27 (1998)
Article MATH MathSciNet Google Scholar
Qu, Y., Tan, M., Kutner, M.H.: Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52, 797–810 (1996)
Article MATH MathSciNet Google Scholar
Raftery, A.E., Newton, M.A., Satagopan, J.M., Krivitsky, P.N.: Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. In: Bayesian Statistics, vol. 8, pp. 1–45. Oxford University Press, Oxford (2007)
Google Scholar
Rasch, G.: Studies in Mathematical Psychology: I. Probabilistic Models for Some Intelligence and Attainment Tests. Nielsen & Lydiche, Oxford (1960)
Google Scholar
Rost, J.: Rasch models in latent classes: an integration of two approaches to item analysis. Appl. Psychol. Meas. 14, 271–282 (1990)
Article Google Scholar
Rost, J., von Davier, M: Mixture distribution Rasch models. In: Fischer, G.H., Molenaar, I.W. (eds.) Rasch Models: Foundations, Recent Developments, and Applications, pp. 257–268. Springer, New York (1995)
Chapter Google Scholar
Sammel, M.D., Ryan, L.M., Legler, J.M.: Latent variable models for mixed discrete and continuous outcomes. J. R. Stat. Soc. B 59, 667–678 (1997)
Article MATH Google Scholar
Schlimmer, J.C.: Concept acquisition through representational adjustment. Ph.D. thesis, Department of Information and Computer Science, University of California, Irvine (1987)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Article MATH Google Scholar
Steele, R.J.: Practical Importance Sampling Methods for Finite Mixture Models and Multiple Imputation. Ph.D. thesis, University of Washington (2002)
Tipping, M.E.: Probabilistic visualisation of high-dimensional binary data. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems 11, pp. 592–598. MIT Press, Cambridge (1999)
Google Scholar
Uebersax, J.S.: Probit latent class analysis with dichotomous or ordered category measures: conditional independence/dependence models. Appl. Psychol. Meas. 23, 283–297 (1999)
Article Google Scholar
Vermunt, J.: Multilevel mixture item response theory models: an application in education testing. In: Proceedings of the 56th Session of the International Statistical Institute, Lisbon, Portugal. International Statistical Institute, Voorburg, Netherlands (2007)
Google Scholar
Vermunt, J., Magidson, J.: Factor analysis with categorical indicators: a comparison between traditional and latent class approaches. In: der Ark, A.V., Croon, M.A., Sijtsma, K. (eds.) New Developments in Categorical Data Analysis for the Social and Behavioral Sciences, pp. 41–62. Lawrence Erlbaum Associates, Mahwah (2005)
Google Scholar
Vermunt, J., Magidson, J.: LG-Syntax User’s Guide: Manual for Latent GOLD 4.5 Syntax Module. Statistical Innovations Inc., Belmont (2008)
Google Scholar
von Davier, M., Yamamoto, K.: Mixture distribution and HYBRID Rasch models. In: von Davier, M., Carstensen, C.H. (eds.) Multivariate and Mixture Distribution Rasch Models, pp. 99–115. Springer, New York (2007)
Chapter Google Scholar
von Davier, M., Rost, J., Carstensen, C.H.: Introduction: Extending the Rasch model. In: von Davier, M., Carstensen, C.H. (eds.) Multivariate and Mixture Distribution Rasch Models, pp. 1–12. Springer, New York (2007)
Chapter Google Scholar

Download references

Acknowledgements

We would like to think the editor, associate editor and reviewers for their insightful comments and suggestions which have greatly improved this paper. This research was supported by a Science Foundation Ireland Research Frontiers Programme Grant (06/RFP/M040) and Strategic Research Cluster Grant (08/SRC/I1407).

Author information

Authors and Affiliations

National Centre for Geocomputation, National University of Ireland, Maynooth, Ireland
Isabella Gollini
School of Mathematical Sciences, University College Dublin, Dublin, Ireland
Thomas Brendan Murphy

Authors

Isabella Gollini
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Brendan Murphy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Brendan Murphy.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 9.5 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gollini, I., Murphy, T.B. Mixture of latent trait analyzers for model-based clustering of categorical data. Stat Comput 24, 569–588 (2014). https://doi.org/10.1007/s11222-013-9389-1

Download citation

Received: 06 October 2011
Accepted: 22 February 2013
Published: 03 April 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11222-013-9389-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mixture of latent trait analyzers for model-based clustering of categorical data

Abstract

Access this article

Similar content being viewed by others

Mixture Models: Latent Profile and Latent Class Analysis

Mixture Model Clustering with Covariates Using Adjusted Three-Step Approaches

Model based clustering for mixed data: clustMD

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

(PDF 9.5 MB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mixture of latent trait analyzers for model-based clustering of categorical data

Abstract

Access this article

Similar content being viewed by others

Mixture Models: Latent Profile and Latent Class Analysis

Mixture Model Clustering with Covariates Using Adjusted Three-Step Approaches

Model based clustering for mixed data: clustMD

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic Supplementary Material

(PDF 9.5 MB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation