, Volume 70, Issue 2, pp 325–345 | Cite as

Selecting the number of classes under latent class regression: a factor analytic analogue

  • Guan-Hua Huang


Recently, the regression extension of latent class analysis (RLCA) model has received much attention in the field of medical research. The basic RLCA model summarizes shared features of measured multiple indicators as an underlying categorical variable and incorporates the covariate information in modeling both latent class membership and multiple indicators themselves. To reduce complexity and enhance interpretability, one usually fixes the number of classes in a given RLCA. Often, goodness of fit methods comparing various estimated models are used as a criterion to select the number of classes. In this paper, we propose a new method that is based on an analogous method used in factor analysis and does not require repeated fitting. Two ideas with application to many settings other than ours are synthesized in deriving the method: a connection between latent class models and factor analysis, and techniques of covariate marginalization and elimination. A Monte Carlo simulation study is presented to evaluate the behavior of the selection procedure and compare to alternative approaches. Data from a study of how measured visual impairments affect older persons’ functioning are used for illustration.


categorical data factor analysis finite mixture model goodness of fit test latent profile model marginalization residuals in generalized linear models Monte Carlo simulation. 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agresti A. (1984) Analysis of Categorical Data. J. Wiley and Sons, New YorkGoogle Scholar
  2. Akaike H. (1987) Factor analysis and AIC. Psychometrika 52:317–332MathSciNetGoogle Scholar
  3. Bandeen-Roche K., Miglioretti D.L., Zeger S.L., Rathouz P.J. (1997) Latent variable regression for multiple discrete outcomes. Journal of the American Statistical Association 92:1375–1386MathSciNetGoogle Scholar
  4. Bartholomew D.J., Knott M. (1999) Latent Variable Models and Factor Analysis (2nd ed.). Kendall Library of Statistics. Arnold, LondonGoogle Scholar
  5. Cattell R.B. (1966) The scree test for the number of factors. Multivariate Behavioral Research 1:245–276Google Scholar
  6. Cattell R.B., Vogelmann S. (1977) A comprehensive trial of the scree and KG criteria for determining the number of factors. Multivariate Behavioral Research 12:289–325Google Scholar
  7. Cook R.D., Weisberg S. (1982) Residuals and Influence in Regression. Chapman Hall, LondonGoogle Scholar
  8. Dayton C.M., Macready G.B. (1988) Concomitant-variable latent-class models. Journal of the American Statistical Association 83:173–178Google Scholar
  9. Dempster A.P., Laird N.M., Rubin D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39:1–38Google Scholar
  10. Folstein M.F., Folstein S.E., McHugh P.R. (1975) Mini-mental state: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research 12:189CrossRefPubMedGoogle Scholar
  11. Formann A.K. (1992) Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association 87:476–486Google Scholar
  12. Francisco C.A., Finch M.D. (1979) A comparison of methods used for determining the number of factors to retain in factor analysis. American Statistical Association Proceedings of the Statistical Computing Section 105–110Google Scholar
  13. Geman S., Geman D. (1984) Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6:721–741Google Scholar
  14. Goldberg D. (1972) GHQ The Selection of Psychiatric Illness by Questionnaire. Oxford University Press, LondonGoogle Scholar
  15. Goodman L.A. (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61:215–231Google Scholar
  16. Graybill F.A. (1983) Matrices with Applications in Statistics. Wadsworth, BelmontGoogle Scholar
  17. Green B.F. (1951) A general solution of the latent class model of latent structure analysis and latent profile analysis. Psychometrika 16:151–166PubMedGoogle Scholar
  18. Guttman L. (1954) Some necessary conditions for common-factor analysis. Psychometrika 19:149–161Google Scholar
  19. Hagenaars J.A. (1993) Loglinear Models with Latent Variables Sage. University Paper series on Quantitative Applications in the Social Sciences, series no. 07–094. Newbury Park, CA: Sage Publications.Google Scholar
  20. Huang G.H., Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika 69:5–32Google Scholar
  21. Humphreys L.G. (1964) Number of cases and number of factors: an example where N is very large. Educational and Psychological Measurement 24:457–466Google Scholar
  22. Kashyap R.L. (1982) Optimal choice of AR and MA parts in autoregressive moving average models. IEEE Transactions on Pattern Analysis and Machine Intelligence 4:99–104Google Scholar
  23. Landwehr J.M., Pregibon D., Shoemaker C. (1984) Graphical methods for assessing logistic regression models. Journal of the American Statistical Association 79:61–71Google Scholar
  24. Lazarsfeld P.F., Henry N.W. (1968) Latent Structure Analysis. Houghton-Mifflin, New YorkGoogle Scholar
  25. Liang K.Y., Zeger S.L. (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22Google Scholar
  26. Linn R. (1968) A monte carlo approach to the number of factors problem. Psychometrika 33:37–71PubMedGoogle Scholar
  27. McCullagh P., Nelder J.A. (1989) Generalized Linear Models, (2nd ed.). , LondonGoogle Scholar
  28. Melton B., Liang K.Y., Pulver A.E. (1994) Extended latent class approach to the study of familial/sporadic forms of a disease: its application to the study of the heterogeneity of schizophrenia. Genetic Epidemiology 11:311–327CrossRefPubMedGoogle Scholar
  29. Moustaki I. (1996) A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psychology 49:313–334Google Scholar
  30. Muthén L.K., Muthén B.O. (1998) Mplus User’s Guide. Muthén & Muthén, Los Angeles, CAGoogle Scholar
  31. O’Hara Hines R.J., Carter E.M. (1993) Improved added variable and partial residual plots for the detection of influential observation in generalized linear models. Applied Statistics 42:3–20Google Scholar
  32. Rubin G.S., West S.K., Munoz B., Bandeen-Roche K., Zeger S.L., Schein O., Fried L.P. (1997) A comprehensive assessment of visual impairment in an older american population: SEE study. Investigative Ophthalmology and Visual Science 38:557–568PubMedGoogle Scholar
  33. Schwarz G. (1978) Estimating the dimension of a model. Annals of Statistics 6:461–464Google Scholar
  34. Statistical Sciences Inc. (1995) S-PLUS User’s Manual, Version 3.3 for Windows. Statistical Sciences Inc., SeattleGoogle Scholar
  35. Strang G. (1976) Linear Algebra and Its Applications. Academic Press, New YorkGoogle Scholar
  36. Titterington D.M., Smith A.F.M., Makov U.E. (1985) Statistical Analysis of Finite Mixture Distributions. Wiley, Chichester, U.KGoogle Scholar
  37. Van der Heijden P.G.M., Dessens J., Böckenholt U. (1996) Estimating the concomitant-variable latent-class model with the EM algorithm. Journal of Educational and Behavioral Statistics 21:215–229Google Scholar
  38. Vermunt J.K. (1996) Log-linear Event History Analysis: A General Approach with Missing Data, Unobserved Heterogeneity, and Latent Variables. Tilburg University Press, TilburgGoogle Scholar
  39. Vermunt J.K., Magidson J. (2000) Latent GOLD 2.0 User’s Guide. Statistical Innovations Inc., Belmont, MAGoogle Scholar
  40. Wang P.C. (1985) Adding a variable in generalized linear models. Technometrics 27:273–276MathSciNetGoogle Scholar
  41. Wang P.C. (1987) Residual plots for detecting nonlinearity in generalized linear models. Technometrics 29:435–438Google Scholar
  42. Wedel M., Desarbo W.S., Bult J.R., Ramaswamy V. (1993) A latent class poisson regression model for heterogeneous count data. Journal of Applied Econometrics 8:397–411Google Scholar
  43. West S.K., Munoz B., Rubin G.S., Schein O.D., Bandeen-Roche K., Zeger S.L., German P.S., Fried L.P. (1997). Function and visual impairment in a population-based study of older adults: SEE project. Investigative Ophthalmology and Visual Science 38:72–82PubMedGoogle Scholar
  44. Yakowitz S.J., Spragins J.D. (1968) On the identifiability of finite mixtures. The Annals of Mathematical Statistics 39:209–214Google Scholar

Copyright information

© The Psychometric Society 2005

Authors and Affiliations

  1. 1.Institute of StatisticsNational Chiao Tung UniversityHsinchuTaiwan

Personalised recommendations