Categorical latent variable modeling utilizing fuzzy clustering generalized structured component analysis as an alternative to latent class analysis

Abstract

Latent class analysis is becoming popular in many areas of education, psychology, social and behavioral sciences, public health, and medicine. However, it often suffers from identification issues due to the large number of parameters involved when using maximum likelihood (ML) estimation. Increasing the sample size, reducing sparseness, and strengthening the relationship between the observed variables and the latent variables all improve the information and thus reduce the identification issues, but the identification issue still affects the validity of parameter estimates in ML estimation and the definition of identification is not sufficient to guarantee the existence of an ML solution. In this paper, generalized structured component analysis (GSCA), which is a component-based approach that utilizes optimal scaling and fuzzy clustering, is applied to avoid these identification issues and develop more stable solutions for the heterogeneity of a population based on a set of categorical responses. Testing our proposed new approach, component-based (CB) latent class analysis (LCA), on real world substance use data from Add Health produced not only the same features as those yielded by conventional ML LCA but also stable estimation without identification issues. Comparing the results obtained from ML LCA using Mplus and poLCA in R, with those from our proposed CB LCA using GSCA in R revealed a similar number of latent classes and posterior probabilities and only minor discrepancies in individual latent class classifications when the posterior probabilities of membership are not distinct.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

References

  1. Becker JM, Rai A, Ringle CM, Völckner F (2013) Discovering unobserved heterogeneity in structural equation models to avert validity threats. MIS Q 37(3):665–694

    Google Scholar 

  2. Bezdek JC (1974) Numerical taxonomy with fuzzy sets. J Math Biol 1:57–71

    MathSciNet  MATH  Google Scholar 

  3. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York

    Google Scholar 

  4. Collins L, Lanza S (2010) Latent class and latent transition analysis: with applications in the social, behavioral, and health sciences. Wiley, New York

    Google Scholar 

  5. Dziak JJ, Lanza ST, Tan X (2014) Effect size, statistical power, and sample size requirements for the bootstrap likelihood ratio test in latent class analysis. Struct Equ Model 21(4):534–552

    MathSciNet  Google Scholar 

  6. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26

    MathSciNet  MATH  Google Scholar 

  7. Efron B (1982) The jackknife, the bootstrap and other resampling plans. SIAM, Philadelphia

    Google Scholar 

  8. Esposito Vinzi V, Trinchera L, Squillacciotti S, Tenenhaus M (2008) REBUS–PLS: a response-based procedure for detecting unit segments in PLS path modeling. Appl Stoch Models Bus Industry 24:439–458

    MathSciNet  MATH  Google Scholar 

  9. Goodman LA (1974a) The analysis of systems of qualitative variables when some of the variables are unobservable. Part I—a modified latent structure approach. Am J Sociol 79:1179–1259

    Google Scholar 

  10. Goodman LA (1974b) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61:215–231

    MathSciNet  MATH  Google Scholar 

  11. Goodman LA (1979) On the estimation of parameters in latent structure analysis. Psychometrika 44:123–128

    MathSciNet  Google Scholar 

  12. Gudicha DW, Schmittmann VD, Vermunt JK (2016) Power computation for likelihood ratio tests for the transition parameters in latent Markov models. Struct Equ Model 23:234–245

    MathSciNet  Google Scholar 

  13. Hahn C, Johnson DM, Herrmann A, Huber F (2002) Capturing customer heterogeneity using a finite mixture PLS approach. Schmalenbach Bus Rev 54:243–269

    Google Scholar 

  14. Hair JF, Hult GTM, Ringle CM, Sarstedt M (2017) A primer on partial least squares structural equation modeling (PLS–SEM), 2nd edn. Sage, Thousand Oaks

    Google Scholar 

  15. Harris KM (2009) The national longitudinal study of adolescent to adult health (Add Health), Waves I & II, 1994–1996; Wave III, 2001–2002; Wave IV, 2007–2009 (Machine-readable data file and documentation). Chapel Hill: Carolina Population Center, University of North Carolina at Chapel Hill. Retrieved from https://doi.org/10.3886/ICPSR21600.v21

  16. Harris KM, Udry JR (2018) National longitudinal study of adolescent to adult health (Add Health), 1994–2008 [Public Use]. Ann Arbor, MI: Carolina Population Center, University of North Carolina-Chapel Hill [distributor], Inter-university Consortium for Political and Social Research [distributor], 2018-08-06. https://doi.org/10.3886/ICPSR21600.v21

  17. Hwang H, Takane Y (2004) Generalized structured component analysis. Psychometrika 69:81–99

    MathSciNet  MATH  Google Scholar 

  18. Hwang H, Takane Y (2010) Nonlinear generalized structured component analysis. Behaviormetrika 34:95–109

    MATH  Google Scholar 

  19. Hwang H, Takane Y (2014) Generalized structured component analysis: a component-based approach to structural equation modeling. CRC Press, Boca Raton

    Google Scholar 

  20. Hwang H, DeSarbo SW, Takane Y (2007) Fuzzy clusterwise generalized structured component analysis. Psychometrika 72:181–198

    MathSciNet  MATH  Google Scholar 

  21. Hwang H, Takane Y, Jung K (2017) Generalized structured component analysis with uniqueness terms for accommodating measurement error. Front Psychol 8:2137

    Google Scholar 

  22. Jeon M, Rabe-Hesketh S (2012) Profile-likelihood approach for estimating generalized linear mixed models with factor structures. J Educ Behav Stat 37:518–542

    Google Scholar 

  23. Jöreskog KG (1973) A general method for estimating a linear structural equation system. In: Goldberger AS, Duncan OD (eds) Structural equation models in the social sciences. Seminar Press, New York

    Google Scholar 

  24. Jöreskog KG (1977) Structural equation models in the social sciences. In: Krishnaiah PR (ed) Applications of statistics. North-Holland, Amsterdam

    Google Scholar 

  25. Jöreskog KG (1978) Structural analysis of covariance and correlation matrices. Psychometrika 43:443–477

    MathSciNet  MATH  Google Scholar 

  26. Lazarsfeld PF, Henry NW (1968) Latent structure analysis. Houghton, Mifflin, New York

    Google Scholar 

  27. Linzer DA, Lewis J (2013) “poLCA: polytomous variable latent class analysis.” R package version 1.4. http://dlinzer.github.com/poLCA

  28. Lord FM (1952) A theory of test scores. Psychometric Monograph, No, p 7

    Google Scholar 

  29. Lubke GH, Muthén B (2005) Investigating population heterogeneity with factor mixture models. Psychol Methods 10(1):21–39

    Google Scholar 

  30. Masyn KE (2013) Latent class analysis and finite mixture modeling. In: Little TD (ed) Oxford library of psychology. The oxford handbook of quantitative methods: statistical analysis. Oxford University Press, New York, pp 551–611

    Google Scholar 

  31. McDonald RP (1999) Test theory: a unified treatment. Lawrence Erlbaum Associates, Mahwah

    Google Scholar 

  32. Muthén B, Asparouhov T (2006) Item response mixture modeling: application to tobacco dependence criteria. Addict Behav 31:1050–1066

    Google Scholar 

  33. Muthén LK, Muthén BO (1998–2017) Mplus User’s Guide. Eighth Ed. Los Angeles, CA: Muthén & Muthén

  34. Muthén BO, Shedden K (1999) Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics 55:463–469

    MATH  Google Scholar 

  35. Nagin D (2005) Group-based modeling of development. Harvard University Press, Cambridge

    Google Scholar 

  36. Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc Ser A 135:370–384

    Google Scholar 

  37. Nylund KL, Asparouhov T, Muthén BO (2007) Deciding on the number of classes in latent class analysis and growth mixture modeling: a Monte Carlo simulation study. Struct Equ Model 14(4):535–569

    MathSciNet  Google Scholar 

  38. Pastor DA, Beretvas SN (2006) Longitudinal Rasch modeling in context of psychotherapy outcomes assessment. Appl Psychol Meas 30(2):100–120

    MathSciNet  Google Scholar 

  39. Ringle C, Wende S, Becker J-M (2015) SmartPLS 3. Bönningstedt: SmartPLS. http://www.smartpls.com. Accessed 30 Nov 2018

  40. Roubens M (1982) Fuzzy clustering algorithms and their cluster validity. Eur J Oper Res 10:294–301

    MathSciNet  MATH  Google Scholar 

  41. Ryoo JH, Hwang H (2017) Model evaluation in the generalized structured component analysis using the confirmatory tetrad analysis. Front Psychol Quant Psychol Meas 8:916

    Google Scholar 

  42. Ryoo JH, Chatterjee S, Shi D (2015) New variable selection criteria in model selection. In: Paper presented at the annual meeting of the modern modeling methods conference, Storrs, CT

  43. Ryoo JH, Wang C, Swearer S, Hull M, Shi D (2018) Longitudinal model building using latent transition analysis: an example of using school bullying data. Front Psychol Quant Psychol Meas 9:675

    Google Scholar 

  44. Skrondal A, Rabe-Hesketh S (2004) Generalized latent variable modeling: multilevel, longitudinal, and structural equation models. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  45. R Core Team (2017). R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. https://www.R-project.org/. Accessed 30 Nov 2018

  46. Wilson M, Zheng X, McGuire L (2012) Formulating latent growth using an explanatory item response model approach. J Appl Meas 13(1):1–22

    Google Scholar 

  47. Wold H (1975) PLS path models with latent variables: the NIPALS approach. In: Blalock HM, Aganbegian A, Borodkin FM, Boudon R, Cappecchi V (eds) Quantitative sociology: international perspectives on mathematical and statistical modeling. Academic Press, New York, pp 307–357

    Google Scholar 

  48. Yang JS, Zheng X (2018) Item response data analysis using Stata item response theory package. J Educ Behav Stat 43(1):116–129

    MathSciNet  Google Scholar 

  49. Young FW (1981) Quantitative analysis of qualitative data. Psychometrika 46:347–388

    MathSciNet  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ji Hoon Ryoo.

Ethics declarations

Conflict of interest

“On behalf of all authors, the corresponding author states that there is no conflict of interest.” Categorical Latent Variable Modeling Utilizing Fuzzy Clustering Generalized Structured Component Analysis as an Alternative to Latent Class Analysis.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Heungsun Hwang

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ryoo, J.H., Park, S. & Kim, S. Categorical latent variable modeling utilizing fuzzy clustering generalized structured component analysis as an alternative to latent class analysis. Behaviormetrika 47, 291–306 (2020). https://doi.org/10.1007/s41237-019-00084-6

Download citation

Keywords

  • Fuzzy clustering
  • Generalized structured component analysis
  • Latent class analysis
  • Optimal scaling