Advances in Data Analysis and Classification

, Volume 7, Issue 3, pp 267–279 | Cite as

A Monte Carlo evaluation of three methods to detect local dependence in binary data latent class models

  • Daniel L. Oberski
  • Geert H. van Kollenburg
  • Jeroen K. Vermunt
Regular Article


Binary data latent class analysis is a form of model-based clustering applied in a wide range of fields. A central assumption of this model is that of conditional independence of responses given latent class membership, often referred to as the “local independence” assumption. The results of latent class analysis may be severely biased when this crucial assumption is violated; investigating the degree to which bivariate relationships between observed variables fit this hypothesis therefore provides vital information. This article evaluates three methods of doing so. The first is the commonly applied method of referring the so-called “bivariate residuals” to a Chi-square distribution. We also introduce two alternative methods that are novel to the investigation of local dependence in latent class analysis: bootstrapping the bivariate residuals, and the asymptotic score test or “modification index”. Our Monte Carlo simulation indicates that the latter two methods perform adequately, while the first method does not perform as intended.


Conditional dependence Latent variable models Score test   Lagrange multiplier test Modification index Bivariate residuals 

Mathematics Subject Classification (2010)

62F40 62F03 62F99 62H15 62H17 62H30 


  1. Agresti A (2002) Categorical data analysis, 2nd edn. Wiley-Interscience, New YorkzbMATHCrossRefGoogle Scholar
  2. Ahlquist JS, Breunig C (2012) Model-based clustering and typologies in the social sciences. Polit Anal 20(1):92–112CrossRefGoogle Scholar
  3. Albert P, Dodd L (2004) A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics 60(2):427–435MathSciNetzbMATHCrossRefGoogle Scholar
  4. Baughman A, Bisgard K, Cortese M, Thompson W, Sanden G, Strebel P (2008) Utility of composite reference standards and latent class analysis in evaluating the clinical accuracy of diagnostic tests for pertussis. Clin Vaccine Immunol 15(1):106–114CrossRefGoogle Scholar
  5. Chen F, Mackey A, Vermunt J, Roos D (2007) Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS One 2(4):e383CrossRefGoogle Scholar
  6. Collins LM, Lanza ST (2010) Latent class and latent transition analysis: with applications in the social, behavioral, and health sciences, vol 718. Wiley, New YorkGoogle Scholar
  7. Efron B (1982) The Jackknife, the bootstrap, and other resampling plans. In: Proceedings of the CBMS-NSF Regional Conference Series in Applied Mathematics, Society for Industrial and Applied Mathematics (SIAM), PhiladelphiaGoogle Scholar
  8. Evers M, Namboodiri N (1979) On the design matrix strategy in the analysis of categorical data. Sociol Methodol 10:86–111CrossRefGoogle Scholar
  9. Faraone S, Tsuang M (1994) Measuring diagnostic accuracy in. Am J Psychiatry 1(51):651Google Scholar
  10. Forcina A (2008) Identifiability of extended latent class models with individual covariates. Comput Stat Data Anal 52(12):5263–5268MathSciNetzbMATHCrossRefGoogle Scholar
  11. Formann A (1992) Linear logistic latent class analysis for polytomous data. J Am Stat Assoc 87(418): 476–486Google Scholar
  12. Gaffikin L, McGrath J, Arbyn M, Blumenthal P (2007) Visual inspection with acetic acid as a cervical cancer test: accuracy validated using latent class analysis. BMC Med Res Methodol 7(1):36CrossRefGoogle Scholar
  13. Gallego A, Oberski D (2012) Personality and political participation: the mediation hypothesis. Polit Behav 34:424–451CrossRefGoogle Scholar
  14. Glas C (1998) Detection of differential item functioning using Lagrange multiplier tests. Stat Sinica 8: 647–668Google Scholar
  15. Glas C (1999) Modification indices for the 2-PL and the nominal response model. Psychometrika 64(3): 273–294Google Scholar
  16. Goodman L (1974) Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61(2):215MathSciNetzbMATHCrossRefGoogle Scholar
  17. Hadgu A, Dendukuri N, Hilden J (2005) Evaluation of nucleic acid amplification tests in the absence of a perfect gold-standard test: a review of the statistical and epidemiologic issues. Epidemiology 16(5): 604–612Google Scholar
  18. Hagenaars JAP (1988) Latent structure models with direct effects between indicators local dependence models. Sociol Methods Res 16(3):379–405CrossRefGoogle Scholar
  19. Hagenaars JAP, McCutcheon AL (2002) Applied latent class analysis. Cambridge University Press, CambridgeGoogle Scholar
  20. Heinen T (1996) Latent class and discrete latent trait models: similarities and differences. Sage, Thousand OaksGoogle Scholar
  21. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1): 177–196Google Scholar
  22. Hope T, Norris PA (2012) Heterogeneity in the frequency distribution of crime victimization. J Quant Criminol. doi: 10.1007/s10940-012-9190-x
  23. Huang G, Bandeen-Roche K (2004) Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika 69(1):5–32MathSciNetCrossRefGoogle Scholar
  24. Hybels C, Blazer D, Pieper C, Landerman L, Steffens D (2009) Profiles of depressive symptoms in older adults diagnosed with major depression: a latent cluster analysis. Am J Geriatr Psychiatry 17(5):387CrossRefGoogle Scholar
  25. Langeheine R, Pannekoek J, Van de Pol F (1996) Bootstrapping goodness-of-fit measures in categorical data analysis. Sociol Methods Res 24(4):492–516CrossRefGoogle Scholar
  26. Laumann EO, Paik A, Rosen RC (1999) Sexual dysfunction in the United States. JAMA 281(6):537–544CrossRefGoogle Scholar
  27. Maydeu-Olivares A, Joe H (2005) Limited-and full-information estimation and goodness-of-fit testing in \(2^n\) contingency tables. J Am Stat Assoc 100(471):1009–1020MathSciNetzbMATHCrossRefGoogle Scholar
  28. McLachlan G, Peel D (2000) Finite mixture models volume 299. Wiley-Interscience, New YorkCrossRefGoogle Scholar
  29. Nyholt D, Gillespie N, Heath A, Merikangas K, Duffy D, Martin N (2004) Latent class and genetic analysis does not support migraine with aura and migraine without aura as separate entities. Genet Epidemiol 26(3):231–244CrossRefGoogle Scholar
  30. R Core Team (2012) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0Google Scholar
  31. Rao CR (1948) Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation. In: Proceedings of the Cambridge philosophical society, vol 44, pp 50–57. Cambridge University Press, CambridgeGoogle Scholar
  32. Saris W, Satorra A, Sörbom D (1987) The detection and correction of specification errors in structural equation models. Sociol Methodol 17:105–129CrossRefGoogle Scholar
  33. Satorra A (1989) Alternative test criteria in covariance structure analysis: a unified approach. Psychometrika 54(1):131–151MathSciNetCrossRefGoogle Scholar
  34. Savage M, Devine F, Cunningham N, Taylor M, Li Y, Hjellbrekke J, Le Roux B, Friedman S, Miles A (2013) A new model of social class? Findings from the BBC’s Great British Class Survey Experiment. Sociology 47(2):219–250CrossRefGoogle Scholar
  35. Sörbom D (1989) Model modification. Psychometrika 54(3):371–384MathSciNetCrossRefGoogle Scholar
  36. Tay L, Newman D, Vermunt J (2011) Using mixed-measurement item response theory with covariates (MM-IRT-C) to ascertain observed and unobserved measurement equivalence. Organ Res Methods 14(1):147–176CrossRefGoogle Scholar
  37. Torrance-Rynard V, Walter S (1998) Effects of dependent errors in the assessment of diagnostic test performance. Stat Med 16(19):2157–2175CrossRefGoogle Scholar
  38. Vacek P (1985) The effect of conditional dependence on the evaluation of diagnostic tests. Biometrics 41(4):959–968Google Scholar
  39. van der Linden W, Glas C (2010) Statistical tests of conditional independence between responses and/or response times on test items. Psychometrika 75(1):120–139Google Scholar
  40. Vermunt JK, Magidson J (2005) Technical guide for latent GOLD 4.0: Basic and advanced. Statistical Innovations Inc, BelmontGoogle Scholar
  41. Walter S, Irwig L (1988) Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. J Clin Epidemiol 41(9):923–937CrossRefGoogle Scholar
  42. Walter SD, Riddell CA, Rabachini T, Villa LL, Franco EL (2013) Accuracy of p53 codon 72 polymorphism status determined by multiple laboratory methods: a latent class model analysis. PloS one 8(2):e56430CrossRefGoogle Scholar
  43. White N, Johnson H, Silburn P, Mellick G, Dissanayaka N, Mengersen K (2012) Probabilistic subgroup identification using bayesian finite mixture modelling: a case study in Parkinson’s disease phenotype identification. Stat Methods Med Res 21(6):563–583CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Daniel L. Oberski
    • 1
  • Geert H. van Kollenburg
    • 1
  • Jeroen K. Vermunt
    • 1
  1. 1.Department of Methodology and StatisticsTilburg UniversityTilburgThe Netherlands

Personalised recommendations