Computational Statistics

, Volume 31, Issue 2, pp 425–449 | Cite as

Composite likelihood and maximum likelihood methods for joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data

  • Bo Zhang
  • Wei LiuEmail author
  • Hui Zhang
  • Qihui Chen
  • Zhiwei Zhang
Original Paper


Joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data has been proposed to study the relationship between diseases and their related biomarkers. However, statistical inference of the joint latent class modeling approach has proved very challenging due to its computational complexity in seeking maximum likelihood estimates. In this article, we propose a series of composite likelihoods for maximum composite likelihood estimation, as well as an enhanced Monte Carlo expectation–maximization (MCEM) algorithm for maximum likelihood estimation, in the context of joint latent class models. Theoretically, the maximum composite likelihood estimates are consistent and asymptotically normal. Numerically, we have shown that, as compared to the MCEM algorithm that maximizes the full likelihood, not only the composite likelihood approach that is coupled with the quasi-Newton method can substantially reduce the computational complexity and duration, but it can simultaneously retain comparative estimation efficiency.


Pseudo-likelihood Expectation–maximization algorithm   Markov chain Monte Carlo Shared latent class models Two-part models 



We sincerely thank two anonymous reviewers, Associate Editor, and Editors for their valuable comments, which had substantially improved this manuscript. The views expressed in this article are those of the authors and do not necessarily represent the views of US Food and Drug Administration.


  1. Bellio R, Varin C (2005) A pairwise likelihood approach to generalized linear models with crossed random effects. Stat Model 5:217–227MathSciNetCrossRefzbMATHGoogle Scholar
  2. Booth JG, Hobert JP (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc, Ser B 61:265–285CrossRefzbMATHGoogle Scholar
  3. Buck Louis GM, Weiner JM, Whitcomb BW, Sperrazza R, Schisterman EF, Lobdell DT, Crickard K, Greizerstein H, Kostyniak PJ (2005) Environmental PCB exposure and risk of endometriosis. Hum Reprod 20(1):279–285CrossRefGoogle Scholar
  4. Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16:1190–1208MathSciNetCrossRefzbMATHGoogle Scholar
  5. Cave M, Appana S, Patel M, Falkner KC, McClain CJ, Brock G (2010) Polychlorinated biphenyls, lead, and mercury are associated with liver disease in American adults: NHANES 2003–2004. Environ Health Perspect 118(12):1735–1742CrossRefGoogle Scholar
  6. Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (2008) National Health and Nutrition Examination Survey Data. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2003–2004, HyattsvilleGoogle Scholar
  7. Chao HR, Wang SL, Lee WJ, Wang YF, Päpke O (2007) Levels of polybrominated diphenyl ethers (PBDEs) in breast milk from central Taiwan and their relation to infant birth outcome and maternal menstruation effects. Environ Int 33(2):239–245CrossRefGoogle Scholar
  8. Chan JS, Kuk AY (1997) Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53:86–97MathSciNetCrossRefzbMATHGoogle Scholar
  9. Clayton D, Rasbash J (1999) Estimation in large crossed random-effect models by data augmentation. J R Stat Soc, Ser A 162:425–436CrossRefGoogle Scholar
  10. Coull BA, Hobert JP, Ryan LM, Holmes LB (2001) Crossed random effect models for multiple outcomes in a study of teratogenesis. J Am Stat Assoc 96(456):1194–1204MathSciNetCrossRefzbMATHGoogle Scholar
  11. Ding G, Shi R, Gao Y, Zhang Y, Kamijima M, Sakai K, Wang G, Feng C, Tian Y (2012) Pyrethroid pesticide exposure and risk of childhood acute lymphocytic leukemia in Shanghai. Environ Sci Technol 46(24):13480–13487CrossRefGoogle Scholar
  12. Gennings C, Sabo R, Carneyb E (2010) Identifying subsets of complex mixtures most associated with complex diseases. Epidemiology 21(4):S77–S84CrossRefGoogle Scholar
  13. Geyer CJ, Thompson EA (1992) Constrained Monte Carlo maximum likelihood for dependent data (with discussion). J R Stat Soc, Ser B 54(3):657–699MathSciNetGoogle Scholar
  14. Giboney PT (2005) Mildly elevated liver transaminase levels in the asymptomatic patient. Am Fam Physcian 71(6):1105–1110Google Scholar
  15. Herbstman JB, Sjödin A, Jones R, Kurzon M, Lederman SA, Rauh VA, Needham LL, Wang R, Perera FP (2008) Prenatal exposure to PBDEs and neurodevelopment. Epidemiology 19(6):S348Google Scholar
  16. Kortenkamp A (2008) Low dose mixture effects of endocrine disrupters: implications for risk assessment and epidemiology. Int J Androl 31(2):233–237CrossRefGoogle Scholar
  17. Kratz A, Ferraro M, Sluss PM, Lewandrowski KB (2004) Case records of the Massachusetts general hospital: laboratory values. N Engl J Med 351(15):1549–1563Google Scholar
  18. Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326MathSciNetCrossRefzbMATHGoogle Scholar
  19. Lindsay B (1998) Composite likelihood methods. Contemp Math 80:220–239MathSciNetGoogle Scholar
  20. Main KM, Kiviranta H, Virtanen HE, Sundqvist E, Tuomisto JT, Tuomisto J, Vartiainen T, Skakkebaek NE, Toppari J (2007) Flame retardants in placenta and breast milk and cryptorchidism in newborn boys. Environ Health Perspect 115(10):1519–1526Google Scholar
  21. McCulloch CE (1997) Maximum likelihood algorithms for generalized linear mixed models. J Am Stat Assoc 92:162–170MathSciNetCrossRefzbMATHGoogle Scholar
  22. Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New YorkzbMATHGoogle Scholar
  23. Olsen MK, Schafer JL (2001) A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc 96:730–1164MathSciNetCrossRefzbMATHGoogle Scholar
  24. Pinheiro JC, Chao EC (2006) Efficient Laplacian and adaptive Gaussian quadrature algorithms for multilevel generalized linear mixed models. J Comput Graph Stat 15:58–81MathSciNetCrossRefGoogle Scholar
  25. Renard D, Molenberghs G, Geys H (2004) A pairwise likelihood approach to estimation in multilevel probit models. Comput Stat Data Anal 44(4):649–667MathSciNetCrossRefzbMATHGoogle Scholar
  26. Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21:5–42MathSciNetzbMATHGoogle Scholar
  27. Xie Y, Chen Z, Albert PS (2013) A crossed random effects modeling approach for estimating diagnostic accuracy from ordinal ratings without a gold standard. Stat Med 32(20):3472–3485MathSciNetCrossRefGoogle Scholar
  28. Zhang B, Chen Z, Albert PS (2012) Latent class models for joint analysis of disease prevalence and high-dimensional semicontinuous biomarker data. Biostatistics 13(1):74–88CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Bo Zhang
    • 1
  • Wei Liu
    • 2
    Email author
  • Hui Zhang
    • 3
  • Qihui Chen
    • 4
  • Zhiwei Zhang
    • 1
  1. 1.Division of Biostatistics, Office of Surveillance and BiometricsCenter for Devices and Radiological Health, Food and Drug AdministrationSilver SpringUSA
  2. 2.Department of MathematicsHarbin Institute of Technology HarbinPeople’s Republic of China
  3. 3.Department of BiostatisticsSt. Jude Children’s Research HospitalMemphisUSA
  4. 4.Department of Applied Economics, College of Economics and ManagementChina Agricultural UniversityBeijingPeople’s Republic of China

Personalised recommendations