# Composite likelihood and maximum likelihood methods for joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data

- 1.5k Downloads
- 2 Citations

## Abstract

Joint latent class modeling of disease prevalence and high-dimensional semicontinuous biomarker data has been proposed to study the relationship between diseases and their related biomarkers. However, statistical inference of the joint latent class modeling approach has proved very challenging due to its computational complexity in seeking maximum likelihood estimates. In this article, we propose a series of composite likelihoods for maximum composite likelihood estimation, as well as an enhanced Monte Carlo expectation–maximization (MCEM) algorithm for maximum likelihood estimation, in the context of joint latent class models. Theoretically, the maximum composite likelihood estimates are consistent and asymptotically normal. Numerically, we have shown that, as compared to the MCEM algorithm that maximizes the full likelihood, not only the composite likelihood approach that is coupled with the quasi-Newton method can substantially reduce the computational complexity and duration, but it can simultaneously retain comparative estimation efficiency.

## Keywords

Pseudo-likelihood Expectation–maximization algorithm Markov chain Monte Carlo Shared latent class models Two-part models## Notes

### Acknowledgments

We sincerely thank two anonymous reviewers, Associate Editor, and Editors for their valuable comments, which had substantially improved this manuscript. The views expressed in this article are those of the authors and do not necessarily represent the views of US Food and Drug Administration.

## References

- Bellio R, Varin C (2005) A pairwise likelihood approach to generalized linear models with crossed random effects. Stat Model 5:217–227MathSciNetCrossRefzbMATHGoogle Scholar
- Booth JG, Hobert JP (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc, Ser B 61:265–285CrossRefzbMATHGoogle Scholar
- Buck Louis GM, Weiner JM, Whitcomb BW, Sperrazza R, Schisterman EF, Lobdell DT, Crickard K, Greizerstein H, Kostyniak PJ (2005) Environmental PCB exposure and risk of endometriosis. Hum Reprod 20(1):279–285CrossRefGoogle Scholar
- Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16:1190–1208MathSciNetCrossRefzbMATHGoogle Scholar
- Cave M, Appana S, Patel M, Falkner KC, McClain CJ, Brock G (2010) Polychlorinated biphenyls, lead, and mercury are associated with liver disease in American adults: NHANES 2003–2004. Environ Health Perspect 118(12):1735–1742CrossRefGoogle Scholar
- Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (2008) National Health and Nutrition Examination Survey Data. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2003–2004, HyattsvilleGoogle Scholar
- Chao HR, Wang SL, Lee WJ, Wang YF, Päpke O (2007) Levels of polybrominated diphenyl ethers (PBDEs) in breast milk from central Taiwan and their relation to infant birth outcome and maternal menstruation effects. Environ Int 33(2):239–245CrossRefGoogle Scholar
- Chan JS, Kuk AY (1997) Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53:86–97MathSciNetCrossRefzbMATHGoogle Scholar
- Clayton D, Rasbash J (1999) Estimation in large crossed random-effect models by data augmentation. J R Stat Soc, Ser A 162:425–436CrossRefGoogle Scholar
- Coull BA, Hobert JP, Ryan LM, Holmes LB (2001) Crossed random effect models for multiple outcomes in a study of teratogenesis. J Am Stat Assoc 96(456):1194–1204MathSciNetCrossRefzbMATHGoogle Scholar
- Ding G, Shi R, Gao Y, Zhang Y, Kamijima M, Sakai K, Wang G, Feng C, Tian Y (2012) Pyrethroid pesticide exposure and risk of childhood acute lymphocytic leukemia in Shanghai. Environ Sci Technol 46(24):13480–13487CrossRefGoogle Scholar
- Gennings C, Sabo R, Carneyb E (2010) Identifying subsets of complex mixtures most associated with complex diseases. Epidemiology 21(4):S77–S84CrossRefGoogle Scholar
- Geyer CJ, Thompson EA (1992) Constrained Monte Carlo maximum likelihood for dependent data (with discussion). J R Stat Soc, Ser B 54(3):657–699MathSciNetGoogle Scholar
- Giboney PT (2005) Mildly elevated liver transaminase levels in the asymptomatic patient. Am Fam Physcian 71(6):1105–1110Google Scholar
- Herbstman JB, Sjödin A, Jones R, Kurzon M, Lederman SA, Rauh VA, Needham LL, Wang R, Perera FP (2008) Prenatal exposure to PBDEs and neurodevelopment. Epidemiology 19(6):S348Google Scholar
- Kortenkamp A (2008) Low dose mixture effects of endocrine disrupters: implications for risk assessment and epidemiology. Int J Androl 31(2):233–237CrossRefGoogle Scholar
- Kratz A, Ferraro M, Sluss PM, Lewandrowski KB (2004) Case records of the Massachusetts general hospital: laboratory values. N Engl J Med 351(15):1549–1563Google Scholar
- Lin X (1997) Variance component testing in generalised linear models with random effects. Biometrika 84:309–326MathSciNetCrossRefzbMATHGoogle Scholar
- Lindsay B (1998) Composite likelihood methods. Contemp Math 80:220–239MathSciNetGoogle Scholar
- Main KM, Kiviranta H, Virtanen HE, Sundqvist E, Tuomisto JT, Tuomisto J, Vartiainen T, Skakkebaek NE, Toppari J (2007) Flame retardants in placenta and breast milk and cryptorchidism in newborn boys. Environ Health Perspect 115(10):1519–1526Google Scholar
- McCulloch CE (1997) Maximum likelihood algorithms for generalized linear mixed models. J Am Stat Assoc 92:162–170MathSciNetCrossRefzbMATHGoogle Scholar
- Molenberghs G, Verbeke G (2005) Models for discrete longitudinal data. Springer, New YorkzbMATHGoogle Scholar
- Olsen MK, Schafer JL (2001) A two-part random-effects model for semicontinuous longitudinal data. J Am Stat Assoc 96:730–1164MathSciNetCrossRefzbMATHGoogle Scholar
- Pinheiro JC, Chao EC (2006) Efficient Laplacian and adaptive Gaussian quadrature algorithms for multilevel generalized linear mixed models. J Comput Graph Stat 15:58–81MathSciNetCrossRefGoogle Scholar
- Renard D, Molenberghs G, Geys H (2004) A pairwise likelihood approach to estimation in multilevel probit models. Comput Stat Data Anal 44(4):649–667MathSciNetCrossRefzbMATHGoogle Scholar
- Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21:5–42MathSciNetzbMATHGoogle Scholar
- Xie Y, Chen Z, Albert PS (2013) A crossed random effects modeling approach for estimating diagnostic accuracy from ordinal ratings without a gold standard. Stat Med 32(20):3472–3485MathSciNetCrossRefGoogle Scholar
- Zhang B, Chen Z, Albert PS (2012) Latent class models for joint analysis of disease prevalence and high-dimensional semicontinuous biomarker data. Biostatistics 13(1):74–88CrossRefzbMATHGoogle Scholar