Skip to main content

Advertisement

Log in

Risk Factor Identification in Heterogeneous Disease Progression with L1-Regularized Multi-state Models

  • Research Article
  • Published:
Journal of Healthcare Informatics Research Aims and scope Submit manuscript

Abstract

Multi-state model (MSM) is a useful tool to analyze longitudinal data for modeling disease progression at multiple time points. While the regularization approaches to variable selection have been widely used, extending them to MSM remains largely unexplored. In this paper, we have developed the L1-regularized multi-state model (L1MSTATE) framework that enables parameter estimation and variable selection simultaneously. The regularized optimization problem was solved by deriving a one-step coordinate descent algorithm with great computational efficiency. The L1MSTATE approach was evaluated using extensive simulation studies, and it showed that L1MSTATE outperformed existing regularized multi-state models in terms of the accurate identification of risk factors. It also outperformed the un-regularized multi-state models (MSTATE) in terms of identifying the important risk factors in situations with small sample sizes. The power of L1MSTATE in predicting the transition probabilities comparing with MSTATE was demonstrated using the Europe Blood and Marrow Transplantation (EBMT) dataset. The L1MSTATE was implemented in the open-access R package ‘L1mstate’.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Aalen OO, Johansen S (1978) Empirical transition matrix for nonhomogeneous Markov-chains based on censored observations. Scand J Stat 5:141–150

    MATH  Google Scholar 

  2. Aalen OO, Borgan O, Gjessing HK (2008) Survival and event history analysis. A process point of view. Springer, New York ISBN 978-0-387-20287-7

    Book  MATH  Google Scholar 

  3. Ahn K, Banerijee A, Sahr N, Kim S (2018) Group and within-group variable selection for competing risks data. Lifetime Data Anal 24(3):407–424

    Article  MathSciNet  MATH  Google Scholar 

  4. Ambrogi F, Scheike T (2016) Penalized estimation for competing risks regression with applications to high-dimensional covariates. Biostatistics 17(4):708–721

    Article  MathSciNet  Google Scholar 

  5. Andersen PK (1988) Multistate models in survival analysis: a study of nephropathy and mortality in diabetes. Stat Med 7(6):661–670

    Article  Google Scholar 

  6. Andersen PK, Keiding N (2002) Multi-state models for event history analysis. Stat Methods Med Res 11(2):91–115

    Article  MATH  Google Scholar 

  7. Andersen PK, Hansen LS, Keiding N (1991) Assessing the influence of reversible disease indicators on survival. Stat Med 10:1061–1067

    Article  Google Scholar 

  8. Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York, NY ISBN 978-1-4612-4348-9

    Book  MATH  Google Scholar 

  9. Breslow NE (1972) Discussion of the paper by D.R.Cox. J R Stat Soc Ser B 34:216–217

    MathSciNet  Google Scholar 

  10. Chen HH, Duffy SW, Tabar L (1974) An arbitrary Lagrangian-Eulerian computing method for all flow speeds. J Comput Phys 14(3):227–253

    Article  Google Scholar 

  11. Commenges D, Joly P, Letenneur L, Dartigues JF (2004) Incidence and mortality of Alzheimeŕs disease or dementia using an illness-death model. Stat Med 23:199–210

    Article  Google Scholar 

  12. Cox DR (1972) Regression models and life-tables. J R Stat Soc Ser B Methodol 34(1):187–220

    MathSciNet  MATH  Google Scholar 

  13. deWreede LC (2011) mstate: an r package for the analysis of competing risks and multi-state models. J Stat Softw 38(7):53–66

    Google Scholar 

  14. deWreede LC, Fiocco M, Putter H (2010) The mstate package for estimation and prediction in non- and semi-parametric multi-state and competing risks models. Comput Methods Prog Biomed 99(3):261–274

    Article  Google Scholar 

  15. Duffy SW, Chen HH (1995) Estimation of mean sojourn time in breast cancer screening using a Markov chain model of entry to and exit from preclinical detectable phase. Stat Med 14:1531–1543

    Article  Google Scholar 

  16. Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27:861–874

    Article  Google Scholar 

  17. Fu Z, Ma S, Lin H, Parikh C, Zhou B (2017) Penalized variable selection for multi-center competing risks data. Stat Biosci 9:379–405

    Article  Google Scholar 

  18. Fu Z, Parikh C, Zhou B (2017) Penalized variable selection in competing risks regression. Lifetime Data Anal 23:353–376

    Article  MathSciNet  MATH  Google Scholar 

  19. Gentleman RC, Lawless JF, Lindsey JC, Yan P (1994) Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Stat Med 13(3):805–821

    Article  Google Scholar 

  20. Ha I, Lee M, Oh S, Jeong J, Sylvester R, Lee Y (2014) Variable selection in subdistribution hazard frailty models with competing risks data. Stat Med 30(26):4590–4604

    Article  MathSciNet  Google Scholar 

  21. Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London ISBN 9780412343902

    MATH  Google Scholar 

  22. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: prediction, Inference and Data Mining. Springer, New York ISBN 978-0-387-84858-7

    Book  MATH  Google Scholar 

  23. Huang S, Hu C, Bell M, Billheimer D, Guerra S, Roe D, Vasquez M, Bedrick E (2018) Regularized continuous-time Markov model via elastic net. Biometrics 74(3):1045–1054

    Article  MathSciNet  MATH  Google Scholar 

  24. Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E (2003) Multistate Markov models for disease progression with classification error. J R Stat Soc Ser D Stat 52(2):193–209

    MathSciNet  Google Scholar 

  25. Johansen S (1983) An extension of Cox́s regression model. Int Stat Rev 51(2):165–174

    Article  MathSciNet  MATH  Google Scholar 

  26. Kalbfleisch J, Lawless JF (1985) The analysis of panel data under a Markov assumption. J Am Stat Assoc 80(392):863–871

    Article  MathSciNet  MATH  Google Scholar 

  27. Kay R (1986) A Markov model for analyzing cancer markers and disease states in survival studies. Biometrics 42:855–865

    Article  MATH  Google Scholar 

  28. Kim S, Ahn K (2019) Bi-level variable selection for case-cohort studies with group variables. Stat Methods Med Res 28(10–11):3404–3414

    Article  MathSciNet  Google Scholar 

  29. Kirby AJ (1991) Statistical modelling for the precursors of cervical cancer. Tech. Rep. Thesis (Ph.D.), University of Cambridge, Cambridge, England, United Kingdom

  30. Klotz JH, Sharples LD (1994) Estimation for a Markov heart transplant model. Stat 43(3):431–436

    Google Scholar 

  31. Longini IM, Clark WS, Byers RAHAGF, Hethcote HW (1989) Statistical analysis of the stages of HIVinfection using a Markov model. Stat Med 8:851–843

    Article  Google Scholar 

  32. Mairal J, Yu B (2012) Complexity analysis of the lasso regularization path. Proceedings of the 29th. International Conference on Machine Learning, Edinburgh, Scotland, UK

  33. Marshall G, Jones RH (1995) Multi-state Markov models and diabetic retinopathy. Stat Med 14(18):1975–1983

    Article  Google Scholar 

  34. Meier L, vanDegeer S, Buhlmann P (2007) The group lasso for logistic regression. J R Stat Soc Ser B 70(1):53–71

    Article  MathSciNet  Google Scholar 

  35. Oelker M, Tutz G (2017) A uniform framework for the combination of penalties in generalized structured models. ADAC 11(1):97–120

    Article  MathSciNet  MATH  Google Scholar 

  36. Perez-Ocon R, Ruiz-Castro J, Gamiz-Perez M (2001) Non-homogeneous Markov models in the analysis of survival after breast cancer. J R Stat Soc Ser C Appl Stat 50:111–124

    Article  MATH  Google Scholar 

  37. Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multistate models. Stat Med 26:2389–2430

    Article  MathSciNet  Google Scholar 

  38. Reulen H, Kneib T (2016) Structured fusion lasso penalized multi-state models. Stat Med 35(25):4637–4659

    Article  MathSciNet  MATH  Google Scholar 

  39. Saadati M, Beyersmann J, Kopp-Schneider A, Benner A (2018) Prediction accuracy and variable selection for penalized cause-specific hazards models. Biom J 60(2):288–306

    Article  MathSciNet  MATH  Google Scholar 

  40. Sharples LD (1993) Use of the Gibbs sampler to estimate transition rates between grades of coronary disease following cardiac transplantation. Stat Med 12:1155–1169

    Article  Google Scholar 

  41. Simon N (2012) Regularization paths for cox́s proportional hazards model via coordinate descent. J Stat Softw 39(5):53–66

    Google Scholar 

  42. Ternes N, Rotolo F, Michiels S (2016) Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models. Stat Med 35(15):2561–2573

    Article  MathSciNet  Google Scholar 

  43. Tibshirani R (1996) The lasso method for variable selection in the cox model. Stat Med 16(4):385–395

    Article  Google Scholar 

  44. Verweij PJ, Houwelingen HC (1993) Cross-validation in survival analysis. Stat Med 12(24):385–395

    Article  Google Scholar 

  45. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Methodol 68(1):49–67

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

This work was partially supported by the National Science Foundation (NSF)–Division of Communication and Computing Foundations (CCF) awards #1718513, #1715027, #1714136, and the JDRF award #2-SRA-2018-513-S-B.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan Dang.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dang, X., Huang, S. & Qian, X. Risk Factor Identification in Heterogeneous Disease Progression with L1-Regularized Multi-state Models. J Healthc Inform Res 5, 20–53 (2021). https://doi.org/10.1007/s41666-020-00085-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41666-020-00085-1

Keywords

Navigation