Abstract
Multi-state model (MSM) is a useful tool to analyze longitudinal data for modeling disease progression at multiple time points. While the regularization approaches to variable selection have been widely used, extending them to MSM remains largely unexplored. In this paper, we have developed the L1-regularized multi-state model (L1MSTATE) framework that enables parameter estimation and variable selection simultaneously. The regularized optimization problem was solved by deriving a one-step coordinate descent algorithm with great computational efficiency. The L1MSTATE approach was evaluated using extensive simulation studies, and it showed that L1MSTATE outperformed existing regularized multi-state models in terms of the accurate identification of risk factors. It also outperformed the un-regularized multi-state models (MSTATE) in terms of identifying the important risk factors in situations with small sample sizes. The power of L1MSTATE in predicting the transition probabilities comparing with MSTATE was demonstrated using the Europe Blood and Marrow Transplantation (EBMT) dataset. The L1MSTATE was implemented in the open-access R package ‘L1mstate’.
Similar content being viewed by others
References
Aalen OO, Johansen S (1978) Empirical transition matrix for nonhomogeneous Markov-chains based on censored observations. Scand J Stat 5:141–150
Aalen OO, Borgan O, Gjessing HK (2008) Survival and event history analysis. A process point of view. Springer, New York ISBN 978-0-387-20287-7
Ahn K, Banerijee A, Sahr N, Kim S (2018) Group and within-group variable selection for competing risks data. Lifetime Data Anal 24(3):407–424
Ambrogi F, Scheike T (2016) Penalized estimation for competing risks regression with applications to high-dimensional covariates. Biostatistics 17(4):708–721
Andersen PK (1988) Multistate models in survival analysis: a study of nephropathy and mortality in diabetes. Stat Med 7(6):661–670
Andersen PK, Keiding N (2002) Multi-state models for event history analysis. Stat Methods Med Res 11(2):91–115
Andersen PK, Hansen LS, Keiding N (1991) Assessing the influence of reversible disease indicators on survival. Stat Med 10:1061–1067
Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York, NY ISBN 978-1-4612-4348-9
Breslow NE (1972) Discussion of the paper by D.R.Cox. J R Stat Soc Ser B 34:216–217
Chen HH, Duffy SW, Tabar L (1974) An arbitrary Lagrangian-Eulerian computing method for all flow speeds. J Comput Phys 14(3):227–253
Commenges D, Joly P, Letenneur L, Dartigues JF (2004) Incidence and mortality of Alzheimeŕs disease or dementia using an illness-death model. Stat Med 23:199–210
Cox DR (1972) Regression models and life-tables. J R Stat Soc Ser B Methodol 34(1):187–220
deWreede LC (2011) mstate: an r package for the analysis of competing risks and multi-state models. J Stat Softw 38(7):53–66
deWreede LC, Fiocco M, Putter H (2010) The mstate package for estimation and prediction in non- and semi-parametric multi-state and competing risks models. Comput Methods Prog Biomed 99(3):261–274
Duffy SW, Chen HH (1995) Estimation of mean sojourn time in breast cancer screening using a Markov chain model of entry to and exit from preclinical detectable phase. Stat Med 14:1531–1543
Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27:861–874
Fu Z, Ma S, Lin H, Parikh C, Zhou B (2017) Penalized variable selection for multi-center competing risks data. Stat Biosci 9:379–405
Fu Z, Parikh C, Zhou B (2017) Penalized variable selection in competing risks regression. Lifetime Data Anal 23:353–376
Gentleman RC, Lawless JF, Lindsey JC, Yan P (1994) Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Stat Med 13(3):805–821
Ha I, Lee M, Oh S, Jeong J, Sylvester R, Lee Y (2014) Variable selection in subdistribution hazard frailty models with competing risks data. Stat Med 30(26):4590–4604
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London ISBN 9780412343902
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: prediction, Inference and Data Mining. Springer, New York ISBN 978-0-387-84858-7
Huang S, Hu C, Bell M, Billheimer D, Guerra S, Roe D, Vasquez M, Bedrick E (2018) Regularized continuous-time Markov model via elastic net. Biometrics 74(3):1045–1054
Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E (2003) Multistate Markov models for disease progression with classification error. J R Stat Soc Ser D Stat 52(2):193–209
Johansen S (1983) An extension of Cox́s regression model. Int Stat Rev 51(2):165–174
Kalbfleisch J, Lawless JF (1985) The analysis of panel data under a Markov assumption. J Am Stat Assoc 80(392):863–871
Kay R (1986) A Markov model for analyzing cancer markers and disease states in survival studies. Biometrics 42:855–865
Kim S, Ahn K (2019) Bi-level variable selection for case-cohort studies with group variables. Stat Methods Med Res 28(10–11):3404–3414
Kirby AJ (1991) Statistical modelling for the precursors of cervical cancer. Tech. Rep. Thesis (Ph.D.), University of Cambridge, Cambridge, England, United Kingdom
Klotz JH, Sharples LD (1994) Estimation for a Markov heart transplant model. Stat 43(3):431–436
Longini IM, Clark WS, Byers RAHAGF, Hethcote HW (1989) Statistical analysis of the stages of HIVinfection using a Markov model. Stat Med 8:851–843
Mairal J, Yu B (2012) Complexity analysis of the lasso regularization path. Proceedings of the 29th. International Conference on Machine Learning, Edinburgh, Scotland, UK
Marshall G, Jones RH (1995) Multi-state Markov models and diabetic retinopathy. Stat Med 14(18):1975–1983
Meier L, vanDegeer S, Buhlmann P (2007) The group lasso for logistic regression. J R Stat Soc Ser B 70(1):53–71
Oelker M, Tutz G (2017) A uniform framework for the combination of penalties in generalized structured models. ADAC 11(1):97–120
Perez-Ocon R, Ruiz-Castro J, Gamiz-Perez M (2001) Non-homogeneous Markov models in the analysis of survival after breast cancer. J R Stat Soc Ser C Appl Stat 50:111–124
Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multistate models. Stat Med 26:2389–2430
Reulen H, Kneib T (2016) Structured fusion lasso penalized multi-state models. Stat Med 35(25):4637–4659
Saadati M, Beyersmann J, Kopp-Schneider A, Benner A (2018) Prediction accuracy and variable selection for penalized cause-specific hazards models. Biom J 60(2):288–306
Sharples LD (1993) Use of the Gibbs sampler to estimate transition rates between grades of coronary disease following cardiac transplantation. Stat Med 12:1155–1169
Simon N (2012) Regularization paths for cox́s proportional hazards model via coordinate descent. J Stat Softw 39(5):53–66
Ternes N, Rotolo F, Michiels S (2016) Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models. Stat Med 35(15):2561–2573
Tibshirani R (1996) The lasso method for variable selection in the cox model. Stat Med 16(4):385–395
Verweij PJ, Houwelingen HC (1993) Cross-validation in survival analysis. Stat Med 12(24):385–395
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Methodol 68(1):49–67
Funding
This work was partially supported by the National Science Foundation (NSF)–Division of Communication and Computing Foundations (CCF) awards #1718513, #1715027, #1714136, and the JDRF award #2-SRA-2018-513-S-B.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Dang, X., Huang, S. & Qian, X. Risk Factor Identification in Heterogeneous Disease Progression with L1-Regularized Multi-state Models. J Healthc Inform Res 5, 20–53 (2021). https://doi.org/10.1007/s41666-020-00085-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41666-020-00085-1