Abstract
Observational cohort studies of individuals with chronic disease provide information on rates of disease progression, the effect of fixed and time-varying risk factors, and the extent of heterogeneity in the course of disease. Analysis of this information is often facilitated by the use of multistate models with intensity functions governing transition between disease states. We discuss modeling and analysis issues for such models when individuals are observed intermittently. Frameworks for dealing with heterogeneity and measurement error are discussed including random effect models, finite mixture models, and hidden Markov models. Cohorts are often defined by convenience and ways of addressing outcome-dependent sampling or observation of individuals are also discussed. Data on progression of joint damage in psoriatic arthritis and retinopathy in diabetes are analysed to illustrate these issues and related methodology.
Similar content being viewed by others
References
Aalen OO (1989) A linear regression model for the analysis of life times. Stat Med 8:907–925
Aalen OO, Borgan O, Fekjaer H (2001) Covariate adjustment of event histories estimated with Markov chains: the additive approach. Biometrics 57:993–1001
Aalen OO, Borgan O, Gjessing HK (2008) Survival and event history analysis: a process point of view. Springer, New York
Aalen OO (2012) Armitage lecture 2010: understanding treatment effects: the value of integrating longitudinal data and survival analysis. Stat Med 31:1903–1917
Al-Kateb H, Boright AP, Mirea L, Xie X, Sutradhar R, Mowjoodi A, Bharaj B, Liu M, Bucksa JM, Arends VL, Steffes MW, Cleary PA, Sun W, Lachin JM, Thorner PS, Ho M, McKnight AJ, Maxwell AP, Savage DA, Kidd KK, Kidd JR, Speed WC, Orchard TJ, Miller RG, Sun L, Bull SB, Paterson AD, The Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group (2008) Multiple superoxide dismutase 1/splicing factor serine alanine 15 variants are associated with the development and progression of diabetic nephropathy: the diabetes control and complications trial/epidemiology of diabetes interventions and complications genetics study. Diabetes 57:218–228
Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York
Andersen PK (2002) Multi-state models for event history analysis. Stat Methods Med Res 11:91–115
Andersen PK, Klein JP, Rosthoj S (2003) Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika 90:15–27
Bacchetti P, Boylan RD, Terrault NA, Monto A, Berenguer M (2010) Non-Markov multistate modeling using time-varying covariates, with application to progression of liver fibrosis due to hepatitis C following liver transplant. Int J Biostat 6(1):7
Barrett JK, Siannis F, Farewell VT (2011) A semi-competing risks model for data with interval-censoring and informative observation: an application to the MRC cognitive function and ageing study. Stat Med 30:1–10
Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M (2009) Using the whole cohort in the analysis of case-cohort data. Am J Epidemiol 169(11):1398–1405
Bureau A, Shiboski S, Hughes JP (2003) Applications of continuous time hidden Markov models to the study of misclassified disease outcomes. Stat Med 22(3):441–462
Chandran V, Cook RJ, Edwin J, Shen H, Pellett FJ, Shanmugarajah S, Rosen CF, Gladman DD (2010) Soluble biomarkers differentiate patients with psoriatic arthritis from those with psoriasis without arthritis. Rheumatology 49(7):1399–1405
Chandran V, Tolusso DC, Cook RJ, Gladman DD (2010) Risk factors for axial inflammatory arthritis in patients with psoriatic arthritis. J Rheumatol 37:809–815
Chandran V, Cook RJ, Thavaneswaran A, Lee KA, Pellett FJ, Gladman DD (2012) Parametric survival analysis as well as multi-state analysis confirms the association between human leukocyte antigen alleles and the development of arthritis mutilans in patients with psoriatic arthritis. J Rheumatol 39(8):1723
Chen B, Yi GY, Cook RJ (2010) Analysis of interval-censored disease progression data via multi-state models under a nonignorable inspection process. Stat Med 29(11):1175–1189
Commenges D (1999) Multi-state models in epidemiology. Lifetime Data Anal 5:315–327
Cook RJ, Lawless JF (2007) The statistical analysis of recurrent events. Springer, New York
Cook RJ, Lawless JF, Lakhal-Chaieb L, Lee K-A (2009) Robust estimation of mean functions and treatment effects for recurrent events under event-dependent censoring and termination: application to skeletal complications in cancer metastatic to bone. J Am Stat Assoc 104(485):60–75
Copas AJ, Farewell VT (2001) Incorporating retrospective data into an analysis of time to illness. Biostatistics 2(1):1–12
Cox DR, Miller HD (1965) The theory of stochastic processes. Chapman and Hall, London
Cox DR (1975) Partial likelihood. Biometrika 62:269–276
Datta S, Satten GA (2001) Validity of the Aalen-Johansen estimators of stage occupation probabilities and Nelson-Aalen estimators of integrated transition hazards for non-Markov models. Stat Probab Lett 55:403–411
Early Treatment Diabetic Retinopathy Study Research Group (1991) Fundus photographic risk factors for progression of diabetic retinopathy: ETDRS report number 12. Ophthalmology 98(Suppl):823–833
Fahrmeir L, Tutz G (2001) Multivariate statistical modeling based on generalized linear models. Springer, New York
Farewell VT, Lawless JF, Gladman DD, Urowitz MB (2003) Tracing studies and analysis of the effect of loss to follow-up on mortality estimation from patient registry data. J R Stat Soc, Ser C, Appl Stat 52(4):445–456
Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G (2009) Longitudinal data analysis. Handbooks of modern statistical methods. Chapman and Hall/CRC, Boca Raton
Frydman H (1984) Maximum likelihood estimation in the mover-stayer model. J Am Stat Assoc 79(387):632–638
Frydman H (1992) A nonparametric estimation procedure for a periodically observed three-state Markov process, with applications to AIDS. J R Stat Soc B 54:853–866
Frydman H (1995) Semiparametric estimation in a three-state duration-dependent Markov model from interval-censored observations with application to AIDS data. Biometrics 51:502–511
Frydman H, Szarek M (2009) Nonparametric estimation in a Markov “illness-death” process from interval censored observations with missing intermediate transition status. Biometrics 65:142–151
Gentleman RC, Lawless JF, Lindsey JC, Yan P (1994) Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Stat Med 13(8):805–821
Gladman DD, Farewell VT, Nadeau C (1995) Clinical indicators of progression in psoriatic arthritis: multivariate relative risk model. J Rheumatol 22(4):675–679
Goodman LA (1961) Statistical methods for the mover-stayer model. J Am Stat Assoc 56:841–868
Granger CWJ (1969) Investigating causal relationships by econometric models and cross-spectral methods. Econometrica 37:424–438
Grüger J, Kay R, Schumacher M (1991) The validity of inferences based on incomplete observations in disease state models. Biometrics 47:595–605
Hajducek DM, Lawless JF (2012) Duration analysis in longitudinal studies with intermittent observation times and losses to followup. Can J Stat 40(1):1–21
Hogan JW, Roy J, Korkontzelou C (2004) Handling dropouts in longitudinal studies. Stat Med 23:1455–1497
Hougaard P (1999) Multi-state models. A review. Lifetime Data Anal 5:239–264
Jackson CH, Sharples LD (2002) Hidden Markov models for the onset and progression of bronchiolitis obliterans syndrome in lung transplant recipients. Stat Med 21:113–128
Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E (2003) Multistate Markov models for disease progression with classification error. J R Stat Soc, Ser C, Stat 52:193–209
Jackson CH (2011) Multi-state models for panel data: the msm package for R. J Stat Softw 38(8):1–28
Joly P, Commenges D, Helmer C, Letenneur L (2002) A penalized likelihood approach for an illness-death model with interval-censored data: application to age-specific incidence of dementia. Biostatistics 3:433–443
Kalbfleisch J, Lawless J (1985) The analysis of panel data under a Markov assumption. J Am Stat Assoc 80(392):863–871
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data. Wiley, New York
Kay R (1986) A Markov model for analysing cancer markers and disease states in survival studies. Biometrics 42:855–865
Keiding N, Klein JP, Horowitz MM (2001) Multi-state models and outcome prediction in bone marrow transplantation. Stat Med 20:1871–1885
Kessing LV, Hansen MG, Andersen PK, Angst J (2004) The predictive effect of episodes on the risk of recurrence in depressive and bipolar disorders—a life-long perspective. Acta Psychiatr Scand 109(5):339–344
Kvist K, Andersen PK, Angst J, Kessing LV (2010) Event dependent sampling of recurrent events. Lifetime Data Anal 16:580–598
Lawless JF, Fong DYT (1999) State duration models in clinical and observational studies. Stat Med 18:2365–2376
Lawless JF, Kalbfleisch JD, Wild CJ (1999) Semiparametric methods for response-selective and missing data problems in regression. J R Stat Soc, Ser B, Stat Methodol 61(2):413–438
Lawless JF, Wigg MB, Tuli S, Drake J, Lamberti-Pasculli M (2001) Analysis of repeated failures of durations, with application to shunt failures for patients with paediatric hydrocephalus. J R Stat Soc, Ser C, Appl Stat 50:449–465
Lawless JF (2013) The design and analysis of life history studies. Stat Med. doi:10.1002/sim.5754
Lee EW, Kim MY (1998) The analysis of correlated panel data using a continuous-time Markov model. Biometrics 54(4):1638–1644
Lin H, Guo Z, Peduzzi PN, Gill TM, Allore HG (2008) A semiparametric transition model with latent traits for longitudinal multistate data. Biometrics 64(4):1032–1042
Lumley T, Shaw PA, Dai JY (2011) Connections between survey calibration estimators and semiparametric models for incomplete data (with discussion). Int Stat Rev 79:200–220
Mandel M, Betensky RA (2008) Estimating time-to-event from longitudinal ordinal data using random-effects Markov models: application to multiple sclerosis progression. Biostatistics 9(4):750–764
Martinussen T, Scheike TH (2006) Dynamic regression models for survival data. Springer, New York
Mueller PW, Rogus JJ, Cleary PA, Zhao Y, Smiles AM, Steffes MW, Bucksa J, Gibson TB, Cordovado SK, Krolewski AS, Nierras CR, Warram JH (2006) Genetics of Kidneys in Diabetes (GoKinD) study: a genetics collection available for identifying genetic susceptibility factors in diabetic nephropathy in type I diabetes. J Am Soc Nephrol 17(7):1782–1790
Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New York
O’Keeffe AG, Tom BDM, Farewell VT (2011) A case-study in the clinical epidemiology of psoriatic arthritis: multistate models and causal arguments. J R Stat Soc, Ser C, Appl Stat 60(5):675–699
O’Keeffe AG, Tom BDM, Farewell VT (2013) Mixture distributions in multi-state modelling: some considerations in a study of psoriatic arthritis. Stat Med 32:600–619
Pencina MJ, Larson MG, D’Agostino RB (2007) Choice of time scale and its effect on significance of predictors in longitudinal studies. Stat Med 26(6):1343–1359
Prentice RL, Langer R, Stefanick ML, Howard BV, Pettinger M, Anderson G, Barad D, Curb JD, Kotchen J, Kuller L, Limacher M, Wactawski-Wende J, The Women’s Health Initiative Investigators (2005) Combined postmenopausal hormone therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the Women’s Health Initiative Clinical Trial. Am J Epidemiol 162(5):404–414
Raboud J, Reid N, Coates RA, Farewell VT (1993) Estimating risks of progressing to AIDS when covariates are measured with error. J R Stat Soc A 156(3):393–406
Rahman P, Gladman P, Cook RJ, Zhou Y, Young G, Salonen D (1998) Radiological assessment in psoriatic arthritis. Br J Rheumatol 37:760–765
Robins JM, Rotnitzky A, Zhao LP (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 90(429):106–121
Samuelsen SO, Anestad H, Skrondal A (2007) Stratified case-cohort analysis of general cohort sampling designs. Scand J Stat 34:103–119
Satten GA, Longini IM (1996) Markov chains with measurement error: estimating the “true” course of a marker of the progression of human immunodeficiency virus disease. Appl Stat 45(3):275–309
Satten GA (1999) Estimating the extent of tracking in interval-censored chain-of-events data. Biometrics 55:1228–1231
Scheike TH, Zhang M-J (2002) An additive-multiplicative Cox-Aalen regression model. Scand J Stat 29:75–88
Sweeting MJ, De Angelis D, Neal KR, Ramsay ME, Irving WL, Wright M, Brant L, Harris HE, Trent HCV Study Group, HCV National Register Steering Group (2006) Estimated progression rates in three United Kingdom hepatitis C cohorts differed according to method of recruitment. J Clin Epidemiol 59:144–152
Sweeting MJ, Farewell VT, De Angelis D (2010) Multi-state Markov models for disease progression in the presence of informative examination times: an application to hepatitis C. Stat Med 29(11):1161–1174
Sutradhar R, Cook RJ (2008) Analysis of interval-censored data from clustered multistate processes: application to joint damage in psoriatic arthritis. J R Stat Soc, Ser C, Appl Stat 57(5):553–566
The Diabetes Control and Complications Trial Research Group (1993) The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus. N Engl J Med 329(14):977–986
The Diabetes Control and Complications Trial Research Group (1995) Progression of retinopathy with intensive versus conventional treatment in the diabetes control and complications trial. Opthalmology 102:647–661
The Diabetes Control and Complications Trial Research Group (1998) Early worsening of diabetic retinopathy in the diabetes control and complications trial. Arch Opthalmol 116:874–887
Titman AC, Sharples LD (2010) Semi-Markov models with phase-type sojourn distributions. Biometrics 66(3):742–752
Titman AC, Sharples LD (2010) Model diagnostics for multi-state models. Stat Methods Med Res 19:621–651
Titman AC (2011) Flexible nonhomogeneous Markov models for panel observed data. Biometrics 67(3):780–787
Tolusso D, Cook RJ (2009) Robust estimation of state occupancy probabilities for interval-censored multistate data: an application involving spondylitis in psoriatic arthritis. Commun Stat, Theory Methods 38(18):3307–3325
Tom BDM, Farewell VT (2011) Intermittent observation of time-dependent explanatory variables: a multistate modelling approach. Stat Med 30(30):3520–3531
Tsiatis AA, Davidian M (2004) Joint modeling of longitudinal and time-to-event data: an overview. Stat Sin 14:809–834
Turnbull B (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc B 38:290–295
Tyas SL, Salazar JC, Snowdon DA, Desrosiers MF, Riley KP, Mendiondo MS, Kryscio RJ (2007) Transitions to mild cognitive impairments, dementia, and death: findings from the nun study. Am J Epidemiol 165(11):1231–1238
Wei LJ, Lin DY, Weissfeld L (1989) Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc 84(408):1065–1073
Yang Y, Nair VN (2011) Parametric inference for time-to-failure in multi-state semi-Markov models: a comparison of marginal and process approaches. Can J Stat 39(3):537–555
Acknowledgements
We thank the Editor and Jianwen Cai for inviting this article in connection with the Symposium on Emerging Methodological Issues in Population-Based Chronic Disease Research, in honor of Ross Prentice (October 20–21, 2011). This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada and the Canadian Institutes for Health Research. Richard Cook is a Canada Research Chair in Statistical Methods for Health Research. The authors thank Dr. Dafna Gladman and Dr. Vinod Chandran for collaboration and helpful discussions regarding the research at the Centre for Prognosis Studies in Rheumatic Disease at the University of Toronto. The Diabetes Control and Complications Trial was sponsored by the Division of Diabetes, Endocrinology, and Metabolic Diseases of the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institutes of Health, through cooperative agreements and a research grant. Additional support was provided by the National Heart, Lung, and Blood Institute, the National Eye Institute, and the National Center for Research Resources.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cook, R.J., Lawless, J.F. Statistical Issues in Modeling Chronic Disease in Cohort Studies. Stat Biosci 6, 127–161 (2014). https://doi.org/10.1007/s12561-013-9087-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-013-9087-8