Skip to main content

Advertisement

Log in

Statistical Issues in Modeling Chronic Disease in Cohort Studies

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Observational cohort studies of individuals with chronic disease provide information on rates of disease progression, the effect of fixed and time-varying risk factors, and the extent of heterogeneity in the course of disease. Analysis of this information is often facilitated by the use of multistate models with intensity functions governing transition between disease states. We discuss modeling and analysis issues for such models when individuals are observed intermittently. Frameworks for dealing with heterogeneity and measurement error are discussed including random effect models, finite mixture models, and hidden Markov models. Cohorts are often defined by convenience and ways of addressing outcome-dependent sampling or observation of individuals are also discussed. Data on progression of joint damage in psoriatic arthritis and retinopathy in diabetes are analysed to illustrate these issues and related methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Aalen OO (1989) A linear regression model for the analysis of life times. Stat Med 8:907–925

    Article  Google Scholar 

  2. Aalen OO, Borgan O, Fekjaer H (2001) Covariate adjustment of event histories estimated with Markov chains: the additive approach. Biometrics 57:993–1001

    Article  MATH  MathSciNet  Google Scholar 

  3. Aalen OO, Borgan O, Gjessing HK (2008) Survival and event history analysis: a process point of view. Springer, New York

    Book  Google Scholar 

  4. Aalen OO (2012) Armitage lecture 2010: understanding treatment effects: the value of integrating longitudinal data and survival analysis. Stat Med 31:1903–1917

    Article  MathSciNet  Google Scholar 

  5. Al-Kateb H, Boright AP, Mirea L, Xie X, Sutradhar R, Mowjoodi A, Bharaj B, Liu M, Bucksa JM, Arends VL, Steffes MW, Cleary PA, Sun W, Lachin JM, Thorner PS, Ho M, McKnight AJ, Maxwell AP, Savage DA, Kidd KK, Kidd JR, Speed WC, Orchard TJ, Miller RG, Sun L, Bull SB, Paterson AD, The Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications Research Group (2008) Multiple superoxide dismutase 1/splicing factor serine alanine 15 variants are associated with the development and progression of diabetic nephropathy: the diabetes control and complications trial/epidemiology of diabetes interventions and complications genetics study. Diabetes 57:218–228

    Article  Google Scholar 

  6. Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York

    Book  MATH  Google Scholar 

  7. Andersen PK (2002) Multi-state models for event history analysis. Stat Methods Med Res 11:91–115

    Article  MATH  Google Scholar 

  8. Andersen PK, Klein JP, Rosthoj S (2003) Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika 90:15–27

    Article  MATH  MathSciNet  Google Scholar 

  9. Bacchetti P, Boylan RD, Terrault NA, Monto A, Berenguer M (2010) Non-Markov multistate modeling using time-varying covariates, with application to progression of liver fibrosis due to hepatitis C following liver transplant. Int J Biostat 6(1):7

    MathSciNet  Google Scholar 

  10. Barrett JK, Siannis F, Farewell VT (2011) A semi-competing risks model for data with interval-censoring and informative observation: an application to the MRC cognitive function and ageing study. Stat Med 30:1–10

    Article  MathSciNet  Google Scholar 

  11. Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M (2009) Using the whole cohort in the analysis of case-cohort data. Am J Epidemiol 169(11):1398–1405

    Article  Google Scholar 

  12. Bureau A, Shiboski S, Hughes JP (2003) Applications of continuous time hidden Markov models to the study of misclassified disease outcomes. Stat Med 22(3):441–462

    Article  Google Scholar 

  13. Chandran V, Cook RJ, Edwin J, Shen H, Pellett FJ, Shanmugarajah S, Rosen CF, Gladman DD (2010) Soluble biomarkers differentiate patients with psoriatic arthritis from those with psoriasis without arthritis. Rheumatology 49(7):1399–1405

    Article  Google Scholar 

  14. Chandran V, Tolusso DC, Cook RJ, Gladman DD (2010) Risk factors for axial inflammatory arthritis in patients with psoriatic arthritis. J Rheumatol 37:809–815

    Article  Google Scholar 

  15. Chandran V, Cook RJ, Thavaneswaran A, Lee KA, Pellett FJ, Gladman DD (2012) Parametric survival analysis as well as multi-state analysis confirms the association between human leukocyte antigen alleles and the development of arthritis mutilans in patients with psoriatic arthritis. J Rheumatol 39(8):1723

    Google Scholar 

  16. Chen B, Yi GY, Cook RJ (2010) Analysis of interval-censored disease progression data via multi-state models under a nonignorable inspection process. Stat Med 29(11):1175–1189

    Article  MathSciNet  Google Scholar 

  17. Commenges D (1999) Multi-state models in epidemiology. Lifetime Data Anal 5:315–327

    Article  MATH  MathSciNet  Google Scholar 

  18. Cook RJ, Lawless JF (2007) The statistical analysis of recurrent events. Springer, New York

    MATH  Google Scholar 

  19. Cook RJ, Lawless JF, Lakhal-Chaieb L, Lee K-A (2009) Robust estimation of mean functions and treatment effects for recurrent events under event-dependent censoring and termination: application to skeletal complications in cancer metastatic to bone. J Am Stat Assoc 104(485):60–75

    Article  MathSciNet  Google Scholar 

  20. Copas AJ, Farewell VT (2001) Incorporating retrospective data into an analysis of time to illness. Biostatistics 2(1):1–12

    Article  MATH  Google Scholar 

  21. Cox DR, Miller HD (1965) The theory of stochastic processes. Chapman and Hall, London

    MATH  Google Scholar 

  22. Cox DR (1975) Partial likelihood. Biometrika 62:269–276

    Article  MATH  MathSciNet  Google Scholar 

  23. Datta S, Satten GA (2001) Validity of the Aalen-Johansen estimators of stage occupation probabilities and Nelson-Aalen estimators of integrated transition hazards for non-Markov models. Stat Probab Lett 55:403–411

    Article  MATH  MathSciNet  Google Scholar 

  24. Early Treatment Diabetic Retinopathy Study Research Group (1991) Fundus photographic risk factors for progression of diabetic retinopathy: ETDRS report number 12. Ophthalmology 98(Suppl):823–833

    Google Scholar 

  25. Fahrmeir L, Tutz G (2001) Multivariate statistical modeling based on generalized linear models. Springer, New York

    Book  Google Scholar 

  26. Farewell VT, Lawless JF, Gladman DD, Urowitz MB (2003) Tracing studies and analysis of the effect of loss to follow-up on mortality estimation from patient registry data. J R Stat Soc, Ser C, Appl Stat 52(4):445–456

    Article  MATH  MathSciNet  Google Scholar 

  27. Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G (2009) Longitudinal data analysis. Handbooks of modern statistical methods. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  28. Frydman H (1984) Maximum likelihood estimation in the mover-stayer model. J Am Stat Assoc 79(387):632–638

    Article  MATH  MathSciNet  Google Scholar 

  29. Frydman H (1992) A nonparametric estimation procedure for a periodically observed three-state Markov process, with applications to AIDS. J R Stat Soc B 54:853–866

    MATH  MathSciNet  Google Scholar 

  30. Frydman H (1995) Semiparametric estimation in a three-state duration-dependent Markov model from interval-censored observations with application to AIDS data. Biometrics 51:502–511

    Article  MATH  Google Scholar 

  31. Frydman H, Szarek M (2009) Nonparametric estimation in a Markov “illness-death” process from interval censored observations with missing intermediate transition status. Biometrics 65:142–151

    Article  MathSciNet  Google Scholar 

  32. Gentleman RC, Lawless JF, Lindsey JC, Yan P (1994) Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Stat Med 13(8):805–821

    Article  Google Scholar 

  33. Gladman DD, Farewell VT, Nadeau C (1995) Clinical indicators of progression in psoriatic arthritis: multivariate relative risk model. J Rheumatol 22(4):675–679

    Google Scholar 

  34. Goodman LA (1961) Statistical methods for the mover-stayer model. J Am Stat Assoc 56:841–868

    Article  Google Scholar 

  35. Granger CWJ (1969) Investigating causal relationships by econometric models and cross-spectral methods. Econometrica 37:424–438

    Article  Google Scholar 

  36. Grüger J, Kay R, Schumacher M (1991) The validity of inferences based on incomplete observations in disease state models. Biometrics 47:595–605

    Article  Google Scholar 

  37. Hajducek DM, Lawless JF (2012) Duration analysis in longitudinal studies with intermittent observation times and losses to followup. Can J Stat 40(1):1–21

    Article  MATH  MathSciNet  Google Scholar 

  38. Hogan JW, Roy J, Korkontzelou C (2004) Handling dropouts in longitudinal studies. Stat Med 23:1455–1497

    Article  Google Scholar 

  39. Hougaard P (1999) Multi-state models. A review. Lifetime Data Anal 5:239–264

    Article  MATH  MathSciNet  Google Scholar 

  40. Jackson CH, Sharples LD (2002) Hidden Markov models for the onset and progression of bronchiolitis obliterans syndrome in lung transplant recipients. Stat Med 21:113–128

    Article  Google Scholar 

  41. Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E (2003) Multistate Markov models for disease progression with classification error. J R Stat Soc, Ser C, Stat 52:193–209

    Article  MathSciNet  Google Scholar 

  42. Jackson CH (2011) Multi-state models for panel data: the msm package for R. J Stat Softw 38(8):1–28

    Google Scholar 

  43. Joly P, Commenges D, Helmer C, Letenneur L (2002) A penalized likelihood approach for an illness-death model with interval-censored data: application to age-specific incidence of dementia. Biostatistics 3:433–443

    Article  MATH  Google Scholar 

  44. Kalbfleisch J, Lawless J (1985) The analysis of panel data under a Markov assumption. J Am Stat Assoc 80(392):863–871

    Article  MATH  MathSciNet  Google Scholar 

  45. Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data. Wiley, New York

    Book  MATH  Google Scholar 

  46. Kay R (1986) A Markov model for analysing cancer markers and disease states in survival studies. Biometrics 42:855–865

    Article  MATH  Google Scholar 

  47. Keiding N, Klein JP, Horowitz MM (2001) Multi-state models and outcome prediction in bone marrow transplantation. Stat Med 20:1871–1885

    Article  Google Scholar 

  48. Kessing LV, Hansen MG, Andersen PK, Angst J (2004) The predictive effect of episodes on the risk of recurrence in depressive and bipolar disorders—a life-long perspective. Acta Psychiatr Scand 109(5):339–344

    Article  Google Scholar 

  49. Kvist K, Andersen PK, Angst J, Kessing LV (2010) Event dependent sampling of recurrent events. Lifetime Data Anal 16:580–598

    Article  MathSciNet  Google Scholar 

  50. Lawless JF, Fong DYT (1999) State duration models in clinical and observational studies. Stat Med 18:2365–2376

    Article  Google Scholar 

  51. Lawless JF, Kalbfleisch JD, Wild CJ (1999) Semiparametric methods for response-selective and missing data problems in regression. J R Stat Soc, Ser B, Stat Methodol 61(2):413–438

    Article  MATH  MathSciNet  Google Scholar 

  52. Lawless JF, Wigg MB, Tuli S, Drake J, Lamberti-Pasculli M (2001) Analysis of repeated failures of durations, with application to shunt failures for patients with paediatric hydrocephalus. J R Stat Soc, Ser C, Appl Stat 50:449–465

    Article  MATH  MathSciNet  Google Scholar 

  53. Lawless JF (2013) The design and analysis of life history studies. Stat Med. doi:10.1002/sim.5754

    MATH  MathSciNet  Google Scholar 

  54. Lee EW, Kim MY (1998) The analysis of correlated panel data using a continuous-time Markov model. Biometrics 54(4):1638–1644

    Article  MATH  Google Scholar 

  55. Lin H, Guo Z, Peduzzi PN, Gill TM, Allore HG (2008) A semiparametric transition model with latent traits for longitudinal multistate data. Biometrics 64(4):1032–1042

    Article  MATH  MathSciNet  Google Scholar 

  56. Lumley T, Shaw PA, Dai JY (2011) Connections between survey calibration estimators and semiparametric models for incomplete data (with discussion). Int Stat Rev 79:200–220

    Article  MATH  Google Scholar 

  57. Mandel M, Betensky RA (2008) Estimating time-to-event from longitudinal ordinal data using random-effects Markov models: application to multiple sclerosis progression. Biostatistics 9(4):750–764

    Article  MathSciNet  Google Scholar 

  58. Martinussen T, Scheike TH (2006) Dynamic regression models for survival data. Springer, New York

    MATH  Google Scholar 

  59. Mueller PW, Rogus JJ, Cleary PA, Zhao Y, Smiles AM, Steffes MW, Bucksa J, Gibson TB, Cordovado SK, Krolewski AS, Nierras CR, Warram JH (2006) Genetics of Kidneys in Diabetes (GoKinD) study: a genetics collection available for identifying genetic susceptibility factors in diabetic nephropathy in type I diabetes. J Am Soc Nephrol 17(7):1782–1790

    Article  Google Scholar 

  60. Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer, New York

    MATH  Google Scholar 

  61. O’Keeffe AG, Tom BDM, Farewell VT (2011) A case-study in the clinical epidemiology of psoriatic arthritis: multistate models and causal arguments. J R Stat Soc, Ser C, Appl Stat 60(5):675–699

    Article  MathSciNet  Google Scholar 

  62. O’Keeffe AG, Tom BDM, Farewell VT (2013) Mixture distributions in multi-state modelling: some considerations in a study of psoriatic arthritis. Stat Med 32:600–619

    Article  MathSciNet  Google Scholar 

  63. Pencina MJ, Larson MG, D’Agostino RB (2007) Choice of time scale and its effect on significance of predictors in longitudinal studies. Stat Med 26(6):1343–1359

    Article  MathSciNet  Google Scholar 

  64. Prentice RL, Langer R, Stefanick ML, Howard BV, Pettinger M, Anderson G, Barad D, Curb JD, Kotchen J, Kuller L, Limacher M, Wactawski-Wende J, The Women’s Health Initiative Investigators (2005) Combined postmenopausal hormone therapy and cardiovascular disease: toward resolving the discrepancy between observational studies and the Women’s Health Initiative Clinical Trial. Am J Epidemiol 162(5):404–414

    Article  Google Scholar 

  65. Raboud J, Reid N, Coates RA, Farewell VT (1993) Estimating risks of progressing to AIDS when covariates are measured with error. J R Stat Soc A 156(3):393–406

    Article  Google Scholar 

  66. Rahman P, Gladman P, Cook RJ, Zhou Y, Young G, Salonen D (1998) Radiological assessment in psoriatic arthritis. Br J Rheumatol 37:760–765

    Article  Google Scholar 

  67. Robins JM, Rotnitzky A, Zhao LP (1995) Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 90(429):106–121

    Article  MATH  MathSciNet  Google Scholar 

  68. Samuelsen SO, Anestad H, Skrondal A (2007) Stratified case-cohort analysis of general cohort sampling designs. Scand J Stat 34:103–119

    Article  MATH  MathSciNet  Google Scholar 

  69. Satten GA, Longini IM (1996) Markov chains with measurement error: estimating the “true” course of a marker of the progression of human immunodeficiency virus disease. Appl Stat 45(3):275–309

    Article  MATH  Google Scholar 

  70. Satten GA (1999) Estimating the extent of tracking in interval-censored chain-of-events data. Biometrics 55:1228–1231

    Article  MATH  Google Scholar 

  71. Scheike TH, Zhang M-J (2002) An additive-multiplicative Cox-Aalen regression model. Scand J Stat 29:75–88

    Article  MATH  MathSciNet  Google Scholar 

  72. Sweeting MJ, De Angelis D, Neal KR, Ramsay ME, Irving WL, Wright M, Brant L, Harris HE, Trent HCV Study Group, HCV National Register Steering Group (2006) Estimated progression rates in three United Kingdom hepatitis C cohorts differed according to method of recruitment. J Clin Epidemiol 59:144–152

    Article  Google Scholar 

  73. Sweeting MJ, Farewell VT, De Angelis D (2010) Multi-state Markov models for disease progression in the presence of informative examination times: an application to hepatitis C. Stat Med 29(11):1161–1174

    Article  MathSciNet  Google Scholar 

  74. Sutradhar R, Cook RJ (2008) Analysis of interval-censored data from clustered multistate processes: application to joint damage in psoriatic arthritis. J R Stat Soc, Ser C, Appl Stat 57(5):553–566

    Article  MathSciNet  Google Scholar 

  75. The Diabetes Control and Complications Trial Research Group (1993) The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus. N Engl J Med 329(14):977–986

    Article  Google Scholar 

  76. The Diabetes Control and Complications Trial Research Group (1995) Progression of retinopathy with intensive versus conventional treatment in the diabetes control and complications trial. Opthalmology 102:647–661

    Article  Google Scholar 

  77. The Diabetes Control and Complications Trial Research Group (1998) Early worsening of diabetic retinopathy in the diabetes control and complications trial. Arch Opthalmol 116:874–887

    Article  Google Scholar 

  78. Titman AC, Sharples LD (2010) Semi-Markov models with phase-type sojourn distributions. Biometrics 66(3):742–752

    Article  MATH  MathSciNet  Google Scholar 

  79. Titman AC, Sharples LD (2010) Model diagnostics for multi-state models. Stat Methods Med Res 19:621–651

    Article  MathSciNet  Google Scholar 

  80. Titman AC (2011) Flexible nonhomogeneous Markov models for panel observed data. Biometrics 67(3):780–787

    Article  MATH  MathSciNet  Google Scholar 

  81. Tolusso D, Cook RJ (2009) Robust estimation of state occupancy probabilities for interval-censored multistate data: an application involving spondylitis in psoriatic arthritis. Commun Stat, Theory Methods 38(18):3307–3325

    Article  MATH  MathSciNet  Google Scholar 

  82. Tom BDM, Farewell VT (2011) Intermittent observation of time-dependent explanatory variables: a multistate modelling approach. Stat Med 30(30):3520–3531

    Article  MathSciNet  Google Scholar 

  83. Tsiatis AA, Davidian M (2004) Joint modeling of longitudinal and time-to-event data: an overview. Stat Sin 14:809–834

    MATH  MathSciNet  Google Scholar 

  84. Turnbull B (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc B 38:290–295

    MATH  MathSciNet  Google Scholar 

  85. Tyas SL, Salazar JC, Snowdon DA, Desrosiers MF, Riley KP, Mendiondo MS, Kryscio RJ (2007) Transitions to mild cognitive impairments, dementia, and death: findings from the nun study. Am J Epidemiol 165(11):1231–1238

    Article  Google Scholar 

  86. Wei LJ, Lin DY, Weissfeld L (1989) Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J Am Stat Assoc 84(408):1065–1073

    Article  MathSciNet  Google Scholar 

  87. Yang Y, Nair VN (2011) Parametric inference for time-to-failure in multi-state semi-Markov models: a comparison of marginal and process approaches. Can J Stat 39(3):537–555

    MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the Editor and Jianwen Cai for inviting this article in connection with the Symposium on Emerging Methodological Issues in Population-Based Chronic Disease Research, in honor of Ross Prentice (October 20–21, 2011). This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada and the Canadian Institutes for Health Research. Richard Cook is a Canada Research Chair in Statistical Methods for Health Research. The authors thank Dr. Dafna Gladman and Dr. Vinod Chandran for collaboration and helpful discussions regarding the research at the Centre for Prognosis Studies in Rheumatic Disease at the University of Toronto. The Diabetes Control and Complications Trial was sponsored by the Division of Diabetes, Endocrinology, and Metabolic Diseases of the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institutes of Health, through cooperative agreements and a research grant. Additional support was provided by the National Heart, Lung, and Blood Institute, the National Eye Institute, and the National Center for Research Resources.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerald F. Lawless.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cook, R.J., Lawless, J.F. Statistical Issues in Modeling Chronic Disease in Cohort Studies. Stat Biosci 6, 127–161 (2014). https://doi.org/10.1007/s12561-013-9087-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-013-9087-8

Keywords

Navigation