Skip to main content

Advertisement

Log in

Using the landmark method for creating prediction models in large datasets derived from electronic health records

  • Published:
Health Care Management Science Aims and scope Submit manuscript

Abstract

With the integration of electronic health records (EHRs), health data has become easily accessible and abounded. The EHR has the potential to provide important healthcare information to researchers by creating study cohorts. However, accessing this information comes with three major issues: 1) Predictor variables often change over time, 2) Patients have various lengths of follow up within the EHR, and 3) the size of the EHR data can be computationally challenging. Landmark analyses provide a perfect complement to EHR data and help to alleviate these three issues. We present two examples that utilize patient birthdays as landmark times for creating dynamic datasets for predicting clinical outcomes. The use of landmark times help to solve these three issues by incorporating information that changes over time, by creating unbiased reference points that are not related to a patient’s exposure within the EHR, and reducing the size of a dataset compared to true time-varying analysis. These techniques are shown using two example cohort studies from the Cleveland Clinic that utilized 4.5 million and 17,787 unique patients, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Kokkonen EW, Davis SA, Lin HC, Dabade TS, Feldman SR, Fleischer AB Jr (2013) Use of electronic medical records differs by specialty and office settings. J Am Med Inform Assoc 20:e33–8

    Article  Google Scholar 

  2. Shapiro S, Rosenberg L (2005) Bias in Case Control Studies. In: Peter Armitage, Theodore Colton (eds) Encyclopedia of Biostatistics. John Wiley & Sons, Ltd

  3. US Preventive Services Task Force (1989) Guide to clinical preventive services: report of the US Preventive Services Task Force. DIANE publishing

  4. Kalbfleisch J, Prentice R (2002) The Statistical Analysis of Failure Time Data, 2nd edn. Wiley, New York

    Book  Google Scholar 

  5. Kanis JA (2002) Diagnosis of osteoporosis and assessment of fracture risk. The Lancet 359:1929–1936

    Article  Google Scholar 

  6. Buuren S, Groothuis-Oudshoorn K (2011) MICE: Multivariate imputation by chained equations in R. J stat softw 45

  7. Anderson JR, Cain KC, Gelber RD (1983) Analysis of survival by tumor response. J Clin Oncol 1:710–719

    Google Scholar 

  8. Dafni U (2011) Landmark analysis at the 25-year landmark point. Circulation: Cardiovasc Qual and Outcomes 4:363–371

    Google Scholar 

  9. de Cogain M, Krambeck AE, Rule AD, Li X, Bergstralh EJ, Gettman MT, Lieske JC (2012) Shock wave lithotripsy and diabetes mellitus: a population-based cohort study. Urology 79:298–302

    Article  Google Scholar 

  10. Amin AP, Mukhopadhyay E, Nathan S, Napan S, Kelly RF (2009) Association of medical noncompliance and long-term adverse outcomes, after myocardial infarction in a minority and uninsured population. Transl Res 154:78–89

    Article  Google Scholar 

  11. Parast L, Cai B, Bedayat A, Kumamaru KK, George E, Dill KE, Rybicki FJ (2012) Statistical methods for predicting mortality in patients diagnosed with acute pulmonary embolism. Acad Radiol 19:1465–1473

    Article  Google Scholar 

  12. Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, New York

    Book  Google Scholar 

  13. Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA (1982) Evaluating the yield of medical tests. JAMA 247:2543–2546

    Article  Google Scholar 

  14. Harrell FE (2001) Regression modeling strategies : with applications to linear models, logistic regression, and survival analysis. Springer, New York

    Book  Google Scholar 

  15. Wells BJ, Nowacki AS, Chagin K, Kattan MW (2013) Strategies for Handling Missing Data in Electronic Health Record Derived Data. eGEMs 1(3):Article 7

  16. Wells BJ, Roth R, Nowacki AS, Arrigain S, Yu C, Rosenkrans WA Jr, Kattan MW (2013) Prediction of morbidity and mortality in patients with type 2 diabetes. PeerJ 1:e87

    Article  Google Scholar 

  17. Mannino DM, Homa DM, Akinbami LJ, Ford ES, Redd SC (2002) Chronic obstructive pulmonary disease surveillance–United States, 1971–2000. Respir Care 47:1184–1199

    Google Scholar 

  18. van Houwelingen JC, Putter H (2012) Dynamic prediction in clinical survival analysis. CRC Press, Boca Raton

    Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge the writing assistance of Stephanie S. Kocian, M.A.T., M.S.

Conflicts of interest

The authors report no potential conflicts of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brian J. Wells.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wells, B.J., Chagin, K.M., Li, L. et al. Using the landmark method for creating prediction models in large datasets derived from electronic health records. Health Care Manag Sci 18, 86–92 (2015). https://doi.org/10.1007/s10729-014-9281-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10729-014-9281-3

Keywords

Navigation