Abstract
With the integration of electronic health records (EHRs), health data has become easily accessible and abounded. The EHR has the potential to provide important healthcare information to researchers by creating study cohorts. However, accessing this information comes with three major issues: 1) Predictor variables often change over time, 2) Patients have various lengths of follow up within the EHR, and 3) the size of the EHR data can be computationally challenging. Landmark analyses provide a perfect complement to EHR data and help to alleviate these three issues. We present two examples that utilize patient birthdays as landmark times for creating dynamic datasets for predicting clinical outcomes. The use of landmark times help to solve these three issues by incorporating information that changes over time, by creating unbiased reference points that are not related to a patient’s exposure within the EHR, and reducing the size of a dataset compared to true time-varying analysis. These techniques are shown using two example cohort studies from the Cleveland Clinic that utilized 4.5 million and 17,787 unique patients, respectively.
Similar content being viewed by others
References
Kokkonen EW, Davis SA, Lin HC, Dabade TS, Feldman SR, Fleischer AB Jr (2013) Use of electronic medical records differs by specialty and office settings. J Am Med Inform Assoc 20:e33–8
Shapiro S, Rosenberg L (2005) Bias in Case Control Studies. In: Peter Armitage, Theodore Colton (eds) Encyclopedia of Biostatistics. John Wiley & Sons, Ltd
US Preventive Services Task Force (1989) Guide to clinical preventive services: report of the US Preventive Services Task Force. DIANE publishing
Kalbfleisch J, Prentice R (2002) The Statistical Analysis of Failure Time Data, 2nd edn. Wiley, New York
Kanis JA (2002) Diagnosis of osteoporosis and assessment of fracture risk. The Lancet 359:1929–1936
Buuren S, Groothuis-Oudshoorn K (2011) MICE: Multivariate imputation by chained equations in R. J stat softw 45
Anderson JR, Cain KC, Gelber RD (1983) Analysis of survival by tumor response. J Clin Oncol 1:710–719
Dafni U (2011) Landmark analysis at the 25-year landmark point. Circulation: Cardiovasc Qual and Outcomes 4:363–371
de Cogain M, Krambeck AE, Rule AD, Li X, Bergstralh EJ, Gettman MT, Lieske JC (2012) Shock wave lithotripsy and diabetes mellitus: a population-based cohort study. Urology 79:298–302
Amin AP, Mukhopadhyay E, Nathan S, Napan S, Kelly RF (2009) Association of medical noncompliance and long-term adverse outcomes, after myocardial infarction in a minority and uninsured population. Transl Res 154:78–89
Parast L, Cai B, Bedayat A, Kumamaru KK, George E, Dill KE, Rybicki FJ (2012) Statistical methods for predicting mortality in patients diagnosed with acute pulmonary embolism. Acad Radiol 19:1465–1473
Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, New York
Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA (1982) Evaluating the yield of medical tests. JAMA 247:2543–2546
Harrell FE (2001) Regression modeling strategies : with applications to linear models, logistic regression, and survival analysis. Springer, New York
Wells BJ, Nowacki AS, Chagin K, Kattan MW (2013) Strategies for Handling Missing Data in Electronic Health Record Derived Data. eGEMs 1(3):Article 7
Wells BJ, Roth R, Nowacki AS, Arrigain S, Yu C, Rosenkrans WA Jr, Kattan MW (2013) Prediction of morbidity and mortality in patients with type 2 diabetes. PeerJ 1:e87
Mannino DM, Homa DM, Akinbami LJ, Ford ES, Redd SC (2002) Chronic obstructive pulmonary disease surveillance–United States, 1971–2000. Respir Care 47:1184–1199
van Houwelingen JC, Putter H (2012) Dynamic prediction in clinical survival analysis. CRC Press, Boca Raton
Acknowledgments
The authors would like to acknowledge the writing assistance of Stephanie S. Kocian, M.A.T., M.S.
Conflicts of interest
The authors report no potential conflicts of interest.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wells, B.J., Chagin, K.M., Li, L. et al. Using the landmark method for creating prediction models in large datasets derived from electronic health records. Health Care Manag Sci 18, 86–92 (2015). https://doi.org/10.1007/s10729-014-9281-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10729-014-9281-3