Abstract
Objectives
Determine sources of error in electronically extracted data from electronic health records.
Study design
Categorical and continuous variables related to early-onset neonatal hypoglycemia were preselected and electronically extracted from records of 100 randomly selected neonates within 3479 births with laboratory-proven early-onset hypoglycemia. Extraction language was written by an information technologist and data validated by blinded manual chart review. Kappa coefficient assessed categorical variables and percent validity continuous variables.
Results
8/23 (35%) categorical variables had acceptable Κappa (1–0.81); 5/23 (22%) had fair-slight agreement, Κappa < 0.40. Notably, “hypoglycemia” had poor agreement, Κappa 0.16. In contrast, 6/8 continuous variables had validity ≥ 94%. After correcting extraction language, 6/9 variables were corrected and inter-rater validation improved. However, “hypoglycemia” was not corrected, remaining an issue.
Conclusions
Data extraction without validation procedures, especially categorical variables using International Classification of Diseases-9 (ICD-9) codes, often results in incorrect data identification. Electronically extracted data must incorporate built-in validating processes.
Similar content being viewed by others
References
Dick RS, Steen EB, Detmer DE. The computer-based patient record: revised ediction: an essential technology for health care. Institute of Medicine Committee on Improving the Patient Record.. Washington DC: National Academies Press; 1997.
Steinbrook R. Health care and the American recovery and reinvestment act. N Engl J Med. 2009;360:1057–60.
Wasserman RC. Electronic medical records (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research. Acad Pediatr. 2011;11:280–7.
Sutherland SM, Kaelber DC, Downing NL, Goel VV, Longhurst CA. Electronic health record–enabled research in children using the electronic health record for clinical discovery. Pediatr Clin. 2016;63:251–68.
Ancker JS, Shih S, Singh MP, Snuder A, Edwards A, Kaushal R. Root causes underlying challenges to secondary use of data. AMIA Annu Symp Proc. 2011;2011:57–62.
Denny JC. Mining electronic health records in the genomics era. PLoS Comput Biol. 2012;8:e1002823.
Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PRO, Bernstam EV, et al. Caveats for the use of operational electronic health recod data in comparative effectiveness research. Med Care. 2013;51:S30–S37.
Dean BB, Lam J, Natoli JL, Butler Q, Aguilar D, Nordyke RJ. Use of electronic medical records for health outcomes research: a literature review. Med Care Res Rev. 2009;66:611–38.
Lucyk K, Tang K, Quan H. Barriers to data quality resulting from the process of coding health information to administratice data: a qualitative study. BMC Health Serv Res. 2017;17:1–10.
Tang KL, Lucyk K, Quan H. Coder perspectives on physician-related barriers to producing high-quality administrative data: a qualitative study. CMAJ Open. 2017;5:E617–E622.
Li L, Chase HS, Patel CO, Friedman C, Weng C. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. AMIA Annu Proc. 2008;404–8.
Newgard CD, Zive D, Jui J, Weathers C, Daya M. Electronic versus manual data processing: evaluating the use of electronic health records in out‐of‐hospital clinical research. Acad Emerg Med. 2012;19:217–27.
Liaw S-T, Taggart J, Yu H, de Lusignan S. Data extraction from electronic health records-existing tools may be unreliable and potentially unsafe. Aust Fam Physician. 2013;42:820–3.
Knake LA, Ahuja M, McDonald EL, Ryckman KK, Weathers N, Burstain T, et al. Quality of EHR data extractions for studies of preterm birth in a tertiary care center: guidelines for obtaining reliable data. BMC Pediatr. 2016;16:59.
Bailie R, Bailie J, Chakraborty A, Swift K. Consistency of denominator data in electronic health records in Australian primary healthcare services: enhancing data quality. Aust J Prim Health. 2015;21:450–9.
Parsons A, McCullough C, Wang J, Shih S. Validity of electronic health record-derived quality measurement for performance monitoring. J Am Med Inform Assoc. 2012;19:604–9.
Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc. 2016;23:1007–15.
Castillo EG, Olfson M, Pincus HA, Vawdrey D, Stroup TS. Electronic health records in mental health research: a framework for developing valid research methods. Psychiatr Serv. 2015;66:193–6.
Acknowledgements
Dr. Scheid was a trainee in Neonatal-Perinatal Medicine at UT Southwestern Medical School. This study was presented in part as an oral presentation at the annual meeting of the Pediatric Academic Societies in San Francisco, CA, May, 2017.
Funding:
George L. MacGregor Professorship in Pediatrics awarded to Dr. Rosenfeld
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Scheid, L.M., Brown, L.S., Clark, C. et al. Data electronically extracted from the electronic health record require validation. J Perinatol 39, 468–474 (2019). https://doi.org/10.1038/s41372-018-0311-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41372-018-0311-8
- Springer Nature America, Inc.
This article is cited by
-
Retrospective application of algorithms to improve identification of pregnancy outcomes from the electronic health record
Journal of Perinatology (2023)
-
Decreasing delivery room CPAP-associated pneumothorax at ≥35-week gestational age
Journal of Perinatology (2022)
-
The impact of the Baby Friendly Hospital Initiative on neonatal hypoglycemia
Journal of Perinatology (2020)