Skip to main content

Advertisement

Log in

Data electronically extracted from the electronic health record require validation

  • Article
  • Published:
Journal of Perinatology Submit manuscript

Abstract

Objectives

Determine sources of error in electronically extracted data from electronic health records.

Study design

Categorical and continuous variables related to early-onset neonatal hypoglycemia were preselected and electronically extracted from records of 100 randomly selected neonates within 3479 births with laboratory-proven early-onset hypoglycemia. Extraction language was written by an information technologist and data validated by blinded manual chart review. Kappa coefficient assessed categorical variables and percent validity continuous variables.

Results

8/23 (35%) categorical variables had acceptable Κappa (1–0.81); 5/23 (22%) had fair-slight agreement, Κappa < 0.40. Notably, “hypoglycemia” had poor agreement, Κappa 0.16. In contrast, 6/8 continuous variables had validity ≥ 94%. After correcting extraction language, 6/9 variables were corrected and inter-rater validation improved. However, “hypoglycemia” was not corrected, remaining an issue.

Conclusions

Data extraction without validation procedures, especially categorical variables using International Classification of Diseases-9 (ICD-9) codes, often results in incorrect data identification. Electronically extracted data must incorporate built-in validating processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Dick RS, Steen EB, Detmer DE. The computer-based patient record: revised ediction: an essential technology for health care. Institute of Medicine Committee on Improving the Patient Record.. Washington DC: National Academies Press; 1997.

    Google Scholar 

  2. Steinbrook R. Health care and the American recovery and reinvestment act. N Engl J Med. 2009;360:1057–60.

    Article  CAS  Google Scholar 

  3. Wasserman RC. Electronic medical records (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research. Acad Pediatr. 2011;11:280–7.

    Article  Google Scholar 

  4. Sutherland SM, Kaelber DC, Downing NL, Goel VV, Longhurst CA. Electronic health record–enabled research in children using the electronic health record for clinical discovery. Pediatr Clin. 2016;63:251–68.

    Article  Google Scholar 

  5. Ancker JS, Shih S, Singh MP, Snuder A, Edwards A, Kaushal R. Root causes underlying challenges to secondary use of data. AMIA Annu Symp Proc. 2011;2011:57–62.

    PubMed  PubMed Central  Google Scholar 

  6. Denny JC. Mining electronic health records in the genomics era. PLoS Comput Biol. 2012;8:e1002823.

    Article  CAS  Google Scholar 

  7. Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PRO, Bernstam EV, et al. Caveats for the use of operational electronic health recod data in comparative effectiveness research. Med Care. 2013;51:S30–S37.

    Article  Google Scholar 

  8. Dean BB, Lam J, Natoli JL, Butler Q, Aguilar D, Nordyke RJ. Use of electronic medical records for health outcomes research: a literature review. Med Care Res Rev. 2009;66:611–38.

    Article  Google Scholar 

  9. Lucyk K, Tang K, Quan H. Barriers to data quality resulting from the process of coding health information to administratice data: a qualitative study. BMC Health Serv Res. 2017;17:1–10.

    Article  Google Scholar 

  10. Tang KL, Lucyk K, Quan H. Coder perspectives on physician-related barriers to producing high-quality administrative data: a qualitative study. CMAJ Open. 2017;5:E617–E622.

    Article  Google Scholar 

  11. Li L, Chase HS, Patel CO, Friedman C, Weng C. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. AMIA Annu Proc. 2008;404–8.

  12. Newgard CD, Zive D, Jui J, Weathers C, Daya M. Electronic versus manual data processing: evaluating the use of electronic health records in out‐of‐hospital clinical research. Acad Emerg Med. 2012;19:217–27.

    Article  Google Scholar 

  13. Liaw S-T, Taggart J, Yu H, de Lusignan S. Data extraction from electronic health records-existing tools may be unreliable and potentially unsafe. Aust Fam Physician. 2013;42:820–3.

    PubMed  Google Scholar 

  14. Knake LA, Ahuja M, McDonald EL, Ryckman KK, Weathers N, Burstain T, et al. Quality of EHR data extractions for studies of preterm birth in a tertiary care center: guidelines for obtaining reliable data. BMC Pediatr. 2016;16:59.

    Article  Google Scholar 

  15. Bailie R, Bailie J, Chakraborty A, Swift K. Consistency of denominator data in electronic health records in Australian primary healthcare services: enhancing data quality. Aust J Prim Health. 2015;21:450–9.

    Article  Google Scholar 

  16. Parsons A, McCullough C, Wang J, Shih S. Validity of electronic health record-derived quality measurement for performance monitoring. J Am Med Inform Assoc. 2012;19:604–9.

    Article  Google Scholar 

  17. Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc. 2016;23:1007–15.

    Article  Google Scholar 

  18. Castillo EG, Olfson M, Pincus HA, Vawdrey D, Stroup TS. Electronic health records in mental health research: a framework for developing valid research methods. Psychiatr Serv. 2015;66:193–6.

    Article  Google Scholar 

Download references

Acknowledgements

Dr. Scheid was a trainee in Neonatal-Perinatal Medicine at UT Southwestern Medical School. This study was presented in part as an oral presentation at the annual meeting of the Pediatric Academic Societies in San Francisco, CA, May, 2017.

Funding:

George L. MacGregor Professorship in Pediatrics awarded to Dr. Rosenfeld

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charles R. Rosenfeld.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Scheid, L.M., Brown, L.S., Clark, C. et al. Data electronically extracted from the electronic health record require validation. J Perinatol 39, 468–474 (2019). https://doi.org/10.1038/s41372-018-0311-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41372-018-0311-8

  • Springer Nature America, Inc.

This article is cited by

Navigation