Skip to main content
Log in

Evaluation of Healthcare Interventions and Big Data: Review of Associated Data Issues

  • Leading Article
  • Published:
PharmacoEconomics Aims and scope Submit manuscript


Although the analysis of ‘big data’ holds tremendous potential to improve patient care, there remain significant challenges before it can be realized. Accuracy and completeness of data, linkage of disparate data sources, and access to data are areas that require particular focus. This article discusses these areas and shares strategies to promote progress. Improvement in clinical coding, innovative matching methodologies, and investment in data standardization are potential solutions to data validation and linkage problems. Challenges to data access still require significant attention with data ownership, security needs, and costs representing significant barriers to access.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff (Project Hope). 2014;33(7):1123–31 (Epub 2014/07/10. Eng).

    Article  Google Scholar 

  2. Schneeweiss S. Learning from big health care data. N Engl J Med. 2014;370(23):2161–3 (Epub 2014/06/05. Eng).

    Article  CAS  PubMed  Google Scholar 

  3. Tungol A, Starner CI, Gunderson BW, Schafer JA, Qiu Y, Gleason PP. Generic drug discount programs: are prescriptions being submitted for pharmacy benefit adjudication? J Manag Care Pharm JMCP. 2012;18(9):690–700 (Epub 2012/12/05. Eng).

    PubMed  Google Scholar 

  4. Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decision support? J Biomed Inform. 2009;42(5):760–72 (Epub 2009/08/18. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Herrett E, Gallagher AM, Bhaskaran K, Forbes H, Mathur R, van Staa T, et al. Data resource profile: Clinical Practice Research Datalink (CPRD). Int J Epidemiol. 2015;44(3):827–36 (Epub 2015/06/08. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Terris DD, Litaker DG, Koroukian SM. Health state information derived from secondary databases is affected by multiple sources of bias. J Clin Epidemiol. 2007;60(7):734–41 (Epub 2007/06/19. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Brookhart MA, Sturmer T, Glynn RJ, Rassen J, Schneeweiss S. Confounding control in healthcare database research: challenges and potential approaches. Med Care. 2010;48(6 Suppl):S114–20 (Epub 2010/05/18. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ (Clinical research ed). 2009;338:b2393 (Epub 2009/07/01. Eng).

    Article  PubMed Central  Google Scholar 

  9. Sturmer T, Schneeweiss S, Avorn J, Glynn RJ. Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration. Am J Epidemiol. 2005;162(3):279–89 (Epub 2005/07/01. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Sturmer T, Glynn RJ, Rothman KJ, Avorn J, Schneeweiss S. Adjustments for unmeasured confounders in pharmacoepidemiologic database studies using external information. Med Care. 2007;45(10 Supl 2):S158–65 (Epub 2007/10/25. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Johnson ML, Crown W, Martin BC, Dormuth CR, Siebert U. Good research practices for comparative effectiveness research: analytic methods to improve causal inference from nonrandomized studies of treatment effects using secondary data sources: the ISPOR Good Research Practices for Retrospective Database Analysis Task Force Report–Part III. Value Health J Int Soc Pharmacoecon Outcomes Res. 2009;12(8):1062–73 (Epub 2009/10/02. Eng).

    Article  Google Scholar 

  12. Outland B, Newman MM, William MJ. Health policy basics: implementation of the international classification of disease, 10th revision. Ann Intern Med. 2015;163:554–6. doi:10.7326/M15-1933.

    Article  PubMed  Google Scholar 

  13. Boyd AD, ‘John’ Li J, Kenost C, et al. Metrics and tools for consistent cohort discovery and financial analyses post-transition to ICD-10-CM. J Am Med Inform Assoc. 2015;22(3):730–7. doi:10.1093/jamia/ocu003.

    PubMed  PubMed Central  Google Scholar 

  14. Linden A, Samuels SJ. Using balance statistics to determine the optimal number of controls in matching studies. J Eval Clin Pract. 2013;19(5):968–75 (Epub 2013/08/06. Eng).

    PubMed  Google Scholar 

  15. Yarnold P, Soltysik RC. Optimal data analysis: a Guidebook with Software for Windows. Washington, DC: APA Books; 2005.

    Google Scholar 

  16. Yarnold P, Soltysik RC. Maximizing predictive accuracy. Chicago: ODA Books; 2016.

    Google Scholar 

  17. Lyman JA, Scully K, Harrison JH Jr. The development of health care data warehouses to support data mining. Clin Lab Med. 2008;28(1):55–71 (Epub 2008/01/16. Eng).

    Article  PubMed  Google Scholar 

  18. Klaiman T, Pracilio V, Kimberly L, Cecil K, Legnini M. Leveraging effective clinical registries to advance medical care quality and transparency. Popul Health Manag. 2014;17(2):127–33 (Epub 2013/10/25. Eng).

    Article  PubMed  Google Scholar 

  19. Dusetzina SB, Tyree S, Meyer AM, Meyer A, Green L, Carpenter WR. Linking Data for Health Services Research: A Framework and Instructional Guide. Rockville. 2014.

  20. Services UDoHH. Medical Expenditure Panel Survey. 2017. Accessed March 24, 2017.

  21. Mirel LM, SR. Enhancing the Medical Expenditure Panel Survey through Data Linkages. Rockville, MD, 2014. Accessed March 24, 2017.

  22. Ludvigsson JF, Haberg SE, Knudsen GP, Lafolie P, Zoega H, Sarkkola C, et al. Ethical aspects of registry-based research in the Nordic countries. Clin Epidemiol. 2015;7:491–508 (Epub 2015/12/10. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Hall BL, Hamilton BH, Richards K, Bilimoria KY, Cohen ME, Ko CY. Does surgical quality improve in the American College of Surgeons National Surgical Quality Improvement Program: an evaluation of all participating hospitals. Ann Surg. 2009;250(3):363–76 (Epub 2009/08/01. Eng).

    PubMed  Google Scholar 

  24. Cook JA, Collins GS. The rise of big clinical databases. Br J Surg. 2015;102(2):e93–101 (Epub 2015/01/30. Eng).

    Article  CAS  PubMed  Google Scholar 

  25. ISPOR Digest of International Databases Working Group. Uses, Applications and Future Directions of the ISPOR Digest of International Databases. ISPOR 19th Annual European Congress Vienna, Austria. 2016. Available from: Updated November 1, 2016.

  26. Rodwin MA. The case for public ownership of patient data. JAMA. 2009;302(1):86–8 (Epub 2009/07/02. Eng).

    Article  CAS  PubMed  Google Scholar 

  27. Kostkova P, Brewer H, de Lusignan S, Fottrell E, Goldacre B, Hart G, et al. Who Owns the Data? Open Data for Healthcare. Front Public Health. 2016;4:7 (Epub 2016/03/01. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  28. McGraw D. Building public trust in uses of Health Insurance Portability and Accountability Act de-identified data. J Am Med Inform Assoc JAMIA. 2013;20(1):29–34 (Epub 2012/06/28. Eng).

    Article  PubMed  Google Scholar 

  29. El Emam K. Methods for the de-identification of electronic health records for genomic research. Genome Med. 2011;3(4):25 (Epub 2011/05/06. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Blobel B. Paradigm changes of health systems towards ubiquitous, personalized health lead to paradigm changes of the security and privacy ecosystems. Int J Biomed Healthc. 2015;3(1):75–81.

    Google Scholar 

  31. Bradley CJ, Penberthy L, Devers KJ, Holden DJ. Health services research and data linkages: issues, methods, and directions for the future. Health Serv Res. 2010;45(5 Pt 2):1468–88 (Epub 2010/11/09. Eng).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Filkins BL, Kim JY, Roberts B, Armstrong W, Miller MA, Hultner ML, et al. Privacy and security in the era of digital health: what should translational researchers know and do about it? Am J Transl Res. 2016;8(3):1560–80 (Epub 2016/05/18. Eng).

    PubMed  PubMed Central  Google Scholar 

  33. Brown JS, Holmes JH, Shah K, Hall K, Lazarus R, Platt R. Distributed health data networks: a practical and preferred approach to multi-institutional evaluations of comparative effectiveness, safety, and quality of care. Med Care. 2010;48(6 Suppl):S45–51 (Epub 2010/05/18. Eng).

    Article  PubMed  Google Scholar 

  34. Hripcsak G, Ryan PB, Duke JD, Shah NH, Park RW, Huser V, et al. Characterizing treatment pathways at scale using the OHDSI network. Proc Natl Acad Sci USA. 2016;113(27):7329–36 (Epub 2016/06/09. Eng).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We would like to express thanks to Marie McWhirter from the University of Illinois, College of Medicine at Peoria for her administrative assistance. All authors contributed significantly to the drafting and revision of the manuscript and approved the final version.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Carl V. Asche.

Ethics declarations

Conflict of interest

Carl Asche, Brian Seal, Kristijan Kahler, Elisabeth Oehrlein, and Meredith Baumgartner have no conflicts of interest to declare.


There was no funding provided for this manuscript.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Asche, C.V., Seal, B., Kahler, K.H. et al. Evaluation of Healthcare Interventions and Big Data: Review of Associated Data Issues. PharmacoEconomics 35, 759–765 (2017).

Download citation

  • Published:

  • Issue Date:

  • DOI: