Drug Safety

, Volume 36, Supplement 1, pp 49–58 | Cite as

Managing Data Quality for a Drug Safety Surveillance System

  • Abraham G. Hartzema
  • Christian G. Reich
  • Patrick B. Ryan
  • Paul E. Stang
  • David Madigan
  • Emily Welebob
  • J. Marc Overhage
Original Research Article

Abstract

Objective

The objective of this study is to present a data quality assurance program for disparate data sources loaded into a Common Data Model, highlight data quality issues identified and resolutions implemented.

Background

The Observational Medical Outcomes Partnership is conducting methodological research to develop a system to monitor drug safety. Standard processes and tools are needed to ensure continuous data quality across a network of disparate databases, and to ensure that procedures used to extract-transform-load (ETL) processes maintain data integrity. Currently, there is no consensus or standard approach to evaluate the quality of the source data, or ETL procedures.

Methods

We propose a framework for a comprehensive process to ensure data quality throughout the steps used to process and analyze the data. The approach used to manage data anomalies includes: (1) characterization of data sources; (2) detection of data anomalies; (3) determining the cause of data anomalies; and (4) remediation.

Findings

Data anomalies included incomplete raw dataset: no race or year of birth recorded. Implausible data: year of birth exceeding current year, observation period end date precedes start date, suspicious data frequencies and proportions outside normal range. Examples of errors found in the ETL process were zip codes incorrectly loaded, drug quantities rounded, drug exposure length incorrectly calculated, and condition length incorrectly programmed.

Conclusions

Complete and reliable observational data are difficult to obtain, data quality assurance processes need to be continuous as data is regularly updated; consequently, processes to assess data quality should be ongoing and transparent.

References

  1. 1.
    Stang PE, Ryan PB, Racoosin JA, Overhage JM, Hartzema AG, Reich C, et al. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Int Med. 2010;153(9):600–6.PubMedCrossRefGoogle Scholar
  2. 2.
    Coloma PM, Trifirò G, Schuemie MJ, Gini R, Herings R, Hippisley-Cox J, et al. Electronic healthcare databases for active drug safety surveillance: is there enough leverage? Pharmacoepidemiol Drug Saf. 2012;21(6):611–21.PubMedCrossRefGoogle Scholar
  3. 3.
    FDA. The Sentinel Initiative: A National Strategy for Monitoring Medical Product Safety. May 2008 [cited 2012 September 15]. http://www.fda.gov/Safety/FDAsSentinelInitiative/ucm089474.htm.
  4. 4.
    Donahue JG, Weiss ST, Goetsch MA, Livingston JM, Greineder DK, Platt R. Assessment of asthma using automated and full-text medical records. J Asthma. 1997;34(4):273–81.PubMedCrossRefGoogle Scholar
  5. 5.
    Hennessy S, Leonard CE, Freeman CP, Deo R, Newcomb C, Kimmel SE, et al. Validation of diagnostic codes for outpatient-originating sudden cardiac death and ventricular arrhythmia in Medicaid and Medicare claims data. Pharmacoepidemiol Drug Saf. 2010;19(6):555–62.PubMedCrossRefGoogle Scholar
  6. 6.
    Lee DS, Donovan L, Austin PC, Gong Y, Liu PP, Rouleau JL, et al. Comparison of coding of heart failure and comorbidities in administrative and clinical data for use in outcomes research. Med Care. 2005;43(2):182–8.PubMedCrossRefGoogle Scholar
  7. 7.
    Miller DR, Oliveria SA, Berlowitz DR, Fincke BG, Stang P, Lillienfeld DE. Angioedema incidence in US veterans initiating angiotensin-converting enzyme inhibitors. Hypertension. 2008;51(6):1624–30.PubMedCrossRefGoogle Scholar
  8. 8.
    So L, Evans D, Quan H. ICD-10 coding algorithms for defining comorbidities of acute myocardial infarction. BMC Health Serv Res. 2006;6:161.PubMedCrossRefGoogle Scholar
  9. 9.
    Varas-Lorenzo C, Castellsague J, Stang MR, Tomas L, Aguado J, Perez-Gutthann S. Positive predictive value of ICD-9 codes 410 and 411 in the identification of cases of acute coronary syndromes in the Saskatchewan Hospital automated database. Pharmacoepidemiol Drug Saf. 2008;17(8):842–52.PubMedCrossRefGoogle Scholar
  10. 10.
    Software Engineering—Product Quality—Part 1: Quality Model. Geneva, Switzerland: International Organization for Standardization; 2001.Google Scholar
  11. 11.
    Kan SH. Metrics and models in software quality engineering. 2nd ed. Boston: Addison-Wesley; 2002.Google Scholar
  12. 12.
    Glass RL. Building quality software. Upper Saddle River: Prentice-Hall; 1992.Google Scholar
  13. 13.
    Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research. Med Care. 2012;50(Suppl):S21–9.PubMedCrossRefGoogle Scholar
  14. 14.
    Wang RY, Storey VC, Firth CP. A framework for analysis of data quality research. IEEE Trans Knowl Data Eng. 1995;7(4):623–40.CrossRefGoogle Scholar
  15. 15.
    Pipino LL, Lee YW, Wang RY. Data quality assessment. Commun ACM. 2002;45(4):211–8.CrossRefGoogle Scholar
  16. 16.
    Batini C, Cappiello C, Francalanci C, Maurino A. Methodologies for data quality assessment and improvement. ACM Comput Surv. 2009;41(3):1–52.CrossRefGoogle Scholar
  17. 17.
    Guidance for Industry E6 Good Clinical Practice: Consolidated Guidance. 1996 [cited Oct 5, 2010]. http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm073122.pdf.
  18. 18.
    Hennessy S. Use of health care databases in pharmacoepidemiology. Basic Clin Pharmacol Toxicol. 2006;98(3):311–3.PubMedCrossRefGoogle Scholar
  19. 19.
    Kahn MG, Batson D, Schilling LM. Data model considerations for clinical effectiveness researchers. Med Care. 2012;50(Suppl):S60–7.PubMedCrossRefGoogle Scholar
  20. 20.
    Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. JAMIA. 2012;19(1):54–60.PubMedGoogle Scholar
  21. 21.
    OMOP. Common Data Model (version 4); 2012 [cited 2012 November 12]. http://omop.org/CDMvocabV4.
  22. 22.
    Reisinger SJ, Ryan PB, O’Hara DJ, Powell GE, Painter JL, Pattishall EN, et al. Development and evaluation of a common data model enabling active drug safety surveillance using disparate healthcare databases. J Am Med Inform Assoc. 2010;17(6):652–62.PubMedCrossRefGoogle Scholar
  23. 23.
    Li L. A conditional sequential sampling procedure for drug safety surveillance. Stat Med. 2009;28(25):3124–38.PubMedCrossRefGoogle Scholar
  24. 24.
    Informatics for Integrating Biology and the Bedside (i2b2) Software. [cited November 18, 2010]. https://www.i2b2.org.
  25. 25.
    Leonard CE, Haynes K, Localio AR, Hennessy S, Tjia J, Cohen A, et al. Diagnostic E-codes for commonly used, narrow therapeutic index medications poorly predict adverse drug events. J Clin Epidemiol. 2008;61(6):561–71.PubMedCrossRefGoogle Scholar
  26. 26.
    Guideline on General Principles of Process Validation. 1987 [cited Cot 5, 2010]. http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm124720.htm.
  27. 27.
    General Principles of Software Validation: Guidance for Industry and FDA Staff. 2002 [cited Oct 5, 2010]. http://www.fda.gov/RegulatoryInformation/Guidances/ucm126954.htm.
  28. 28.
    Jick SS, Kaye JA, Vasilakis-Scaramozza C, Garcia Rodriguez LA, Ruigomez A, Meier CR, et al. Validity of the general practice research database. Pharmacotherapy. 2003;23(5):686–9.PubMedCrossRefGoogle Scholar
  29. 29.
    Khan NF, Harrison SE, Rose PW. Validity of diagnostic coding within the General Practice Research Database: a systematic review. Br J Gen Pract. 2010;60(572):e128–36.PubMedCrossRefGoogle Scholar
  30. 30.
    Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: a systematic review. Br J Clin Pharmacol. 2010;69(1):4–14.PubMedCrossRefGoogle Scholar
  31. 31.
    Garcia Rodriguez LA, Perez Gutthann S. Use of the UK General Practice Research Database for pharmacoepidemiology. Br J Clin Pharmacol. 1998;45(5):419–25.PubMedCrossRefGoogle Scholar
  32. 32.
    Pladevall M, Goff DC, Nichaman MZ, Chan F, Ramsey D, Ortiz C, et al. An assessment of the validity of ICD Code 410 to identify hospital admissions for myocardial infarction: the Corpus Christi Heart Project. Int J Epidemiol. 1996;25(5):948–52.PubMedCrossRefGoogle Scholar
  33. 33.
    Wahl PM, Rodgers K, Schneeweiss S, Gage BF, Butler J, Wilmer C, et al. Validation of claims-based diagnostic and procedure codes for cardiovascular and gastrointestinal serious adverse events in a commercially-insured population. Pharmacoepidemiol Drug Saf. 2010;19(6):596–603.PubMedCrossRefGoogle Scholar
  34. 34.
    Harrold LR, Saag KG, Yood RA, Mikuls TR, Andrade SE, Fouayzi H, et al. Validity of gout diagnoses in administrative data. Arthritis Rheum. 2007;57(1):103–8.PubMedCrossRefGoogle Scholar
  35. 35.
    Lewis JD, Schinnar R, Bilker WB, Wang X, Strom BL. Validation studies of the health improvement network (THIN) database for pharmacoepidemiology research. Pharmacoepidemiol Drug Saf. 2007;16(4):393–401.PubMedCrossRefGoogle Scholar
  36. 36.
    Strom BL. Data validity issues in using claims data. Pharmacoepidemiol Drug Saf. 2001;10(5):389–92.PubMedCrossRefGoogle Scholar
  37. 37.
    Jinjuvadia K, Kwan W, Fontana RJ. Searching for a needle in a haystack: use of ICD-9-CM codes in drug-induced liver injury. Am J Gastroenterol. 2007;102(11):2437–43.PubMedCrossRefGoogle Scholar
  38. 38.
    Birman-Deych E, Waterman AD, Yan Y, Nilasena DS, Radford MJ, Gage BF. Accuracy of ICD-9-CM codes for identifying cardiovascular and stroke risk factors. Med Care. 2005;43(5):480–5.PubMedCrossRefGoogle Scholar
  39. 39.
    Lain SJ, Roberts CL, Hadfield RM, Bell JC, Morris JM. How accurate is the reporting of obstetric haemorrhage in hospital discharge data? A validation study. Aust N Z J Obstet Gynaecol. 2008;48(5):481–4.PubMedCrossRefGoogle Scholar
  40. 40.
    Lopushinsky SR, Covarrubia KA, Rabeneck L, Austin PC, Urbach DR. Accuracy of administrative health data for the diagnosis of upper gastrointestinal diseases. Surg Endosc. 2007;21(10):1733–7.PubMedCrossRefGoogle Scholar
  41. 41.
    Austin PC, Daly PA, Tu JV. A multicenter study of the coding accuracy of hospital discharge administrative data for patients admitted to cardiac care units in Ontario. Am Heart J. 2002;144(2):290–6.PubMedCrossRefGoogle Scholar
  42. 42.
    Liangos O, Wald R, O’Bell JW, Price L, Pereira BJ, Jaber BL. Epidemiology and outcomes of acute renal failure in hospitalized patients: a national survey. Clin J Am Soc Nephrol. 2006;1(1):43–51.PubMedCrossRefGoogle Scholar
  43. 43.
    Waikar SS, Wald R, Chertow GM, Curhan GC, Winkelmayer WC, Liangos O, et al. Validity of international classification of diseases, ninth revision, clinical modification codes for acute renal failure. J Am Soc Nephrol. 2006;17(6):1688–94.PubMedCrossRefGoogle Scholar
  44. 44.
    Hennessy S, Leonard CE, Palumbo CM, Newcomb C, Bilker WB. Quality of Medicaid and Medicare data obtained through Centers for Medicare and Medicaid Services (CMS). Med Care. 2007;45(12):1216–20.PubMedCrossRefGoogle Scholar
  45. 45.
    Butani AL, Sherwood N, Adams K, et al. The VDW vital signs file: strengths, issues and recommendations for the future. Poster presented at the 15th Annual HMO Research Network Conference, Danville; 2009.Google Scholar
  46. 46.
    Hornbrook MC, Hitz P, Pardee R, et al. The VDW demographic and enrollment files: strengths, issues, and recommendations for the Future. Presented at the 15th annual HMO research network conference, Danville; 2009.Google Scholar
  47. 47.
    Moore KM, Cheetham C, Dublin S, et al. VDW pharmacy file: strengths, weaknesses and recommendations. Poster presented at the 15th annual HMO research network conference, Danville; 2009.Google Scholar
  48. 48.
    Saylor G, Ellis JL, Raebel MA, et al. Formalization of the laboratory result content area of the VDW. Poster presented at the 14th Annual HMO research network conference, Minneapolis; 2008.Google Scholar
  49. 49.
    OMOP. Observational Source Characteristics Analysis Report (OSCAR) Design Specification and Feasibility Assessment. 2010 [cited 2012 June 18]. http://omop.org/OSCAR.
  50. 50.
    OMOP. NATHAN—Utility of Natural History Information; 2010 [cited 2012 June 18]. http://omop.org/NATHAN.
  51. 51.
    OMOP Implementation 2011 [cited 2012 December 12]. http://omop.org/OMOPimplementation.

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Abraham G. Hartzema
    • 1
    • 2
  • Christian G. Reich
    • 2
    • 3
  • Patrick B. Ryan
    • 2
    • 4
  • Paul E. Stang
    • 2
    • 4
  • David Madigan
    • 2
    • 5
  • Emily Welebob
    • 2
  • J. Marc Overhage
    • 2
    • 6
  1. 1.College of PharmacyUniversity of FloridaGainesvilleUSA
  2. 2.Observational Medical Outcomes Partnership, Foundation for the National Institutes of HealthBethesdaUSA
  3. 3.AstraZenecaWalthamUSA
  4. 4.Janssen Research and Development LLCTitusvilleUSA
  5. 5.Department of StatisticsColumbia UniversityNew YorkUSA
  6. 6.Siemens Health ServicesMalvernUSA

Personalised recommendations