Drug Safety

, Volume 36, Supplement 1, pp 49–58

Managing Data Quality for a Drug Safety Surveillance System

  • Abraham G. Hartzema
  • Christian G. Reich
  • Patrick B. Ryan
  • Paul E. Stang
  • David Madigan
  • Emily Welebob
  • J. Marc Overhage
Original Research Article

DOI: 10.1007/s40264-013-0098-7

Cite this article as:
Hartzema, A.G., Reich, C.G., Ryan, P.B. et al. Drug Saf (2013) 36(Suppl 1): 49. doi:10.1007/s40264-013-0098-7

Abstract

Objective

The objective of this study is to present a data quality assurance program for disparate data sources loaded into a Common Data Model, highlight data quality issues identified and resolutions implemented.

Background

The Observational Medical Outcomes Partnership is conducting methodological research to develop a system to monitor drug safety. Standard processes and tools are needed to ensure continuous data quality across a network of disparate databases, and to ensure that procedures used to extract-transform-load (ETL) processes maintain data integrity. Currently, there is no consensus or standard approach to evaluate the quality of the source data, or ETL procedures.

Methods

We propose a framework for a comprehensive process to ensure data quality throughout the steps used to process and analyze the data. The approach used to manage data anomalies includes: (1) characterization of data sources; (2) detection of data anomalies; (3) determining the cause of data anomalies; and (4) remediation.

Findings

Data anomalies included incomplete raw dataset: no race or year of birth recorded. Implausible data: year of birth exceeding current year, observation period end date precedes start date, suspicious data frequencies and proportions outside normal range. Examples of errors found in the ETL process were zip codes incorrectly loaded, drug quantities rounded, drug exposure length incorrectly calculated, and condition length incorrectly programmed.

Conclusions

Complete and reliable observational data are difficult to obtain, data quality assurance processes need to be continuous as data is regularly updated; consequently, processes to assess data quality should be ongoing and transparent.

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Abraham G. Hartzema
    • 1
    • 2
  • Christian G. Reich
    • 2
    • 3
  • Patrick B. Ryan
    • 2
    • 4
  • Paul E. Stang
    • 2
    • 4
  • David Madigan
    • 2
    • 5
  • Emily Welebob
    • 2
  • J. Marc Overhage
    • 2
    • 6
  1. 1.College of PharmacyUniversity of FloridaGainesvilleUSA
  2. 2.Observational Medical Outcomes Partnership, Foundation for the National Institutes of HealthBethesdaUSA
  3. 3.AstraZenecaWalthamUSA
  4. 4.Janssen Research and Development LLCTitusvilleUSA
  5. 5.Department of StatisticsColumbia UniversityNew YorkUSA
  6. 6.Siemens Health ServicesMalvernUSA

Personalised recommendations