Skip to main content

The Impact of Drug and Outcome Prevalence on the Feasibility and Performance of Analytical Methods for a Risk Identification and Analysis System



A systematic risk identification system has the potential to study all marketed drugs. However, the rates of drug exposure and outcome occurrences in observational databases, the database size and the desired risk detection threshold determine the power and therefore limit the feasibility of the application of appropriate analytical methods. Drugs vary dramatically for these parameters because of their prevalence of indication, cost, time on the market, payer formularies, market pressures and clinical guidelines.


Evaluate (i) the feasibility of a risk identification system based on commercially available observational databases, (ii) the range of drugs that can be studied for certain outcomes, (iii) the influence of underpowered drug-outcome pairs on the performance of analytical methods estimating the strength of their association and (iv) the time required from the introduction of a new drug to accumulate sufficient data for signal detection.


As part of the Observational Medical Outcomes Partnership experiment, we used data from commercially available observational databases and calculated the minimal detectable relative risk of all pairs of marketed drugs and eight health outcomes of interest. We then studied an array of analytical methods for their ability to distinguish between pre-determined positive and negative drug-outcome test pairs. The positive controls contained active ingredients with evidence of a positive association with the outcome, and the negative controls had no such evidence. As a performance measure we used the area under the receiver operator characteristics curve (AUC). We compared the AUC of methods using all test pairs or only pairs sufficiently powered for detection of a relative risk of 1.25. Finally, we studied all drugs introduced to the market in 2003–2008 and determined the time required to achieve the same minimal detectable relative risk threshold.


The performance of methods improved after restricting them to fully powered drug-outcome pairs. The availability of drug-outcome pairs with sufficient power to detect a relative risk of 1.25 varies enormously among outcomes. Depending on the market uptake, drugs can generate relevant signals in the first month after approval, or never reach sufficient power.


The incidence of drugs and important outcomes determines sample size and method performance in estimating drug-outcome associations. Careful consideration is therefore necessary to choose databases and outcome definitions, particularly for newly introduced drugs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. Takahashi Y, Nishida Y, Asai S. Utilization of health care databases for pharmacoepidemiology. Eur J Clin Pharmacol. 2012;68(2):123–9.

    PubMed  Article  CAS  Google Scholar 

  2. Silverman SL. From randomized controlled trials to observational studies. Am J Med. 2009;122(2):114–20.

    PubMed  Article  Google Scholar 

  3. Vandenbroucke JP. When are observational studies as credible as randomised trials? Lancet. 2004;363(9422):1728–31.

    PubMed  Article  Google Scholar 

  4. Temple R. Meta-analysis and epidemiologic studies in drug development and postmarketing surveillance. JAMA. 1999;281(9):841–4.

    PubMed  Article  CAS  Google Scholar 

  5. Taubes G. Epidemiology faces its limits. Science. 1995;269(5221):164–9.

    PubMed  Article  CAS  Google Scholar 

  6. Coloma PM, Trifirò G, Schuemie MJ, Gini R, Herings R, Hippisley-Cox J, et al. Electronic healthcare databases for active drug safety surveillance: is there enough leverage? Pharmacoepidemiol Drug Saf. 2012;21(6):611–21.

    PubMed  Article  Google Scholar 

  7. Ryan PB, Madigan D, Stang PE, Marc Overhage J, Racoosin JA, Hartzema AG. Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Stat Med. 2012;31(30):4401–15.

    PubMed  Article  Google Scholar 

  8. Ryan PB, Stang PE, Overhage JM, Suchard MA, Hartzema AG, DuMouchel W, et al. A comparison of the empirical performance of methods for a risk identification system. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0108-9.

  9. Armstrong B. A simple estimator of minimum detectable relative risk, sample size, or power in cohort studies. Am J Epidemiol. 1987;126(2):356–8.

    PubMed  Article  CAS  Google Scholar 

  10. Beaumont JJ, Breslow NE. Power considerations in epidemiologic studies of vinyl chloride workers. Am J Epidemiol. 1981;114(5):725–34.

    PubMed  CAS  Google Scholar 

  11. Overhage JM, Ryan PB, Schuemie MJ, Stang PE. Desideratum for Evidence Based Epidemiology. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0102-2.

  12. Ryan PB, Schuemie MJ, Welebob E, Duke J, Valentine S, Hartzema AG. Defining a reference set to support methodological research in drug safety. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0097-8.

  13. Cantor SB, Kattan MW. Determining the area under the ROC curve for a binary diagnostic test. Med Decis Making. 2000;20(4):468–70.

    PubMed  Article  CAS  Google Scholar 

  14. FDA Drug Safety Communication: risk of progressive multifocal leukoencephalopathy (PML) with the use of Tysabri (natalizumab). 02-05-2010 [cited 2013 January 28].

  15. Musonda P, Farrington CP, Whitaker HJ. Sample sizes for self-controlled case series studies. Stat Med. 2006;25:2618–31.

    PubMed  Article  Google Scholar 

  16. Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics. 1983;39(2):499–503.

    PubMed  Article  CAS  Google Scholar 

  17. Reich CG, Ryan PB, Schuemie MJ. Alternative outcome definitions and their effect on the performance of methods for observational outcome studies. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0111-1.

Download references


The Observational Medical Outcomes Partnership is funded by the Foundation for the National Institutes of Health (FNIH) through generous contributions from the following: Abbott, Amgen, AstraZeneca, Bayer Healthcare Pharmaceuticals, Biogen Idec, Bristol-Myers Squibb, Eli Lilly & Company, GlaxoSmithKline, Janssen Research and Development, Lundbeck, Inc., Merck & Co., Novartis Pharmaceuticals, Pfizer, Pharmaceutical Research Manufacturers of America (PhRMA), Roche, Sanofi-Aventis, Schering-Plough, and Takeda. Dr. Reich is an employee of AstraZeneca. Dr. Ryan is an employee of Janssen Research and Development. Dr. Suchard received a grant previously from the FNIH.

This article was published in a supplement sponsored by the Foundation for the National Institutes of Health (FNIH). The supplement was guest edited by Stephen J.W. Evans. It was peer reviewed by Olaf H. Klungel who received a small honorarium to cover out-of-pocket expenses. S.J.W.E has received travel funding from the FNIH to travel to the OMOP symposium and received a fee from FNIH for the review of a protocol for OMOP. O.H.K has received funding for the IMI-PROTECT project.from the Innovative Medicines Initiative Joint Undertaking ( under Grant Agreement no 115004, resources of which are composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Christian G. Reich.

Additional information

The OMOP research used data from Truven Health Analytics (formerly the Health Business of Thomson Reuters), and includes MarketScan® Research Databases, represented with MarketScan Lab Supplemental (MSLR, 1.2 m persons), MarketScan Medicare Supplemental Beneficiaries (MDCR, 4.6 m persons), MarketScan Multi-State Medicaid (MDCD, 10.8 m persons), MarketScan Commercial Claims and Encounters (CCAE, 46.5 m persons). Data also provided by Quintiles® Practice Research Database (formerly General Electric’s Electronic Health Record, 11.2 m persons) database. GE is an electronic health record database while the other four databases contain administrative claims data.

Appendix: Definitions of the Health Outcomes of Interest Studied

Appendix: Definitions of the Health Outcomes of Interest Studied

HOI # Definition
Aplastic anemia 2 Occurrence of at least one diagnostic code ICD-9-CM:
• 284.0* Constitutional aplastic anemiaa
• 284.8* Other specified aplastic anemias
• 284.9 Aplastic anemia, unspecified AND
Occurrence of at least one diagnostic procedure code for bone marrow aspiration or biopsy within 60 days prior to the diagnostic code
Acute kidney injury 1 Occurrence of at least one diagnostic code ICD-9-CM:
• 584* Acute renal failurea
Acute liver injury 1 Occurrence of at least one diagnostic code ICD-9-CM:
• 277.4 Disorders of bilirubin excretion
• 570* Acute and subacute necrosis of the livera
• 572.2 Hepatic coma (hepatorenal syndrome)
• 572.4* Hepatorenal syndromea
• 573* Other disorders of the liver, including chemical or drug induceda
• 576.8 Other specified disorders of biliary tract
• 782.4 Jaundice, unspecified, not of newborn
• 789.1* Hepatomegalya
• 790.4* Nonspecific elevation of transaminase or lactic dehydrogenase levelsa
• 794.8* Abnormal liver function test resultsa
Acute myocardial infarction 1 Occurrence of at least one broad diagnostic code ICD-9-CM:
• 410* Acute myocardial infarctiona
• 411.1 Intermediate coronary syndrome
• 411.8 Other acute coronary occlusion
• 413.9 Other and unspecified angina pectoris on or during hospitalization
Bleeding 3 Occurrence of at least one diagnostic codeb
Mortality after myocardial infarction 3 Occurrence of at least one narrow diagnostic code ICD-9-CM:
• 410* Acute myocardial infarctiona AND
Occurrence of at least one diagnostic procedure code within 30 days prior to diagnostic codec OR
Occurrence of at least one therapeutic procedure code within 60 days after the diagnostic codec AND
Occurrence of death after the diagnostic code as one of the following:
• Occurrence of one condition code indicating death ICD-9-CM:
• 798.0 Sudden death, cause unknown
• 798.1 Instantaneous death
• 798.2 Death occurring in less than 24 h from onset of symptoms, not otherwise explained
• 798.9 Unattended death
• Occurrence of a diagnostic code ICD-9-CM:
• 427.5 Cardiac arrest AND OBSERVATION_PERIOD_END_DATE at the date of the diagnostic code
Progressive multifocal leukoencephalopathy 1 Occurrence of at least one diagnostic code ICD-9-CM:
• 046.3 Progressive multifocal leukoencephalopathy
Upper GI Ulcer 1 Occurrence of at least one diagnostic coded AND hospitalization at date of diagnostic code
  1. aAn asterisk indicates a wildcard, i.e. any code with or without additional digits is included in the definition
  2. bA detailed list of all codes are available at
  3. cA detailed list of all codes are available at
  4. dA detailed list of all codes are available at

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Reich, C.G., Ryan, P.B. & Suchard, M.A. The Impact of Drug and Outcome Prevalence on the Feasibility and Performance of Analytical Methods for a Risk Identification and Analysis System. Drug Saf 36, 195–204 (2013).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Bortezomib
  • Area Under This Curve
  • Natalizumab
  • Tipranavir
  • Gemifloxacin