Skip to main content

Desideratum for Evidence Based Epidemiology



There is great variation in choices of method and specific analytical details in epidemiological studies, resulting in widely varying results even when studying the same drug and outcome in the same database. Not only does this variation undermine the credibility of the research but it limits our ability to improve the methods.


In order to evaluate the performance of methods and analysis choices we used standard references and a literature review to identify 164 positive controls (drug–outcome pairs believed to represent true adverse drug reactions), and 234 negative controls (drug–outcome pairs for which we have confidence there is no direct causal relationship). We tested 3,748 unique analyses (methods in combination with specific analysis choices) that represent the full range of approaches to adjusting for confounding in five large observational datasets on these controls. We also evaluated the impact of increasingly specific outcome definitions, and performed a replication study in six additional datasets. We characterized the performance of each method using the area under the receiver operator curve (AUC), bias, and coverage probability. In addition, we developed simulated datasets that closely matched the characteristics of the observational datasets into which we inserted data consistent with known drug–outcome relationships in order to measure the accuracy of estimates generated by the analyses.


We expect the results of this systematic, empirical evaluation of the performance of these analyses across a moderate range of outcomes and databases to provide important insights into the methods used in epidemiological studies and to increase the consistency with which methods are applied, thereby increasing the confidence in results and our ability to systematically improve our approaches.

This is a preview of subscription content, access via your institution.


  1. Popper K. Science: conjectures and refutations. In: Mace CA, editor. A lecture given at Peterhouse, Cambridge, in Summer 1953, as part of a course on developments and trends in contemporary British philosophy, organized by the British Council; originally published under the title ‘Philosophy of Science: a Personal Report’ in British Philosophy in Mid-Century, 1957.

  2. Young SS, Karr A. Deming, data and observational studies. Significance. 2011;8(3):116–20.

    Google Scholar 

  3. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342(25):1887–92.

    PubMed  Article  CAS  Google Scholar 

  4. Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med. 2000;342(25):1878–86.

    PubMed  Article  CAS  Google Scholar 

  5. Danaei G, Tavakkoli M, Hernán MA. Bias in observational studies of prevalent users: lessons for comparative effectiveness research from a meta-analysis of statins. Am J Epidemiol. 2012;175(4):250–62.

    PubMed  Article  Google Scholar 

  6. MacLehose RR, Reeves BC, Harvey IM, Sheldon TA, Russell IT, Black AM. A systematic review of comparisons of effect sizes derived from randomised and non-randomised studies. Health Technol Assess. 2000;4(34):1–154.

    PubMed  CAS  Google Scholar 

  7. Furlan AD, Tomlinson G, Jadad AA, Bombardier C. Methodological quality and homogeneity influenced agreement between randomized trials and nonrandomized studies of the same intervention for back pain. J Clin Epidemiol. 2008;61(3):209–31.

    PubMed  Article  Google Scholar 

  8. Abraham NS, Byrne CJ, Young JM, Solomon MJ. Meta-analysis of well-designed nonrandomized comparative studies of surgical procedures is as good as randomized controlled trials. J Clin Epidemiol. 2010;63(3):238–45.

    PubMed  Article  Google Scholar 

  9. Suissa S. Randomized trials built on sand: examples from COPD, hormone therapy, and cancer. Rambam Maimonides Med J. 2012;3(3):e0014.

    PubMed  Article  Google Scholar 

  10. Tuma RS. Statisticians set sights on observational studies. J Natl Cancer Inst. 2007;99(9):664–5, 8.

    Google Scholar 

  11. Varas-Lorenzo C, Garcia-Rodriguez LA, Perez-Gutthann S, Duque-Oliart A. Hormone replacement therapy and incidence of acute myocardial infarction. A population-based nested case–control study. Circulation. 2000;101(22):2572–8.

    PubMed  Article  CAS  Google Scholar 

  12. Grodstein F, Stampfer MJ, Manson JE, Colditz GA, Willett WC, Rosner B, et al. Postmenopausal estrogen and progestin use and the risk of cardiovascular disease. N Engl J Med. 1996;335(7):453–61.

    PubMed  Article  CAS  Google Scholar 

  13. Wilson PW, Garrison RJ, Castelli WP. Postmenopausal estrogen use, cigarette smoking, and cardiovascular morbidity in women over 50. The Framingham Study. N Engl J Med. 1985;313(17):1038–43.

    PubMed  Article  CAS  Google Scholar 

  14. Manson JE, Hsia J, Johnson KC, Rossouw JE, Assaf AR, Lasser NL, et al. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med. 2003;349(6):523–34.

    PubMed  Article  CAS  Google Scholar 

  15. Hernán MA, Robins JM, García Rodríguez LA. Discussion on “Statistical Issues Arising in the Women’s Health Initiative”. Biometrics. 2005;61(4):922–30.

    Article  Google Scholar 

  16. Hernán MA, Alonso A, Logan R, Grodstein F, Michels KB, Willett WC, et al. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology. 2008;19(6):766–79.

    PubMed  Article  Google Scholar 

  17. Cardwell CR, Abnet CC, Cantwell MM, Murray LJ. Exposure to oral bisphosphonates and risk of esophageal cancer. JAMA J Am Med Assoc. 2010;304(6):657–63.

    Article  CAS  Google Scholar 

  18. Green J, Czanner G, Reeves G, Watson J, Wise L, Beral V. Oral bisphosphonates and risk of cancer of oesophagus, stomach, and colorectum: case-control analysis within a UK primary care cohort. BMJ. 2010;341:c4444.

    PubMed  Article  Google Scholar 

  19. Meier CR, Schlienger RG, Kraenzlin ME, Schlegel B, Jick H. HMG-CoA reductase inhibitors and the risk of fractures. JAMA J Am Med Assoc. 2000;283(24):3205–10.

    Article  CAS  Google Scholar 

  20. van Staa TP, Wegman S, de Vries F, Leufkens B, Cooper C. Use of statins and risk of fractures. JAMA J Am Med Assoc. 2001;285(14):1850–5.

    Article  Google Scholar 

  21. de Vries F, de Vries C, Cooper C, Leufkens B, van Staa TP. Reanalysis of two studies with contrasting results on the association between statin use and fracture risk: the General Practice Research Database. Int J Epidemiol. 2006;35(5):1301–8.

    PubMed  Article  Google Scholar 

  22. Ioannidis JP, Tzoulaki I. Minimal and null predictive effects for the most popular blood biomarkers of cardiovascular disease. Circ Res. 2012;110(5):658–62.

    PubMed  Article  CAS  Google Scholar 

  23. McDonald CJ. The evolution of Intel’s copy exactly! Technology transfer method. Intel Technol J. 1998;Q4:1–6.

    Google Scholar 

  24. Terwiesch C, Xu Y. The copy exactly ramp-up strategy: trading-off learning with process change. August 4, 2003, cited 2012 December 24.

  25. Rothwell PM. External validity of randomised controlled trials: “to whom do the results of this trial apply?”. Lancet. 2005;365(9453):82–93.

    Google Scholar 

  26. Rothwell PM. Factors that can affect the external validity of randomised controlled trials. PLoS Clin Trials. 2006;1(1):e9.

    PubMed  Article  Google Scholar 

  27. Stang PE, Ryan PB, Overhage JM, Schuemie MJ, Hartzema AG, Welebob E. Variation in choice of study design: findings from the epidemiology design decision inventory and evaluation (EDDIE) survey. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0103-1.

  28. The European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP). Guide on methodological standards in pharmacoepidemiology (revision 1). EMA/95098/2010. Cited 2013 January 23.

  29. Gagne JJ, Fireman B, Ryan PB, Maclure M, Gerhard T, Toh S, et al. Design considerations in an active medical product safety monitoring system. Pharmacoepidemiol Drug Saf. 2012;21(Suppl 1):32–40.

    PubMed  Article  Google Scholar 

  30. Gagne JJ, Nelson JC, Fireman B, Seeger JD, Toh D, Gerhard T, et al. Taxonomy for monitoring methods within a medical product safety surveillance system: year two report of the mini-sentinel taxonomy project workgroup (workgroup) 2012, cited 2012 October 29.

  31. Taleb NN. The Black Swan: the impact of the highly improbable. New York: Random House; 2010.

  32. Avorn J. In defense of pharmacoepidemiology—embracing the yin and yang of drug research. N Engl J Med. 2007;357(22):2219–21.

    PubMed  Article  CAS  Google Scholar 

  33. Madigan D, Ryan PB, Schuemie M, Stang PE, Overhage JM, Hartzema AG, et al. Evaluating the impact of database heterogeneity on observational study results. Am J Epidemiol. 2013;178(4):645–51.

    PubMed  Article  Google Scholar 

  34. Kuhn TS. The structure of scientific revolutions. 3rd ed. Chicago: University of Chicago Press; 1996.

    Book  Google Scholar 

  35. Suissa S. Time-related biases in pharmacoepidemiology. In: International Society of Pharmacoepidemiology mid-year meeting, Miami Beach, Florida, 2012.

  36. Prasad V, Jena AB. Prespecified falsification end points: can they validate true observational associations? JAMA J Am Med Assoc. 2013;309(3):241–2.

    Article  CAS  Google Scholar 

  37. Schuemie MJ, Ryan PB, DuMouchel W, Suchard MA, Madigan D. Interpreting observational studies: why empirical calibration is needed to correct p-values. Stat Med. 2013. doi:10.1002/sim.5925.

    PubMed  Google Scholar 

  38. Ryan PB, Madigan D, Stang PE, Marc Overhage J, Racoosin JA, Hartzema AG. Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Stat Med. 2012;31(30):4401–15.

    PubMed  Article  Google Scholar 

  39. Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inf Assoc JAMIA. 2012;19(1):54–60.

    Article  Google Scholar 

  40. Reich C, Ryan PB, Stang PE, Rocca M. Evaluation of alternative standardized terminologies for medical conditions within a network of observational healthcare databases. J Biomed Inf. 2012;45(4):689–96.

    Article  Google Scholar 

  41. Ryan PB, Schuemie MJ, Gruber S, Zorych I, Madigan D. Empirical performance of a new user cohort method: Lessons for developing a risk identification and analysis system. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0099-6.

  42. Madigan D, Schuemie MJ, Ryan PB. Empirical performance of the case–control method: lessons for developing a risk identification and analysis system. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0105-z.

  43. Suchard MA, Zorych I, Simpson SE, Schuemie MJ, Ryan PB, Madigan D. Empirical performance of the self-controlled case series design: lessons for developing a risk identification and analysis system. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0100-4.

  44. Ryan PB, Schuemie MJ, Madigan D. Empirical performance of a self-controlled cohort method: lessons for developing a risk identification and analysis system. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0101-3.

  45. Schuemie MJ, Madigan D, Ryan PB. Empirical performance of LGPS and LEOPARD: lessons for developing a risk identification and analysis system. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0107-x.

  46. Norén GN, Bergvall T, Ryan PB, Juhlin K, Schuemie MJ, Madigna D. Empirical performance of the calibrated self-controlled cohort analysis within temporal pattern discovery: lessons for developing a risk identification and analysis system. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0095-x.

  47. DuMouchel B, Ryan PB, Schuemie MJ, Madigan D. Evaluation of disproportionality safety signaling applied to healthcare databases. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0106-y.

  48. Coloma PM, Trifirò G, Schuemie MJ, Gini R, Herings R, Hippisley-Cox J, et al. Electronic healthcare databases for active drug safety surveillance: is there enough leverage? Pharmacoepidemiol Drug Saf. 2012;21(6):611–21.

    PubMed  Article  Google Scholar 

  49. Hartzema AG, Reich C, Ryan PB, Stang PE, Madigna D, Welebob E, et al. Managing data quality for a drug safety surveillance system. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0098-7.

  50. Hansen RA, Gray MD, Fox BI, Hollingsworth JC, Gao J, Zeng P. How well do various health outcome definitions used in observational studies identify cases that are consistent with expert opinion? Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0104-0.

  51. Reich C, Ryan PB, Schuemie MJ. Alternative outcome definitions and their effect on the performance of methods for observational outcome studies. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0111-1.

  52. Reich CG, Ryan PB, Suchard MA. The impact of drug and outcome prevalence on the feasibility and performance of analytical methods for a risk identification and analysis system. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0112-0.

  53. Trifirò G, Pariente A, Coloma PM, Kors JA, Polimeni G, Miremont-Salame G, et al. Data mining on electronic health record databases for signal detection in pharmacovigilance: which events to monitor? Pharmacoepidemiol Drug Saf. 2009;18(12):1176–84.

    PubMed  Article  Google Scholar 

  54. Ryan PB, Schuemie MJ, Welebob E, Duke J, Valentine S, Hartzema AG. Defining a reference set to support methodological research in drug safety. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0097-8.

  55. Schuemie MJ, Gini R, Coloma PM, Straatman H, Herings RMC, Pedersen L, et al. Replication of the OMOP experiment in Europe: evaluating methods for risk identification in electronic health record databases. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0109-8.

  56. Ryan PB, Schuemie MJ. Evaluating performance of risk identification methods through a large-scale simulation of observational data. Drug Saf (in this supplement issue). doi:10.1007/s40264-013-0110-2.

  57. Hand DJ. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn. 2009;77(1):103–23.

    Article  Google Scholar 

  58. Tisdale J, Miller D. Drug-induced diseases: prevention, detection, and management. 2nd ed. USA: American Society of Health-System Pharmacists; 2010.

Download references


The Observational Medical Outcomes Partnership is funded by the Foundation for the National Institutes of Health (FNIH) through generous contributions from the following: Abbott, Amgen Inc., AstraZeneca, Bayer Healthcare Pharmaceuticals, Inc., Biogen Idec, Bristol-Myers Squibb, Eli Lilly & Company, GlaxoSmithKline, Janssen Research and Development, Lundbeck, Inc., Merck & Co., Inc., Novartis Pharmaceuticals Corporation, Pfizer Inc, Pharmaceutical Research Manufacturers of America (PhRMA), Roche, Sanofi-aventis, Schering-Plough Corporation, and Takeda. Dr. Overhage is an employee of Siemens. Drs. Ryan and Stang are employees of Janssen Research and Development. Dr. Schuemie received a fellowship from the Office of Medical Policy, Center for Drug Evaluation and Research, Food and Drug Administration, and has become an employee of Janssen Research and Development since completing the work described here. Dr. Schuemie has previously received a grant from FNIH.

This article was published in a supplement sponsored by the Foundation for the National Institutes of Health (FNIH). The supplement was guest edited by Stephen J.W. Evans. It was peer reviewed by Olaf H. Klungel who received a small honorarium to cover out-of-pocket expenses. S.J.W.E has received travel funding from the FNIH to travel to the OMOP symposium and received a fee from FNIH for the review of a protocol for OMOP. O.H.K has received funding for the IMI-PROTECT project from the Innovative Medicines Initiative Joint Undertaking ( under Grant Agreement no 115004, resources of which are composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution.

Author information

Authors and Affiliations


Corresponding author

Correspondence to J. Marc Overhage.

Additional information

The OMOP research used data from Truven Health Analytics (formerly the Health Business of Thomson Reuters), and includes MarketScan® Research Databases, represented with MarketScan Lab Supplemental (MSLR, 1.2 m persons), MarketScan Medicare Supplemental Beneficiaries (MDCR, 4.6 m persons), MarketScan Multi-State Medicaid (MDCD, 10.8 m persons), MarketScan Commercial Claims and Encounters (CCAE, 46.5 m persons). Data also provided by Quintiles® Practice Research Database (formerly General Electric’s Electronic Health Record, 11.2 m persons) database. GE is an electronic health record database while the other four databases contain administrative claims data.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Overhage, J.M., Ryan, P.B., Schuemie, M.J. et al. Desideratum for Evidence Based Epidemiology. Drug Saf 36, 5–14 (2013).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Coverage Probability
  • Observational Dataset
  • Common Data Model
  • Observational Database
  • Observational Medical Outcome Partnership