Skip to main content

An intuitive review of methods for observational studies of comparative effectiveness


I use diagrams to illustrate the sources of potential selection bias in observational studies of comparative effectiveness. I adapt these diagrams for three hypothetical scenarios that clarify the strengths and weaknesses of two prominent methods used to account for potential selection bias: propensity scores and instrumental variables. After reviewing the fundamentals of how to apply each method, including new developments that make implementation easier, I refer to some recent studies that illustrate how choice of method can affect estimates. I conclude by emphasizing that many studies with apparently rich sources of data are nevertheless unlikely to produce unbiased estimates and that conceptual modeling can help identify these problems in advance.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  • ALLHAT Officers and Coordinators for the ALLHAT Collaborative Research Group: Major outcomes in high-risk hypertensive patients randomized to angiotensin-converting enzyme inhibitor or calcium channel blocker vs. diuretic. JAMA 288(23), 2981–2997 (2002). doi:10.1001/jama.288.23.2981

    Article  Google Scholar 

  • Baum, C.F., Shaffer, M.E., Stillman, S.: Instrumental variables and GMM: estimation and testing. Stat. J. 3(1), 1–31 (2003)

    Google Scholar 

  • Bound, J., Jaeger, D.A., Baker, R.M.: Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J. Am. Stat. Assoc. 90, 443–450 (1995). doi:10.2307/2291055

    Article  Google Scholar 

  • Clancy, C.: Health issues and opportunities at AHRQ. Testimony before the House Subcommittee on Labor-HHS-Education appropriations, Washington DC, March 5, 2008. (2008). Accessed 7 April 2008

  • Congressional Budget Offices: Research on the comparative effectiveness of medical treatments: issues and options for an expanded federal role. Congress of the United States, Pub. No. 2975, December 2007

  • Congressional Research Service: Comparative clinical effectiveness and cost-effectiveness research: background, history, and overview. CRS Report for Congress, October 15, 2007

  • D’Agostino Jr., R.B.: Tutorial in biostatistics: propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat. Med. 17, 2265–2281 (1998). doi:10.1002/(SICI)1097-0258(19981015)17:19<2265::AID-SIM918>3.0.CO;2-B

    Article  PubMed  Google Scholar 

  • Davidson, R., MacKinnon, J.G.: Estimation and Inference in Econometrics. Oxford University Press, New York (1993)

    Google Scholar 

  • Dixon, K.: US may compare medical products; companies wary. Reuters, March 31 (2008)

  • Earle, C.C., Tsai, J.S., Gelber, R.D., Weinstein, M.C., Neumann, P.J., Weeks, J.C.: Effectiveness of chemotherapy for advanced lung cancer in the elderly: instrumental variable and propensity analysis. J. Clin. Oncol. 19(4), 1064–1070 (2001)

    PubMed  CAS  Google Scholar 

  • Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7(1), 1–26 (1979). doi:10.1214/aos/1176344552

    Article  Google Scholar 

  • Grootendorst, P.: A review of instrumental variables estimation of treatment effects in the applied health sciences. Health Serv. Outcomes Res. Methodol. 7, 159–179 (2007). doi:10.1007/s10742-007-0023-6

    Article  Google Scholar 

  • Hausman, J.A.: Specification tests in econometrics. Econometrica 46(6), 1251–1271 (1978). doi:10.2307/1913827

    Article  Google Scholar 

  • Heckman, J.J.: Dummy endogenous variables in a simultaneous equation system. Econometrica 46(4), 931–959 (1978). doi:10.2307/1909757

    Article  Google Scholar 

  • Heckman, J.J.: Sample selection bias as a specification error. Econometrica 47(1), 153–161 (1979). doi:10.2307/1912352

    Article  Google Scholar 

  • Imbens, G.W., Angrist, J.D.: Identification and estimation of local average treatment effects. Econometrica 62(2), 467–475 (1994). doi:10.2307/2951620

    Article  Google Scholar 

  • Institute of Medicine: Learning what works best: the nation’s need for evidence on comparative effectiveness in health care. (2007) Accessed 19 May 2008

  • Newey, W.K., Powell, J.L., Vella, F.: Nonparametric estimation of triangular simultaneous equations models. Econometrica 67, 565–603 (1999). doi:10.1111/1468-0262.00037

    Article  Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983). doi:10.1093/biomet/70.1.41

    Article  Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79, 516–524 (1984). doi:10.2307/2288398

    Article  Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am. Stat. 39, 33–38 (1985). doi:10.2307/2683903

    Article  Google Scholar 

  • Staiger, D., Stock, J.: Instrumental variables regression with weak instruments. Econometrica 65, 557–586 (1997). doi:10.2307/2171753

    Article  Google Scholar 

  • Stukel, T.A., Fisher, E.S., Wennberg, D.E., Alter, D.A., Gottlieb, D.J., Vermeulen, M.J.: Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA 297(3), 278–285 (2007). doi:10.1001/jama.297.3.278

    Article  PubMed  CAS  Google Scholar 

  • Terza, J.V., Bradford, W.D., Dismuke, C.E.: The use of linear instrumental variables methods in health services research and health economics: a cautionary note. Health Serv. Res. 43(3), 1102–1120 (2008a). doi:10.1111/j.1475-6773.2007.00807.x

    Article  PubMed  Google Scholar 

  • Terza, J.V., Basu, A., Rathouz, P.J.: Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling. J. Health Econ. 27, 531–543 (2008b). doi:10.1016/j.jhealeco.2007.09.009

    Article  PubMed  Google Scholar 

  • Wang, P.S., Schneeweiss, S., Avorn, J., Fischer, M.A., Mogun, H., Solomon, D.H., Brookhart, M.A.: Risk of death in elderly users of conventional vs. atypical antipsychotic medications. N. Engl. J. Med. 353(22), 2335–2341 (2005). doi:10.1056/NEJMoa052827

    Article  PubMed  CAS  Google Scholar 

Download references


This research was supported by Grant Number IAD 06-112 from the Health Services Research and Development Service of the U.S. Department of Veterans Affairs. All opinions expressed in this paper are those of the author and do not necessarily reflect the official position of the U.S. Department of Veterans Affairs or of Boston University. The author wishes to thank Matt Maciejewski, Paul Hebert, Ann Hendricks, Austin Frakt, and an anonymous reviewer for helpful comments.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Steven D. Pizer.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Pizer, S.D. An intuitive review of methods for observational studies of comparative effectiveness. Health Serv Outcomes Res Method 9, 54–68 (2009).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Comparative effectiveness
  • Observational studies
  • Selection bias
  • Propensity scores
  • Instrumental variables