Skip to main content
Log in

Propensity score methods for causal inference: an overview

  • Review Paper
  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

Propensity score methods are popular and effective statistical techniques for reducing selection bias in observational data to increase the validity of causal inference based on observational studies in behavioral and social science research. Some methodologists and statisticians have raised concerns about the rationale and applicability of propensity score methods. In this review, we addressed these concerns by reviewing the development history and the assumptions of propensity score methods, followed by the fundamental techniques of and available software packages for propensity score methods. We especially discussed the issues in and debates about the use of propensity score methods. This review provides beneficial information about propensity score methods from the historical point of view and helps researchers to select appropriate propensity score methods for their observational studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahmed A, Husain A, Love TE, Gambassi G, Dell’Italia LJ, Francis GS, Gheorghiade M, Allman RM, Meleth S, Bourge RC (2006) Heart failure, chronic diuretic use, and increase in mortality and hospitalization: an observational study using propensity score methods. Eur Heart J 27(12):1431–1439

    Article  Google Scholar 

  • Austin PC (2011) An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res 46(3):399–424

    Article  Google Scholar 

  • Austin PC, Stuart EA (2015) Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med 34(28):3661–3679

    Article  MathSciNet  Google Scholar 

  • Bai H (2011a) A comparison of propensity score matching methods for reducing selection bias. Int J Res Method Educ 34(1):81–107

    Article  Google Scholar 

  • Bai H (2011b) Using propensity score analysis for making causal claims in research articles. Educ Psychol Rev 23:273–278

    Article  Google Scholar 

  • Bai H (2013) A bootstrap procedure of propensity score estimation. J Exp Educ 81(2):157–177

    Article  Google Scholar 

  • Bai H (2015) Methodological considerations in implementing propensity score matching. In: Pan W, Bai H (eds) Propensity score analysis: fundamentals, developments, and extensions. Guilford Press, New York, pp 74–88

    Google Scholar 

  • Bang H, Robins JM (2005) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–973

    Article  MathSciNet  MATH  Google Scholar 

  • Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T (2006) Variable selection for propensity score models. Am J Epidemiol 163(12):1149–1156

    Article  Google Scholar 

  • Caliendo M, Kopeinig S (2008) Some practical guidance for the implementation of propensity score matching. J Econ Surveys 221:31–72

    Article  Google Scholar 

  • Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhyā Indian J Stat Ser A 35(4):417–446

    MATH  Google Scholar 

  • Dehejia RH, Wahba S (2002) Propensity score-matching methods for nonexperimental causal studies. Rev Econ Stat 84(1):151–161

    Article  Google Scholar 

  • Diamond A, Sekhon JS (2013) Genetic matching for estimating causal effects: a general multivariate matching method for achieving balance in observational studies. Rev Econ Stat 95:932–945

    Article  Google Scholar 

  • Fisher RA (1951) The design of experiments. Oliver & Boyd, Edinburgh

    Google Scholar 

  • Funk MJ, Westreich D, Wiesen C, Stürmer T, Brookhart MA, Davidian M (2001) Doubly robust estimation of causal effects. Am J Epidemiol 173(7):761–767

    Article  Google Scholar 

  • Greenland S (2005) Multiple-bias modelling for analysis of observational data. J R Stat Soc Ser A (Stat Soc) 168(2):267–306

    Article  MathSciNet  MATH  Google Scholar 

  • Groenwold RHH, Nelson DB, Nichol KL, Hoes AW, Hak E (2010) Sensitivity analyses to estimate the potential impact of unmeasured confounding in causal research. Int J Epidemiol 39(1):107–117

    Article  Google Scholar 

  • Guo S, Barth RP, Gibbons C (2006) Propensity score matching strategies for evaluating substance abuse services for child welfare clients. Child Youth Serv Rev 28(4):357–383

    Article  Google Scholar 

  • Hamilton MA (1979) Choosing the parameter for a 2 × 2 table or a 2 × 2 × 2 table analysis. Am J Epidemiol 109(3):362–375

    Article  Google Scholar 

  • Hansen BB (2004) Full matching in an observational study of coaching for the SAT. J Am Stat Assoc 99(467):609–618

    Article  MathSciNet  MATH  Google Scholar 

  • Harder VS, Stuart EA, Anthony JC (2010) Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychol Methods 15(3):234–249

    Article  Google Scholar 

  • Heckman JJ, Ichimura H, Todd PE (1997) Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev Econ Stud 64(4):605–654

    Article  MATH  Google Scholar 

  • Hirano K, Imbens GW (2001) Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Serv Outcomes Res Method 2(3):259–278

    Article  Google Scholar 

  • Hirano K, Imbens GW, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4):1161–1189

    Article  MathSciNet  MATH  Google Scholar 

  • Ho DE, Imai K, King G, Stuart EA (2011) MatchIt: nonparametric preprocessing for parametric causal inference. J Stat Softw 42(8):1–28

    Article  Google Scholar 

  • Holland PW (1986) Statistics and causal inference. J Am Stat Assoc 81(396):945–960

    Article  MathSciNet  MATH  Google Scholar 

  • Huesch MD (2013) External adjustment sensitivity analysis for unmeasured confounding: an application to coronary stent outcomes, Pennsylvania 2004–2008. Health Serv Res 48(3):1191–1214

    Article  Google Scholar 

  • Imai K, Ratkovic M (2014) Covariate balancing propensity score. J R Stat Soc Ser B (Stat Methodol) 76(1):243–263

    Article  MathSciNet  MATH  Google Scholar 

  • Keele LJ (2015) Package ‘rbounds’, version 2.1. https://cran.r-project.org/web/packages/rbounds/rbounds.pdf. Accessed 20 Jan 2016

  • Kempthorne O (1952) The design and analysis of experiments. Wiley, Oxford

    Book  MATH  Google Scholar 

  • King G, Nielsen R (2016) Why propensity scores should not be used for matching. https://gking.harvard.edu/files/gking/files/psnot.pdf. Accessed 26 June 2017

  • Lemon SC, Roy JR, Clark MA, Friedmann PD, Rakowski WR (2003) Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann Behav Med 26:172–181

    Article  Google Scholar 

  • Leuven E, Sianesi B (2012) PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing. Statistical Software Components S432001. Boston College Department of Economics. http://ideas.repec.org/c/boc/bocode/s432001.html. Accessed 6 May 2014

  • Li L, Shen C, Wu AC, Li X (2011) Propensity score-based sensitivity analysis method for uncontrolled confounding. Am J Epidemiol 174(3):345–353

    Article  Google Scholar 

  • Li J, Handorf E, Bekelman J, Mitra N (2017) Propensity score and doubly robust methods for estimating the effect of treatment on censored cost. Stat Med 35(12):1985–1999

    Article  MathSciNet  Google Scholar 

  • Lin DY, Psaty BM, Kronmal RA (1998) Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54(3):948–963

    Article  MATH  Google Scholar 

  • MacLehose RF, Kaufman S, Kaufman JS, Poole C (2005) Bounding causal effects under uncontrolled confounding using counterfactuals. Epidemiology 16(4):548–555

    Article  Google Scholar 

  • Månsson R, Joffe MM, Sun W, Hennessy S (2007) On the estimation and use of propensity scores in case-control and case-cohort studies. Am J Epidemiol 166(3):332–339

    Article  Google Scholar 

  • McCaffrey DF, Ridgeway G, Morral AR (2004) Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 9(4):403–425

    Article  Google Scholar 

  • Pan W, Bai H (eds) (2015a) Propensity score analysis: fundamentals and developments. Guilford Press, New York

    Google Scholar 

  • Pan W, Bai H (2015b) Propensity score interval matching: using bootstrap confidence intervals for accommodating estimation errors of propensity scores. BMC Med Res Methodol 15(53):1–9

    Google Scholar 

  • Pan W, Bai H (2016a) A robustness index of propensity score estimation to uncontrolled confounders. In: He H, Wu P, Chen D (eds) Statistical causal inferences and their applications in public health research. Springer, New York, pp 91–100

    Chapter  Google Scholar 

  • Pan W, Bai H (2016b) Propensity score methods in nursing research: take advantage of them but proceed with caution. Nurs Res 65(6):421–424

    Article  Google Scholar 

  • Pattanayak CW (2015) Evaluating covariate balance. In: Pan W, Bai H (eds) Propensity score analysis: fundamentals and developments. Guilford Press, New York, pp 89–112

    Google Scholar 

  • Pearl J (2010) The foundations of causal inference. Sociol Methodol 40(1):75–149

    Article  Google Scholar 

  • Robins JM, Hernan MA, Brumback B (2000a) Marginal structural models and causal inference in epidemiology. Epidemiology 11:550–560

    Article  Google Scholar 

  • Robins JM, Rotnitzky A, Scharfstein DO (2000b) Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Halloran ME, Berry D (eds) Statistical models in epidemiology, the environment, and clinical trials. Springer, New York, pp 1–94

    MATH  Google Scholar 

  • Rosenbaum PR (1989) Optimal matching for observational studies. J Am Stat Assoc 84(408):1024–1032

    Article  Google Scholar 

  • Rosenbaum PR (2010) Observational studies, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • Rosenbaum PR, Rubin DB (1983a) Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B (Methodol) 45(2):212–218

    Google Scholar 

  • Rosenbaum PR, Rubin DB (1983b) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55

    Article  MathSciNet  MATH  Google Scholar 

  • Rosenbaum PR, Rubin DB (1984) Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 79(387):516–524

    Article  Google Scholar 

  • Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39(1):33–38

    Google Scholar 

  • Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688

    Article  Google Scholar 

  • Rubin DB (1977) Assignment to treatment group on the basis of a covariate. J Educ Behav Stat 2(1):1–26

    Article  MathSciNet  Google Scholar 

  • Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Ann Stat 6(1):34–58

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin DB (1980) Bias reduction using Mahalanobis metric matching. Biometrics 36:293–298

    Article  MATH  Google Scholar 

  • Rubin DB (1997) Estimating causal effects from large data sets using propensity scores. Ann Intern Med 127(8_Part_2):757–763

  • Rubin DB (2001) Using propensity scores to help design observational studies: application to the tobacco litigation. Health Serv Outcomes Res Method 2(3–4):169–188

    Article  Google Scholar 

  • Rubin DB (2008) For objective causal inference, design trumps analysis. Ann Appl Stat 2(3):808–840

    Article  MathSciNet  MATH  Google Scholar 

  • Rubin DB (2009) Should observational studies be designed to allow lack of balance in covariate distributions across treatment groups? Stat Med 28(9):1420–1423

    Article  MathSciNet  Google Scholar 

  • Rubin DB, Thomas N (1996) Matching using estimated propensity scores: relating theory to practice. Biometrics 52:249–264. https://doi.org/10.2307/2533160

    Article  MATH  Google Scholar 

  • SAS Institute Inc. (2017a) SAS/STAT® 14.3 user’s guide: the CAUSALTRT procedure. SAS Institute Inc., Cary, NC

  • SAS Institute Inc. (2017b) SAS/STAT® 14.3 user’s guide: the PSMATCH procedure. SAS Institute Inc., Cary, NC

  • Schafer JL, Kang J (2008) Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychol Methods 13:279–313

    Article  Google Scholar 

  • Schneeweiss S (2006) Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf 15(5):291–303

    Article  Google Scholar 

  • Schuler M (2015) Overview of implementing propensity score analyses in statistical software. In: Pan W, Bai H (eds) Propensity score analysis: fundamentals and developments. Guilford Press, New York, pp 20–48

    Google Scholar 

  • Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, Boston

    Google Scholar 

  • Shadish WR, Clark MH, Steiner PM (2008) Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. J Am Stat Assoc 3(484):1334–1344

    Article  MathSciNet  MATH  Google Scholar 

  • Smith JA, Todd PE (2005) Does matching overcome LaLonde’s critique of nonexperimental estimators? J Econ 125:305–353

    Article  MathSciNet  MATH  Google Scholar 

  • Stone CA, Tang Y (2013) Comparing propensity score methods in balancing covariates and recovering impact in small sample educational program evaluations. Pract Assess Res Eval 18(13):1–12

    Google Scholar 

  • Thoemmes F (2012) Propensity score matching in SPSS. https://arxiv.org/abs/1201.6385. Accessed 26 May 2014

  • Westreich D, Lessler J, Funk MJ (2010) Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol 36(8):826–833

    Article  Google Scholar 

  • Winship C, Morgan SL (1999) The estimation of causal effects from observational data. Ann Rev Sociol 25:659–706

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Pan.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Communicated by Takahiro Hoshino.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pan, W., Bai, H. Propensity score methods for causal inference: an overview. Behaviormetrika 45, 317–334 (2018). https://doi.org/10.1007/s41237-018-0058-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41237-018-0058-8

Keywords

Navigation