, Volume 45, Issue 2, pp 317–334 | Cite as

Propensity score methods for causal inference: an overview

  • Wei Pan
  • Haiyan Bai
Review Paper


Propensity score methods are popular and effective statistical techniques for reducing selection bias in observational data to increase the validity of causal inference based on observational studies in behavioral and social science research. Some methodologists and statisticians have raised concerns about the rationale and applicability of propensity score methods. In this review, we addressed these concerns by reviewing the development history and the assumptions of propensity score methods, followed by the fundamental techniques of and available software packages for propensity score methods. We especially discussed the issues in and debates about the use of propensity score methods. This review provides beneficial information about propensity score methods from the historical point of view and helps researchers to select appropriate propensity score methods for their observational studies.


Propensity scores Propensity score methods Propensity score analysis Propensity score matching Subclassification IPTW 


Compliance with ethical standards

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.


  1. Ahmed A, Husain A, Love TE, Gambassi G, Dell’Italia LJ, Francis GS, Gheorghiade M, Allman RM, Meleth S, Bourge RC (2006) Heart failure, chronic diuretic use, and increase in mortality and hospitalization: an observational study using propensity score methods. Eur Heart J 27(12):1431–1439Google Scholar
  2. Austin PC (2011) An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivar Behav Res 46(3):399–424Google Scholar
  3. Austin PC, Stuart EA (2015) Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med 34(28):3661–3679MathSciNetGoogle Scholar
  4. Bai H (2011a) A comparison of propensity score matching methods for reducing selection bias. Int J Res Method Educ 34(1):81–107Google Scholar
  5. Bai H (2011b) Using propensity score analysis for making causal claims in research articles. Educ Psychol Rev 23:273–278Google Scholar
  6. Bai H (2013) A bootstrap procedure of propensity score estimation. J Exp Educ 81(2):157–177Google Scholar
  7. Bai H (2015) Methodological considerations in implementing propensity score matching. In: Pan W, Bai H (eds) Propensity score analysis: fundamentals, developments, and extensions. Guilford Press, New York, pp 74–88Google Scholar
  8. Bang H, Robins JM (2005) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–973MathSciNetzbMATHGoogle Scholar
  9. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Stürmer T (2006) Variable selection for propensity score models. Am J Epidemiol 163(12):1149–1156Google Scholar
  10. Caliendo M, Kopeinig S (2008) Some practical guidance for the implementation of propensity score matching. J Econ Surveys 221:31–72Google Scholar
  11. Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhyā Indian J Stat Ser A 35(4):417–446zbMATHGoogle Scholar
  12. Dehejia RH, Wahba S (2002) Propensity score-matching methods for nonexperimental causal studies. Rev Econ Stat 84(1):151–161Google Scholar
  13. Diamond A, Sekhon JS (2013) Genetic matching for estimating causal effects: a general multivariate matching method for achieving balance in observational studies. Rev Econ Stat 95:932–945Google Scholar
  14. Fisher RA (1951) The design of experiments. Oliver & Boyd, EdinburghGoogle Scholar
  15. Funk MJ, Westreich D, Wiesen C, Stürmer T, Brookhart MA, Davidian M (2001) Doubly robust estimation of causal effects. Am J Epidemiol 173(7):761–767Google Scholar
  16. Greenland S (2005) Multiple-bias modelling for analysis of observational data. J R Stat Soc Ser A (Stat Soc) 168(2):267–306MathSciNetzbMATHGoogle Scholar
  17. Groenwold RHH, Nelson DB, Nichol KL, Hoes AW, Hak E (2010) Sensitivity analyses to estimate the potential impact of unmeasured confounding in causal research. Int J Epidemiol 39(1):107–117Google Scholar
  18. Guo S, Barth RP, Gibbons C (2006) Propensity score matching strategies for evaluating substance abuse services for child welfare clients. Child Youth Serv Rev 28(4):357–383Google Scholar
  19. Hamilton MA (1979) Choosing the parameter for a 2 × 2 table or a 2 × 2 × 2 table analysis. Am J Epidemiol 109(3):362–375Google Scholar
  20. Hansen BB (2004) Full matching in an observational study of coaching for the SAT. J Am Stat Assoc 99(467):609–618MathSciNetzbMATHGoogle Scholar
  21. Harder VS, Stuart EA, Anthony JC (2010) Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychol Methods 15(3):234–249Google Scholar
  22. Heckman JJ, Ichimura H, Todd PE (1997) Matching as an econometric evaluation estimator: evidence from evaluating a job training programme. Rev Econ Stud 64(4):605–654zbMATHGoogle Scholar
  23. Hirano K, Imbens GW (2001) Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Serv Outcomes Res Method 2(3):259–278Google Scholar
  24. Hirano K, Imbens GW, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4):1161–1189MathSciNetzbMATHGoogle Scholar
  25. Ho DE, Imai K, King G, Stuart EA (2011) MatchIt: nonparametric preprocessing for parametric causal inference. J Stat Softw 42(8):1–28Google Scholar
  26. Holland PW (1986) Statistics and causal inference. J Am Stat Assoc 81(396):945–960MathSciNetzbMATHGoogle Scholar
  27. Huesch MD (2013) External adjustment sensitivity analysis for unmeasured confounding: an application to coronary stent outcomes, Pennsylvania 2004–2008. Health Serv Res 48(3):1191–1214Google Scholar
  28. Imai K, Ratkovic M (2014) Covariate balancing propensity score. J R Stat Soc Ser B (Stat Methodol) 76(1):243–263MathSciNetGoogle Scholar
  29. Keele LJ (2015) Package ‘rbounds’, version 2.1. Accessed 20 Jan 2016
  30. Kempthorne O (1952) The design and analysis of experiments. Wiley, OxfordzbMATHGoogle Scholar
  31. King G, Nielsen R (2016) Why propensity scores should not be used for matching. Accessed 26 June 2017
  32. Lemon SC, Roy JR, Clark MA, Friedmann PD, Rakowski WR (2003) Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann Behav Med 26:172–181Google Scholar
  33. Leuven E, Sianesi B (2012) PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing. Statistical Software Components S432001. Boston College Department of Economics. Accessed 6 May 2014
  34. Li L, Shen C, Wu AC, Li X (2011) Propensity score-based sensitivity analysis method for uncontrolled confounding. Am J Epidemiol 174(3):345–353Google Scholar
  35. Li J, Handorf E, Bekelman J, Mitra N (2017) Propensity score and doubly robust methods for estimating the effect of treatment on censored cost. Stat Med 35(12):1985–1999MathSciNetGoogle Scholar
  36. Lin DY, Psaty BM, Kronmal RA (1998) Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54(3):948–963zbMATHGoogle Scholar
  37. MacLehose RF, Kaufman S, Kaufman JS, Poole C (2005) Bounding causal effects under uncontrolled confounding using counterfactuals. Epidemiology 16(4):548–555Google Scholar
  38. Månsson R, Joffe MM, Sun W, Hennessy S (2007) On the estimation and use of propensity scores in case-control and case-cohort studies. Am J Epidemiol 166(3):332–339Google Scholar
  39. McCaffrey DF, Ridgeway G, Morral AR (2004) Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 9(4):403–425Google Scholar
  40. Pan W, Bai H (eds) (2015a) Propensity score analysis: fundamentals and developments. Guilford Press, New YorkGoogle Scholar
  41. Pan W, Bai H (2015b) Propensity score interval matching: using bootstrap confidence intervals for accommodating estimation errors of propensity scores. BMC Med Res Methodol 15(53):1–9Google Scholar
  42. Pan W, Bai H (2016a) A robustness index of propensity score estimation to uncontrolled confounders. In: He H, Wu P, Chen D (eds) Statistical causal inferences and their applications in public health research. Springer, New York, pp 91–100Google Scholar
  43. Pan W, Bai H (2016b) Propensity score methods in nursing research: take advantage of them but proceed with caution. Nurs Res 65(6):421–424Google Scholar
  44. Pattanayak CW (2015) Evaluating covariate balance. In: Pan W, Bai H (eds) Propensity score analysis: fundamentals and developments. Guilford Press, New York, pp 89–112Google Scholar
  45. Pearl J (2010) The foundations of causal inference. Sociol Methodol 40(1):75–149Google Scholar
  46. Robins JM, Hernan MA, Brumback B (2000a) Marginal structural models and causal inference in epidemiology. Epidemiology 11:550–560Google Scholar
  47. Robins JM, Rotnitzky A, Scharfstein DO (2000b) Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In: Halloran ME, Berry D (eds) Statistical models in epidemiology, the environment, and clinical trials. Springer, New York, pp 1–94zbMATHGoogle Scholar
  48. Rosenbaum PR (1989) Optimal matching for observational studies. J Am Stat Assoc 84(408):1024–1032Google Scholar
  49. Rosenbaum PR (2010) Observational studies, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  50. Rosenbaum PR, Rubin DB (1983a) Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J R Stat Soc Ser B (Methodol) 45(2):212–218Google Scholar
  51. Rosenbaum PR, Rubin DB (1983b) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55MathSciNetzbMATHGoogle Scholar
  52. Rosenbaum PR, Rubin DB (1984) Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 79(387):516–524Google Scholar
  53. Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39(1):33–38Google Scholar
  54. Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688Google Scholar
  55. Rubin DB (1977) Assignment to treatment group on the basis of a covariate. J Educ Behav Stat 2(1):1–26MathSciNetGoogle Scholar
  56. Rubin DB (1978) Bayesian inference for causal effects: the role of randomization. Ann Stat 6(1):34–58MathSciNetzbMATHGoogle Scholar
  57. Rubin DB (1980) Bias reduction using Mahalanobis metric matching. Biometrics 36:293–298zbMATHGoogle Scholar
  58. Rubin DB (1997) Estimating causal effects from large data sets using propensity scores. Ann Intern Med 127(8_Part_2):757–763Google Scholar
  59. Rubin DB (2001) Using propensity scores to help design observational studies: application to the tobacco litigation. Health Serv Outcomes Res Method 2(3–4):169–188Google Scholar
  60. Rubin DB (2008) For objective causal inference, design trumps analysis. Ann Appl Stat 2(3):808–840MathSciNetzbMATHGoogle Scholar
  61. Rubin DB (2009) Should observational studies be designed to allow lack of balance in covariate distributions across treatment groups? Stat Med 28(9):1420–1423MathSciNetGoogle Scholar
  62. Rubin DB, Thomas N (1996) Matching using estimated propensity scores: relating theory to practice. Biometrics 52:249–264. zbMATHGoogle Scholar
  63. SAS Institute Inc. (2017a) SAS/STAT® 14.3 user’s guide: the CAUSALTRT procedure. SAS Institute Inc., Cary, NCGoogle Scholar
  64. SAS Institute Inc. (2017b) SAS/STAT® 14.3 user’s guide: the PSMATCH procedure. SAS Institute Inc., Cary, NCGoogle Scholar
  65. Schafer JL, Kang J (2008) Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychol Methods 13:279–313Google Scholar
  66. Schneeweiss S (2006) Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf 15(5):291–303Google Scholar
  67. Schuler M (2015) Overview of implementing propensity score analyses in statistical software. In: Pan W, Bai H (eds) Propensity score analysis: fundamentals and developments. Guilford Press, New York, pp 20–48Google Scholar
  68. Shadish WR, Cook TD, Campbell DT (2002) Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin, BostonGoogle Scholar
  69. Shadish WR, Clark MH, Steiner PM (2008) Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. J Am Stat Assoc 3(484):1334–1344MathSciNetzbMATHGoogle Scholar
  70. Smith JA, Todd PE (2005) Does matching overcome LaLonde’s critique of nonexperimental estimators? J Econ 125:305–353MathSciNetzbMATHGoogle Scholar
  71. Stone CA, Tang Y (2013) Comparing propensity score methods in balancing covariates and recovering impact in small sample educational program evaluations. Pract Assess Res Eval 18(13):1–12Google Scholar
  72. Thoemmes F (2012) Propensity score matching in SPSS. Accessed 26 May 2014
  73. Westreich D, Lessler J, Funk MJ (2010) Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol 36(8):826–833Google Scholar
  74. Winship C, Morgan SL (1999) The estimation of causal effects from observational data. Ann Rev Sociol 25:659–706Google Scholar

Copyright information

© The Behaviormetric Society 2018

Authors and Affiliations

  1. 1.Duke University, DUMC 3322DurhamUSA
  2. 2.University of Central FloridaOrlandoUSA

Personalised recommendations