, Volume 77, Issue 3, pp 581–609 | Cite as

A Two-Step Bayesian Approach for Propensity Score Analysis: Simulations and Case Study

  • David Kaplan
  • Jianshen Chen


A two-step Bayesian propensity score approach is introduced that incorporates prior information in the propensity score equation and outcome equation without the problems associated with simultaneous Bayesian propensity score approaches. The corresponding variance estimators are also provided. The two-step Bayesian propensity score is provided for three methods of implementation: propensity score stratification, weighting, and optimal full matching. Three simulation studies and one case study are presented to elaborate the proposed two-step Bayesian propensity score approach. Results of the simulation studies reveal that greater precision in the propensity score equation yields better recovery of the frequentist-based treatment effect. A slight advantage is shown for the Bayesian approach in small samples. Results also reveal that greater precision around the wrong treatment effect can lead to seriously distorted results. However, greater precision around the correct treatment effect parameter yields quite good results, with slight improvement seen with greater precision in the propensity score equation. A comparison of coverage rates for the conventional frequentist approach and proposed Bayesian approach is also provided. The case study reveals that credible intervals are wider than frequentist confidence intervals when priors are non-informative.

Key words

propensity score analysis Bayesian inference 



The research reported in this paper was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305D110001 to The University of Wisconsin–Madison. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.


  1. Abadie, A., & Imbens, G. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica, 74, 235–267. CrossRefGoogle Scholar
  2. Abadie, A., & Imbens, G.W. (2008). On the failure of the bootstrap for matching estimators. Econometrica, 76, 1537–1558. CrossRefGoogle Scholar
  3. Abadie, A., & Imbens, G.W. (2009). Matching on the estimated propensity score (NBER Working Paper 15301). Google Scholar
  4. Abbas, A.E., Budescu, D.V., & Gu, Y. (2010). Assessing joint distributions with isoprobability countours. Management Science, 56, 997–1011. CrossRefGoogle Scholar
  5. Abbas, A.E., Budescu, D.V., Yu, H.T., & Haggerty, R. (2008). A comparison of two probability encoding methods: fixed probability vs. fixed variable values. Decision Analysis, 5, 190–202. CrossRefGoogle Scholar
  6. An, W. (2010). Bayesian propensity score estimators: incorporating uncertainties in propensity scores into causal inference. Sociological Methodology, 40, 151–189. CrossRefGoogle Scholar
  7. Austin, P.C., & Mamdani, M.M. (2006). A comparison of propensity score methods: a case-study estimating the effectiveness of post-AMI statin use. Statistics in Medicine, 25, 2084–2106. PubMedCrossRefGoogle Scholar
  8. Benjamin, D.J. (2003). Does 401(k) eligibility increase saving? Evidence from propensity score subclassification. Journal of Public Economics, 87, 1259–1290. CrossRefGoogle Scholar
  9. Cochran, W.G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics, 24, 295–313. PubMedCrossRefGoogle Scholar
  10. Dawid, A.P. (1982). The well-calibrated Bayesian. Journal of the American Statistical Association, 77, 605–610. Google Scholar
  11. Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (2003). Bayesian data analysis (2nd ed.). London: Chapman and Hall. Google Scholar
  12. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–741. PubMedCrossRefGoogle Scholar
  13. Guo, S., & Fraser, M.W. (2010). Propensity score analysis: statistical methods and applications. Thousand Oaks: Sage. Google Scholar
  14. Hansen, B.B. (2004). Full matching in an observational study of coaching for the SAT. Journal of the American Statistical Association, 99, 609–618. CrossRefGoogle Scholar
  15. Hansen, B.B., & Klopfer, S.O. (2006). Optimal full matching and related designs via network flow. Journal of Computational and Graphical Statistics, 15, 609–627. CrossRefGoogle Scholar
  16. Heckman, J.J. (2005). The scientific model of causality. In R.M. Stolzenberg (Ed.), Sociological methodology (Vol. 35, pp. 1–97). Boston: Blackwell Publishing. Google Scholar
  17. Hirano, K., & Imbens, G.W. (2001). Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Services and Outcomes Research Methodology, 2, 259–278. CrossRefGoogle Scholar
  18. Hirano, K., Imbens, G.W., & Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71, 1169–1189. Google Scholar
  19. Holland, P.W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960. Google Scholar
  20. Horvitz, D.G., & Thompson, D.J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685. Google Scholar
  21. Hoshino, T. (2008). A Bayesian propensity score adjustment for latent variable modeling and MCMC algorithm. Computational Statistics & Data Analysis, 52, 1413–1429. CrossRefGoogle Scholar
  22. Larsen, M.D. (1999). An analysis of survey data on smoking using propensity scores. Sankya. The Indian Journal of Statistics, 61, 91–105. Google Scholar
  23. Lechner, M. (2002). Some practical issues in the evaluation of heterogeneous labour market programmes by matching methods. Journal of the Royal Statistical Society. Series A. Statistics in Society, 165, 59–82. Google Scholar
  24. Lunceford, J.K., & Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23, 2937–2960. PubMedCrossRefGoogle Scholar
  25. Martin, A.D., Quinn, K.M., & Park, J.H. (2010, May 10). Markov chain Monte Carlo (MCMC) package.
  26. McCandless, L.C., Gustafson, P., & Austin, P.C. (2009). Bayesian propensity score analysis for observational data. Statistics in Medicine, 28, 94–112. PubMedCrossRefGoogle Scholar
  27. NCES (2001). Early childhood longitudinal study: kindergarten class of 1998–99: base year public-use data files user’s manual (Tech. Rep. No. NCES 2001-029). U.S. Government Printing Office. Google Scholar
  28. Neyman, J.S. (1923). Statistical problems in agriculture experiments. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 2, 107–180. Google Scholar
  29. O’Hagan, A., Buck, C.E., Daneshkhah, A., Eiser, J.R., Garthwaite, P.H., Jenkinson, D.J., et al. (2006). Uncertain judgements: eliciting experts’ probabilities. West Sussex: Wiley. CrossRefGoogle Scholar
  30. Perkins, S.M., Tu, W., Underhill, M.G., Zhou, X.H., & Murray, M.D. (2000). The use of propensity scores in pharmacoepidemiologic research. Pharmacoepidemiology and Drug Safety, 9, 93–101. PubMedCrossRefGoogle Scholar
  31. R Development Core Team (2011). R: a language and environment for statistical computing (Computer software manual). Vienna, Austria. Available from (ISBN 3-900051-07-0).
  32. Rässler, S. (2002). Statistical matching: a frequentist theory, practical applications, and alternative Bayesian approaches. New York: Springer. Google Scholar
  33. Rosenbaum, P.R. (1987). Model-based direct adjustment. Journal of the American Statistical Association, 82, 387–394. Google Scholar
  34. Rosenbaum, P.R. (1989). Optimal matching for observational studies. Journal of the American Statistical Association, 84, 1024–1032. Google Scholar
  35. Rosenbaum, P.R. (2002). Observational studies (2nd ed.). New York: Springer. Google Scholar
  36. Rosenbaum, P.R., & Rubin, D.B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. CrossRefGoogle Scholar
  37. Rosenbaum, P.R., & Rubin, D.B. (1984). Reducing bias in observational studies using sub-classification on the propensity score. Journal of the American Statistical Association, 79, 516–524. Google Scholar
  38. Rosenbaum, P.R., & Rubin, D.B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate a propensity score. American Statistician, 39, 33–38. Google Scholar
  39. Rubin, D.B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66, 688–701. CrossRefGoogle Scholar
  40. Rubin, D.B. (1985). The use of propensity scores in applied Bayesian inference. Bayesian Statistics, 2, 463–472. Google Scholar
  41. Rubin, D.B. (2006). Matched sampling for causal effects. Cambridge: Cambridge University Press. Google Scholar
  42. Rubin, D.B., & Thomas, N. (1992a). Affinely invariant matching methods with ellipsoidal distributions. Annals of Statistics, 20, 1079–1093. CrossRefGoogle Scholar
  43. Rubin, D.B., & Thomas, N. (1992b). Characterizing the effect of matching using linear propensity score methods with normal distributions. Biometrika, 79, 797–809. CrossRefGoogle Scholar
  44. Rubin, D.B., & Thomas, N. (1996). Matching using estimated propensity scores. Biometrics, 52, 249–264. PubMedCrossRefGoogle Scholar
  45. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society. Series B. Statistical Methodology, 64, 583–639. CrossRefGoogle Scholar
  46. Steiner, P.M., & Cook, D. (in press). Matching and propensity scores. In T. Little (Ed.), Oxford handbook of quantitative methods. Oxford: Oxford University Press. Google Scholar
  47. Steiner, P.M., Cook, T.D., & Shadish, W.R. (2011). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores. Journal of Educational and Behavioral Statistics, 36, 213–236. CrossRefGoogle Scholar
  48. Steiner, P.M., Cook, T.D., Shadish, W.R., & Clark, M.H. (2010). The importance of covariate selection in controlling for selection bias in observational studies. Psychological Methods, 15, 250–267. PubMedCrossRefGoogle Scholar
  49. Thoemmes, F.J., & Kim, E.S. (2011). A systematic review of propensity score methods in the social sciences. Multivariate Behavioral Research, 46, 90–118. CrossRefGoogle Scholar
  50. van Buuren, S., & Groothuis-Oudshoorn, K. (2011). MICE: multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. Available from Google Scholar
  51. Yuan, Y., & MacKinnon, D.P. (2009). Bayesian mediation analysis. Psychological Methods, 14, 301–322. PubMedCrossRefGoogle Scholar
  52. Zanutto, E.L., Lu, B., & Hornik, R. (2005). Using propensity score subclassification for multiple treatment doses to evaluate a national anti-drug media campaign. Journal of Educational and Behavioral Statistics, 30, 59–73. CrossRefGoogle Scholar

Copyright information

© The Psychometric Society 2012

Authors and Affiliations

  1. 1.Department of Educational PsychologyUniversity of Wisconsin–MadisonMadisonUSA

Personalised recommendations