Health Services and Outcomes Research Methodology

, Volume 14, Issue 4, pp 166–182 | Cite as

Using propensity scores in difference-in-differences models to estimate the effects of a policy change

  • Elizabeth A. StuartEmail author
  • Haiden A. Huskamp
  • Kenneth Duckworth
  • Jeffrey Simmons
  • Zirui Song
  • Michael E. Chernew
  • Colleen L. Barry


Difference-in-difference (DD) methods are a common strategy for evaluating the effects of policies or programs that are instituted at a particular point in time, such as the implementation of a new law. The DD method compares changes over time in a group unaffected by the policy intervention to the changes over time in a group affected by the policy intervention, and attributes the “difference-in-differences” to the effect of the policy. DD methods provide unbiased effect estimates if the trend over time would have been the same between the intervention and comparison groups in the absence of the intervention. However, a concern with DD models is that the program and intervention groups may differ in ways that would affect their trends over time, or their compositions may change over time. Propensity score methods are commonly used to handle this type of confounding in other non-experimental studies, but the particular considerations when using them in the context of a DD model have not been well investigated. In this paper, we describe the use of propensity scores in conjunction with DD models, in particular investigating a propensity score weighting strategy that weights the four groups (defined by time and intervention status) to be balanced on a set of characteristics. We discuss the conceptual issues associated with this approach, including the need for caution when selecting variables to include in the propensity score model, particularly given the multiple time point nature of the analysis. We illustrate the ideas and method with an application estimating the effects of a new payment and delivery system innovation (an accountable care organization model called the “Alternative Quality Contract” (AQC) implemented by Blue Cross Blue Shield of Massachusetts) on health plan enrollee out-of-pocket mental health service expenditures. We find no evidence that the AQC affected out-of-pocket mental health service expenditures of enrollees.


Mental health spending Policy evaluation Natural experiment Non-experimental study Causal inference 



We gratefully acknowledge funding support from the Commonwealth Fund [Grant # 20130499]. Dr. Stuart’s time was partially supported by the National Institute of Mental Health (1R01MH099010, PI: Stuart). We also thank Dana Gelb Safran at Blue Cross Blue Shield of Massachusetts for assistance generating the original research question and accessing data, and Christina Fu and Hocine Azeni of Harvard Medical School for expert programming support.


  1. Abadie, A.: Semiparametric difference-in-difference estimators. Rev. Econ. Stud. 72(1), 1–19 (2005)CrossRefGoogle Scholar
  2. Athey, S., Imbens, G.W.: Identification and inference in nonlinear difference-in-difference models. Econometrica. 74(2), 431–497 (2006)CrossRefGoogle Scholar
  3. Buntin, M.B., Zaslavsky, A.M.: Too much ado about two-part models and transformation? Comparing methods of modeling Medicare expenditures. J. Health Econ. 23(3), 525–542 (2004)PubMedCrossRefGoogle Scholar
  4. Card, D.: The impact of the Mariel boatlift on the Miami labor market. Ind. Labor Relat. Rev. 43(2), 245–257 (1994)CrossRefGoogle Scholar
  5. Card, D., Krueger, A.B.: Minimum wages and employment: a case study of the fast-food industry in New Jersey and Pennsylvania. Am. Econ. Rev. 84, 772–793 (1994)Google Scholar
  6. Edlund, M.J., Unutzer, J., Wells, K.B.: Clinician screening and treatment of alcohol, drug, and mental problems in primary care: results from healthcare for communities. Med. Care 42(15), 1158–1166 (2004)PubMedCrossRefGoogle Scholar
  7. Ho, D.E., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit Anal. 15, 199–236 (2007)CrossRefGoogle Scholar
  8. Holland, P.W.: Statistics and causal inference. J. Am. Stat. Assoc. 81(396), 945–960 (1986)CrossRefGoogle Scholar
  9. Imai, K., van Dyk, D.A.: Causal inference with general treatment regimes: generalizing the propensity score. J. Am Stat Assoc. 99(467), 854–866 (2004)CrossRefGoogle Scholar
  10. Imbens, G.W.: The role of the propensity score in estimating dose-response functions. Biometrika 87(3), 706–710 (2000)CrossRefGoogle Scholar
  11. Imbens, G.W., Wooldridge, J.M.: Recent developments in the econometrics of program evaluation. J. Econ. Lit. 47(1), 5–86 (2009)CrossRefGoogle Scholar
  12. Lechner, M.: The estimation of causal effects by difference-in-difference methods. Universitat St. Gallen Department of Economics Discussion Paper No. 2010–2028. (2011)Google Scholar
  13. Linden, A., Adams, J.L.: Applying a propensity score-based weighting model to interrupted time series data: improving causal inference in programme evaluation. J. Eval. Clin. Pract. 17(6), 1231–1238 (2011)PubMedCrossRefGoogle Scholar
  14. Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23, 2937–2960 (2004)PubMedCrossRefGoogle Scholar
  15. McCaffrey, D.F., Griffin, B.A., Almirall, D., Slaughter, M.E., Ramchand, R., Burgette, L.F.: A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat. Med. 32(19), 3388–3414 (2013)PubMedCentralPubMedCrossRefGoogle Scholar
  16. Pope, G.C., Kautter, J., Ellis, R.P., Ash, A.S., Ayanian, J.Z., Iezzoni, L.I., et al.: Risk adjustment of Medicare capitation payments using the CMS-HCC model. Health Care Financ. Rev. 25(4), 119–141 (2004)PubMedCentralPubMedGoogle Scholar
  17. Rosenbaum, P.R.: The consequences of adjusting for a concomitant variable that has been affected by the treatment. J. R. Stat. Soc. Ser. A. 147(5), 656–666 (1984)CrossRefGoogle Scholar
  18. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)CrossRefGoogle Scholar
  19. Rosenbaum, P.R.: Design of observational studies. Springer, New York (2010)CrossRefGoogle Scholar
  20. Rubin, D.B.: Assignment to treatment group on the basis of a covariate. J. Educ. Stat. 2, 1–26 (1977)Google Scholar
  21. Rubin, D.B.: The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Stat. Med. 26(1), 20–36 (2007)PubMedCrossRefGoogle Scholar
  22. Shadish, W.R., Cook, T.D., Campbell, D.T.: Experimental and Quasi-experimental Designs for Generalized Causal Inference. Houghton Mifflin Company, Boston (2002)Google Scholar
  23. Song, Z., Safran, D.G., Landon, B.E., Landrum, M.B., He, Y., Mechanic, R.E., Day, M.P., Chernew, M.E.: The ‘alternative quality contract’, based on a global budget, lowered medical spending and improved quality. Health Aff. 31(8), 1885–1894 (2012)CrossRefGoogle Scholar
  24. Stuart, E.A.: Matching methods for causal inference: a review and a look forward. Stat. Sci. 25(1), 1–21 (2010)PubMedCentralPubMedCrossRefGoogle Scholar
  25. Werner, R.M., Konetzka, R.T., Stuart, E.A., Norton, E.C., Polsky, D., Park, J.: The impact of public reporting on quality of postacute care. Health Serv. Res. 44(4), 1169–1187 (2009)PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Elizabeth A. Stuart
    • 1
    Email author
  • Haiden A. Huskamp
    • 2
  • Kenneth Duckworth
    • 3
  • Jeffrey Simmons
    • 3
  • Zirui Song
    • 2
  • Michael E. Chernew
    • 2
  • Colleen L. Barry
    • 1
  1. 1.Johns Hopkins Bloomberg School of Public HealthBaltimoreUSA
  2. 2.Harvard Medical SchoolBostonUSA
  3. 3.Blue Cross Blue Shield of MassachusettsBostonUSA

Personalised recommendations