Regression-adjusted matching and double-robust methods for estimating average treatment effects in health economic evaluation

  • Noémi KreifEmail author
  • Richard Grieve
  • Rosalba Radice
  • Jasjeet S. Sekhon


Regression, propensity score (PS) and double-robust (DR) methods can reduce selection bias when estimating average treatment effects (ATEs). Economic evaluations of health care interventions exemplify complex data structures, in that the covariate–endpoint relationships tend to be highly non-linear, with highly skewed cost and health outcome endpoints. When either the regression or PS model is correct, DR methods can provide unbiased, efficient estimates of ATEs, but generally the specification of both models is unknown. Regression-adjusted matching can also protect against bias from model misspecification, but has not been compared to DR methods. This paper compares regression-adjusted matching to selected DR methods (weighted regression and augmented inverse probability of treatment weighting) as well as to regression and PS methods for addressing selection bias in cost-effectiveness analyses (CEA). We contrast the methods in a CEA of a pharmaceutical intervention, where there are extreme estimated PSs, hence unstable inverse probability of treatment (IPT) weights. The case study motivates a simulation which considers settings with functional form misspecification in the PS and endpoint regression models (e.g. cost model with log instead of identity link), stable and unstable PS weights. We find that in the realistic setting of unstable IPT weights and misspecifications to the PS and regression models, regression-adjusted matching reports less bias than DR methods. We conclude that regression-adjusted matching is a relatively robust method for estimating ATEs in applications with complex data structures exemplified by CEA.


Average treatment effect Inverse probability of treatment weighting Double-robustness Regression-adjusted matching Cost-effectiveness analyses 



We thank Zia Sadique (LSHTM) for help in the motivating case study, Roland Ramsahai (University of Cambridge) for valuable comments on the Monte Carlo simulations, Manuel Gomes, Karla Diaz-Ordaz, Adam Steventon, Rhian Daniel (all LSHTM) and Susan Gruber (Harvard School of Public Health) for comments on the manuscript. We also thank David Harrison and Kathy Rowan (ICNARC) for access to the data used in the case study. Funding from the Economic and Social Research Council (Grant no. RES-061-25-0343) is greatly appreciated.


  1. Abadie, A., Drukker, D., Herr, J.L., Imbens, G.: Implementing matching estimators for average treatment effects in Stata. Stata J. 4(3), 290–311 (2004a)Google Scholar
  2. Abadie, A., Herr, J.L., Imbens, G.W., Drukker, D.M.: NNMATCH: Stata module to compute nearest-neighbor bias-corrected estimators. (2004b). Accessed 15 June 2012
  3. Abadie, A., Imbens, G.W.: Large sample properties of matching estimators for average treatment effects. Econometrica 74(1), 235–267 (2006)CrossRefGoogle Scholar
  4. Abadie, A., Imbens, G.W.: Bias-corrected matching estimators for average treatment effects. J. Bus. Econ. Stat. 29(1), 1–11 (2011)CrossRefGoogle Scholar
  5. Austin, P.C.: A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Stat. Med. 27(12), 2037–2049 (2008)PubMedCrossRefGoogle Scholar
  6. Austin, P.C.: Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat. Med. 28, 3083–3107 (2009)PubMedCrossRefGoogle Scholar
  7. Austin, P.C.: Using ensemble-based methods for directly estimating causal effects: an investigation of tree-based G-computation. Multivariate Behav. Res. 47(1), 115–135 (2012)PubMedCrossRefGoogle Scholar
  8. Bang, H., Robins, J.M.: Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–972 (2005)PubMedCrossRefGoogle Scholar
  9. Barber, J., Thompson, S.G.: Multiple regression of cost data: use of generalised linear models. J. Health Serv. Res. Policy 9(4), 197–204 (2004)PubMedCrossRefGoogle Scholar
  10. Basu, A.: Economics of individualization in comparative effectiveness research and a basis for a patient-centered health care. J. Health Econ. 30(3), 549–559 (2011)PubMedCrossRefGoogle Scholar
  11. Basu, A., Manca, A.: Regression estimators for generic health-related quality of life and quality-adjusted life years. Med. Decis. Making 32(1), 56–69 (2011)PubMedCrossRefGoogle Scholar
  12. Basu, A., Manning, W.G.: Issues for the next generation of health care cost analyses. Med. Care 47(7_Supplement_1), S109–S114 (2009)Google Scholar
  13. Basu, A., Polsky, D., Manning, W.: Estimating treatment effects on healthcare costs under exogeneity: is there a ‘magic bullet’? Health Serv. Outcomes Res. Methodol. 11(1), 1–26 (2011). doi: 10.1007/s10742-011-0072-8 PubMedCrossRefGoogle Scholar
  14. Basu, A., Rathouz, P.J.: Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6(1), 93–109 (2005)PubMedCrossRefGoogle Scholar
  15. Buntin, M.B., Zaslavsky, A.M.: Too much ado about two-part models and transformation? Comparing methods of modeling Medicare expenditures. J. Health Econ. 23(3), 525–542 (2004)PubMedCrossRefGoogle Scholar
  16. Busso, M., DiNardo, J., McCrary, J.: New evidence on the finite sample properties of propensity score reweighting and matching estimators. In: Working paper, vol. 3998, 2011Google Scholar
  17. Caliendo, M., Kopeinig, S.: Some practical guidance for the implementation of propensity score matching. J. Econ. Surv. 22(1), 31–72 (2008). doi: 10.1111/j.1467-6419.2007.00527.x CrossRefGoogle Scholar
  18. Crump, R.K., Hotz, V.J., Imbens, G.W., Mitnik, O.A.: Dealing with limited overlap in estimation of average treatment effects. Biometrika 96(1), 187–199 (2009)CrossRefGoogle Scholar
  19. Davison, A., Hinkley, D.: Bootstrap Methods and Their Application. Cambridge University Press, New York (1997)CrossRefGoogle Scholar
  20. Dehejia, R.H., Wahba, S.: Propensity score-matching methods for nonexperimental causal studies. Rev. Econ. Stat. 84(1), 151–161 (2002)CrossRefGoogle Scholar
  21. Diamond, A., Sekhon, J.S.: Genetic matching for estimating causal effects: a general multivariate matching method for achieving balance in observational studies. Rev. Econ. Stat. 95(3), 932–945 (2013)CrossRefGoogle Scholar
  22. Fenwick, E., O’Brien, B., Briggs, A.: Cost-effectiveness acceptability curves—facts, fallacies and frequently asked questions. Health Econ. 13(5), 405–415 (2004)PubMedCrossRefGoogle Scholar
  23. Freedman, D., Berk, R.A.: Weighting regression by propensity score. Eval. Rev. 32(4), 392–409 (2008)PubMedCrossRefGoogle Scholar
  24. Fung, V., Brand, R.J., Newhouse, J.P., Hsu, J.: Using medicare data for comparative effectiveness research: opportunities and challenges. Am. J. Manag. Care 17(7), 489–496 (2011)Google Scholar
  25. Funk, M.J., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M.A., Davidian, M.: Doubly robust estimation of causal effects. Am. J. Epidemiol. 173(7), 761–767 (2011). doi: 10.1093/aje/kwq439 PubMedCrossRefGoogle Scholar
  26. Glick, H., Doshi, J., Sonnad, S., Polsky, D.: Economic Evaluation in Clinical Trials. Oxford University Press, Oxford (2007)Google Scholar
  27. Glynn, A.N., Quinn, K.M.: An introduction to the augmented inverse propensity weighted estimator. Political Anal. 18, 36–56 (2010)CrossRefGoogle Scholar
  28. Golinelli, D., Ridgeway, G., Rhoades, H., Tucker, J., Wenzel, S.: Bias and variance trade-offs when combining propensity score weighting and regression: with an application to HIV status and homeless men. Health Serv. Outcomes Res. Methodol. 12(2–3), 104–118 (2012)PubMedCrossRefGoogle Scholar
  29. Grieve, R., Sekhon, J.S., Hu, T.-W., Bloom, J.: Evaluating health care programs by combining cost with quality of life measures: a case study comparing capitation and fee for service. Health Serv. Res. 43(4), 1204–1222 (2008)PubMedCrossRefGoogle Scholar
  30. Gruber, S., van der Laan, M.J.: An application of collaborative targeted maximum likelihood estimation in causal inference and genomics. Int. J. Biostat. 6(1), Article 18 (2010). doi: 10.2202/1557-4679.1182 PubMedGoogle Scholar
  31. Hill, J., Reiter, J.P.: Interval estimation for treatment effects using propensity score matching. Stat. Med. 25(13), 2230–2256 (2006)PubMedCrossRefGoogle Scholar
  32. Hirano, K., Imbens, G.W.: Estimation of causal effects using propensity score weighting: an application to data on right heart catheterization. Health Serv. Outcomes Res. Methodol. 2(3), 259–278 (2001)CrossRefGoogle Scholar
  33. Hirano, K., Imbens, G.W., Ridder, G.: Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4), 1161–1189 (2003)CrossRefGoogle Scholar
  34. Ho, D.E., Imai, K., King, G., Stuart, E.A.: Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Anal. 15(3), 199–236 (2007)CrossRefGoogle Scholar
  35. Imbens, G.M., Wooldridge, J.M.: Recent developments in the econometrics of program evaluation. J. Econ. Lit. 47(1), 5–86 (2009)CrossRefGoogle Scholar
  36. Jackson, C., Bojke, L., Thompson, S., Claxton, K., Sharples, L.: A framework for addressing structural uncertainty in decision models. Med. Decis. Making 31, 662–674 (2011)PubMedCrossRefGoogle Scholar
  37. Jones, A., Lomas, J., Rice, N.: Applying beta-type size distributions to healthcare cost regressions. In: HEDG working papers, vol. WP 11/31. HEDG, c/o Department of Economics, University of York, 2011Google Scholar
  38. Jones, A.M.: Models for health care. In: HEDG working papers. HEDG, c/o Department of Economics, University of York, 2010Google Scholar
  39. Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22(4), 523–539 (2007)CrossRefGoogle Scholar
  40. Kreif, N., Grieve, R., Radice, R., Sadique, Z., Ramsahai, R., Sekhon, J.S.: Methods for estimating subgroup effects in cost-effectiveness analyses that use observational data. Med. Decis. Making 32(6), 750–763 (2012a). doi: 10.1177/0272989x12448929 PubMedCrossRefGoogle Scholar
  41. Kreif, N., Grieve, R., Sadique, Z.: Statistical methods for cost-effectiveness analyses that use observational data: a critical appraisal tool and review of current practice. Health Econ. 22(4), 486–500 (2012b). doi: 10.1002/hec.2806 PubMedCrossRefGoogle Scholar
  42. Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Stat. Med. 29(3), 337–346 (2010)PubMedGoogle Scholar
  43. Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23(19), 2937–2960 (2004)PubMedCrossRefGoogle Scholar
  44. Manca, A., Austin, P.C.: Using propensity score methods to analyse individual patient-level cost-effectiveness data from observational studies. (2008). Accessed 15 June 2012
  45. Manning, W.G., Basu, A., Mullahy, J.: Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ. 24(3), 465–488 (2005). doi: 10.1016/j.jhealeco.2004.09.011 PubMedCrossRefGoogle Scholar
  46. Mihaylova, B., Briggs, A., O’Hagan, A., Thompson, S.: Review of statistical methods for analysing healthcare resources and costs. Health Econ. (2010). doi: 10.1002/hec.1653 PubMedGoogle Scholar
  47. NICE: Guide to the methods of technology appraisal 2013. (2013). Accessed 10 July 2013
  48. Nixon, R., Wonderling, D., Grieve, R.: Non-parametric methods for cost-effectiveness analysis: the central limit theorem and the bootstrap compared. Health Econ. 19(3), 316–333 (2010)PubMedCrossRefGoogle Scholar
  49. Nixon, R.M., Thompson, S.G.: Methods for incorporating covariate adjustment, subgroup analysis and between-centre differences into cost-effectiveness evaluations. Health Econ. 14(12), 1217–1229 (2005)PubMedCrossRefGoogle Scholar
  50. Pearl, J.: Causal diagrams for empirical research. Biometrika 82(4), 669–688 (1995)CrossRefGoogle Scholar
  51. Petersen, M.L., Porter, K., Gruber, S., Wang, Y., Laan, M.J.V.D.: Diagnosing and responding to violations in the positivity assumption. Stat. Methods Med. Res. 21(1), 31–54 (2012)PubMedCrossRefGoogle Scholar
  52. Porter, K.E., Gruber, S., Laan, M.J.V.D., Sekhon, J.S.: The relative performance of targeted maximum likelihood estimators. Int. J. Biostat. (2011). doi: 10.2202/1557-4679 PubMedGoogle Scholar
  53. Quinn, C.: The health-economic applications of copulas: methods in applied econometric research. (2007). Accessed 10 Aug 2011
  54. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2011)Google Scholar
  55. Radice, R., Grieve, R., Ramsahai, R., Kreif, N., Sadique, Z., Sekhon, J.S.: Evaluating treatment effectiveness in patient subgroups: a comparison of propensity score methods with an automated matching approach. Int. J. Biostat. 8(1), 25 (2012)PubMedGoogle Scholar
  56. Robins, J., Rotnitzky, A., Zhao, L.P.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89, 846–866 (1994)CrossRefGoogle Scholar
  57. Robins, J., Sued, M., Lei-Gomez, Q., Rotnitzky, A.: Comment: performance of double-robust estimators when “inverse probability” weights are highly variable. Stat. Sci. 22(4), 544–559 (2007)CrossRefGoogle Scholar
  58. Robins, J.M., Rotnitzky, A., Zhao, L.P.: Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Stat. Assoc. 90(429), 106–121 (1995)CrossRefGoogle Scholar
  59. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983). doi: 10.1093/biomet/70.1.41 CrossRefGoogle Scholar
  60. Rowan, K., Welch, C., North, E., Harrison, D.: Drotrecogin alfa (activated): real-life use and outcomes for the UK. Crit. Care 12(2), R58 (2008)PubMedCrossRefGoogle Scholar
  61. Rubin, D.B.: The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics 29, 185–203 (1973)CrossRefGoogle Scholar
  62. Rubin, D.B.: The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Stat. Med. 26(1), 20–36 (2007)PubMedCrossRefGoogle Scholar
  63. Rubin, D.B.: On the limitations of comparative effectiveness research. Stat. Med. 29, 1991–1995 (2010)PubMedCrossRefGoogle Scholar
  64. Rubin, D.B., Thomas, N.: Combining propensity score matching with additional adjustments for prognostic covariates. J. Am. Stat. Assoc. 95, 573–585 (2000)CrossRefGoogle Scholar
  65. Sadique, M.Z., Grieve, R., Harrison, D., Cuthbertson, B., Rowan, K.: Is Drotrecogin alfa (activated) for adults with severe sepsis, cost-effective in routine clinical practice? Crit. Care 15(5), R228 (2011)PubMedCrossRefGoogle Scholar
  66. Sekhon, J.S.: Matching: multivariate and propensity score matching with automated balance search. J. Stat. Softw. 42(7), 1–52 (2011)Google Scholar
  67. Sekhon, J.S., Grieve, R.D.: A matching method for improving covariate balance in cost-effectiveness analyses. Health Econ. 21(6), 695–714 (2011). doi: 10.1002/hec.1748 PubMedCrossRefGoogle Scholar
  68. StataCorp: Stata Statistical Software: Release 12. StataCorp LP, College Station (2011)Google Scholar
  69. Stuart, E.A.: Matching methods for causal inference: a review and a look forward. Stat. Sci. 25(1), 1–21 (2010)PubMedCrossRefGoogle Scholar
  70. Trivedi, P.K., Zimmer, D.M.: Copula Modeling: An Introduction to Practitioners, vol. 1. Foundations and Trends in Econometrics. Now Publishing Inc., Delft (2005)Google Scholar
  71. Tunis, S.R., Benner, J., McClellan, M.: Comparative effectiveness research: policy context, methods development and research infrastructure. Stat. Med. 29(19), 1963–1976 (2010). doi: 10.1002/sim.3818 PubMedCrossRefGoogle Scholar
  72. van der Laan, M.J.: Targeted maximum likelihood based causal inference: part I. Int. J. Biostat. (2010). doi: 10.2202/1557-4679.1211 Google Scholar
  73. van der Laan, M.J., Gruber, S.: Collaborative double robust targeted maximum likelihood estimation. Int. J. Biostat. (2010). doi: 10.2202/1557-4679.1181 Google Scholar
  74. van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. (2007). doi: 10.2202/1544-6115.1309 Google Scholar
  75. Westreich, D., Cole, S.R.: Invited commentary: positivity in practice. Am. J. Epidemiol. 171(6), 674–677 (2010). doi: 10.1093/aje/kwp436 PubMedCrossRefGoogle Scholar
  76. Westreich, D., Lessler, J., Funk, M.: Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J. Clin. Epidemiol. 63(8), 826–833 (2010)PubMedCrossRefGoogle Scholar
  77. Willan, A.R., Briggs, A.H., Hoch, J.S.: Regression methods for covariate adjustment and subgroup analysis for non-censored cost-effectiveness data. Health Econ. 13(5), 461–475 (2004)PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Noémi Kreif
    • 1
    Email author
  • Richard Grieve
    • 1
  • Rosalba Radice
    • 1
  • Jasjeet S. Sekhon
    • 2
  1. 1.Department of Health Services Research and PolicyLondon School of Hygiene and Tropical MedicineLondonUK
  2. 2.Department of Political Science, and StatisticsUniversity of California BerkeleyBerkeleyUSA

Personalised recommendations