Estimating treatment effects on healthcare costs under exogeneity: is there a ‘magic bullet’?



Methods for estimating average treatment effects (ATEs), under the assumption of no unmeasured confounders, include regression models; propensity score (PS) adjustments using stratification, weighting, or matching; and doubly robust estimators (a combination of both). Researchers continue to debate about the best estimator for outcomes such as health care cost data, as they are usually characterized by an asymmetric distribution and heterogeneous treatment effects,. Challenges in finding the right specifications for regression models are well documented in the literature. Propensity score estimators are proposed as alternatives to overcoming these challenges. Using simulations, we find that in moderate size samples (n = 5,000), balancing on PSs that are estimated from saturated specifications can balance the covariate means across treatment arms but fails to balance higher-order moments and covariances amongst covariates. Therefore, unlike regression model, even if a formal model for outcomes is not required, PS estimators can be inefficient at best and biased at worst for health care cost data. Our simulation study, designed to take a ‘proof by contradiction’ approach, proves that no one estimator can be considered the best under all data generating processes for outcomes such as costs. The inverse-propensity weighted estimator is most likely to be unbiased under alternate data generating processes but is prone to bias under misspecification of the PS model and is inefficient compared to an unbiased regression estimator. Our results show that there are no ‘magic bullets’ when it comes to estimating treatment effects in health care costs. Care should be taken before naively applying any one estimator to estimate ATEs in these data. We illustrate the performance of alternative methods in a cost dataset on breast cancer treatment.


Propensity score Non-linear regression Average treatment effect Health care costs 

JEL classification

C01 C21 I10 


  1. Abadie, A., Imbens, G.: Large sample properties of matching estimators for average treatment effects. Econometrica 74, 235–267 (2006)CrossRefGoogle Scholar
  2. Angrist, J., Hahn, J.: When to control for covariates? Panel asymptotics for estimates of treatment effects. Rev. Econ. Stat. 86, 58–72 (2004)CrossRefGoogle Scholar
  3. Austin, P.C.: A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003. Stat. Med. 27, 2037–2049 (2008)PubMedCrossRefGoogle Scholar
  4. Bang, H., Robins, J.M.: Doubly robust estimation in missing data and causal inference models. Biometrics 61, 962–972 (2005)PubMedCrossRefGoogle Scholar
  5. Bao, Y.: Predicting the use of outpatient mental health services, do modeling approaches make a difference? Inquiry 39, 168–183 (2002)PubMedGoogle Scholar
  6. Barlow, W.E., Taplin, S.H., Yoshida, C.K., Buist, D.S., Seger, D., Brown, M.: Cost comparison of mastectomy versus breast-conserving therapy for early-stage breast cancer. J. Natl Cancer Inst. 93, 447–455 (2001)PubMedCrossRefGoogle Scholar
  7. Basu, A., Rathouz, P.J.: Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6, 93–109 (2005)PubMedCrossRefGoogle Scholar
  8. Basu, A., Arondekar, B.V., Rathouz, P.: Scale of interest versus scale of estimation: comparing alternative estimators for the incremental costs of a comorbidity. Health Econ. 15(10), 1091–1107 ( 2006)Google Scholar
  9. Blough, D.K., Madden, C.W., Hornbrook, M.C.: Modeling risk using generalized linear models. J. Health Econ. 18, 153–171 (1999)PubMedCrossRefGoogle Scholar
  10. Box, G. E. P., Cox, D. R.: An analysis of transformations. J. Roy. Stat. Soc. B 26, 211–252 (1964)Google Scholar
  11. Bullano, M.F., Willey, V., Hauch, O., Wygant, G., Spyropoulos, A.C., Hoffman, L.: Longitudinal evaluation of health plan costs per venous thromboembolism or bleed event in patients with a prior venous thromboembolism event during hospitalization. J. Manag. Care Pharm. 11, 663–673 (2005)PubMedGoogle Scholar
  12. Busso, M., DiNardo, J., McCrary, J.: New evidence on the finite sample properties of propensity score matching and reweighting estimators. The Institute for the Study of Labor (IZA) Discussion Paper 3998, (2009)Google Scholar
  13. Cox, D.R.: The Planning of Experiments. Wiley, New York (1958)Google Scholar
  14. Crowder, M.: On linear and quadratic estimating functions. Biometrika 74, 591–597 (1987)CrossRefGoogle Scholar
  15. Dehajia, R.H., Wahba, S.: Casual effects in nonexperimental studies, reevaluating the evaluation of treating programs. J. Am. Stat. Assoc. 94, 1053–1062 (1999)CrossRefGoogle Scholar
  16. Dehejia, R.H.: Program evaluation as a decision problem. J. Econom. 125, 141–173 (2005)CrossRefGoogle Scholar
  17. Desch, C., Penberthy, L., Hillner, B., McDonald, M.K., Smith, T.J., Pozez, A.L., Retchin, S.M.: A sociodemographic and economic comparison of breast reconstruction, mastectomy, and conservation surgery. Surgery 125, 441–447 (1999)PubMedCrossRefGoogle Scholar
  18. Duan, N.: Smearing estimate, a nonparametric retransformation method. J. Am. Stat. Assoc. 78, 605–610 (1983)CrossRefGoogle Scholar
  19. Duan, N., Manning, W.G., Morris, C.N., Newhouse, J.P.: A comparison of alternative models for the demand for medical care. J. Bus. Econ. Stat. 1, 115–126 (1983)CrossRefGoogle Scholar
  20. Ershler, W.B., Chen, K., Reyes, E.B., Dubois, R.: Economic burden of patients with anemia in selected diseases. Value Health 8, 629–638 (2005)PubMedCrossRefGoogle Scholar
  21. Fan, J.: Design-adaptive nonparametric regression. J. Am. Stat. Assoc. 87, 998–1004 (1992)Google Scholar
  22. Fan, J.: Local linear regression smoothers and their minimax efficiency. Ann. Stat. 21, 196–216 (1993)Google Scholar
  23. Fan, J., Gijbels, I., King, M.: Local likelihood and local partial likelihood in hazard regression. Ann. Stat. 25, 1661–1690 (1997)Google Scholar
  24. Fisher, R.A.: Design of Experiments. Oliver and Boyd, Edinburgh (1935)Google Scholar
  25. Frölich, M.: Treatment evaluation: matching versus local polynomial regression. Discussion paper 2000-17, Department of Economics, University of St. Gallen (2000)Google Scholar
  26. Frölich, M.: Finite-sample properties of propensity score matching and weighting estimators. Rev. Econ. Stat. 86, 77–90 (2004)CrossRefGoogle Scholar
  27. Given, C., Bradley, C., Luca, A., Given, B., Osuch, J.R.: Observation interval for evaluating the costs of surgical interventions for older women with a new diagnosis of breast cancer. Med. Care 39, 1146–1157 (2001)PubMedCrossRefGoogle Scholar
  28. Hadley, J., Mitchell, J.M., Mandelblatt, J.: Medicare fees and small area variations in the treatment of localized breast cancer. N. Engl. J. Med. 52, 334–360 (1992)Google Scholar
  29. Hadley, J., Polsky, D., Mandelblatt, S., Mitchell, J.M., Weeks, J.W., Wang, Q., Hwang, Y.T.: OPTIONS Research Team: an exploratory instrumental variable analysis of the outcomes of localized breast cancer treatments in a medicare population. Health Econ. 12, 171–186 (2003)PubMedCrossRefGoogle Scholar
  30. Hallinen, T., Martikainen, J.A., Soini, E.J., Suominen, L., Aronkyo, T.: Direct costs of warfarin treatment among patients with atrial fibrillation in a Finnish healthcare setting. Curr. Med. Res. Opin. 22, 683–692 (2006)PubMedCrossRefGoogle Scholar
  31. Hastie, T., Loader, C.: Local regression: automatic kernel carpentry. Stat. Sci. 8(2), 120–143 (1993)Google Scholar
  32. Heckman, J.J.: Varieties of selection bias. Am. Econ. Rev. 80, 313–318 (1990)Google Scholar
  33. Heckman, J.J.: Randomization and social policy evaluation. In: Manski, C.F., Garfinkel, I. (eds.) Evaluating Welfare and Training Programs, pp. 201–230. Harvard University Press, Cambridge (1992)Google Scholar
  34. Heckman, J.J., Robb, R.: Alternative methods for evaluating the impact of interventions. In: Heckman, J., Singer, B. (eds.) Longitudinal Analysis of Labor Market Data Econometric Society Monograph No. 10, pp. 156–245. Cambridge University Press, Cambridge (1985)CrossRefGoogle Scholar
  35. Heckman, J.J., Smith, J.: Evaluating the welfare state. In: Strom, S. (ed.) Econometrics and Economic Theory in the 20th Century, the Ragnar Frisch Centennial Econometric Society Monograph Series, pp. 241–318. Cambridge University Press, Cambridge (1998)CrossRefGoogle Scholar
  36. Hirano, K., Imbens, G.W., Ridder G.: Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71, 1161–1189 (2003). Also, National Bureau of Economic Research Working Paper, t0251 (2000)Google Scholar
  37. Holland, P.: Statistics and causal inference. J. Am. Stat. Assoc. 81, 945–970 (1986)CrossRefGoogle Scholar
  38. Horvitz, D.G., Thompson, D.J.: A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47, 663–685 (1952)CrossRefGoogle Scholar
  39. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, 2nd edn. Wiley, New York (1995)Google Scholar
  40. Imbens, G.W.: Nonparametric estimation of average treatment effects under exogeneity, a review. Rev. Econ. Stat. 86, 4–29 (2004)CrossRefGoogle Scholar
  41. Imbens, G.W., Wooldridge, J.M.: Recent developments in the econometrics of program evaluation. J. Econ. Lit. 47, 5–86 (2009)CrossRefGoogle Scholar
  42. Jalan, J., Ravallion, M.: Estimating the benefit incidence of an antipoverty program by propensity score matching. J. Bus. Econ. Stat. 21, 19–30 (2003)CrossRefGoogle Scholar
  43. Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness, a comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22, 523–539 (2007). with discussionCrossRefGoogle Scholar
  44. Killian, R., Matschinger, H., Loeffler, W., Roick, C., Angermeyer, M.C.: A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption of schizophrenia treatment. J. Ment. Health Policy Econ. 5, 21–31 (2002)Google Scholar
  45. Little, R.J., Rubin, D.B.: Causal effects in clinical and epidemiological studies via potential outcomes, concepts and analytical approaches. Annu. Rev. Public Health 21, 121–145 (2000)PubMedCrossRefGoogle Scholar
  46. Lu, B., Rosenbaum, P.R.: Optimal pair matching with two control groups. J. Comput. Graph. Stat. 13, 422–434 (2004)CrossRefGoogle Scholar
  47. Lunceford, J.K., Davidian, M.: Stratification and weighting via propensity score in estimating of casual treatment effects, a comparative study. Stat. Med. 23, 2937–2960 (2004)PubMedCrossRefGoogle Scholar
  48. Manning, W.G.: The logged dependent variable, heteroscedasticity, and the retransformation problem. J. Health Econ. 17, 283–295 (1998)PubMedCrossRefGoogle Scholar
  49. Manning, W.G., Mullahy, J.: Estimating log models, to transform or not to transform? J. Health Econ. 20, 461–494 (2001)PubMedCrossRefGoogle Scholar
  50. Manning, W.G., Newhouse, J.P., Duan, N., Keeler, E.B., Leibowitz, A., Marquis, M.S.: Health insurance and the demand for medical care, evidence from a randomized experiment. Am. Econ. Rev. 77, 251–277 (1987)PubMedGoogle Scholar
  51. McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall, London (1989)Google Scholar
  52. Millimet, D.L., Tchernis, R.: On the specification of propensity scores, with applications to the analysis of trade policies. J. Bus. Econ. Stat. 27, 397–415 (2009)CrossRefGoogle Scholar
  53. Mullahy, J.: Much ado about two, reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17, 247–281 (1998)PubMedCrossRefGoogle Scholar
  54. National Institutes of Health Consensus Conference: Treatment of early-stage breast cancer. J. Am. Med. Assoc. 265, 391–396 (1991)Google Scholar
  55. Neyman, J.: Sur les applications de la thar des probabilities aux experiences Agaricales: Essay des principle (1923). [English translation of excerpts by D. Dabrowska and T. Speed, in Statistical Sciences 1990; 5:463–472.]Google Scholar
  56. Norum, J., Olsen, J., Wist, E.: Lumpectomy or mastectomy? Is breast conserving surgery too expensive? Breast Cancer Res. Treat. 45, 7–14 (1997)PubMedCrossRefGoogle Scholar
  57. Oaxaca, R.: Male–female wage differentials in urban labor markets. Int. Econ. Rev. 14, 693–709 (1973)CrossRefGoogle Scholar
  58. Polsky, D., Mandelblatt, J.S., Weeksm, J.C., Venditti, L., Hwang, Y.-T., Glick, H.A., Hadley, J., Schulman, K.A.: Economic evaluation of breast cancer treatment, considering the value of patient choice. J. Clin. Oncol. 21, 1139–1146 (2003)PubMedCrossRefGoogle Scholar
  59. Pregibon, D.: Goodness of link tests for generalized linear models. Appl. Stat. 29, 15–24 (1980)CrossRefGoogle Scholar
  60. Quandt, R.E.: A new approach to estimating switching regressions. J. Am. Stat. Assoc. 67, 306–310 (1972)CrossRefGoogle Scholar
  61. Quandt, R.E.: The Econometrics of Disequilibrium. Blackwell, Oxford (1988)Google Scholar
  62. Rias, L.A.G., Eisner, M.P., Kosary, C.I., Hankey, B.F., Miller, B.F., Clegg, L., Edwards, B.K. (eds.): SEER Cancer Statistics Review, 1973–1997. National Cancer Institute, Bethesda (2000)Google Scholar
  63. Robins, J.M., Rotnitzky, A., Zhao, L.P.: Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Stat. Assoc. 90, 106–121 (1995)CrossRefGoogle Scholar
  64. Rosenbaum, P.R.: Model-based direct adjustment. J. Am. Stat. Assoc. 82, 387–394 (1987)CrossRefGoogle Scholar
  65. Rosenbaum, P.R.: Propensity score. In: Armitage, P., Colton, T. (eds.) Encyclopedia of Biostatistics, vol. 5, pp. 3551–3555. Wiley, New York (1998)Google Scholar
  66. Rosenbaum, P.R.: Covariance adjustment in randomized experiments and observational studies. Stat. Sci. 17, 286–304 (2002)CrossRefGoogle Scholar
  67. Rosenbaum, P.R., Rubin, D.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)CrossRefGoogle Scholar
  68. Roy, A.D.: Some thoughts on the distribution of earnings. Oxf. Econ. Pap. 3, 135–146 (1951)Google Scholar
  69. Rubin, D.: The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics 29, 185–203 (1973)CrossRefGoogle Scholar
  70. Rubin, D.: Estimating causal effects of treatment in randomized and non-randomized studies. J. Educ. Psychol. 66, 688–701 (1974)CrossRefGoogle Scholar
  71. Rubin, D.: Bayesian inference for causal effects, the role of randomization. Ann. Stat. 6, 34–58 (1978)CrossRefGoogle Scholar
  72. Rubin, D.B.: Estimating causal effects from large data sets using propensity scores. Ann. Intern. Med. 127, 757–763 (1997)PubMedGoogle Scholar
  73. Rubin, D.B., Thomas, N.: Characterizing the effect of matching using linear propensity score methods with normal distributions. Biometrika 79, 297–809 (1992)CrossRefGoogle Scholar
  74. Rubin, D.B., Thomas, N.: Matching using estimated propensity scores, relating theory to practice. Biometrics 52, 249–264 (1996)PubMedCrossRefGoogle Scholar
  75. Rubin, D.B., Waterman, R.P.: Estimating the casual effects of marketing interventions using propensity score methodology. Stat. Sci. 21, 206–222 (2006)CrossRefGoogle Scholar
  76. Scharfstein, D.O., Rotnitzky, A., Robins, J.M.: Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Am. Stat. Assoc. 94, 1096–1120 (with rejoinder, 1135–1146) (1999)Google Scholar
  77. Seifert, B., Gasser, T.: Finite sample variance of local polynomials: analysis and solutions. J. Am. Stat. Assoc. 91, 267–275 (1996)CrossRefGoogle Scholar
  78. Warren, J.L., Brown, M., Fay, M.P., Schussler, N., Potosky, A.L., Riley, G.F.: Costs of treatment for elderly women with early-stage breast cancer in fee-for-service settings. J. Clin. Oncol. 20, 307–316 (2002)PubMedCrossRefGoogle Scholar
  79. Wedderburn, R.W.M.: Quasi-likelihood functions, generalized linear models, and the Gauss–Newton method. Biometrika 61, 439–447 (1974)Google Scholar
  80. Wooldridge, J.M.: Some alternatives to the Box–Cox regression model. Int. Econ. Rev. 33, 935–955 (1992)CrossRefGoogle Scholar
  81. Zhao, Z.: Using matching to estimate treatment effects: data requirements, matching metrics, and Monte Carlo evidence. Rev. Econ. Stat. 86, 91–107 (2004)CrossRefGoogle Scholar
  82. Zhao, Z.: Sensitivity of propensity score methods to the specifications. Econ. Lett. 98, 309–319 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Anirban Basu
    • 1
    • 2
  • Daniel Polsky
    • 3
  • Willard G. Manning
    • 4
  1. 1.Department of Health Services and PORPPUniversity of WashingtonSeattleUSA
  2. 2.The National Bureau of Economic Research CambridgeUSA
  3. 3.Division of General Internal MedicineUniversity of PennsylvaniaPhiladelphiaUSA
  4. 4.Harris School of Public Policy StudiesUniversity of ChicagoChicagoUSA

Personalised recommendations