Abstract
In most nonrandomized observational studies, differences between treatment groups may arise not only due to the treatment but also because of the effect of confounders. Therefore, causal inference regarding the treatment effect is not as straightforward as in a randomized trial. To adjust for confounding due to measured covariates, the average treatment effect is often estimated by using propensity scores. Typically, propensity scores are estimated by logistic regression. More recent suggestions have been to employ nonparametric classification algorithms from machine learning. In this article, we propose a weighted estimator combining parametric and nonparametric models. Some theoretical results regarding consistency of the procedure are given. Simulation studies are used to assess the performance of the newly proposed methods relative to existing methods, and a data analysis example from the Surveillance, Epidemiology and End Results database is presented.
Similar content being viewed by others
References
Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res. 9, 2015–2033 (2008)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)
Brookhart, M.A., van der Laan, M.J.: A semiparametric model selection criterion with applications to the marginal structural model. Comput. Stat. Data Anal. 50(2), 475–498 (2006)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Freund, Y., Schapire, R.: A desicion-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Hainmueller, J.: Entropy balancing for causal effects: a multivariate reweighting method to produce balanced samples in observational studies. Political Anal. 20(1), 25–46 (2012)
Harder, V.S., Stuart, E.A., Anthony, J.C.: Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychol. Methods 15(3), 234–249 (2010)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)
Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian model averaging: a tutorial. Stat. Sci. 14(4), 382–401 (1999)
Imai, K., Ratkovic, M.: Covariate balancing propensity score. J. R. Stat. Soc.: Ser. B (Statistical Methodology) 76(1), 243–263 (2014)
Imai, K., Van Dyk, D.: Causal inference with general treatment regimes. J. Am. Stat. Assoc. 99(467), 854–866 (2004)
Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22(4), 523–539 (2007)
Kouassi, D.A., Singh, J.: A semiparametric approach to hazard estimation with randomly censored observations. J. Am. Stat. Assoc. 92(440), 1351–1355 (1997)
Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Stat. Med. 29(3), 337–346 (2010)
Lin, D., Psaty, B., Kronmal, R.: Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54(3), 948–963 (1998)
Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23(19), 2937–2960 (2004)
Mays, J.E., Birch, J.B., Starnes, B.A.: Model robust regression: combining parametric, nonparametric, and semiparametric methods. J. Nonparametric Stat. 13(2), 245–277 (2001)
McCaffrey, D.F., Griffin, B.A., Almirall, D., Slaughter, M.E., Ramchand, R., Burgette, L.F.: A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat. Med. 32(19), 3388–3414 (2013)
McCaffrey, D.F., Ridgeway, G., Morral, A.R.: Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods 9(4), 403–425 (2004)
Mitra, N., Heitjan, D.F.: Sensitivity of the hazard ratio to nonignorable treatment assignment in an observational study. Stat. Med. 26(6), 1398–1414 (2007)
Nottingham, Q.J., Birch, J.B.: A semiparametric approach to analysing dose-response data. Stat. Med. 19(3), 389–404 (2000)
Olkin, I., Spiegelman, C.H.: A semiparametric approach to density estimation. J. Am. Stat. Assoc. 82(399), 858–865 (1987)
Pregibon, D.: Resistant fits for some commonly used logistic models with medical applications. Biometrics 38, 485–498 (1982)
Ridgeway, G.: The state of boosting. Comput. Sci. Stat. 31, 172–181 (1999)
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688–701 (1974)
Setoguchi, S., Schneeweiss, S., Brookhart, M.A., Glynn, R.J., Cook, E.F.: Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol. Drug Saf. 17(6), 546–555 (2008)
Shinohara, E.T., Mitra, N., Guo, M., Metz, J.M.: Radiation therapy is associated with improved survival in the adjuvant and definitive treatment of intrahepatic cholangiocarcinoma. Int. J. Radiat. Oncol. Biol. Phys. 72(5), 1495–1501 (2008)
Stefanski, L.A., Boos, D.D.: The calculus of m-estimation. Am. Stat. 56(1), 29–38 (2002)
Tchernis, R., Horvitz-Lennon, M., Normand, S.L.T.: On the use of discrete choice models for causal inference. Stat. Med. 24(14), 2197–2212 (2005)
van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. 6(1), 1–21 (2007)
van der Laan, M.J., Rose, S.: Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, New York (2011)
White, H.: Maximum likelihood estimation of misspecified models. Econometrica 50(1), 1–25 (1982)
Yang, Y.: Adaptive regression by mixing. J. Am. Stat. Assoc. 96(454), 574–588 (2001)
Yuan, Z., Ghosh, D.: Combining multiple biomarker models in logistic regression. Biometrics 64(2), 431–439 (2008)
Yuan, Z., Yang, Y.: Combining linear regression models. J. Am. Stat. Assoc. 100(472), 1202–1214 (2005)
Zhang, T., Yu, B.: Boosting with early stopping: convergence and consistency. Ann. Stat. 33(4), 1538–1579 (2005)
Acknowledgments
The authors thank Brian Lee for making his code available. The work of Zhu and Ghosh was supported by the National Institute on Drug Abuse Grant P50 DA010075-16 and NCI Grant CA 129102. The work of Mukherjee was supported by NSF Grant DMS-1007494 and NIH/NCI Grant CA156608. The content of this manuscript is solely the responsibility of the author(s) and does not necessarily represent the official views of the National Institute on Drug Abuse or the National Institutes of Health. Mitra would like to acknowledge Eric Shinohara, MD for making the cholangiocarcinoma data available to us.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhu, Y., Ghosh, D., Mitra, N. et al. A data-adaptive strategy for inverse weighted estimation of causal effects. Health Serv Outcomes Res Method 14, 69–91 (2014). https://doi.org/10.1007/s10742-014-0124-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-014-0124-y