Skip to main content
Log in

A data-adaptive strategy for inverse weighted estimation of causal effects

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

In most nonrandomized observational studies, differences between treatment groups may arise not only due to the treatment but also because of the effect of confounders. Therefore, causal inference regarding the treatment effect is not as straightforward as in a randomized trial. To adjust for confounding due to measured covariates, the average treatment effect is often estimated by using propensity scores. Typically, propensity scores are estimated by logistic regression. More recent suggestions have been to employ nonparametric classification algorithms from machine learning. In this article, we propose a weighted estimator combining parametric and nonparametric models. Some theoretical results regarding consistency of the procedure are given. Simulation studies are used to assess the performance of the newly proposed methods relative to existing methods, and a data analysis example from the Surveillance, Epidemiology and End Results database is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res. 9, 2015–2033 (2008)

    Google Scholar 

  • Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    Google Scholar 

  • Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  • Breiman, L., Friedman, J., Stone, C., Olshen, R.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)

    Google Scholar 

  • Brookhart, M.A., van der Laan, M.J.: A semiparametric model selection criterion with applications to the marginal structural model. Comput. Stat. Data Anal. 50(2), 475–498 (2006)

    Article  Google Scholar 

  • Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    Google Scholar 

  • Freund, Y., Schapire, R.: A desicion-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  Google Scholar 

  • Hainmueller, J.: Entropy balancing for causal effects: a multivariate reweighting method to produce balanced samples in observational studies. Political Anal. 20(1), 25–46 (2012)

    Article  Google Scholar 

  • Harder, V.S., Stuart, E.A., Anthony, J.C.: Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychol. Methods 15(3), 234–249 (2010)

    Article  PubMed Central  PubMed  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)

    Book  Google Scholar 

  • Hoeting, J., Madigan, D., Raftery, A., Volinsky, C.: Bayesian model averaging: a tutorial. Stat. Sci. 14(4), 382–401 (1999)

    Article  Google Scholar 

  • Imai, K., Ratkovic, M.: Covariate balancing propensity score. J. R. Stat. Soc.: Ser. B (Statistical Methodology) 76(1), 243–263 (2014)

    Article  Google Scholar 

  • Imai, K., Van Dyk, D.: Causal inference with general treatment regimes. J. Am. Stat. Assoc. 99(467), 854–866 (2004)

    Article  Google Scholar 

  • Kang, J.D.Y., Schafer, J.L.: Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat. Sci. 22(4), 523–539 (2007)

    Article  Google Scholar 

  • Kouassi, D.A., Singh, J.: A semiparametric approach to hazard estimation with randomly censored observations. J. Am. Stat. Assoc. 92(440), 1351–1355 (1997)

    Article  Google Scholar 

  • Lee, B.K., Lessler, J., Stuart, E.A.: Improving propensity score weighting using machine learning. Stat. Med. 29(3), 337–346 (2010)

    PubMed Central  PubMed  Google Scholar 

  • Lin, D., Psaty, B., Kronmal, R.: Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54(3), 948–963 (1998)

    Article  CAS  PubMed  Google Scholar 

  • Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23(19), 2937–2960 (2004)

    Article  PubMed  Google Scholar 

  • Mays, J.E., Birch, J.B., Starnes, B.A.: Model robust regression: combining parametric, nonparametric, and semiparametric methods. J. Nonparametric Stat. 13(2), 245–277 (2001)

    Article  Google Scholar 

  • McCaffrey, D.F., Griffin, B.A., Almirall, D., Slaughter, M.E., Ramchand, R., Burgette, L.F.: A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat. Med. 32(19), 3388–3414 (2013)

    Article  PubMed Central  PubMed  Google Scholar 

  • McCaffrey, D.F., Ridgeway, G., Morral, A.R.: Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods 9(4), 403–425 (2004)

    Article  PubMed  Google Scholar 

  • Mitra, N., Heitjan, D.F.: Sensitivity of the hazard ratio to nonignorable treatment assignment in an observational study. Stat. Med. 26(6), 1398–1414 (2007)

    Article  PubMed  Google Scholar 

  • Nottingham, Q.J., Birch, J.B.: A semiparametric approach to analysing dose-response data. Stat. Med. 19(3), 389–404 (2000)

    Article  CAS  PubMed  Google Scholar 

  • Olkin, I., Spiegelman, C.H.: A semiparametric approach to density estimation. J. Am. Stat. Assoc. 82(399), 858–865 (1987)

    Article  Google Scholar 

  • Pregibon, D.: Resistant fits for some commonly used logistic models with medical applications. Biometrics 38, 485–498 (1982)

    Article  CAS  PubMed  Google Scholar 

  • Ridgeway, G.: The state of boosting. Comput. Sci. Stat. 31, 172–181 (1999)

    Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)

    Article  Google Scholar 

  • Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688–701 (1974)

    Article  Google Scholar 

  • Setoguchi, S., Schneeweiss, S., Brookhart, M.A., Glynn, R.J., Cook, E.F.: Evaluating uses of data mining techniques in propensity score estimation: a simulation study. Pharmacoepidemiol. Drug Saf. 17(6), 546–555 (2008)

    Article  PubMed Central  PubMed  Google Scholar 

  • Shinohara, E.T., Mitra, N., Guo, M., Metz, J.M.: Radiation therapy is associated with improved survival in the adjuvant and definitive treatment of intrahepatic cholangiocarcinoma. Int. J. Radiat. Oncol. Biol. Phys. 72(5), 1495–1501 (2008)

    Article  PubMed  Google Scholar 

  • Stefanski, L.A., Boos, D.D.: The calculus of m-estimation. Am. Stat. 56(1), 29–38 (2002)

    Article  Google Scholar 

  • Tchernis, R., Horvitz-Lennon, M., Normand, S.L.T.: On the use of discrete choice models for causal inference. Stat. Med. 24(14), 2197–2212 (2005)

    Article  PubMed  Google Scholar 

  • van der Laan, M.J., Polley, E.C., Hubbard, A.E.: Super learner. Stat. Appl. Genet. Mol. Biol. 6(1), 1–21 (2007)

    Google Scholar 

  • van der Laan, M.J., Rose, S.: Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, New York (2011)

    Book  Google Scholar 

  • White, H.: Maximum likelihood estimation of misspecified models. Econometrica 50(1), 1–25 (1982)

    Article  Google Scholar 

  • Yang, Y.: Adaptive regression by mixing. J. Am. Stat. Assoc. 96(454), 574–588 (2001)

    Article  Google Scholar 

  • Yuan, Z., Ghosh, D.: Combining multiple biomarker models in logistic regression. Biometrics 64(2), 431–439 (2008)

    Article  PubMed  Google Scholar 

  • Yuan, Z., Yang, Y.: Combining linear regression models. J. Am. Stat. Assoc. 100(472), 1202–1214 (2005)

    Article  CAS  Google Scholar 

  • Zhang, T., Yu, B.: Boosting with early stopping: convergence and consistency. Ann. Stat. 33(4), 1538–1579 (2005)

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank Brian Lee for making his code available. The work of Zhu and Ghosh was supported by the National Institute on Drug Abuse Grant P50 DA010075-16 and NCI Grant CA 129102. The work of Mukherjee was supported by NSF Grant DMS-1007494 and NIH/NCI Grant CA156608. The content of this manuscript is solely the responsibility of the author(s) and does not necessarily represent the official views of the National Institute on Drug Abuse or the National Institutes of Health. Mitra would like to acknowledge Eric Shinohara, MD for making the cholangiocarcinoma data available to us.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yeying Zhu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (tex 9 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Ghosh, D., Mitra, N. et al. A data-adaptive strategy for inverse weighted estimation of causal effects. Health Serv Outcomes Res Method 14, 69–91 (2014). https://doi.org/10.1007/s10742-014-0124-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-014-0124-y

Keywords

Navigation