Skip to main content
Log in

Covariate-Balancing-Aware Interpretable Deep Learning Models for Treatment Effect Estimation

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Estimating treatment effects is of great importance for many biomedical applications with observational data. Particularly, interpretability of the treatment effects is preferable for many biomedical researchers. In this paper, we first provide a theoretical analysis and derive an upper bound for the bias of average treatment effect (ATE) estimation under the strong ignorability assumption. Derived by leveraging appealing properties of the weighted energy distance, our upper bound is tighter than what has been reported in the literature. Motivated by the theoretical analysis, we propose a novel objective function for estimating the ATE that uses the energy distance balancing score and hence does not require the correct specification of the propensity score model. We also leverage recently developed neural additive models to improve interpretability of deep learning models used for potential outcome prediction. We further enhance our proposed model with an energy distance balancing score weighted regularization. The superiority of our proposed model over current state-of-the-art methods is demonstrated in semi-synthetic experiments using two benchmark datasets, namely, IHDP and ACIC, as well as is examined through the study of the effect of smoking on the blood level of cadmium using NHANES.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Rosenbaum PR, Rosenbaum PB, Briskman (2010) Design of observational studies, chap 1. Springer, New York, pp 3–4

    Book  MATH  Google Scholar 

  2. Shalit U, Johansson FD, Sontag D (2017) Estimating individual treatment effect: generalization bounds and algorithms. International conference on machine learning. PMLR, pp 3076–3085

    Google Scholar 

  3. Johansson F, Shalit U, Sontag D (2016) Learning representations for counterfactual inference. International conference on machine learning. PMLR, pp 3020–3029

    Google Scholar 

  4. Louizos C, Shalit U, Mooij J, Sontag D, Zemel R, Welling M (2017) Causal effect inference with deep latent-variable models. arXiv preprint arXiv:1705.08821

  5. Alaa AM, Weisz M, Van Der Schaar M (2017) Deep counterfactual networks with propensity-dropout. arXiv preprint arXiv:1706.05966

  6. Alaa AM, van der Schaar M (2017) Bayesian inference of individualized treatment effects using multi-task gaussian processes. arXiv preprint arXiv:1704.02801

  7. Schwab P, Linhardt L, Karlen W (2018) Perfect match: a simple method for learning representations for counterfactual inference with neural networks. arXiv preprint arXiv:1810.00656

  8. Yoon J, Jordon J, Van Der Schaar M (2018) GANITE: estimation of individualized treatment effects using generative adversarial nets. In: International conference on learning representations

  9. Farrell MH, Liang T, Misra S (2018) Deep neural networks for estimation and inference: application to causal effects and other semiparametric estimands. arXiv preprint arXiv:1809.09953

  10. Shi C, Blei DM, Veitch V (2019) Adapting neural networks for the estimation of treatment effects. arXiv preprint arXiv:1906.02120

  11. Bica I, Jordon J, van der Schaar M (2020) Estimating the effects of continuous-valued interventions using generative adversarial networks. arXiv preprint arXiv:2002.12326

  12. Kaddour J, Zhu Y, Liu Q, Kusner MJ, Silva R (2021) Causal effect inference for structured treatments. Adv Neural Inf Process Syst 34:24841–24854

    Google Scholar 

  13. Yao L, Li S, Li Y, Huai M, Gao J, Zhang A (2018) Representation learning for treatment effect estimation from observational data. Adv Neural Inf Process Syst 31:2633–2643

    Google Scholar 

  14. Agarwal R, Frosst N, Zhang X, Caruana R, Hinton G.E (2020) Neural additive models: interpretable machine learning with neural nets. arXiv preprint arXiv:2004.13912

  15. Hill JL (2011) Bayesian nonparametric modeling for causal inference. J Comput Graph Stat 20(1):217–240

    Article  MathSciNet  Google Scholar 

  16. MacDorman MF, Atkinson JO (1998) Infant mortality statistics from the linked birth/infant death data set-1995 period data

  17. Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688

    Article  Google Scholar 

  18. Splawa-Neyman J, Dabrowska DM, Speed T (1990) On the application of probability theory to agricultural experiments . essay on principles. section 9. Stat Sci 10:465–472

    MathSciNet  MATH  Google Scholar 

  19. Cox D (1958) The planning of experiments. John Wiley and Sons, New York

    MATH  Google Scholar 

  20. Huling JD, Mak S (2020) Energy balancing of covariate distributions. arXiv preprint arXiv:2004.13962

  21. Cramér H (1928) On the composition of elementary errors: second paper: statistical applications. Scand Actuar J 1928(1):141–180

    Article  MATH  Google Scholar 

  22. Székely GJ (2003) E-statistics: the energy of statistical samples. Bowling Green State Univ Dep Math Stat Tech Rep 3(05):1–18

    Google Scholar 

  23. Székely GJ, Rizzo ML (2013) Energy statistics: a class of statistics based on distances. J Stat Plan Inference 143(8):1249–1272

    Article  MathSciNet  MATH  Google Scholar 

  24. Johansson FD, Shalit U, Kallus N, Sontag D (2020) Generalization bounds and representation learning for estimation of potential outcomes and causal effects. arXiv preprint arXiv:2001.07426

  25. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55

    Article  MathSciNet  MATH  Google Scholar 

  26. Du SS, Zhai X, Poczos B, Singh A (2018) Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054

  27. Bu Z, Xu S, Chen K (2021) A dynamical view on optimization algorithms of overparameterized neural networks. International conference on artificial intelligence and statistics. PMLR, pp 3187–3195

    Google Scholar 

  28. Kennedy EH (2016) Semiparametric theory and empirical processes in causal inference. In: Statistical causal inferences and their applications in public health research. pp 141–167

  29. Van der Laan MJ, Rose S (2011) Targeted learning: causal inference for observational and experimental data, vol 4. Springer, New York

    Book  Google Scholar 

  30. Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W (2017) Double/debiased/neyman machine learning of treatment effects. Am Econ Rev 107(5):261–65

    Article  MATH  Google Scholar 

  31. Chipman HA, George EI, McCulloch RE (2010) Bart: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298

    Article  MathSciNet  MATH  Google Scholar 

  32. Imai K, Ratkovic M (2014) Covariate balancing propensity score. J R Stat Soc Ser B (Stat Methodol) 76(1):243–263

    Article  MathSciNet  MATH  Google Scholar 

  33. Sharma A, Kiciman E et al (2019) DoWhy: a Python package for causal inference. https://github.com/microsoft/dowhy

  34. Athey S, Tibshirani J, Wager S (2019) Generalized random forests. Ann Stat 47(2):1148–1178

    Article  MathSciNet  MATH  Google Scholar 

  35. Curth A, van der Schaar M (2021) On inductive biases for heterogeneous treatment effect estimation. Adv Neural Inf Process Syst 34:15883–15894

    Google Scholar 

  36. Sparapani R, Spanbauer C, McCulloch R (2021) Nonparametric machine learning and efficient computation with Bayesian additive regression trees: the BART R package. J Stat Softw 97(1):1–66. https://doi.org/10.18637/jss.v097.i01

    Article  Google Scholar 

  37. Rosenbaum PR (2013) Using differential comparisons in observational studies. Chance 26(3):18–25

    Article  Google Scholar 

  38. Hartford J, Lewis G, Leyton-Brown K, Taddy M (2017) Deep iv: a flexible approach for counterfactual prediction. International Conference on Machine Learning. PMLR, pp 1414–1423

    Google Scholar 

  39. Mak S, Joseph VR (2018) Support points. Ann Stat 46(6A):2562–2592

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was partially supported by NIH RF1AG063481 grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Long.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 3127 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, K., Yin, Q. & Long, Q. Covariate-Balancing-Aware Interpretable Deep Learning Models for Treatment Effect Estimation. Stat Biosci (2023). https://doi.org/10.1007/s12561-023-09394-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12561-023-09394-6

Keywords

Navigation