Abstract
Estimating treatment effects is of great importance for many biomedical applications with observational data. Particularly, interpretability of the treatment effects is preferable for many biomedical researchers. In this paper, we first provide a theoretical analysis and derive an upper bound for the bias of average treatment effect (ATE) estimation under the strong ignorability assumption. Derived by leveraging appealing properties of the weighted energy distance, our upper bound is tighter than what has been reported in the literature. Motivated by the theoretical analysis, we propose a novel objective function for estimating the ATE that uses the energy distance balancing score and hence does not require the correct specification of the propensity score model. We also leverage recently developed neural additive models to improve interpretability of deep learning models used for potential outcome prediction. We further enhance our proposed model with an energy distance balancing score weighted regularization. The superiority of our proposed model over current state-of-the-art methods is demonstrated in semi-synthetic experiments using two benchmark datasets, namely, IHDP and ACIC, as well as is examined through the study of the effect of smoking on the blood level of cadmium using NHANES.
Similar content being viewed by others
References
Rosenbaum PR, Rosenbaum PB, Briskman (2010) Design of observational studies, chap 1. Springer, New York, pp 3–4
Shalit U, Johansson FD, Sontag D (2017) Estimating individual treatment effect: generalization bounds and algorithms. International conference on machine learning. PMLR, pp 3076–3085
Johansson F, Shalit U, Sontag D (2016) Learning representations for counterfactual inference. International conference on machine learning. PMLR, pp 3020–3029
Louizos C, Shalit U, Mooij J, Sontag D, Zemel R, Welling M (2017) Causal effect inference with deep latent-variable models. arXiv preprint arXiv:1705.08821
Alaa AM, Weisz M, Van Der Schaar M (2017) Deep counterfactual networks with propensity-dropout. arXiv preprint arXiv:1706.05966
Alaa AM, van der Schaar M (2017) Bayesian inference of individualized treatment effects using multi-task gaussian processes. arXiv preprint arXiv:1704.02801
Schwab P, Linhardt L, Karlen W (2018) Perfect match: a simple method for learning representations for counterfactual inference with neural networks. arXiv preprint arXiv:1810.00656
Yoon J, Jordon J, Van Der Schaar M (2018) GANITE: estimation of individualized treatment effects using generative adversarial nets. In: International conference on learning representations
Farrell MH, Liang T, Misra S (2018) Deep neural networks for estimation and inference: application to causal effects and other semiparametric estimands. arXiv preprint arXiv:1809.09953
Shi C, Blei DM, Veitch V (2019) Adapting neural networks for the estimation of treatment effects. arXiv preprint arXiv:1906.02120
Bica I, Jordon J, van der Schaar M (2020) Estimating the effects of continuous-valued interventions using generative adversarial networks. arXiv preprint arXiv:2002.12326
Kaddour J, Zhu Y, Liu Q, Kusner MJ, Silva R (2021) Causal effect inference for structured treatments. Adv Neural Inf Process Syst 34:24841–24854
Yao L, Li S, Li Y, Huai M, Gao J, Zhang A (2018) Representation learning for treatment effect estimation from observational data. Adv Neural Inf Process Syst 31:2633–2643
Agarwal R, Frosst N, Zhang X, Caruana R, Hinton G.E (2020) Neural additive models: interpretable machine learning with neural nets. arXiv preprint arXiv:2004.13912
Hill JL (2011) Bayesian nonparametric modeling for causal inference. J Comput Graph Stat 20(1):217–240
MacDorman MF, Atkinson JO (1998) Infant mortality statistics from the linked birth/infant death data set-1995 period data
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688
Splawa-Neyman J, Dabrowska DM, Speed T (1990) On the application of probability theory to agricultural experiments . essay on principles. section 9. Stat Sci 10:465–472
Cox D (1958) The planning of experiments. John Wiley and Sons, New York
Huling JD, Mak S (2020) Energy balancing of covariate distributions. arXiv preprint arXiv:2004.13962
Cramér H (1928) On the composition of elementary errors: second paper: statistical applications. Scand Actuar J 1928(1):141–180
Székely GJ (2003) E-statistics: the energy of statistical samples. Bowling Green State Univ Dep Math Stat Tech Rep 3(05):1–18
Székely GJ, Rizzo ML (2013) Energy statistics: a class of statistics based on distances. J Stat Plan Inference 143(8):1249–1272
Johansson FD, Shalit U, Kallus N, Sontag D (2020) Generalization bounds and representation learning for estimation of potential outcomes and causal effects. arXiv preprint arXiv:2001.07426
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55
Du SS, Zhai X, Poczos B, Singh A (2018) Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054
Bu Z, Xu S, Chen K (2021) A dynamical view on optimization algorithms of overparameterized neural networks. International conference on artificial intelligence and statistics. PMLR, pp 3187–3195
Kennedy EH (2016) Semiparametric theory and empirical processes in causal inference. In: Statistical causal inferences and their applications in public health research. pp 141–167
Van der Laan MJ, Rose S (2011) Targeted learning: causal inference for observational and experimental data, vol 4. Springer, New York
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W (2017) Double/debiased/neyman machine learning of treatment effects. Am Econ Rev 107(5):261–65
Chipman HA, George EI, McCulloch RE (2010) Bart: Bayesian additive regression trees. Ann Appl Stat 4(1):266–298
Imai K, Ratkovic M (2014) Covariate balancing propensity score. J R Stat Soc Ser B (Stat Methodol) 76(1):243–263
Sharma A, Kiciman E et al (2019) DoWhy: a Python package for causal inference. https://github.com/microsoft/dowhy
Athey S, Tibshirani J, Wager S (2019) Generalized random forests. Ann Stat 47(2):1148–1178
Curth A, van der Schaar M (2021) On inductive biases for heterogeneous treatment effect estimation. Adv Neural Inf Process Syst 34:15883–15894
Sparapani R, Spanbauer C, McCulloch R (2021) Nonparametric machine learning and efficient computation with Bayesian additive regression trees: the BART R package. J Stat Softw 97(1):1–66. https://doi.org/10.18637/jss.v097.i01
Rosenbaum PR (2013) Using differential comparisons in observational studies. Chance 26(3):18–25
Hartford J, Lewis G, Leyton-Brown K, Taddy M (2017) Deep iv: a flexible approach for counterfactual prediction. International Conference on Machine Learning. PMLR, pp 1414–1423
Mak S, Joseph VR (2018) Support points. Ann Stat 46(6A):2562–2592
Acknowledgements
This research was partially supported by NIH RF1AG063481 grant.
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, K., Yin, Q. & Long, Q. Covariate-Balancing-Aware Interpretable Deep Learning Models for Treatment Effect Estimation. Stat Biosci (2023). https://doi.org/10.1007/s12561-023-09394-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12561-023-09394-6