Abstract
Individual treatment effect (ITE) represents the expected improvement in the outcome of taking a particular action to a particular target, and plays important roles in decision making in various domains. However, its estimation problem is difficult because intervention studies to collect information regarding the applied treatments (i.e., actions) and their outcomes are often quite expensive in terms of time and monetary costs. In this study, we consider a semi-supervised ITE estimation problem that exploits more easily-available unlabeled instances to improve the performance of ITE estimation using small labeled data. We combine two ideas from causal inference and semi-supervised learning, namely, matching and label propagation, respectively, to propose counterfactual propagation, which is the first semi-supervised ITE estimation method. Experiments using semi-real datasets demonstrate that the proposed method can successfully mitigate the data scarcity problem in ITE estimation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadie, A., Imbens, G.W.: Large sample properties of matching estimators for average treatment effects. Econometrica 74(1), 235–267 (2006)
Alvari, H., Shaabani, E., Sarkar, S., Beigi, G., Shakarian, P.: Less is more: semi-supervised causal inference for detecting pathogenic users in social media. In: Proceedings of the 2019 World Wide Web Conference (WWW), pp. 154–161 (2019)
Baiocchi, M., Cheng, J., Small, D.S.: Instrumental variable methods for causal inference. Stat. Med. 33(13), 2297–2340 (2014)
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Machine Learn. Res. 7, 2399–2434 (2006)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 153–160 (2007)
Breiman, L.: Random forests. Machine Learn. 45(1), 5–32 (2001)
Bui, T.D., Ravi, S., Ramavajjala, V.: Neural graph learning: training neural networks using graphs. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM), pp. 64–71 (2018)
Chan, D., Ge, R., Gershony, O., Hesterberg, T., Lambert, D.: Evaluating online ad campaigns in a pipeline: causal models at scale. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 7–16 (2010)
Chipman, H.A., George, E.I., McCulloch, R.E., et al.: Bart: Bayesian additive regression trees. Ann. Appl. Stat. 4(1), 266–298 (2010)
Dorie, V.: NPCI: Non-parametrics for causal inference. https://www.github.com/vdorie/npci (2016)
Du, B., Xinyao, T., Wang, Z., Zhang, L., Tao, D.: Robust graph-based semisupervised learning for noisy labeled data via maximum correntropy criterion. IEEE Trans. Cyber. 49(4), 1440–1453 (2018)
Glass, T.A., Goodman, S.N., Hernán, M.A., Samet, J.M.: Causal inference in public health. Annual Rev. Public Health 34, 61–75 (2013)
Guo, R., Li, J., Liu, H.: Learning individual causal effects from networked observational data. In: Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM), pp. 232–240 (2020)
Hill, J.L.: Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20(1), 217–240 (2011)
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Label propagation for deep semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5070–5079 (2019)
Johansson, F., Shalit, U., Sontag, D.: Learning representations for counterfactual inference. In: Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 3020–3029 (2016)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
LaLonde, R.J.: Evaluating the econometric evaluations of training programs with experimental data. The American Economic Review, pp. 604–620 (1986)
Lewis, D.: Causation. J. Philosophy 70(17), 556–567 (1974)
Li, S., Vlassis, N., Kawale, J., Fu, Y.: Matching via dimensionality reduction for estimation of treatment effects in digital marketing campaigns. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 3768–3774 (2016)
Liu, W., Wang, J., Chang, S.F.: Robust and scalable graph-based semisupervised learning. Proc. IEEE 100(9), 2624–2638 (2012)
Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23(19), 2937–2960 (2004)
Pal, A., Chakrabarti, D.: Label propagation with neural networks. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1671–1674 (2018)
Pearl, J.: Causality. Cambridge University Press (2009)
Pombo, N., Garcia, N., Bousson, K., Felizardo, V.: Machine learning approaches to automated medical decision support systems. In: Handbook of Research on Artificial Intelligence Techniques and Algorithms, pp. 183–203. IGI Global (2015)
Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD), pp. 239–248. ACM (2005)
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Rosenbaum, P.R., Rubin, D.B.: Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am. Stat. 39(1), 33–38 (1985)
Rubin, D.B.: Matching to remove bias in observational studies. Biometrics, pp. 159–183 (1973)
Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688 (1974)
Shalit, U., Johansson, F.D., Sontag, D.: Estimating individual treatment effect: generalization bounds and algorithms. In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 3076–3085. JMLR.org (2017)
Splawa-Neyman, J., Dabrowska, D.M., Speed, T.: On the application of probability theory to agricultural experiments. essay on principles. Section 9. Statistical Science, pp. 465–472 (1990)
Vahdat, A.: Toward robustness against label noise in training deep discriminative neural networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 5596–5605 (2017)
Veitch, V., Wang, Y., Blei, D.: Using embeddings to correct for unobserved confounding in networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 13769–13779 (2019)
Wager, S., Athey, S.: Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113(523), 1228–1242 (2018)
Weston, J., Ratle, F., Mobahi, H., Collobert, R.: Deep learning via semi-supervised embedding. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 639–655. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_34
Yang, Z., Cohen, W.W., Salakhutdinov, R.: Revisiting semi-supervised learning with graph embeddings. arXiv preprint arXiv:1603.08861 (2016)
Yao, L., Li, S., Li, Y., Huai, M., Gao, J., Zhang, A.: Representation learning for treatment effect estimation from observational data. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2633–2643 (2018)
Zhou, F., Li, T., Zhou, H., Zhu, H., Jieping, Y.: Graph-based semi-supervised learning with non-ignorable non-response. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 7013–7023 (2019)
Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International conference on Machine learning (ICML), pp. 912–919 (2003)
Acknowledgments
This work was partially supported by JSPS KAKENHI Grant Number 20H04244.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Harada, S., Kashima, H. (2021). Counterfactual Propagation for Semi-supervised Individual Treatment Effect Estimation. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-030-67658-2_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67657-5
Online ISBN: 978-3-030-67658-2
eBook Packages: Computer ScienceComputer Science (R0)