Skip to main content

Counterfactual Propagation for Semi-supervised Individual Treatment Effect Estimation

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12457))

Abstract

Individual treatment effect (ITE) represents the expected improvement in the outcome of taking a particular action to a particular target, and plays important roles in decision making in various domains. However, its estimation problem is difficult because intervention studies to collect information regarding the applied treatments (i.e., actions) and their outcomes are often quite expensive in terms of time and monetary costs. In this study, we consider a semi-supervised ITE estimation problem that exploits more easily-available unlabeled instances to improve the performance of ITE estimation using small labeled data. We combine two ideas from causal inference and semi-supervised learning, namely, matching and label propagation, respectively, to propose counterfactual propagation, which is the first semi-supervised ITE estimation method. Experiments using semi-real datasets demonstrate that the proposed method can successfully mitigate the data scarcity problem in ITE estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/SH1108/CounterfactualPropagation.

  2. 2.

    https://archive.ics.uci.edu/ml/datasets/Bag+of+Words.

References

  1. Abadie, A., Imbens, G.W.: Large sample properties of matching estimators for average treatment effects. Econometrica 74(1), 235–267 (2006)

    Article  MathSciNet  Google Scholar 

  2. Alvari, H., Shaabani, E., Sarkar, S., Beigi, G., Shakarian, P.: Less is more: semi-supervised causal inference for detecting pathogenic users in social media. In: Proceedings of the 2019 World Wide Web Conference (WWW), pp. 154–161 (2019)

    Google Scholar 

  3. Baiocchi, M., Cheng, J., Small, D.S.: Instrumental variable methods for causal inference. Stat. Med. 33(13), 2297–2340 (2014)

    Article  MathSciNet  Google Scholar 

  4. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Machine Learn. Res. 7, 2399–2434 (2006)

    Google Scholar 

  5. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 153–160 (2007)

    Google Scholar 

  6. Breiman, L.: Random forests. Machine Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  7. Bui, T.D., Ravi, S., Ramavajjala, V.: Neural graph learning: training neural networks using graphs. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM), pp. 64–71 (2018)

    Google Scholar 

  8. Chan, D., Ge, R., Gershony, O., Hesterberg, T., Lambert, D.: Evaluating online ad campaigns in a pipeline: causal models at scale. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 7–16 (2010)

    Google Scholar 

  9. Chipman, H.A., George, E.I., McCulloch, R.E., et al.: Bart: Bayesian additive regression trees. Ann. Appl. Stat. 4(1), 266–298 (2010)

    Article  MathSciNet  Google Scholar 

  10. Dorie, V.: NPCI: Non-parametrics for causal inference. https://www.github.com/vdorie/npci (2016)

  11. Du, B., Xinyao, T., Wang, Z., Zhang, L., Tao, D.: Robust graph-based semisupervised learning for noisy labeled data via maximum correntropy criterion. IEEE Trans. Cyber. 49(4), 1440–1453 (2018)

    Article  Google Scholar 

  12. Glass, T.A., Goodman, S.N., Hernán, M.A., Samet, J.M.: Causal inference in public health. Annual Rev. Public Health 34, 61–75 (2013)

    Article  Google Scholar 

  13. Guo, R., Li, J., Liu, H.: Learning individual causal effects from networked observational data. In: Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM), pp. 232–240 (2020)

    Google Scholar 

  14. Hill, J.L.: Bayesian nonparametric modeling for causal inference. J. Comput. Graph. Stat. 20(1), 217–240 (2011)

    Article  MathSciNet  Google Scholar 

  15. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  16. Iscen, A., Tolias, G., Avrithis, Y., Chum, O.: Label propagation for deep semi-supervised learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5070–5079 (2019)

    Google Scholar 

  17. Johansson, F., Shalit, U., Sontag, D.: Learning representations for counterfactual inference. In: Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 3020–3029 (2016)

    Google Scholar 

  18. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  19. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  20. LaLonde, R.J.: Evaluating the econometric evaluations of training programs with experimental data. The American Economic Review, pp. 604–620 (1986)

    Google Scholar 

  21. Lewis, D.: Causation. J. Philosophy 70(17), 556–567 (1974)

    Article  Google Scholar 

  22. Li, S., Vlassis, N., Kawale, J., Fu, Y.: Matching via dimensionality reduction for estimation of treatment effects in digital marketing campaigns. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), pp. 3768–3774 (2016)

    Google Scholar 

  23. Liu, W., Wang, J., Chang, S.F.: Robust and scalable graph-based semisupervised learning. Proc. IEEE 100(9), 2624–2638 (2012)

    Article  Google Scholar 

  24. Lunceford, J.K., Davidian, M.: Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat. Med. 23(19), 2937–2960 (2004)

    Article  Google Scholar 

  25. Pal, A., Chakrabarti, D.: Label propagation with neural networks. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1671–1674 (2018)

    Google Scholar 

  26. Pearl, J.: Causality. Cambridge University Press (2009)

    Google Scholar 

  27. Pombo, N., Garcia, N., Bousson, K., Felizardo, V.: Machine learning approaches to automated medical decision support systems. In: Handbook of Research on Artificial Intelligence Techniques and Algorithms, pp. 183–203. IGI Global (2015)

    Google Scholar 

  28. Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD), pp. 239–248. ACM (2005)

    Google Scholar 

  29. Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)

    Article  MathSciNet  Google Scholar 

  30. Rosenbaum, P.R., Rubin, D.B.: Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am. Stat. 39(1), 33–38 (1985)

    Google Scholar 

  31. Rubin, D.B.: Matching to remove bias in observational studies. Biometrics, pp. 159–183 (1973)

    Google Scholar 

  32. Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688 (1974)

    Article  Google Scholar 

  33. Shalit, U., Johansson, F.D., Sontag, D.: Estimating individual treatment effect: generalization bounds and algorithms. In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 3076–3085. JMLR.org (2017)

    Google Scholar 

  34. Splawa-Neyman, J., Dabrowska, D.M., Speed, T.: On the application of probability theory to agricultural experiments. essay on principles. Section 9. Statistical Science, pp. 465–472 (1990)

    Google Scholar 

  35. Vahdat, A.: Toward robustness against label noise in training deep discriminative neural networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 5596–5605 (2017)

    Google Scholar 

  36. Veitch, V., Wang, Y., Blei, D.: Using embeddings to correct for unobserved confounding in networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 13769–13779 (2019)

    Google Scholar 

  37. Wager, S., Athey, S.: Estimation and inference of heterogeneous treatment effects using random forests. J. Am. Stat. Assoc. 113(523), 1228–1242 (2018)

    Article  MathSciNet  Google Scholar 

  38. Weston, J., Ratle, F., Mobahi, H., Collobert, R.: Deep learning via semi-supervised embedding. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 639–655. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_34

    Chapter  Google Scholar 

  39. Yang, Z., Cohen, W.W., Salakhutdinov, R.: Revisiting semi-supervised learning with graph embeddings. arXiv preprint arXiv:1603.08861 (2016)

  40. Yao, L., Li, S., Li, Y., Huai, M., Gao, J., Zhang, A.: Representation learning for treatment effect estimation from observational data. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 2633–2643 (2018)

    Google Scholar 

  41. Zhou, F., Li, T., Zhou, H., Zhu, H., Jieping, Y.: Graph-based semi-supervised learning with non-ignorable non-response. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 7013–7023 (2019)

    Google Scholar 

  42. Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International conference on Machine learning (ICML), pp. 912–919 (2003)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by JSPS KAKENHI Grant Number 20H04244.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shonosuke Harada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Harada, S., Kashima, H. (2021). Counterfactual Propagation for Semi-supervised Individual Treatment Effect Estimation. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12457. Springer, Cham. https://doi.org/10.1007/978-3-030-67658-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67658-2_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67657-5

  • Online ISBN: 978-3-030-67658-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics