Causal Inference and Natural Language Processing

Chen, Wenqing; Chu, Zhixuan

doi:10.1007/978-3-031-35051-1_9

Wenqing Chen³ &
Zhixuan Chu⁴

758 Accesses

Abstract

This chapter explores the intersection of two research fields: causal inference and natural language processing (NLP). We aim to answer two fundamental questions: (1) how can NLP aid in causal inference when working with textual data, and (2) how can causal inference theory enhance the robustness and interpretability of NLP models? We present the latest developments and challenges in each area. Firstly, we discuss the difficulties associated with performing causal inference with textual data, which stems from the unstructured and high-dimensional nature of the text. We demonstrate how NLP models can extract high-level semantic variables and how textual data can assume various roles in the causal graph based on Pearl’s causal theory. Secondly, while NLP models have achieved remarkable success across different tasks, we highlight concerns about their reliability and robustness. NLP models are prone to learning spurious correlations, which are non-causal but correlated relationships. Thirdly, we provide an extensive overview of causality-driven models for NLP, examining various methods of integrating causality, including intervention-level and counterfactual-level debiasing techniques. Finally, we explore how causal interpretations can improve the interpretability of deep neural models in NLP, enabling a more profound understanding of the models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://gluebenchmark.com/leaderboard, https://super.gluebenchmark.com/leaderboard

References

R. Aralikatte et al., Focus attention: promoting faithfulness and diversity in summarization, in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021), pp. 6078–6095
Google Scholar
M. Arjovsky et al., Invariant risk minimization (2019). arXivabs/1907.02893
Google Scholar
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, in 3rd International Conference on Learning Representations, ICLR (2015)
Google Scholar
E. Bareinboim, J. Pearl, Controlling selection bias in causal inference, in Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, PMLR. vol. 22 (2012), pp. 100–108
Google Scholar
E. Bareinboim et al., On pearl’s hierarchy and the foundations of causal inference, in Probabilistic and Causal Inference (2022)
Google Scholar
Y. Belinkov, S. Gehrmann, E. Pavlick, Interpretability and analysis in neural NLP, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts (2020), pp. 1–5
Google Scholar
Y. Belinkov, J. Glass, Analysis methods in neural language processing: a survey, Trans. Assoc. Comput. Linguist. 7, 49–72 (2019)
Article Google Scholar
R. Bommasani, C. Cardie, Intrinsic evaluation of summarization datasets, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP (2020), pp. 8075–8096
Google Scholar
W. Chen et al., De-confounded variational encoder-decoder for logical table-to-text generation, in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNN (2021), pp. 5532–5542
Google Scholar
W. Chen et al., Dependent multi-task learning with causal intervention for image captioning, in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, eds. by Z.-H. Zhou. Main Track. International Joint Conferences on Artificial Intelligence Organization (2021), pp. 2263–2270. https://doi.org/10.24963/ijcai.2021/312
W. Chen et al., Exploring logically dependent multi-task learning with causal inference, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020), pp. 2213–2225
Google Scholar
S. Choudhary, N. Chatterjee, S.K. Saha, Interpretation of black box NLP models: a survey (2022). arXiv preprint arXiv:2203.17081
Google Scholar
M. Cornia et al., Meshed-memory transformer for image captioning, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2020), pp. 10575–10584
Google Scholar
J. Devlin et al., BERT: pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT (2019), pp. 4171–4186
Google Scholar
N. Egami et al., How to make causal inferences using texts (2018). arXiv abs/1802.02163
Google Scholar
N. Egami et al., How to make causal inferences using texts. Sci. Adv. 8(42) (2022). eabg2652. https://www.science.org/doi/pdf/10.1126/sciadv.abg2652
Y. Elazar et al., Amnesic probing: behavioral explanation with amnesic counterfactuals. Trans. Assoc. Comput. Linguist. 9, 160–175 (2021)
Article Google Scholar
A. Feder et al., Causal inference in natural language processing: estimation, prediction, interpretation and beyond (2021). arXiv abs/2109.00725
Google Scholar
A. Feder et al., CausaLM: causal model explanation through counterfactual language models. Comput. Linguist. 47(2), 333–386 (2021)
Google Scholar
Y. Feng et al., Modeling fluency and faithfulness for diverse neural machine translation. Proc. AAAI Conf. Artif. Intell. 34(01), 59–66 (2020)
Google Scholar
M. Finlayson et al., Causal analysis of syntactic agreement mechanisms in neural language models, in Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021 (Association for Computational Linguistics (ACL), 2021), pp. 1828–1843
Google Scholar
C. Fong, J. Grimmer, Discovery of treatments from text corpora, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL (2016), pp. 1600–1609
Google Scholar
S. Garg et al., Counterfactual fairness in text classification through robustness, in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES (2019), pp. 219–226
Google Scholar
M. Gill, A.B. Hall, How judicial identity changes the text of legal rulings, in Political Methods: Quantitative Methods eJournal (2015)
Google Scholar
S. Gururangan et al., Annotation artifacts in natural language inference data, in 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT (2018), pp. 107–112
Google Scholar
I. Habernal et al., The argument reasoning comprehension task: identification and reconstruction of implicit warrants, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT (2018), pp. 1930–1940
Google Scholar
D. Hendrycks, K. Lee, M. Mazeika, Using pre-training can improve model robustness and uncertainty, in Proceedings of the 36th International Conference on Machine Learning, ICML, vol. 97. Proceedings of Machine Learning Research (2019), pp. 2712–2721
Google Scholar
D. Hendrycks et al., Pretrained transformers improve out-of-distribution robustness, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL (2020), pp. 2744–2751
Google Scholar
D. Hovy, S. Prabhumoye, Five sources of bias in natural language processing. Lang. Linguist. Compass 15(8), e12432 (2021)
Google Scholar
D. Hovy, A. Søgaard, Tagging performance correlates with author age, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, ACL-IJCNLP (2015), pp. 483–488
Google Scholar
G. Hripcsak et al., Causal inference from observational healthcare data: implications, impacts and innovations, in American Medical Informatics Association Annual Symposium, AMIA (2020)
Google Scholar
Z. Hu, L.E. Li, A causal lens for controllable text generation. Adv. Neural Inf. Process. Syst. 34, 24941–24955 (2021)
Google Scholar
S. Iyer, N. Dandekar, K. Csernai et al., First quora dataset release: question pairs (2017). data.quora.com
Google Scholar
Z. Ji et al., Survey of hallucination in natural language generation, in ACM Computing Surveys (2022)
Google Scholar
D. Kaushik, E.H. Hovy, Z.C. Lipton, Learning the difference that makes a difference with counterfactually-augmented data, in 8th International Conference on Learning Representations, ICLR (2020)
Google Scholar
K. Keith, D. Rice, B. O’Connor, Text as causal mediators: research design for causal estimates of differential treatment of social groups via language aspects, in Proceedings of the First Workshop on Causal Inference and NLP (2021), pp. 21–32
Google Scholar
K.A. Keith, D. Jensen, B. O’Connor. Text and causal inference: a review of using text to remove confounding from causal estimates, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL (2020), pp. 5332–5344
Google Scholar
V. Landeiro, A. Culotta, Robust text classification under confounding shift. J. Artif. Intell. Res. 63, 391–419 (2018)
Article MathSciNet Google Scholar
V. Landeiro, T. Tran, A. Culotta, Discovering and controlling for latent confounds in text classification using adversarial domain adaptation, in Proceedings of the 2019 SIAM International Conference on Data Mining, SDM (2019), pp. 298–305
Google Scholar
H. Li et al., Ensure the correctness of the summary: incorporate entailment knowledge into abstractive sentence summarization, in Proceedings of the 27th International Conference on Computational Linguistics (2018), pp. 1430–1441
Google Scholar
M. Li et al., Learning to imagine: integrating counterfactual thinking in neural discrete reasoning, in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2022), pp. 57–69
Google Scholar
A. Lin et al. One-stage deep instrumental variable method for causal inference from observational data, in 2019 IEEE International Conference on Data Mining, ICDM (2019), pp. 419–428
Google Scholar
B. Liu et al., Show, deconfound and tell: image captioning with causal inference, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 18041–18050
Google Scholar
Y. Liu et al., RoBERTa: a robustly optimized bert pretraining approach (2019). arXiv abs/1907.11692
Google Scholar
C. Louizos et al., Causal effect inference with deep latent-variable models, in Annual Conference on Neural Information Processing Systems 2017, NeurIPS (2017), pp. 6446–6456
Google Scholar
A. Madsen, S. Reddy, S. Chandar, Post-hoc interpretability for neural NLP: a survey (2021). arXiv preprint arXiv:2108.04840
Google Scholar
T. McCoy, E. Pavlick, T. Linzen, Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference, in Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL (2019), pp. 3428–3448
Google Scholar
W. Miao, Z. Geng, E.J. Tchetgen Tchetgen, Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105(4), 987–993 (2018)
Article MathSciNet MATH Google Scholar
R. Moraffah et al., Causal interpretability for machine learning-problems, methods and evaluation. ACM SIGKDD Explorations Newslett. 22(1), 18–33 (2020)
Article Google Scholar
A. Naik et al., Stress test evaluation for natural language inference, in Proceedings of the 27th International Conference on Computational Linguistics, COLING (2018), pp. 2340–2353
Google Scholar
R. allapati et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond, in Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning (2016), pp. 280–290
Google Scholar
A. Nichols, Causal inference with observational data. Stata J. 7(4), 507–541 (2007)
Article Google Scholar
T. Niven, H.-Y. Kao, Probing neural network comprehension of natural language arguments, in Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL (2019), pp. 4658–4664
Google Scholar
Y. Pan et al., X-linear attention networks for image captioning, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2020), pp. 10968–10977
Google Scholar
J. Pearl, Causality, 2nd ed. (Cambridge University Press, Cambridge, 2009)
Book MATH Google Scholar
J. Pearl, Theoretical impediments to machine learning with seven sparks from the causal revolution (2018). arXiv preprint arXiv:1801.04016
Google Scholar
J. Pearl, D. Mackenzie, The Book of Why: The New Science of Cause and Effect, 1st edn. (Basic Books, Inc., New York, 2018)
MATH Google Scholar
A. Perez-Suay, G. Camps-Valls, Causal inference in geoscience and remote sensing from observational data. IEEE Trans. Geosci. Remote. Sens. 57(3), 1502–1513 (2019)
Article Google Scholar
M. Peyrard et al., Invariant language modeling, in EMNLP 2022 (2021)
Google Scholar
R. Pryzant, Y. Chung, D. Jurafsky, Predicting sales from the language of product descriptions, in Proceedings of the SIGIR 2017 Workshop On eCommerce Co-located with the 40th International ACM SI-GIR Conference on Research and Development in Information Retrieval, eCOM@SIGIR (2017)
Google Scholar
R. Pryzant et al., Causal effects of linguistic properties, in NAACL-HLT (2021)
Google Scholar
R. Pryzant et al., Causal effects of linguistic properties, in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT (2021), pp. 4095–4109
Google Scholar
R. Pryzant et al., Deconfounded lexicon induction for interpretable social science, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT (2018), pp. 1615–1625
Google Scholar
A. Radford et al., Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
M.E. Roberts, B.M. Stewart, R.A. Nielsen, Adjusting for Confounding with Text Matching. Am. J. Polit. Sci. 64, 887–903 (2020)
Article Google Scholar
J.M. Rohrer, Thinking clearly about correlations and causation: Graphical causal models for observational data. Adv. Methods Practices Psychol. Sci. 1(1), 27–42 (2018)
Article MathSciNet Google Scholar
A. Ross, A. Marasović, M.E. Peters, Explaining NLP models via minimal contrastive editing (MiCE), in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (2021), pp. 3840–3852
Google Scholar
D.B. Rubin, Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688 (1974)
Google Scholar
C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)
Article Google Scholar
B. Schölkopf, Causality for machine learning, in Probabilistic and Causal Inference: The Works of Judea Pearl (2022), pp. 765–804
Google Scholar
B. Schölkopf et al., Toward causal representation learning. Proc. IEEE 109(5), 612–634 (2021)
Article Google Scholar
R. Shekhar et al., FOIL it! Find One mismatch between Image and Language caption, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL (2017), pp. 255–265
Google Scholar
C. Shorten, T.M. Khoshgoftaar, B. Furht, Text data augmentation for deep learning. J. Big Data 8, 1–34 (2021)
Article Google Scholar
H.A. Simon, Spurious correlation: a causal interpretation. J. Am. Statis. Assoc. 49(267), 467–479 (1954)
MATH Google Scholar
D. Sridhar, D.M. Blei, Causal inference from text: a commentary. Sci. Adv. 8(42), eade6585 (2022)
Google Scholar
D. Sridhar, L. Getoor, Estimating causal effects of tone in online debates, in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI (2019), pp. 1872–1878
Google Scholar
N. Tandon et al., WIQA: a dataset for “What if…” reasoning over procedural text, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (2019), pp. 6076–6085
Google Scholar
B. Tian et al., Debiasing NLU models via causal intervention and counterfactual reasoning. Proc. AAAI Conf. Artif. Intell. 36(10), 11376–11384 (2022)
Google Scholar
R. Tian et al., Sticking to the facts: confident decoding for faithful data-to-text generation (2019). arXiv preprint arXiv:1910.08684
Google Scholar
L. Tu et al., An empirical study on robustness to spurious correlations using pre-trained language models. Trans. Assoc. Comput. Linguist. 8, 621–633 (2020)
Article Google Scholar
V. Veitch, D. Sridhar, D.M. Blei, Adapting text embeddings for causal inference, in Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI, vol. 124. Proceedings of Machine Learning Research (2020), pp. 919–928
Google Scholar
J. Vig et al., Causal mediation analysis for interpreting neural NLP: the case of gender bias (2020). arXiv preprint arXiv:2004.12265
Google Scholar
O. Vinyals, Q.V. Le, A neural conversational model, in ICML Deep Learning Workshop (2015)
Google Scholar
E. Wallace, M. Gardner, S. Singh, Interpreting predictions of NLP models, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts (2020), pp. 20–23
Google Scholar
A. Wang et al., GLUE: a multi-task benchmark and analysis platform for natural language understanding, in 7th International Conference on Learning Representations, ICLR (2019)
Google Scholar
T. Wang et al., Visual Commonsense R-CNN, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2020), pp. 10757–10767
Google Scholar
X. Wang, H. Wang, D. Yang, Measure and improve robustness in NLP models: a survey, in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2022), pp. 4569–4586
Google Scholar
Z. Wang, A. Culotta, Identifying spurious correlations for robust text classification, in Findings of the Association for Computational Linguistics: EMNLP (2020), pp. 3431–3440
Google Scholar
Z. Wang, A. Culotta, Robustness to spurious correlations in text classification via automatically generated counterfactuals, in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI (2021), pp. 14024–14031
Google Scholar
J. Wen et al., AutoCAD: automatically generating counterfactuals for mitigating shortcut learning (2022). arXiv preprint arXiv:2211.16202
Google Scholar
T. Wu et al., Polyjuice: generating counterfactuals for explaining, evaluating, and improving models, in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNN (2021), pp. 6707–6723
Google Scholar
K. Xu et al., Show, attend and tell: neural image caption generation with visual attention, in Proceedings of the 32nd International Conference on Machine Learning, ICML, vol. 37. JMLR Workshop and Conference Proceedings (2015), pp. 2048–2057
Google Scholar
K. Xu et al., Show, attend and tell: neural image caption generation with visual attention, in International Conference on Machine Learning. PMLR (2015), pp. 2048–2057
Google Scholar
X. Yang, H. Zhang, J. Cai, Deconfounded image captioning: a causal retrospect, in IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
Google Scholar
Y. Zhang, J. Baldridge, L. He, PAWS: paraphrase adversaries from word scrambling, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT (2019), pp. 1298–1308
Google Scholar
M. Zhou et al., Progress in neural NLP: modeling, learning, and reasoning. Engineering 6(3), 275–290 (2020)
Article MathSciNet Google Scholar
F. Zhu et al., TAT-QA: a question answering benchmark on a hybrid of tabular and textual content in finance, in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021), pp. 3277–3287
Google Scholar

Download references

Author information

Authors and Affiliations

Sun Yat-sen University, Zhuhai, China
Wenqing Chen
Ant Group, Hangzhou, China
Zhixuan Chu

Authors

Wenqing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhixuan Chu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenqing Chen .

Editor information

Editors and Affiliations

University of Virginia, Charlottesville, VA, USA
Sheng Li
Ant Group, Hangzhou, China
Zhixuan Chu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, W., Chu, Z. (2023). Causal Inference and Natural Language Processing. In: Li, S., Chu, Z. (eds) Machine Learning for Causal Inference. Springer, Cham. https://doi.org/10.1007/978-3-031-35051-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-35051-1_9
Published: 09 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35050-4
Online ISBN: 978-3-031-35051-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics