Skip to main content

Explaining Model Behavior with Global Causal Analysis

  • Conference paper
  • First Online:
Explainable Artificial Intelligence (xAI 2023)


We present Global Causal Analysis (GCA) for text classification. GCA is a technique for global model-agnostic explainability drawing from well-established observational causal structure learning algorithms. GCA generates an explanatory graph from high-level human-interpretable features, revealing how these features affect each other and the black-box output. We show how these high-level features do not always have to be human-annotated, but can also be computationally inferred. Moreover, we discuss how the explanatory graph can be used for global model analysis in natural language processing (NLP): the graph shows the effect of different types of features on model behavior, whether these effects are causal effects or mere (spurious) correlations, and if and how different features interact. We then propose a three-step method for (semi-)automatically evaluating the quality, fidelity and stability of the GCA explanatory graph without requiring a ground truth. Finally, we provide a detailed GCA of a state-of-the-art NLP model, showing how setting a global one-versus-rest contrast can improve explanatory relevance, and demonstrating the utility of our three-step evaluation method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

  2. 2.

    Note that the dataset \(D'\) used for explanation does not have to be the same as the dataset D used for training, but can be e.g. the test set [5].

  3. 3.

    \(X_i {{\,\mathrm{\perp \perp }\,}}X_j\) is a short-hand for \(X_i {{\,\mathrm{\perp \perp }\,}}X_j {{\,\mathrm{\vert }\,}}\emptyset \) (i.e. \(\textbf{X} = \emptyset \)). \(X_i {{\,\mathrm{\perp \perp }\,}}\{X_m, X_n\}\) implies \(X_i {{\,\mathrm{\perp \perp }\,}}X_m\) and \(X_i {{\,\mathrm{\perp \perp }\,}}X_n\).

  4. 4.

    For an in-depth overview, we refer the interested reader to [19] and [59].

  5. 5.

    We use \(\mathcal {P}\) as a short-hand for PAG \(\mathcal {P}_{[\mathcal {\textbf{M}}]}\) and \(\mathcal {C}\) as a short-hand for CPDAG \(\mathcal {C}_{[\mathcal {\textbf{D}}]}\).

  6. 6.

    Note that the SHD was originally defined on CPDAGs, but a similar approach can be applied to other types of graphical Markov models as well.

  7. 7.

    The only restriction given to FCI is that \(\hat{Y}\) is a non-ancestor of any variable in Z, i.e. all elements in Z can cause each other and \(\hat{Y}\) but they cannot be caused by \(\hat{Y}\).

  8. 8.

    The model is finetuned for 3 epochs, with a (linear) learning rate of \(5\times 10^{-5}\), AdamW optimizer (\(\beta _1=0.9\), \(\beta _2=0.999\), \(\epsilon =1\times 10^{-8}\)), a GPU batch size of 16, with seed 42. The Python finetuning uses Transformers 4.27.4 with PyTorch 2.0.0, Datasets 2.11.0 and Tokenizers 0.13.2, and is conducted on a Tesla T4 GPU (CUDA 12.0).

  9. 9.

    The goal is not to get a well-performing model, but to explain model behavior.

  10. 10.

    We stress that the inferred fairness features here merely serve as an illustration—e.g. of indicators of protected attributes that one can study—, as the actual relevant features depend heavily on the intended application (area) of the ML model.

  11. 11.

    Note that this same class-contrastive approach to binary encode outputs [60] can also be used to apply GCA to other types of black-boxes, such as ones providing probabilistic class scores, regression analysis and clustering.

  12. 12.

    The Random Forest uses default hyperparameters for scikit-learn 1.2.2 (100 trees, Gini impurity) with seed 42.

  13. 13.

    We use \(F_1\)-score to account for non-equal distributions of predicted labels (Sect. 3.1).

  14. 14.

    That is, the outdegree of \(\hat{Y}\) for the GCA explanatory graph should always be zero.

  15. 15.

    The mean wall-time to generate the GCA explanatory graphs is 0.12s for the fairness aspect (5 features), 0.72s for the robustness aspect (6 features), 2.37s for the task aspect (13 features), and 220.11s for all aspects combined. Wall-time was measured with causal-learn 1.3.3 (no depth limit) on Python 3.9.16, on a MacBook Pro with macOS Monterey 12.6.3 (16 GB 2.3 GHz 8-Core Intel Core i9).

  16. 16.

    Source code available at


  1. Apley, D.W., Zhu, J.: Visualizing the effects of predictor variables in black box supervised learning models. J. Royal Stat. Soc. Ser. B: Stat. Methodol. 82(4), 1059–1086 (2020).

  2. Balkir, E., Kiritchenko, S., Nejadgholi, I., Fraser, K.: Challenges in applying explainability methods to improve the fairness of NLP models. In: Proceedings 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022), pp. 80–92. ACL, Seattle, U.S.A. (2022).

  3. Bastani, O., Kim, C., Bastani, H.: Interpreting Blackbox models via model extraction. CoRR abs/1705.08504 (2017)

    Google Scholar 

  4. Belinkov, Y., Glass, J.: Analysis methods in neural language processing: a survey. Trans. Assoc. Comput. Linguist. 7, 49–72 (2019).

    Article  Google Scholar 

  5. Burkart, N., Huber, M.F.: A survey on the explainability of supervised machine learning. J. Artif. Intell. Res. 70, 245–317 (2021).

    Article  MathSciNet  MATH  Google Scholar 

  6. Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019).

    Article  Google Scholar 

  7. Chattopadhyay, A., Manupriya, P., Sarkar, A., Balasubramanian, V.N.: Neural network attributions: a causal perspective. In: International Conference on Machine Learning, pp. 981–990. PMLR (2019)

    Google Scholar 

  8. Chickering, D.M.: Optimal structure identification with greedy search. J. Mach. Learn. Res. 3, 507–554 (2002)

    MathSciNet  MATH  Google Scholar 

  9. Chou, Y.L., Moreira, C., Bruza, P., Ouyang, C., Jorge, J.: Counterfactuals and causability in explainable artificial intelligence: theory, algorithms, and applications. Inf. Fusion 81, 59–83 (2022).

    Article  Google Scholar 

  10. Colombo, D., Maathuis, M.H., Kalisch, M., Richardson, T.S.: Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Stat. 40(1), 294–321 (2012).

    Article  MathSciNet  MATH  Google Scholar 

  11. Cooper, G.F., Yoo, C.: Causal discovery from a mixture of experimental and observational data. In: Proceedings of the 15th Conf. on Uncertainty in Artificial Intelligence, pp. 116–125. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)

    Google Scholar 

  12. Craven, M.W., Shavlik, J.W.: Using sampling and queries to extract rules from trained neural networks. In: Eleventh International Conference on Machine Learning (ICML), Proceedings, pp. 37–45 (1994).

  13. Craven, M.W., Shavlik, J.W.: extracting tree-structured representations of trained neural networks. In: Advances in Neural Information Processing Systems (NIPS), vol. 8, pp. 24–30 (1996)

    Google Scholar 

  14. Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., Ravi, S.: GoEmotions: a dataset of fine-grained emotions. In: 58th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4040–4054. Online (2020).

  15. Eaton, D., Murphy, K.: Exact Bayesian structure learning from uncertain interventions. In: Artificial Intelligence and Statistics, pp. 107–114. PMLR (2007)

    Google Scholar 

  16. Fisher, A., Rudin, C., Dominici, F.: All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20, 1–81 (2019)

    MathSciNet  MATH  Google Scholar 

  17. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  18. Galhotra, S., Pradhan, R., Salimi, B.: Explaining black-box algorithms using probabilistic contrastive counterfactuals. In: Proceedings of the 2021 International Conference on Management of Data, pp. 577–590. SIGMOD 2021, Association for Computing Machinery, New York, NY, USA (2021).

  19. Glymour, C., Zhang, K., Spirtes, P.: Review of causal discovery methods based on graphical models. Front. Genet. 10, 524 (2019)

    Article  Google Scholar 

  20. Goel, K., Rajani, N.F., Vig, J., Taschdjian, Z., Bansal, M., Ré, C.: Robustness gym: unifying the NLP evaluation landscape. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies: Demonstrations, pp. 42–55. ACL, Online (2021).

  21. Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Stat. 24(1), 44–65 (2015).

    Article  MathSciNet  Google Scholar 

  22. Guidotti, R.: Evaluating local explanation methods on ground truth. Artif. Intell. 291, 103428 (2021).

    Article  MathSciNet  Google Scholar 

  23. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2018).

    Article  Google Scholar 

  24. Halpern, J.Y.: A modification of the Halpern-Pearl definition of causality. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 3022–3033 (2015)

    Google Scholar 

  25. Halpern, J.Y., Pearl, J.: Causes and explanations: a structural-model approach - Part I: Causes. In: 17th Conference on Uncertainy in Artificial Intelligence, Proceedings, pp. 194–202. Morgan, San Francisco, CA (2001).

  26. Handhayani, T., Cussens, J.: Kernel-based approach for learning causal graphs from mixed data. In: Jaeger, M., Nielsen, T.D. (eds.) Proceedings of the 10th International Conference on Probabilistic Graphical Models. Proceedings of the Machine Learning Research, vol. 138, pp. 221–232. PMLR (2020).

  27. Honnibal, M., Montani, I., Van Landeghem, S., Boyd, A.: spaCy: industrial-strength natural language processing in Python (2020).

  28. Hooker, G.: Discovering additive structure in black box functions. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)

    Google Scholar 

  29. Jacovi, A., Goldberg, Y.: Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4198–4205. ACL, Online (2020).

  30. Lakkaraju, H., Kamar, E., Caruana, R., Leskovec, J.: Interpretable & explorable approximations of black box models. In: KDD 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (2017)

    Google Scholar 

  31. Lakkaraju, H., Arsov, N., Bastani, O.: Robust and stable black box explanations. In: Proceedings of the 37th International Conference on Machine Learning (ICML). (2020).

  32. Li, L., Goh, T.T., Jin, D.: How textual quality of online reviews affect classification performance: a case of deep learning sentiment analysis. Neural Comput. Appl. 32(9), 4387–4415 (2018).

    Article  Google Scholar 

  33. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019)

    Google Scholar 

  34. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, pp. 4765–4774 (2017)

    Google Scholar 

  35. Madsen, A., Reddy, S., Chandar, S.: Post-hoc interpretability for neural NLP: a survey. ACM Comput. Surv. 55(8), 1–42 (2022).

    Article  Google Scholar 

  36. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019).

    Article  MathSciNet  MATH  Google Scholar 

  37. Miller, T.: Contrastive explanation: a structural-model approach. Knowl. Eng. Rev. 36, e14 (2021).

    Article  Google Scholar 

  38. Mohammad, S.: Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 174–184. ACL, Melbourne, Australia (2018).

  39. Mohammad, S.M., Turney, P.D.: Crowdsourcing a Word-Emotion Association Lexicon. Comput. Intell. 29(3), 436–465 (2013)

    Article  MathSciNet  Google Scholar 

  40. Plutchik, R.: A general psychoevolutionary theory of emotion. In: Theories of Emotion, pp. 3–33. Elsevier (1980).

  41. Raghu, V.K., Poon, A., Benos, P.V.: Evaluation of causal structure learning methods on mixed data types. In: Le, T.D., Zhang, K., Kıcıman, E., Hyvärinen, A., Liu, L. (eds.) Proceedings of the 2018 ACM SIGKDD Workshop on Causal Disocvery. Proceedings of the Machine Learning Research, vol. 92, pp. 48–65. PMLR (2018).

  42. Ramsey, J., Glymour, M., Sanchez-Romero, R., Glymour, C.: A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images. Int. J. Data Sci. Anal. 3(2), 121–129 (2017).

    Article  Google Scholar 

  43. Ribeiro, M.T., Singh, S., Guestrin, C.: Model-agnostic interpretability of machine learning. In: 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), pp. 91–95 (2016)

    Google Scholar 

  44. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: 22nd ACM SIGKDD Intl. Conf. on Knowledge Discovery in Data Mining (KDD 2016), Proceedings, pp. 1135–1144 (2016).

  45. Ribeiro, M.T., Wu, T., Guestrin, C., Singh, S.: Beyond accuracy: behavioral testing of NLP models with CheckList. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4902–4912. ACL, Online (2020).

  46. Richardson, T., Spirtes, P.: Ancestral graph Markov models. Ann. Stat. 30(4), 962–1030 (2002).

  47. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. In: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019) (2019)

    Google Scholar 

  48. Sani, N., Malinsky, D., Shpitser, I.: Explaining the behavior of black-box prediction algorithms with causal learning. CoRR abs/2006.02482 (2020)

    Google Scholar 

  49. Sengupta, K., Maher, R., Groves, D., Olieman, C.: GenBiT: measure and mitigate gender bias in language datasets. Microsoft J. Appl. Res. 16, 63–71 (2021)

    Google Scholar 

  50. Sepehri, A., Markowitz, D.M., Mir, M.: PassivePy: a tool to automatically identify passive voice in big text data (2022).

  51. Shimizu, S., Hoyer, P.O., Hyvärinen, A., Kerminen, A.: A linear non-gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7(72), 2003–2030 (2006).

  52. Spirtes, P., Glymour, C.N., Scheines, R., Heckerman, D.: Causation, Prediction, and Search. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  53. Tan, S., Joty, S., Baxter, K., Taeihagh, A., Bennett, G.A., Kan, M.Y.: Reliability testing for natural language processing systems. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL) and the 11th International Joint Conference on Natural Language Processing, pp. 4153–4169. ACL, Online (2021).

  54. Tan, S., Caruana, R., Hooker, G., Lou, Y.: Distill-and-compare: auditing black-box models using transparent model distillation. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 303–310. AIES 2018, Association for Computing Machinery, New York, NY, USA (2018).

  55. Tian, J., Pearl, J.: Causal discovery from changes. In: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, pp. 512–521. UAI 2001, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001)

    Google Scholar 

  56. Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006).

    Article  MATH  Google Scholar 

  57. Viinikka, J., Eggeling, R., Koivisto, M.: Intersection-Validation: a method for evaluating structure learning without ground truth. In: Storkey, A., Perez-Cruz, F. (eds.) Proceedings of the 21st International Conference on Artificial Intelligence and Statistics. Proceedings of the Machine Learning Research, vol. 84, pp. 1570–1578. PMLR (2018).

  58. Vilone, G., Longo, L.: A quantitative evaluation of global, rule-based explanations of post-hoc, model agnostic methods. Front. Artif. Intell. 4, 717899 (2021).

    Article  Google Scholar 

  59. Vowels, M.J., Camgoz, N.C., Bowden, R.: D’Ya like DAGs? a survey on structure learning and causal discovery. ACM Comput. Surv. 55(4), 1–36 (2022).

    Article  Google Scholar 

  60. van der Waa, J., Robeer, M., van Diggelen, J., Neerincx, M., Brinkhuis, M.: Contrastive explanations with local Foil Trees. In: 2018 Workshop on Human Interpretability in Machine Learning (WHI) (2018)

    Google Scholar 

  61. Woodward, J.: Making Things Happen. Oxford University Press, Oxford (2004)

    Book  Google Scholar 

  62. Zhang, J.: Causal reasoning with ancestral graphs. J. Mach. Learn. Res. 9(47), 1437–1474 (2008).

  63. Zhang, J.: On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artif. Intell. 172(16), 1873–1896 (2008).

    Article  MathSciNet  MATH  Google Scholar 

  64. Zhao, Q., Hastie, T.: Causal interpretations of black-box models. J. Bus. Econ. Stat. 39(1), 272–281 (2019).

Download references


This study has been partially supported by the Dutch National Police. The authors would like to thank Elize Herrewijnen and Gizem Sogancioglu for their valuable feedback on earlier versions of this work.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marcel Robeer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Robeer, M., Bex, F., Feelders, A., Prakken, H. (2023). Explaining Model Behavior with Global Causal Analysis. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1901. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44063-2

  • Online ISBN: 978-3-031-44064-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics