Skip to main content

From Black Boxes to Conversations: Incorporating XAI in a Conversational Agent

  • Conference paper
  • First Online:
Explainable Artificial Intelligence (xAI 2023)

Abstract

The goal of Explainable AI (XAI) is to design methods to provide insights into the reasoning process of black-box models, such as deep neural networks, in order to explain them to humans. Social science research states that such explanations should be conversational, similar to human-to-human explanations. In this work, we show how to incorporate XAI in a conversational agent, using a standard design for the agent comprising natural language understanding and generation components. We build upon an XAI question bank, which we extend by quality-controlled paraphrases, to understand the user’s information needs. We further systematically survey the literature for suitable explanation methods that provide the information to answer those questions, and present a comprehensive list of suggestions. Our work is the first step towards truly natural conversations about machine learning models with an explanation agent. The comprehensive list of XAI questions and the corresponding explanation methods may support other researchers in providing the necessary information to address users’ demands. To facilitate future work, we release our source code and data https://github.com/bach1292/XAGENT/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/datasets/adult/.

  2. 2.

    https://github.com/bach1292/XAGENT/.

  3. 3.

    We use the Open AI API: https://openai.com/api/.

  4. 4.

    By defining XAI methods, our goal is to distinguish between approaches that rely on models’ internal reasoning and those that only involve simple actions such as retrieving information or making predictions using the model.

  5. 5.

    Despite limitations of DICE in generating actionable counterfactual explanations [17], we include this method in our study due to its alignment with our predefined criteria and high overall quality [17, 33].

  6. 6.

    https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html.

References

  1. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052

    Article  Google Scholar 

  2. Ali, S., et al.: Explainable Artificial Intelligence (XAI): what we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 99, 101805 (2023)

    Article  Google Scholar 

  3. Amidei, J., Piwek, P., Willis, A.: The use of rating and Likert scales in Natural Language Generation human evaluation tasks: a review and some recommendations. In: INLG 2019. ACL (2019). https://doi.org/10.18653/v1/W19-8648. https://aclanthology.org/W19-8648

  4. Barredo Arrieta, A., et al.: Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012. https://www.sciencedirect.com/science/article/pii/S1566253519308103

  5. Bastani, O., Kim, C., Bastani, H.: Interpretability via model extraction. In: FAT/ML (2017)

    Google Scholar 

  6. Bobrow, D.G., Kaplan, R.M., Kay, M., Norman, D.A., Thompson, H., Winograd, T.: GUS, a frame-driven dialog system. Artif. Intell. 8(2), 155–173 (1977). https://doi.org/10.1016/0004-3702(77)90018-2. https://www.sciencedirect.com/science/article/pii/0004370277900182

  7. Brown, T., et al.: Language models are few-shot learners. In: NeurIPS, vol. 33, pp. 1877–1901 (2020)

    Google Scholar 

  8. Chen, C., Li, O., Tao, C., Barnett, A.J., Su, J., Rudin, C.: This looks like that: deep learning for interpretable image recognition. Curran Associates Inc. (2019)

    Google Scholar 

  9. Dash, S., Günlük, O., Wei, D.: Boolean decision rules via column generation. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, Red Hook, NY, USA, pp. 4660–4670. Curran Associates Inc. (2018)

    Google Scholar 

  10. Dhurandhar, A., et al.: Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, Red Hook, NY, USA, pp. 590–601. Curran Associates Inc. (2018)

    Google Scholar 

  11. Gao, J., Galley, M., Li, L., et al.: Neural approaches to conversational AI. Found. Trends Inf. Retrieval 13(2–3), 127–298 (2019)

    Article  Google Scholar 

  12. Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: EMNLP, pp. 6894–6910. ACL (2021). https://doi.org/10.18653/v1/2021.emnlp-main.552. https://aclanthology.org/2021.emnlp-main.552

  13. Gatt, A., Krahmer, E.: Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J. Artif. Intell. Res. 61, 65–170 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gebru, T., et al.: Datasheets for datasets. Commun. ACM 64(12), 86–92 (2021). https://doi.org/10.1145/3458723

    Article  Google Scholar 

  15. Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: DSAA, pp. 80–89. IEEE (2018). https://doi.org/10.1109/DSAA.2018.00018

  16. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). https://www.deeplearningbook.org

  17. Guidotti, R.: Counterfactual explanations and how to find them: literature review and benchmarking. Data Min. Knowl. Disc. 1–55 (2022). https://doi.org/10.1007/s10618-022-00831-6

  18. Guidotti, R., Monreale, A., Giannotti, F., Pedreschi, D., Ruggieri, S., Turini, F.: Factual and counterfactual explanations for black box decision making. IEEE Intell. Syst. 34(6), 14–23 (2019). https://doi.org/10.1109/MIS.2019.2957223

    Article  Google Scholar 

  19. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93:1–93:42 (2018). https://doi.org/10.1145/3236009

  20. Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall/CRC (1990)

    Google Scholar 

  21. Henelius, A., Puolamäki, K., Boström, H., Asker, L., Papapetrou, P.: A peek into the black box: exploring classifiers by randomization. Data Min. Knowl. Disc. 28(5), 1503–1529 (2014). https://doi.org/10.1007/s10618-014-0368-8

    Article  MathSciNet  Google Scholar 

  22. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 3rd edn. draft (2022)

    Google Scholar 

  23. Kuźba, M., Biecek, P.: What would you ask the machine learning model? Identification of user needs for model explanations based on human-model conversations. In: Koprinska, I., et al. (eds.) ECML PKDD 2020. CCIS, vol. 1323, pp. 447–459. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65965-3_30

    Chapter  Google Scholar 

  24. Lakkaraju, H., Slack, D., Chen, Y., Tan, C., Singh, S.: Rethinking explainability as a dialogue: a practitioner’s perspective (2022). arXiv:2202.01875

  25. Liao, Q.V., Gruen, D., Miller, S.: Questioning the AI: informing design practices for explainable AI user experiences. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–15. ACM, New York (2020). https://doi.org/10.1145/3313831.3376590

  26. Liao, Q.V., Varshney, K.R.: Human-centered explainable AI (XAI): from algorithms to user experiences (2022). arXiv:2110.10790

  27. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019). arXiv:1907.11692

  28. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS (2017)

    Google Scholar 

  29. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008). https://nlp.stanford.edu/IR-book/

  30. McKinney, S.M., et al.: International evaluation of an AI system for breast cancer screening. Nature 577(7788), 89–94 (2020)

    Article  Google Scholar 

  31. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019). https://doi.org/10.1016/j.artint.2018.07.007. https://www.sciencedirect.com/science/article/pii/S0004370218305988

  32. Mitchell, M., et al.: Model cards for model reporting. In: FAT* 2019, pp. 220–229. ACM (2019). https://doi.org/10.1145/3287560.3287596

  33. Moreira, C., Chou, Y.L., Hsieh, C., Ouyang, C., Jorge, J., Pereira, J.M.: Benchmarking counterfactual algorithms for XAI: from white box to black box (2022). https://doi.org/10.48550/arXiv.2203.02399. http://arxiv.org/abs/2203.02399. arXiv:2203.02399

  34. Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: FAT* 2020. ACM (2020). https://doi.org/10.1145/3351095.3372850

  35. Nauta, M., van Bree, R., Seifert, C.: Neural prototype trees for interpretable fine-grained image recognition. In: CVPR, pp. 14933–14943 (2021)

    Google Scholar 

  36. Nauta, M., et al.: From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput. Surv. 55(13s), 1–42 (2023). https://doi.org/10.1145/3583558

    Article  Google Scholar 

  37. Nguyen, A., Yosinski, J., Clune, J.: Multifaceted feature visualization: uncovering the different types of features learned by each neuron in deep neural networks (2016)

    Google Scholar 

  38. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  39. Rastogi, A., Zang, X., Sunkara, S., Gupta, R., Khaitan, P.: Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: AAAI, vol. 34, no. 05, pp. 8689–8696 (2020). https://doi.org/10.1609/aaai.v34i05.6394. https://ojs.aaai.org/index.php/AAAI/article/view/6394

  40. Reiter, E., Dale, R.: Building applied natural language generation systems. Nat. Lang. Eng. 3(1), 57–87 (1997)

    Article  Google Scholar 

  41. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: KDD 2016. ACM (2016). https://doi.org/10.1145/2939672.2939778

  42. Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: AAAI, vol. 32, no. 1, pp. 1527–1535 (2018). https://ojs.aaai.org/index.php/AAAI/article/view/11491

  43. Slack, D., Krishna, S., Lakkaraju, H., Singh, S.: TalkToModel: explaining machine learning models with interactive natural language conversations (2022). arXiv:2207.04154

  44. Tolomei, G., Silvestri, F., Haines, A., Lalmas, M.: Interpretable predictions of tree-based ensembles via actionable feature tweaking. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 465–474. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3098039

  45. Tomsett, R., Braines, D., Harborne, D., Preece, A., Chakraborty, S.: Interpretable to whom? A role-based model for analyzing interpretable machine learning systems. In: WHI 2018 (2018)

    Google Scholar 

  46. Van Looveren, A., Klaise, J.: Interpretable counterfactual explanations guided by prototypes. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 650–665. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_40

    Chapter  Google Scholar 

  47. Werner, C.: Explainable AI through rule-based interactive conversation. In: EDBT/ICDT Workshops (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Van Bach Nguyen .

Editor information

Editors and Affiliations

Appendices

Appendix

A GPT-3 Paraphrase Prompting

We finetune the GPT-3 model with two instances for each reference question in the initial XAI question bank (2-shot). Each instance consists of the reference question and two paraphrases of this question. Subsequently, we prompt the model to generate paraphrases with a new question (see Fig. 3 for an example). We repeat the prompt multiple times for each reference question.

Fig. 3.
figure 3

Example GPT-3 finetuning, prompt and output to generate XAI paraphrase candidates

B Phrase Annotation Details

The distribution of annotation scores varies among each question category (see Fig. 4). In general, most of the score medians are above 4, indicating the good quality of GPT-3 in generating paraphrases. However, varying interquartile ranges suggest that GPT-3 generates better paraphrases in specific categories such as How to be that or Why not, and mixed paraphrases in others, such as What if or Other.

Figure 5 depicts the average annotator score per phrase pair. Phrase pairs are ranked by their score, separately for the 310 paraphrase pairs and 59 negative pairs. Most of the paraphrase pairs that were generated by GPT-3 have a score \(\ge 4\), and thus are perceived as being similar, indicating that GPT-3 generates high-quality paraphrases in general. Conversely, most negative pairs, which were sampled from different questions, have an average score \(<4\), supporting the quality of the human annotations. However, there are a few outliers of negative pairs which are annotated with a high similarity score. This is likely caused by our choice of negative phrases, sampled at random from a different question. These pairs may not be truly negative, as one question may be more general than the other or they can be interpreted in different ways (see Table 3 for examples). Furthermore, annotators disagree on ambiguous pairs and agree on unambiguous pairs (Table 4), further supporting the good quality of the dataset.

Fig. 4.
figure 4

Annotation score distribution for each question category.

Fig. 5.
figure 5

Average human annotation score for all phrase pairs ranked by score. Negative pairs are paraphrases from different questions.

C Representation Methods

We test two different feature representation methods: classical TF-IDF weighting, and sentence embeddings. For TF-IDF weighting, we follow a standard preprocessing pipeline: We select tokens of 2 or more alphanumeric characters (punctuation is ignored and always treated as a token separator) and stem the text using the Porter Stemmer [29] to obtain our token dictionary. Maximum and minimum DF thresholds are subject to hyperparameter optimization (see full list of hyperparameters in Table 5). We embed sentences (i.e., question instances) using SimCSE [12] to obtain an alternative feature representation to TF-IDF. We employ the pretrained RoBERTa-large model [27] as base model in SimCSE.

Table 3. Example negative pairs with average score > 4
Table 4. Phrase pairs with highest agreement/disagreement between annotators (bold indicates the reference questions in the question bank)
Table 5. Hyperparameters for Grid Search, bold indicates the chosen hyperparameters. For the other hyperparameters, we use default value in scikit-learn [38].
Fig. 6.
figure 6

Confusion matrix for SimCSE + NN (Color figure online)

D Details on NLU Evaluation

Figure 6 shows the confusion matrix for SimCSE + NN’s. The blue lines separate the questions in each category (see Table 1), and the diagonal contains number of the True Positive rate for each question. This prominent diagonal reflects the high accuracy of the approaches. The squares around the diagonal are sub-confusion matrices between questions in the same group. The high number of gray color in these squares indicates that questions in the same category are harder to distinguish than questions in different category (note that the numbers on x and y axes indicate the merged labels, not IDs).

Table 6. XAI methods and selection criteria (Abbreviation: Cls = Classification, Reg = Regression, RL = Reinforcement Learning)

E XAI Method Overview

Table 6 shows the criteria, which are mentioned in Sect. 5.2 in the paper, to choose the proper XAI method for each XAI question.

F Conversation Scenarios

1.1 F.1 Random Forest Classifier on Adult Data

In this section, we show an example conversation between a prototype implementation of our proposed framework and a user on tabular data (Adult dataset (See footnote 1)) with a Random Forest (RF) classifier.

The task on this data set is to predict whether the income exceeds $50.000/year (abbreviated 50K) based on census data. We train the classifier using the sklearn library and its standard parameter settingsFootnote 6. The mean accuracy of the classifier using 3-fold cross-validation is 0.85. For explanations, we retrain the RF classifier with the same parameter settings on the full data set. The data set and the classifier are loaded at the beginning of the conversation.

Figure 1 in the main body of the paper shows a conversation with the prototype agent (X-Agent). At the beginning of the conversation, the user provides information about her features by answering retrieval questions from the agent. These questions can be generated based on DataSheets [14] of the data set. We omit this part of the conversation in Fig. 1 and show how the X-Agent reacts to several questions about the model.

The first question is the request: Give me the reason for this prediction! The natural language understanding (NLU) component matches this question to the reference question Why is this instance given this prediction? in the question bank (question 47 in Table 1). The Question-XAI method mapping (QX) selects SHAP [28] as the XAI method to provide the information for the answer. The natural language generation (NLG) component combines SHAP’s feature importance information with the predefined text “The above graph ...” to respond to the user question.

For the next question, Why is this profile predicted \(\le \)50K instead of >50K, the labels \(\le \)50K and >50K are replaced by the token <class> before matching to reference question 53 in Table 2 (main body of the paper) Why is this instance predicted P instead of Q?. The QX component identifies DICE [34] as the explanation method for this reference question, and the information is translated into natural language. In detail, DICE returns a counterfactual instance with the desired target label (>50K), yielding two features (Age and Workclass) that need to change in order to obtain the desired prediction. The NLG component extracts the relations between feature values of the original instance (Age: 39, Workclass: State-gov) and counterfactual instance (Age: 66.3, Workclass: Self-emp-inc). In comparison to the counterfactual, Age of the original instance is lower and Workclass differs. These relations are converted and rendered as text in the final answer by the NLG component.

For the final question That’s hard, how could I change only Occupation to get >50K prediction?, the words “Occupation” and “>50K” are substituted by tokens <feature> and <class> respectively. Then, the question is matched to reference question 13 (see Table 1) How should this feature change to get a different prediction?. DICE is again determined as the XAI method for providing the required information to answer this question. However, this question asks for a specific feature, i.e., constrains the search space of DICE for counterfactuals. Finally, the provided information is again translated to natural language.

Fig. 7.
figure 7

Conversation example to explain a Convolutional Neural Network on MNIST (Color figure online)

1.2 F.2 Convolutional Neural Network on MNIST

We use the MNIST data set and a pre-trained convolutional neural network [46] to showcase a conversation on an image data set (see Fig. 7). First, the NLU component matches the first question Why did you predict that? to reference question 47 Why is this instance given this prediction? (see Table 1). Then, QX maps this question to SHAP [28] as the explanation technique. SHAP highlights the important parts on the image that lead to prediction 7. The NLG component adds an explanation in form of natural language text to the information provided by SHAP (the image). For the second question How should this image change to get number 9 predicted?, number 9 is replaced by token <class>. NLU maps this processed question to reference question 12 (see Table 1). QX identifies CFProto [46] as the method to answer this question. CFProto outputs the modified image that is closer to number 9. Finally, NLG generates the explanation text along with the output of CFProto.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, V.B., Schlötterer, J., Seifert, C. (2023). From Black Boxes to Conversations: Incorporating XAI in a Conversational Agent. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1903. Springer, Cham. https://doi.org/10.1007/978-3-031-44070-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44070-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44069-4

  • Online ISBN: 978-3-031-44070-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics