From Black Boxes to Conversations: Incorporating XAI in a Conversational Agent

Nguyen, Van Bach; Schlötterer, Jörg; Seifert, Christin

doi:10.1007/978-3-031-44070-0_4

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1903))

Included in the following conference series:

World Conference on Explainable Artificial Intelligence

700 Accesses

Abstract

The goal of Explainable AI (XAI) is to design methods to provide insights into the reasoning process of black-box models, such as deep neural networks, in order to explain them to humans. Social science research states that such explanations should be conversational, similar to human-to-human explanations. In this work, we show how to incorporate XAI in a conversational agent, using a standard design for the agent comprising natural language understanding and generation components. We build upon an XAI question bank, which we extend by quality-controlled paraphrases, to understand the user’s information needs. We further systematically survey the literature for suitable explanation methods that provide the information to answer those questions, and present a comprehensive list of suggestions. Our work is the first step towards truly natural conversations about machine learning models with an explanation agent. The comprehensive list of XAI questions and the corresponding explanation methods may support other researchers in providing the necessary information to address users’ demands. To facilitate future work, we release our source code and data https://github.com/bach1292/XAGENT/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

What is Missing in XAI So Far?

Article Open access 07 December 2022

Explanation via Machine Arguing

Explaining machine learning models with interactive natural language conversations using TalkToModel

Article Open access 27 July 2023

Notes

1.
https://archive.ics.uci.edu/ml/datasets/adult/.
2.
https://github.com/bach1292/XAGENT/.
3.
We use the Open AI API: https://openai.com/api/.
4.
By defining XAI methods, our goal is to distinguish between approaches that rely on models’ internal reasoning and those that only involve simple actions such as retrieving information or making predictions using the model.
5.
Despite limitations of DICE in generating actionable counterfactual explanations [17], we include this method in our study due to its alignment with our predefined criteria and high overall quality [17, 33].
6.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html.

References

Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Article Google Scholar
Ali, S., et al.: Explainable Artificial Intelligence (XAI): what we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 99, 101805 (2023)
Article Google Scholar
Amidei, J., Piwek, P., Willis, A.: The use of rating and Likert scales in Natural Language Generation human evaluation tasks: a review and some recommendations. In: INLG 2019. ACL (2019). https://doi.org/10.18653/v1/W19-8648. https://aclanthology.org/W19-8648
Barredo Arrieta, A., et al.: Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012. https://www.sciencedirect.com/science/article/pii/S1566253519308103
Bastani, O., Kim, C., Bastani, H.: Interpretability via model extraction. In: FAT/ML (2017)
Google Scholar
Bobrow, D.G., Kaplan, R.M., Kay, M., Norman, D.A., Thompson, H., Winograd, T.: GUS, a frame-driven dialog system. Artif. Intell. 8(2), 155–173 (1977). https://doi.org/10.1016/0004-3702(77)90018-2. https://www.sciencedirect.com/science/article/pii/0004370277900182
Brown, T., et al.: Language models are few-shot learners. In: NeurIPS, vol. 33, pp. 1877–1901 (2020)
Google Scholar
Chen, C., Li, O., Tao, C., Barnett, A.J., Su, J., Rudin, C.: This looks like that: deep learning for interpretable image recognition. Curran Associates Inc. (2019)
Google Scholar
Dash, S., Günlük, O., Wei, D.: Boolean decision rules via column generation. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, Red Hook, NY, USA, pp. 4660–4670. Curran Associates Inc. (2018)
Google Scholar
Dhurandhar, A., et al.: Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS 2018, Red Hook, NY, USA, pp. 590–601. Curran Associates Inc. (2018)
Google Scholar
Gao, J., Galley, M., Li, L., et al.: Neural approaches to conversational AI. Found. Trends Inf. Retrieval 13(2–3), 127–298 (2019)
Article Google Scholar
Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: EMNLP, pp. 6894–6910. ACL (2021). https://doi.org/10.18653/v1/2021.emnlp-main.552. https://aclanthology.org/2021.emnlp-main.552
Gatt, A., Krahmer, E.: Survey of the state of the art in natural language generation: core tasks, applications and evaluation. J. Artif. Intell. Res. 61, 65–170 (2018)
Article MathSciNet MATH Google Scholar
Gebru, T., et al.: Datasheets for datasets. Commun. ACM 64(12), 86–92 (2021). https://doi.org/10.1145/3458723
Article Google Scholar
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: DSAA, pp. 80–89. IEEE (2018). https://doi.org/10.1109/DSAA.2018.00018
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). https://www.deeplearningbook.org
Guidotti, R.: Counterfactual explanations and how to find them: literature review and benchmarking. Data Min. Knowl. Disc. 1–55 (2022). https://doi.org/10.1007/s10618-022-00831-6
Guidotti, R., Monreale, A., Giannotti, F., Pedreschi, D., Ruggieri, S., Turini, F.: Factual and counterfactual explanations for black box decision making. IEEE Intell. Syst. 34(6), 14–23 (2019). https://doi.org/10.1109/MIS.2019.2957223
Article Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93:1–93:42 (2018). https://doi.org/10.1145/3236009
Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall/CRC (1990)
Google Scholar
Henelius, A., Puolamäki, K., Boström, H., Asker, L., Papapetrou, P.: A peek into the black box: exploring classifiers by randomization. Data Min. Knowl. Disc. 28(5), 1503–1529 (2014). https://doi.org/10.1007/s10618-014-0368-8
Article MathSciNet Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing, 3rd edn. draft (2022)
Google Scholar
Kuźba, M., Biecek, P.: What would you ask the machine learning model? Identification of user needs for model explanations based on human-model conversations. In: Koprinska, I., et al. (eds.) ECML PKDD 2020. CCIS, vol. 1323, pp. 447–459. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65965-3_30
Chapter Google Scholar
Lakkaraju, H., Slack, D., Chen, Y., Tan, C., Singh, S.: Rethinking explainability as a dialogue: a practitioner’s perspective (2022). arXiv:2202.01875
Liao, Q.V., Gruen, D., Miller, S.: Questioning the AI: informing design practices for explainable AI user experiences. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–15. ACM, New York (2020). https://doi.org/10.1145/3313831.3376590
Liao, Q.V., Varshney, K.R.: Human-centered explainable AI (XAI): from algorithms to user experiences (2022). arXiv:2110.10790
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019). arXiv:1907.11692
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: NeurIPS (2017)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008). https://nlp.stanford.edu/IR-book/
McKinney, S.M., et al.: International evaluation of an AI system for breast cancer screening. Nature 577(7788), 89–94 (2020)
Article Google Scholar
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019). https://doi.org/10.1016/j.artint.2018.07.007. https://www.sciencedirect.com/science/article/pii/S0004370218305988
Mitchell, M., et al.: Model cards for model reporting. In: FAT* 2019, pp. 220–229. ACM (2019). https://doi.org/10.1145/3287560.3287596
Moreira, C., Chou, Y.L., Hsieh, C., Ouyang, C., Jorge, J., Pereira, J.M.: Benchmarking counterfactual algorithms for XAI: from white box to black box (2022). https://doi.org/10.48550/arXiv.2203.02399. http://arxiv.org/abs/2203.02399. arXiv:2203.02399
Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: FAT* 2020. ACM (2020). https://doi.org/10.1145/3351095.3372850
Nauta, M., van Bree, R., Seifert, C.: Neural prototype trees for interpretable fine-grained image recognition. In: CVPR, pp. 14933–14943 (2021)
Google Scholar
Nauta, M., et al.: From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput. Surv. 55(13s), 1–42 (2023). https://doi.org/10.1145/3583558
Article Google Scholar
Nguyen, A., Yosinski, J., Clune, J.: Multifaceted feature visualization: uncovering the different types of features learned by each neuron in deep neural networks (2016)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Rastogi, A., Zang, X., Sunkara, S., Gupta, R., Khaitan, P.: Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: AAAI, vol. 34, no. 05, pp. 8689–8696 (2020). https://doi.org/10.1609/aaai.v34i05.6394. https://ojs.aaai.org/index.php/AAAI/article/view/6394
Reiter, E., Dale, R.: Building applied natural language generation systems. Nat. Lang. Eng. 3(1), 57–87 (1997)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: KDD 2016. ACM (2016). https://doi.org/10.1145/2939672.2939778
Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: AAAI, vol. 32, no. 1, pp. 1527–1535 (2018). https://ojs.aaai.org/index.php/AAAI/article/view/11491
Slack, D., Krishna, S., Lakkaraju, H., Singh, S.: TalkToModel: explaining machine learning models with interactive natural language conversations (2022). arXiv:2207.04154
Tolomei, G., Silvestri, F., Haines, A., Lalmas, M.: Interpretable predictions of tree-based ensembles via actionable feature tweaking. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 465–474. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3098039
Tomsett, R., Braines, D., Harborne, D., Preece, A., Chakraborty, S.: Interpretable to whom? A role-based model for analyzing interpretable machine learning systems. In: WHI 2018 (2018)
Google Scholar
Van Looveren, A., Klaise, J.: Interpretable counterfactual explanations guided by prototypes. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) ECML PKDD 2021. LNCS (LNAI), vol. 12976, pp. 650–665. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86520-7_40
Chapter Google Scholar
Werner, C.: Explainable AI through rule-based interactive conversation. In: EDBT/ICDT Workshops (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Marburg, Marburg, Germany
Van Bach Nguyen, Jörg Schlötterer & Christin Seifert
University of Duisburg-Essen, Duisburg, Germany
Van Bach Nguyen, Jörg Schlötterer & Christin Seifert
University of Mannheim, Mannheim, Germany
Jörg Schlötterer

Authors

Van Bach Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Schlötterer
View author publications
You can also search for this author in PubMed Google Scholar
Christin Seifert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Van Bach Nguyen .

Editor information

Editors and Affiliations

Technological University Dublin, Dublin, Ireland
Luca Longo

Appendices

Appendix

A GPT-3 Paraphrase Prompting

We finetune the GPT-3 model with two instances for each reference question in the initial XAI question bank (2-shot). Each instance consists of the reference question and two paraphrases of this question. Subsequently, we prompt the model to generate paraphrases with a new question (see Fig. 3 for an example). We repeat the prompt multiple times for each reference question.

B Phrase Annotation Details

The distribution of annotation scores varies among each question category (see Fig. 4). In general, most of the score medians are above 4, indicating the good quality of GPT-3 in generating paraphrases. However, varying interquartile ranges suggest that GPT-3 generates better paraphrases in specific categories such as How to be that or Why not, and mixed paraphrases in others, such as What if or Other.

Figure 5 depicts the average annotator score per phrase pair. Phrase pairs are ranked by their score, separately for the 310 paraphrase pairs and 59 negative pairs. Most of the paraphrase pairs that were generated by GPT-3 have a score $\ge 4$, and thus are perceived as being similar, indicating that GPT-3 generates high-quality paraphrases in general. Conversely, most negative pairs, which were sampled from different questions, have an average score $<4$, supporting the quality of the human annotations. However, there are a few outliers of negative pairs which are annotated with a high similarity score. This is likely caused by our choice of negative phrases, sampled at random from a different question. These pairs may not be truly negative, as one question may be more general than the other or they can be interpreted in different ways (see Table 3 for examples). Furthermore, annotators disagree on ambiguous pairs and agree on unambiguous pairs (Table 4), further supporting the good quality of the dataset.

C Representation Methods

We test two different feature representation methods: classical TF-IDF weighting, and sentence embeddings. For TF-IDF weighting, we follow a standard preprocessing pipeline: We select tokens of 2 or more alphanumeric characters (punctuation is ignored and always treated as a token separator) and stem the text using the Porter Stemmer [29] to obtain our token dictionary. Maximum and minimum DF thresholds are subject to hyperparameter optimization (see full list of hyperparameters in Table 5). We embed sentences (i.e., question instances) using SimCSE [12] to obtain an alternative feature representation to TF-IDF. We employ the pretrained RoBERTa-large model [27] as base model in SimCSE.

Table 3. Example negative pairs with average score > 4

Full size table

Table 4. Phrase pairs with highest agreement/disagreement between annotators (bold indicates the reference questions in the question bank)

Full size table

Table 5. Hyperparameters for Grid Search, bold indicates the chosen hyperparameters. For the other hyperparameters, we use default value in scikit-learn [38].

Full size table

D Details on NLU Evaluation

Figure 6 shows the confusion matrix for SimCSE + NN’s. The blue lines separate the questions in each category (see Table 1), and the diagonal contains number of the True Positive rate for each question. This prominent diagonal reflects the high accuracy of the approaches. The squares around the diagonal are sub-confusion matrices between questions in the same group. The high number of gray color in these squares indicates that questions in the same category are harder to distinguish than questions in different category (note that the numbers on x and y axes indicate the merged labels, not IDs).

Table 6. XAI methods and selection criteria (Abbreviation: Cls = Classification, Reg = Regression, RL = Reinforcement Learning)

Full size table

E XAI Method Overview

Table 6 shows the criteria, which are mentioned in Sect. 5.2 in the paper, to choose the proper XAI method for each XAI question.

F Conversation Scenarios

1.1 F.1 Random Forest Classifier on Adult Data

In this section, we show an example conversation between a prototype implementation of our proposed framework and a user on tabular data (Adult dataset (See footnote 1)) with a Random Forest (RF) classifier.

The task on this data set is to predict whether the income exceeds $50.000/year (abbreviated 50K) based on census data. We train the classifier using the sklearn library and its standard parameter settings^{Footnote 6}. The mean accuracy of the classifier using 3-fold cross-validation is 0.85. For explanations, we retrain the RF classifier with the same parameter settings on the full data set. The data set and the classifier are loaded at the beginning of the conversation.

Figure 1 in the main body of the paper shows a conversation with the prototype agent (X-Agent). At the beginning of the conversation, the user provides information about her features by answering retrieval questions from the agent. These questions can be generated based on DataSheets [14] of the data set. We omit this part of the conversation in Fig. 1 and show how the X-Agent reacts to several questions about the model.

The first question is the request: Give me the reason for this prediction! The natural language understanding (NLU) component matches this question to the reference question Why is this instance given this prediction? in the question bank (question 47 in Table 1). The Question-XAI method mapping (QX) selects SHAP [28] as the XAI method to provide the information for the answer. The natural language generation (NLG) component combines SHAP’s feature importance information with the predefined text “The above graph ...” to respond to the user question.

For the next question, Why is this profile predicted $\le $50K instead of >50K, the labels $\le $50K and >50K are replaced by the token <class> before matching to reference question 53 in Table 2 (main body of the paper) Why is this instance predicted P instead of Q?. The QX component identifies DICE [34] as the explanation method for this reference question, and the information is translated into natural language. In detail, DICE returns a counterfactual instance with the desired target label (>50K), yielding two features (Age and Workclass) that need to change in order to obtain the desired prediction. The NLG component extracts the relations between feature values of the original instance (Age: 39, Workclass: State-gov) and counterfactual instance (Age: 66.3, Workclass: Self-emp-inc). In comparison to the counterfactual, Age of the original instance is lower and Workclass differs. These relations are converted and rendered as text in the final answer by the NLG component.

For the final question That’s hard, how could I change only Occupation to get >50K prediction?, the words “Occupation” and “>50K” are substituted by tokens <feature> and <class> respectively. Then, the question is matched to reference question 13 (see Table 1) How should this feature change to get a different prediction?. DICE is again determined as the XAI method for providing the required information to answer this question. However, this question asks for a specific feature, i.e., constrains the search space of DICE for counterfactuals. Finally, the provided information is again translated to natural language.

1.2 F.2 Convolutional Neural Network on MNIST

We use the MNIST data set and a pre-trained convolutional neural network [46] to showcase a conversation on an image data set (see Fig. 7). First, the NLU component matches the first question Why did you predict that? to reference question 47 Why is this instance given this prediction? (see Table 1). Then, QX maps this question to SHAP [28] as the explanation technique. SHAP highlights the important parts on the image that lead to prediction 7. The NLG component adds an explanation in form of natural language text to the information provided by SHAP (the image). For the second question How should this image change to get number 9 predicted?, number 9 is replaced by token <class>. NLU maps this processed question to reference question 12 (see Table 1). QX identifies CFProto [46] as the method to answer this question. CFProto outputs the modified image that is closer to number 9. Finally, NLG generates the explanation text along with the output of CFProto.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, V.B., Schlötterer, J., Seifert, C. (2023). From Black Boxes to Conversations: Incorporating XAI in a Conversational Agent. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1903. Springer, Cham. https://doi.org/10.1007/978-3-031-44070-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-44070-0_4
Published: 21 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44069-4
Online ISBN: 978-3-031-44070-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

From Black Boxes to Conversations: Incorporating XAI in a Conversational Agent

Abstract

Access this chapter