How people reason with counterfactual and causal explanations for Artificial Intelligence decisions in familiar and unfamiliar domains

Celar, Lenart; Byrne, Ruth M. J.

doi:10.3758/s13421-023-01407-5

How people reason with counterfactual and causal explanations for Artificial Intelligence decisions in familiar and unfamiliar domains

Open access
Published: 24 March 2023

Volume 51, pages 1481–1496, (2023)
Cite this article

Download PDF

You have full access to this open access article

Memory & Cognition Aims and scope Submit manuscript

How people reason with counterfactual and causal explanations for Artificial Intelligence decisions in familiar and unfamiliar domains

Download PDF

Lenart Celar¹ &
Ruth M. J. Byrne¹

3793 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Few empirical studies have examined how people understand counterfactual explanations for other people’s decisions, for example, “if you had asked for a lower amount, your loan application would have been approved”. Yet many current Artificial Intelligence (AI) decision support systems rely on counterfactual explanations to improve human understanding and trust. We compared counterfactual explanations to causal ones, i.e., “because you asked for a high amount, your loan application was not approved”, for an AI’s decisions in a familiar domain (alcohol and driving) and an unfamiliar one (chemical safety) in four experiments (n = 731). Participants were shown inputs to an AI system, its decisions, and an explanation for each decision; they attempted to predict the AI’s decisions, or to make their own decisions. Participants judged counterfactual explanations more helpful than causal ones, but counterfactuals did not improve the accuracy of their predictions of the AI’s decisions more than causals (Experiment 1). However, counterfactuals improved the accuracy of participants’ own decisions more than causals (Experiment 2). When the AI’s decisions were correct (Experiments 1 and 2), participants considered explanations more helpful and made more accurate judgements in the familiar domain than in the unfamiliar one; but when the AI’s decisions were incorrect, they considered explanations less helpful and made fewer accurate judgements in the familiar domain than the unfamiliar one, whether they predicted the AI’s decisions (Experiment 3a) or made their own decisions (Experiment 3b). The results corroborate the proposal that counterfactuals provide richer information than causals, because their mental representation includes more possibilities.

Explainable AI lacks regulative reasons: why AI and human decision-making are not equally opaque

Article Open access 29 September 2022

From Responsibility to Reason-Giving Explainable Artificial Intelligence

Article Open access 19 February 2022

Evaluating the Usefulness of Counterfactual Explanations from Bayesian Networks

Article Open access 04 April 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Suppose you have had a few glasses of wine over a long dinner at a family gathering and you are wondering whether you are legally over or under the limit to drive home safely. You have downloaded an Artificial Intelligence (AI) decision support system to your smartphone and you decide to use it for the first time. You enter the information it requires: the number of units you have drunk, the duration over which you drank, and whether you have eaten, as well as details such as your weight and gender, and it provides you with a decision: You are under the limit. It also provides you with an explanation for its decision: You would have been over the limit if you had drunk on an empty stomach. Others at the gathering want to know what it says about them, and soon you are entering details about various cousins, aunts, uncles, and grandparents into the software application (app) to tell them its decision, and its explanation. Later in the evening another cousin asks what it would say about them; you have left your ‘phone in the other room but after using the app about 15 times you believe you know what it will say. You ask for your cousin’s details and tell them your prediction of the app’s decision: it will say they are under the limit. Is your prediction about the app’s decision accurate? And how helpful was the explanation of its decision for understanding it?

We examine in four experiments how people reason about counterfactual explanations for an AI’s decisions. Despite extensive psychological research on how people create counterfactual alternatives about how things could have turned out differently, few studies have examined the use of counterfactuals as explanations for others’ decisions (for a review, see Byrne, 2016). People tend to create counterfactual explanations in daily life after something bad happens (e.g., Kahneman & Tversky, 1982; Ritov & Baron, 1995; Roese, 1997). They often use counterfactuals to justify or defend their own past actions that led to something bad, in situations ranging from political discourse to accident safety reports (e.g., Catellani & Covelli, 2013; Markman et al., 2008a, 2008b; Morris & Moore, 2000). Accordingly, counterfactuals amplify or deflect judgements of blame, moral responsibility, or legal culpability (e.g., Branscombe et al., 1996; Malle et al., 2014; Parkinson & Byrne, 2017; Tepe & Byrne, 2022). People often rely on them to excuse their own poor performance (e.g., Ferrante et al., 2013; Markman & Tetlock, 2000; McCrea, 2008). When such explanations focus on how things could have turned out better, for example, “I would have got high marks if I had studied more”, they help people to prepare for the future, providing a ‘roadmap’ for intentions and plans, for example, “I will study more”, enabling them to learn from mistakes and prevent the recurrence of bad outcomes (e.g., De Brigard et al., 2013; Dixon & Byrne, 2011; Markman et al., 2008a, 2008b; Roese & Epstude, 2017; Smallman & Roese, 2009).

Counterfactual explanations tend to focus on one’s own decisions, and little is known about how people understand such explanations for others’ decisions. Yet currently hundreds of AI decision support systems provide human users with counterfactual explanations for the AI’s decisions. People are increasingly provided with AI decisions in many areas of daily life, from health to finances, job recruitment to holiday choices. Although early machine learning systems were readily interpretable, the increased reliance on deep neural networks trained on vast arrays of data has resulted in successful AI systems that are hard-to-understand “black boxes” (e.g., Barredo Arrieta et al., 2019). The goal of eXplainable AI (XAI) is to develop algorithmic techniques to provide automated explanations, to improve the interpretability of an AI system and its decisions (for reviews, see Karimi et al., 2020; Keane et al., 2021). People are legally entitled to such explanations, and explanations may also improve their trust and willingness to accept AI decisions (e.g., Hoffman et al., 2018; Wachter et al., 2017). Some XAI techniques attempt to explain the whole AI system, whereas others attempt to justify the decision (e.g., Karimi et al., 2020; Verma et al., 2020). Recently, there has been an explosion of interest in XAI in counterfactual explanations, i.e., explanations that describe how the AI’s decision would have been different, if it had received different input information about some key feature. A frequently used example is an automated decision by a banking AI system to refuse a customer’s loan application, explained by the counterfactual, “if you had asked for a lower amount, your loan application would have been approved” (e.g., Dai et al., 2022; Warren et al., 2023). Over 100 distinct computational methods for generating counterfactuals using different automated approaches have been proposed (see Karimi et al., 2020; Keane et al., 2021). These alternative proposals have claimed the particular algorithmic method generates “good” counterfactuals for humans, using various criteria, for example, counterfactuals said to be plausible, actionable, proximal, sparse or diverse (e.g., Karimi et al., 2020; Keane & Smyth, 2020; Wachter et al., 2017; Warren et al., 2022). But crucially, such XAI claims about the usefulness of counterfactual explanations of AI decisions for human users are based on intuition rather than psychological evidence; a recent review identified fewer than 20% of almost 120 XAI counterfactual papers conducted any test of how human users understood the explanations (Keane et al., 2021).

The XAI interest in counterfactual explanations has been driven partly by their relation to causal explanations (Byrne, 2019; Miller, 2019). Suppose the app you had downloaded about alcohol and driving gave you a causal explanation instead: You are under the limit because you drank on a full stomach. Would the accuracy of your prediction of its decisions be better given the counterfactual or the causal explanation? Counterfactual explanations are similar to causal ones in a number of ways. Counterfactual explanations depend on identifying the relations between events, especially causal, intentional, or deontic relations, and the link between counterfactual and causal reasoning has received particular attention (e.g., Gerstenberg et al., 2021; Lewis, 1973; Lucas & Kemp, 2015; Meder et al., 2010; Spellman & Mandel, 1999). When participants are given or can generate a counterfactual, for example, “I would have got high marks in the exam if I had had extra time”, their judgements increase that the events are causally related, for example, “I did not get high marks in the exam because I did not have extra time” (e.g., McCloy & Byrne, 2002; see also Lagnado et al., 2013). But counterfactual explanations also differ from causal ones in a number of ways. Their content sometimes diverges; when people construct causal explanations, they tend to focus on strong causes that co-vary with an outcome, for example, a drunk driver caused the crash, whereas when they construct counterfactuals they tend to focus on background conditions that could have prevented it, for example, the crash wouldn’t have happened if the protagonist had driven home a different way (Mandel & Lehman, 1996).

Even when counterfactual and causal explanations have the same content, their mental representations differ in important ways. People create a counterfactual by “undoing” some aspects of their simulation of what happened, often to add something new (e.g., Kahneman & Tversky, 1982a; Roese & Epstude, 2017). They envisage at least two possibilities to compare the imagined alternative to what actually happened (e.g., Byrne, 2005). When they understand a factual conditional in the indicative mood “if I had extra time, I got high marks”, they initially envisage a single possibility, “I had extra time and I got high marks”, and although they know there may be alternatives consistent with the conditional, they do not think about them at the outset (Johnson-Laird & Byrne, 2002). Similarly, for a causal assertion, “Because I had extra time I got high marks”, they initially envisage a single possibility, and they can think about alternative possibilities subsequently (e.g. Frosch & Byrne, 2012; Johnson-Laird & Khemlani, 2017). But for a counterfactual conditional in the subjunctive mood “if I had had extra time I would have got high marks”, from the very outset people envisage not only the conjecture, “I had extra time and I got high marks”, they also recover the known or presupposed facts, “I did not have extra time and I did not get high marks” (Byrne, 2017). Hence, when people hear a counterfactual they look at images corresponding to both the alternative to reality and the facts, whereas when they hear a causal assertion they look at images corresponding just to the facts (Orenes et al., 2022). They also make more inferences from counterfactuals than factual assertions (e.g., Byrne & Tasso, 1999). On this dual possibility theory, counterfactuals provide richer information than causal assertions, because they are mentally represented at the outset by more possibilities; but they also require more cognitive resources, again because they are represented by more possibilities. Hence in diary studies, people spontaneously create fewer counterfactual explanations than causal ones (e.g., McEleney & Byrne, 2006). Generally, people prefer simple rather than complex causal explanations (e.g., Keil, 2006; Lombrozo, 2007; Quinn et al., 2021).

In artificial intelligence and machine learning, “causal explanation” is often reserved for scenarios in which a causal model is available. Hence some XAI studies refer instead to “rule-based explanations” or “if…then” rule explanations (e.g., Lage et al., 2019; van de Waa et al., 2021). However, in line with the typical terminology used in the psychology of explanation, we use “causal explanation” to refer to assertions such as “A because of B”, or “A happened because B happened” (see Keil, 2006; Kirfel et al., 2022; Johnson-Laird & Khemlani, 2017; Lombrozo, 2007). The few XAI human user studies of counterfactual explanations suggest counterfactuals can help users predict what an AI system will do (e.g., Lage et al., 2019; Lucic et al., 2020; van der Waa et al., 2021), and also improve their trust and satisfaction with the AI system (e.g., Förster et al., 2021; Kenny et al., 2021; Lucic et al., 2020; see also Hoffman et al., 2018). But although people’s evaluation of their satisfaction and trust in an AI system is higher for one that provides counterfactual explanations than causal ones, their accuracy in predicting the AI’s decisions is helped equally by both sorts of explanation (Warren et al., 2023). The dissociation has raised concerns about ethical explanation strategies in AI: if an explanation type is preferred by human users but has little added impact on their knowledge of the system, it could lead them subjectively to trust an app’s decision without objectively understanding it (Warren et al., 2023). We aim to examine further whether people find counterfactual explanations more helpful than causal ones. We will examine participants’ subjective judgements using the more sensitive measure of their evaluation of the helpfulness of each explanation of an AI’s decision as each one is presented, rather than by a general set of questions about overall explanation satisfaction administered at the end of the experiment (as in Warren et al., 2023). We will also examine their accuracy in predicting the AI’s decisions when their attention has been drawn to each explanation in this manner.

We test not only a familiar domain, alcohol and driving, but also an unfamiliar one, chemical safety. Suppose, instead of the blood-alcohol app, you have a college job for the summer in a chemical lab, and your employer has provided an AI decision support system to guide you in what chemicals are safe for you to handle. You enter the information it requires: occupational exposure limit, pH, exposure duration, air pollution rating, and PNEC rating, and it provides you with a decision: Chemical 83220 is safe. It also provides you with an explanation for its decision: Chemical 83220 would have been deemed unsafe if it had had a longer exposure duration. You use the app about 15 times on your first day and then you are faced with another chemical in a different room from where the device with the AI app is located. You think about this chemical’s details and come to a prediction of the app’s decision: it will say the chemical is safe. Is your prediction accurate about the app’s decision? And does the task of predicting the chemical safety app’s decision seem harder than predicting the alcohol and driving app’s decision? For a familiar domain, it is likely people already have beliefs about what prediction the AI system will make, for example, about whether someone who has drunk 10 units of alcohol is over the legal limit to drive, whereas in an unfamiliar domain, they may have few beliefs about what prediction the AI system will make, for example, about whether a chemical with a pH of 10 units is unsafe to handle.

People make very different inferences with familiar content compared to unfamiliar content. Many hundreds of experiments have shown that people make more accurate inferences about how to test a rule when it is about familiar content, for example, “if an envelope is sealed then it has a 90 cent stamp” compared to unfamiliar content, for example, “if a card has a vowel on one side then it has an even number on the other side” (Wason, 1968; Johnson-Laird, et al, 1972; for a review, see Nickerson, 2015). Familiar content elicits people’s prior beliefs (e.g., Evans & Over, 2004; Oaksford & Chater, 2007) and it makes counterexamples to putative conclusions readily available (Sperber et al., 1995). Knowledge of a domain modulates people’s models of a situation, enabling them to envisage more possibilities, and to eliminate some possibilities from further consideration (e.g., Johnson-Laird & Byrne, 2002; Ragni et al., 2018). Hence we expect participants will make more accurate predictions for a familiar domain compared to an unfamiliar one. We examine whether they consider counterfactual explanations more helpful than causal ones not only for a familiar domain, but also for an unfamiliar one. In a familiar domain people will be readily able to envisage the dual possibilities for a counterfactual explanation, which correspond to their prior knowledge, so they will have access to more information than for a causal explanation; we test whether in an unfamiliar domain they are as readily able to envisage such dual possibilities. Familiarity of domain is relevant to XAI since in some situations, for example, health or holiday choices, people may have some prior beliefs, whereas in other domains, for example, finance or job recruitment, they may not; yet familiarity is relatively unexplored in the study of counterfactual explanations, whether in psychology or in XAI.

Our goal is to examine how people reason about counterfactual explanations of others’ decisions, using AI decisions as a test bed given their topicality. The approach in our four experiments was to provide participants with a set of diverse cases consisting of a variety of inputs provided to an AI system, the different decisions it made, and an explanation for why it made each decision; and to ask participants to judge how helpful the explanation was. An example of the tabular information given to participants, including the AI’s decision and a counterfactual explanation is provided in Fig. 1A. After participants gained experience with a set of cases, the decisions the AI system made, and the explanations for its decisions, we presented them with an entirely new set of different cases, this time with no information about the AI’s decision, and asked them to predict the AI’s decision, for example, whether it would decide the person was over the limit or under the limit. Figure 1B provides an example of the information in this prediction task.

The paradigm, or use-case, in which participants are first provided with trials to become familiar with an app’s decisions, before then being asked to make judgements such as predictions about the app’s decisions, was chosen to ensure that participants commence the prediction task from a similar baseline, and is consistent with proposals that the psychological impact of an explanation is knowledge change (e.g., Keane, 2023; Keil, 2006). Participants’ responses are likely to be based in part on their beliefs about blood alcohol level and driving limits, and in part on their recollection of the somewhat similar cases they considered in the first part of the experiment and the decisions the AI system made. Our interest is in whether participants’ predictions are affected differently by counterfactual explanations, for example, “Elliot’s blood alcohol level would have been over the limit, if he had drunk on an empty stomach”, or causal explanations, for example, “Elliot’s blood alcohol level was under the limit, because he drank on a full stomach”. The content of the explanations is the same, and the informational, memory, and response demands of the prediction task are the same, all that differs is whether the explanation is phrased as a conditional using the connective “if” and the subjunctive mood, or as a causal assertion using the connective “because” and the indicative mood.

We compared counterfactual to causal explanations for AI decisions in a familiar domain and an unfamiliar one; on participants’ predictions of an AI’s decisions (Experiment 1), and on their own decisions (Experiment 2); for an AI system that provides correct decisions and explanations, and one that provides incorrect ones (Experiments 3a and 3b). Since previous studies found trends of increased subjective preference and objective accuracy for counterfactual and causal explanations compared to control descriptions (Warren et al., 2023), we compared counterfactual and causal explanations directly to each other in our experiments.

A summary of the four main findings we will report is as follows: (a) Participants judged counterfactual explanations more helpful than causal ones in our first experiment, but counterfactuals did not improve the accuracy of their predictions of an AI’s decisions more than causals. (b) However, in our second experiment, counterfactuals improved the accuracy of participants’ own decisions more than causals. (c) In these two experiments, the AI’s decisions and explanations were correct and participants considered explanations more helpful and made more accurate judgements in the familiar domain than the unfamiliar one. (d) In our third and fourth experiments, the AI’s decisions and explanations were incorrect, and participants considered explanations less helpful and made fewer accurate judgements in the familiar domain than the unfamiliar one, whether they predicted the AI’s decisions or made their own decisions.

Experiment 1: Predictions of an Artificial Intelligence (AI)’s decisions

Our first hypothesis was people will judge explanations for an AI’s decisions to be more helpful in a familiar than an unfamiliar domain, and they will be more accurate in predicting its decisions in the familiar than the unfamiliar domain. Our second hypothesis was people will judge counterfactual explanations more helpful than causal ones even when we measure explanation helpfulness by judgements after each explanation, rather than by a final satisfaction scale (as in Warren et al., 2023). We also test whether counterfactual and causal explanations help prediction accuracy equally even when participants’ attention is drawn to each explanation in this manner.

Method

Participants

A g*power analysis indicated 171 participants were required to achieve 90% power for a two-tailed analysis of variance (ANOVA) with a medium size effect at p < 0.05. Participants in each experiment were recruited through Prolific, and paid £1.50 sterling; they were native English speakers from Ireland, Britain, America, Canada, Australia and New Zealand, who had not previously participated in related studies. The 177 participants included 139 women, 31 men, five non-binary people, and two people who preferred not to say; their average age was 24.9 years with a range of 18–56 years. They were assigned to four groups: Familiar counterfactual (n = 45), Familiar causal (n = 48), Unfamiliar counterfactual (n = 44), and Unfamiliar causal (n = 40). Participants were excluded prior to any data analysis if they failed either of two attention checks, or failed to correctly identify at least three of five features in a memory test, and accordingly, a further 24 participants were excluded. The experiments received prior approval from the Trinity College Dublin School of Psychology ethics committee, reference SPREC102020-52.

Design and materials

The design was a 2 (familiarity: high, low) × 2 (explanation type: counterfactual, causal) between-participants design, and participants were assigned to one of four conditions. The dependent measures were judgements of explanation helpfulness, accuracy of predictions, and confidence in predictions.

In the first part of the experiment participants were presented with 16 cases (see Fig. 1a). Each case consisted of the input provided to an AI system, the decision it made, and an explanation for why it made the decision. The cases were presented in a different randomised order for each participant. For each case, participants were asked to rate the helpfulness of the explanation, i.e., they responded to, “This explanation was helpful” by ticking a 1–5 Likert-type scale (labelled: strongly disagree, disagree, neutral, agree, strongly agree). The 16 cases are presented in the Online Supplementary Materials (OSM).

In the second part, participants were presented with 16 different cases (see Fig. 1b). Each case consisted only of the input provided to the AI system, without its decision or any explanation. The cases were presented in a different randomised order for each participant. Participants were asked to predict the AI’s decision, responding to the prompt, “Based on the information provided, I believe the app’s prediction for this person/chemical will be” by selecting from the binary options of “over the limit/under the limit” in the familiar condition, or “safe/not safe” in the unfamiliar condition. They were also asked to judge how confident they were in their prediction on a 1–5 Likert-type scale (labelled: not at all confident, not very confident, neither, fairly confident, very confident). The 16 cases are presented in the OSM. Participants received two attention check items, one in each part of the experiment, also presented in tabular form visually identical to the target cases, but participants were asked to indicate the value of one of the input features.

To familiarise participants with the response options, before beginning the experimental trials, they completed one example trial for the first part of the experiment, and a second example trial for the second part. Hence, participants knew from the outset they were going to judge the helpfulness of explanations, and then predict the AI’s decisions.

SafeLimit

Participants in the familiar condition were instructed they would be testing a new app named SafeLimit (adapted from Warren et al., 2023). They were told the app was designed to predict whether or not a person would be over the legal limit to drive, based on five features the app analysed: the person’s weight, units of alcohol, duration of drinking, gender, and stomach fullness. The cases for the SafeLimit system were taken from an AI case-base used to estimate an individual’s blood alcohol content (BAC) using the Widmark formula (Posey & Mozayani, 2007). For the purpose of this study, individuals with BAC ≥ 0.08% were classified as over the legal limit to drive and the selection of the 34 cases used as experimental materials was limited to cases with BAC proximal to the decision boundary of being over or under the limit (0.06 < BAC < 0.09). The outcome for half of the selected cases was over the limit and for the other half was under the limit. Further information on the selection of cases is given in the OSM.

Chemsafe

Participants in the unfamiliar condition were instructed they would be testing a new app called ChemSafe. They were told the app was designed to predict whether or not a chemical would be safe to handle, based on five features the app analysed: occupational exposure limit, pH, exposure duration, air pollution rating, and PNEC rating. The cases for the ChemSafe app were created to be analogous to the blood alcohol cases and the same ones were used from the SafeLimit app with modifications to reduce familiarity, i.e., features from the BAC system were converted to chemical safety technical terms, while the values and case logic remained the same. The categorical features of gender (male/female), and stomach fullness (empty/full) were modified into categorical features of air pollution rating (e-01/e-00), and PNEC rating (EC10/EC50), and the continuous features of weight (kg), units (units), and duration (minutes) were modified into continuous features of occupational exposure limit (ppm), pH (units), and exposure duration (minutes). Figure 2 provides an example of a case from SafeLimit converted to ChemSafe (see Table S1 in the OSM for further information).

Explanations

Participants were given either counterfactual or causal explanations in the first part of the experiment. Participants in the counterfactual condition were given an explanation in the form of a conditional with the connective ‘if’ in the subjunctive mood, for example, “Zoe's blood alcohol level would have been over the limit, if she had drunk more units of alcohol”. Participants in the causal explanation condition were given a matched explanation with the same content but in the form of a causal assertion with the connective ‘because’ in the indicative mood, for example, “Zoe's blood alcohol level was under the limit, because she drank few units of alcohol”. Each explanation was based on one feature and to control for any effects of explanations about different features, participants were presented with four explanations related to each of the four features: weight, units of alcohol, duration, and stomach fullness (and for each of the four features, two outcomes were over the limit and two were under the limit). For the binary feature of stomach fullness, the counterfactual explanations referred to the opposite of the facts presented about it (i.e., full/empty). For the three continuous features of weight, units of alcohol, and duration, they referred to, for example, ‘more’ or ‘fewer’, for example, “…if she had drunk fewer units of alcohol”; the causal explanations referred to ‘many’ and ‘few’, for example, “… because she drank many units of alcohol.” Accordingly, counterfactual explanations used comparative descriptors whereas causal explanations used absolute ones. The difference is consistent with a counterfactual’s focus not only on the facts, for example, she drank 8 units, but also on an alternative to reality, if she had drunk fewer units; and a causal assertion’s focus on the facts, she drank 8 units, i.e., many units (see OSM).

At the end of the experiment, we gave participants two scales developed in the XAI literature to measure explanation satisfaction and trust in the AI system (Hoffman et al., 2018), used in previous studies (e.g., Warren et al., 2023). Their results were broadly consistent with the judgements of explanation helpfulness and so for brevity we report them in the OSM. Participants also completed a memory check question: to identify five features used by the app, by selecting them from a list of ten features (see OSM).

Procedure

Prolific users who consented to participate were provided with a link to the online experiment, presented via Alchemer. The experiment took approximately 15 min to complete.

Results and discussion

The data for all the experiments are available via the Open Science Framework at https://osf.io/e7hjs/ In each experiment we carried out a 2 (familiarity: familiar vs. unfamiliar domain) × 2 (explanation type: counterfactual vs. causal) between-participants ANOVA, on explanation helpfulness judgements, prediction accuracy, and prediction confidence.

In the first part of the experiment, participants responded to, “This explanation was helpful” for each one of 16 explanations. They judged counterfactual explanations more helpful than causal explanations, F(1, 173) = 22.04, p <.001, η_p² = .11, and judged explanations more helpful in the familiar domain than the unfamiliar one, F(1, 173) = 14.73, p < .001, η_p² = .08; the two variables did not interact, F(1,173) = 0.10, p = .75, Fig. 3A. The same direction of difference occurred for most of the 16 items (see OSM for additional analyses of item consistency).

In the second part, participants predicted the AI’s decision for 16 new cases, by selecting from the binary options of “over/under the limit” (or “safe/not safe”). The prediction accuracy measure indicates the extent to which participants’ predictions aligned with the AI’s decisions. Participants made as many correct predictions whether they had been given counterfactual or causal explanations, F(1,173) = .14, p = .71; they made more correct predictions for the familiar domain than the unfamiliar one, F(1, 173) = 28.68, p < .001, η_p² = .14; the two variables did not interact, F(1,173) = .91, p = .34, see Fig. 3B. The same direction of difference was observed for most of the items (see OSM).

Participants were asked to judge how confident they were in their prediction for each one of the 16 cases. They were equally confident in their predictions whether they had been given counterfactual or causal explanations, F (1, 173) = .91, p = .34, they were more confident about their predictions for the familiar domain than the unfamiliar one, F(1, 173) = 8.54, p = .004, η_p² = .05; the two variables did not interact, F (1, 173) = .04, p = .84, Fig. 3C.

Participants judged counterfactual explanations more helpful than causal ones, but their prediction accuracy was helped equally by counterfactual as by causal explanations, and their confidence in their accuracy was the same whether given counterfactual or causal explanations. They judged explanations more helpful in the familiar domain than the unfamiliar one, their predictions were more accurate, and they were more confident in their decisions.

That people judge counterfactual explanations to be more helpful than causal ones suggests the informational benefit of the enriched mental representation of two possibilities for counterfactuals outweighs any cognitive costs of envisaging multiple models. Why then are people’s predictions of an AI’s decisions not helped more by counterfactual explanations than by causal ones? One potential explanation is the additional information in the dual representation of counterfactuals is of use when people reason about their own decisions, rather than another’s decisions. Our next experiment examines the effects of explanations on the accuracy of one’s own decisions.

Experiment 2: Making one’s own decisions

Suppose once again you are at a family gathering and have had a few glasses of wine. Once again you have gained experience with the alcohol and driving app, and now you no longer have access to it. But suppose now you must decide whether you are prepared to drive home. Your task in this case is not simply predicting what the app would decide, instead it is to make your own decision about what is safe for you to do. The aim of Experiment 2 was to compare the effects of counterfactual and causal explanations for an AI’s decisions on participants’ own decisions.

Our focus on people’s own decisions arises because counterfactuals are often personal, i.e., people generate episodic counterfactual thoughts about how events in their own lives could have turned out differently, often centred on their goals, i.e., to prepare for the future by forming intentions, for example, to prevent a bad outcome from occurring again (e.g., De Brigard et al., 2013; Ferrante et al., 2013; Roese & Epstude, 2017). Counterfactuals are influential in helping people make future decisions (O’Connor et al., 2014). In the experiment participants knew from the outset they were going to make decisions about their own safety to drive, or to handle chemicals (because once again they received at the outset a practice trial for each part of the experiment). The task helps ensure participants’ engagement with the AI’s decisions and explanations. In the previous experiment, participants attempted to predict the AI’s decisions about others, given knowledge of the AI’s past decisions about others; in this experiment they attempt to make a decision about themselves, given knowledge of the AI’s past decisions about others. The distinction between predicting an AI’s decisions about others’, and one’s own decisions, is relevant to XAI: AI systems have been shown to outperform humans in some domains (e.g., Bae et al., 2017), but in a variety of situations, for example, finance, medicine, or legislation, human users may ultimately decide whether or not to follow the AI’s decision. For example, a loan applicant may decide whether or not to apply for a lower loan, or a doctor may decide whether or not to follow a recommendation for a patient’s treatment.

Our hypothesis was that when participants make their own decisions, they will judge counterfactual explanations more helpful than causal ones, and their own decisions will be more accurate given counterfactual explanations than causal ones (i.e., the dissociation between subjective and objective measures will be eliminated). We expect once again their decisions will be more accurate when they are familiar with the domain than unfamiliar with it.