Abstract
Counterfactual explanations (CFEs) are a popular approach in explainable artificial intelligence (xAI), highlighting changes to input data necessary for altering a model’s output. A CFE can either describe a scenario that is better than the factual state (upward CFE), or a scenario that is worse than the factual state (downward CFE). However, potential benefits and drawbacks of the directionality of CFEs for user behavior in xAI remain unclear. The current user study (N = 161) compares the impact of CFE directionality on behavior and experience of participants tasked to extract new knowledge from an automated system based on model predictions and CFEs. Results suggest that upward CFEs provide a significant performance advantage over other forms of counterfactual feedback. Moreover, the study highlights potential benefits of mixed CFEs improving user performance compared to downward CFEs or no explanations. In line with the performance results, users’ explicit knowledge of the system is statistically higher after receiving upward CFEs compared to downward comparisons. These findings imply that the alignment between explanation and task at hand, the so-called regulatory fit, may play a crucial role in determining the effectiveness of model explanations, informing future research directions in (xAI). To ensure reproducible research, the entire code, underlying models and user data of this study is openly available: https://github.com/ukuhl/DirectionalAlienZoo
This research was supported by research training group Dataninja (Trustworthy AI for Seamless Problem Solving: Next Generation Intelligence Joins Robust Data Analysis) funded by the German federal state of North Rhine-Westphalia, and by the European Research Council (ERC) under the ERC Synergy Grant Water-Futures (Grant agreement No. 951424).
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
The question of how to provide users with understandable, usable, and trustworthy explanations for machine learning (ML) decisions is at the heart of explainable artificial intelligence (xAI). A popular variant within the community are counterfactual explanations (CFEs), drawing out “what-if” scenarios that highlight necessary perturbations of the input data to change a model’s output [60]. Recent years have brought a notable uptick of studies investigating various aspects of CFEs for ML. Prior work focuses, inter alia, on their robustness [2], impact on user trust and satisfaction [61, 62], and usability as a function of algorithmic properties [21] (see [56] for an extensive review of the research landscape).
One key defining characteristic of counterfactual statements is their directionality: upward counterfactuals describe scenarios that are superior to the factual state (i.e. , how it would have been better), while downward counterfactuals refer to more negative alternatives to the factual state (i.e. , how it would have been worse) [28].
There is general agreement among cognitive and social psychologists that upward counterfactuals serve a preparatory role, increasing motivation and guiding future action [8, 64]. The role of downward counterfactuals, however, seems to be more complex. A common argument points towards a predominantly affective role, inducing a sense of relief about the factual state by emphasizing how a scenario could have been worse [46]. However, alternative empirical evidence suggests that downward counterfactuals may act as a wake-up call by drawing attention towards the possibility of worse outcomes, thus increasing motivation to take action [30].
In xAI, the impact of CFE directionality remains even more ambiguous, given that counterfactuals used to explain a model are not spontaneously generated by humans, but automatically computed as actionable feedback deepening users’ understanding. CFE user studies commonly investigate CFEs that flip a binary outcome class [9, 10, 43, 61, 62]. While these outcomes may have qualitative implications within their respective task domains (e.g. , being under vs. over the legal blood alcohol limit to drive [9, 61, 62], chemicals being safe vs. unsafe [9], grass growth levels on a farm being high vs. low [10]), the directionality of provided explanations is often outside the respective research focus. Thus, this aspect has not yet been extensively studied in xAI, and preliminary data available presents inconsistencies.
For instance, [43] report downward CFEs for positive decisions to be less popular compared to importance rankings, and find no differences between rankings and upward CFEs in terms of user preference. In contrast, [9] suggest a behavioral impact of explanations that establish a downward comparison to the factual state on personal decision-making. Following this reasoning, downward CFEs may potentially serve as better actionable feedback.
Given these sparse and inconsistent accounts, it is unclear whether one type of CFEs is more effective than the other in improving user performance in tasks that require model interpretation. Therefore, the current study systematically compares the impact of CFE directionality for ML predictions on user behavior. Specifically, we perform a user study that requires participants to extract new knowledge from an automated system given model predictions and corresponding CFEs. On top of groups exclusively receiving upward and downward CFEs, we provide a third group of users with both types in a mixed condition. We find it conceivable that collective information on better and worse outcomes may grant a more complete understanding of the causal relationships between actions and outcomes, effectively informing future decision-making. We investigate how CFEs of either type impact users’ objective performance, explicit knowledge of the system, and subjective experience, compared to each other and a no-explanation control condition.
2 Related Work
In contrast to using inherently interpretable models such as rule sets or decision trees, establishing explainability for opaque models like support vector machines or neural networks is a challenging endeavor. Proposed approaches include feature importance methods providing insights into the relevance and influence of input features on model predictions [27, 47], rule extraction techniques distilling interpretable decision rules from complex models [40], and prototype-based explanations leveraging representative instances to explain model behavior [52].
In this broader xAI landscape, CFEs take a prominent role given an emerging user-centric focus on explainability [31]. CFEs facilitate human comprehension by explicitly revealing the necessary changes in input data to influence the model’s output [60]. In this way, CFEs provide explanations for instances where the model’s predictions deviate from the desired outcomes, allowing users to understand the factors that contribute to the model’s decision-making process. Their particular appeal lies in their intrinsically contrastive format, bearing a strong resemblance to human cognitive reasoning. Indeed, individuals routinely engage in counterfactual thinking [14, 45]. During this process, one not only retains the representation of actual facts, but also simulates an alternative scenario of how the reality might have differed [8]. This distinct characteristic positions CFEs as a valuable addition to the xAI toolkit, promising to provide users with actionable insights for decision-making and understanding model behavior for a given decision.
How humans reason from counterfactuals has been a prominent research topic in cognitive psychology studies [16, 25, 49], producing relevant implications for the use of CFEs in xAI applications [7]. Human-generated counterfactuals typically change only a limited set of features, preferably undoing recent and controllable events to create hypothetical scenarios that are strongly aligned with the individual’s personal world knowledge and beliefs [7, 12]. The current literature encompasses various computational approaches for generating CFEs [1, 53], reflecting the continuing development within the field of counterfactual explanation generation. To yield explanations that closely resemble human counterfactual thought [31], generation approaches have placed emphasis on producing CFEs that are sparse [32], stay close to the original input (with variations in terms of the distance measures, [18, 20]), focus on controllable (and thus actionable) features [18, 54], and may even diversify the generated solutions to meet end-user’s needs [32].
Still, gaps in understanding in how far certain aspects of CFEs may facilitate or hinder a user’s understanding when they are used in xAI remain. Just recently, [61] demonstrated that human users more readily understand explanations relying on categorical features in contrast to continuous ones, a distinction not typically taken into account by CFE generation approaches. In a similar vein, the current work investigates the potential impact of CFE directionality on user behavior, a fundamental property commonly not addressed in xAI. Upward counterfactuals (i.e. , how it would have been better) are typically generated following negative events [28]. In this way, they may provide a clear roadmap for future improvement and action [7]. Indeed, imagining “better-worlds” broadly leads to performance improvements in various tasks and settings as a driving force for learning from past mistakes [13, 36, 44, 64]. When individuals engage in upward counterfactual thinking, their motivational orientation towards improvement aligns with the counterfactual focus on a hypothetical “better world”, thus inducing regulatory fit [15]. The positive affect associated with regulatory fit may enhance motivation, persistence, and goal-directed behavior, leading to an increased likelihood of taking action to bridge the gap between the current and desired states [33].
Downward counterfactuals, in contrast, refer to imagining more negative alternatives to the factual state (i.e. , how it would have been worse) [28]. This downward comparison may have different functional implications, and research indeed reveals a complex pattern. On the one hand, downward counterfactual thinking is frequently associated with affective regulation, eliciting relief [46] and reducing regret [38]. Through this positive affective role of inducing a feeling of “I’m better off than I could have been”, it seems to serve a self-enhancement function leading to more favorable self-perception [63]. In this way, downward counterfactual thinking may lead to a sense of complacency, reducing the motivation to act [30]. On the other hand, putting one’s attentional focus on an objectively worse counterfactual possibility may induce negative affect, which may in turn serve as a motivator by signaling that the present condition is inadequate and requires action [30]. Thus, by focusing on mistakes and missed opportunities, downward counterfactuals may potentially highlight areas for improvement. Despite these indications for fundamental differences in how humans reason with upward and downward counterfactuals, this crucial aspect of CFEs’ effectiveness and usability received little attention in xAI research so far. An extensive literature review revealed only two previous papers partially addressing this issue.
First, [43] conducted a study examining the effectiveness of feature rankings and CFEs in two everyday contexts: online advertising and loan applications. Specifically, their second experiment focuses on the directionality of explanations, with a particular emphasis on providing upward CFEs following negative outcomes and downward CFEs following positive outcomes. Participants made trade-off decisions between the two explanation modes, thus indicating their preferences for either feature rankings or CFEs. The results present a notable contrast to the prevailing preference for CFEs within the xAI community. Surprisingly, users show a higher preference for feature rankings over downward CFEs when faced with positive outcomes. This suggests that users found feature rankings more favorable in such scenarios. In the case of negative decisions, users exhibit no significant difference in terms of preference between the explanation formats, selecting upward CFEs as frequently as feature rankings. It is important to note, however, that such an assessment of user preference does not specifically allow drawing conclusions about relative usability differences of the explanation formats. Usability and user preference are two distinct aspects when evaluating the effectiveness of a system; aspects that – while being associated to some extent – often do not align [37]. Users may exhibit a subjective preference for systems or explanations, irrespective of the measurable impact on performance [24, 59].
More recently, [9] exposed participants to a model’s input, its decision, and either counterfactual or causal explanations, framed as a software application built to aid decision-making. Depending on the experimental group, the domain presented to a participant encompassed either a familiar scenario (i.e. , blood alcohol level and driving limit), or an unfamiliar one (i.e. , chemical safety). After rating the perceived helpfulness of the explanation presented, participants reached personal decisions whether they would be prepared to drive/handle an unknown chemical for a series of cases where they only saw the model input (Experiment 2 of [9]). Intriguingly, the personal decisions aligned better with model predictions when the preceding explanations would specifically establish a downward comparison. While this may shed a favorable light on downward CFEs for guiding personal action, it is unclear to which extent the familiarization phase framed as judging the helpfulness of a software application carried over to the subsequent decision-making phase. Furthermore, the reported beneficial effect of explanations that establish a downward comparison presents an incidental finding, as the actual focus of the study was on effects related to domain familiarity.
In light of these inconclusive preliminary findings, we aim to take a first step towards a systematic investigation of their directionality impacts CFEs’ usability as actionable feedback in xAI. Specifically, we ask whether novice users tasked to gain new information from an unknown system in an abstract domain [22] benefit more from receiving upward, downward, or mixed CFE feedback. By examining the effects of directionality, we aim to shed light on a nuance of CFEs that has yet to be explored, contributing to a more comprehensive understanding of their effectiveness and applicability in xAI.
3 Methods
To assess the impact of directionality of CFEs in xAI on user behavior, we employ the game-inspired Alien Zoo framework [22]. Consequently, our study assesses the efficacy of upward, downward, and mixed CFEs in acquiring new knowledge from an automated system in a low-knowledge domain, specifically targeting novice users. The Ethics Committee of Bielefeld University, Germany, approved this study.
3.1 Participants
We determined the required sample size for the current study by running an a-priori power analysis, using openly-available empirical data from an earlier empirical study based on the same experimental paradigm [22]. These exemplary data provided us with realistic estimates for fixed and random effects to be expected in the current study. The power analysis (R package mixedpower v.0.1.0 [23]) indicated that 40 participants per group were required to achieve a power of \(>85\%\) (medium effect size with alpha\(<.05\)).
161 Participants were recruited in April 2023 using Prolific AcademicFootnote 1, and assigned to one of four between-participant conditions in a fixed order: upward CFEs (n = 40), downward CFEs (n = 40), mixed CFEs (i.e. , receiving one downward and one upward CFEs in each feedback round, n = 41), and a no-explanation control group (n = 40). We restricted access to the study to native English speakers from the United States, Australia, Canada, New Zealand, Ireland, and the United Kingdom, who did not previously participate in studies with the given experimental framework. Before participating, users provided informed consent through electronic click wrap agreement.
All participants received a base pay of GBP£4 for participation. The three top performers in each condition received a bonus payment of GBP£1. Together with the experimental instructions, participants were informed about a potential monetary bonus to increase compliance with the task [3].
To ensure sufficient data quality, we applied several exclusion criteria prior to analysis. Specifically, a participant’s responses were removed when they failed more than one attention check (n = 3). No participant displayed monotonous response patterns despite poor performance during the game, indicating high effort. Data from 158 participants contributed to the final game analysis (Table reftab:Participants).
Note that for a number of users, logging of survey responses failed due to technical difficulties (n = 8). Further, we excluded users from the subsequent survey analysis if they answered with positive or negative valence only, indicating low-effort entries (n = 4). Consequently, the survey analysis was based on a subset of 146 users from the game phase.
3.2 Experimental Procedure
A detailed account of procedure and design choices underlying the experimental framework is given in [22].
In short, users who agree to participate are directed to a web server to complete a short online game. As part of the game, participants feed a group of aliens iteratively over several trials, choosing combinations of different plants as food for their aliens. During every feeding choice, users may select up to 6 instances of each of five plants represented by leaf symbols of identical shape but different color (see Fig. 1A for an exemplary decision scene). Each player starts out with an initial pack size of 20 aliens. Tasked to find the best plant combination that makes the alien pack grow instead of declining, top players generating the highest number of aliens per experimental group received an additional monetary bonus.
To facilitate the learning process of what plants make an effective alien diet, participants receive feedback in the form of CFEs after every even trial. The feedback depends on the respective participant’s condition (upward = “Your result would have been BETTER if you had selected:”; downward = “Your result would have been WORSE if you had selected:”; mixed = “Your result would have been BETTER/WORSE if you had selected:”; control = no explanation beyond an overview of past choices). Figure 1B depicts an exemplary feedback scene for a participant in the upward group. The game phase consists of 12 trials (i.e. , 12 feeding choices), with two attention checks assessing user’s attentiveness after trials 3 and 7.
Following the game phase, a survey assessed users’ subjective judgments of presented feedback via a modified version of the system causability scale (SCS, [17]). On top of two items assessing explicit knowledge of feature relevance for task success, this scale measures the extent to which an xAI system provides clear and understandable explanations for its decisions. Users rate the quality of explanations based on factors such as completeness, consistency, relevance, and comprehensibility on a five-point Likert scale. Finally, we collected demographic information on participants’ age and gender, before users received a link to access full debriefing information.
3.3 Prediction of New Alien Number and Generation of CFE Feedback
During the game, an underlying ML model trained on simulated plant-growth-rate data determines changes in alien number. In each trial, the participant’s feeding choice is passed on to a decision tree regression model [50] to predict a change rate for the current pack size (−10 to +10 aliens per decision, capped not to go below 2). Here, we use the model and training data from a previous study relying on the same experimental framework (maximal tree depth of 5 with Gini splitting rule of CART [6]; see Experiment 2 from [22]). This prior work demonstrated the feasibility of the experimental framework when comparing upward CFE feedback to a no-explanation control condition, promising to yield similarly meaningful insights into potential effects driven by different types of CFEs. In addition, the choice to rely on freely-available material that was previously published provided us with exemplary data distributions to obtain realistic estimates for fixed and random effects for the a priori power analysis.
The corresponding training data entails a dependency between two features (plant 2 and plant 4, respectively) and the output variable. Specifically, the growth rate scales linearly with values 1 to 5 for plant 2, iff plant 4 has a value of 1 or 2. To prevent users from applying a simple ‘the more, the better’ approach, the dependence between growth rate and the value 6 of plant 2 was disrupted.
Together with each prediction, we also compute a CFE presenting an alternative plant combination that differs minimally from the current input via optimization [60]. In our implementation, a CFE \({\textbf{x}_{\text {cf}}}\in \mathbb {R}^d\) of an ML model \(h:\mathbb {R}^d\rightarrow \{Y\}\) is realized as solving:
where \(\textbf{x}\in \mathbb {R}^d\) denotes the original input, the regularization \({\theta }(\cdot )\) penalizes deviations from the original input \(\textbf{x}\) (weighted by a regularization strength \(C>0\)), \(y'\in \{Y\}\) denotes the requested output/behavior of the model \(h(\cdot )\) under the counterfactual \({\textbf{x}_{\text {cf}}}\), and \({\ell }(\cdot )\) denotes a loss function penalizing deviations from the requested prediction. Thus, returned CFEs correspond to minimal perturbations to the model’s input that alter the final prediction to a desired outcome. Given the regularization term \({\theta }(\cdot )\), generated CFEs based on this definition remain as similar to the original input \(\textbf{x}\) as possible.
Depending on the participant’s condition, computed CFEs either increase (upward condition, and odd trials of the mixed condition) or decrease (downward condition, and even trials of the mixed condition) the current growth rate prediction by a few decimal points. After two trials, participants receive these CFEs as feedback to further improve their performance in the game (see Fig. 1B).
3.4 Statistical Analysis
We use R-4.1.1 [41] for all statistical analyses, with experimental condition (control, upward, downward, mixed) as independent variable. Given our longitudinal design, we employ linear mixed models for data analysis to effectively address the correlations that arise from multiple measurements taken from each participant [11, 35]. We investigate systematic differences between experimental groups over the 12 feeding trials (R package: lme4 v.4\(\_\)1.1-27.1) [4], with alien pack size over trials as dependent variable, fixed effects of group, trial number and their interaction, and a by-subjects random intercept. We compared model fits using the analysis of variance function (stats package, base R). Effect sizes are reported as \(\eta _{\text {p}}^{2}\) (R package: effectsize v.0.5) [5]. Pairwise estimated marginal means analysis followed-up significant main effects or interactions, Bonferroni corrected to account for multiple comparisons. We report respective effect sizes in terms of Cohen’s d.
We analyze survey data based on item type. Missing values (i.e., users responding “I do not know.” for items assessing explicit knowledge, or “I prefer not to answer.” for items assessing subjective experience) were removed prior to the survey analysis.
The first two items of the survey evaluate the user’s explicit knowledge of feature relevance for successful task completion. Our goal is to determine a comprehensive measure of user knowledge through rewards and penalties for correct and incorrect responses, respectively. To achieve this, we calculate the number of plants correctly identified per participant (i.e. , the number of matches between ground truth and user input).
The remaining items were adapted from the SCS, a rating scale that allows users to evaluate the extent to which an xAI system’s explanations are clear, transparent, and understandable [17]. Based on their responses to these items, we compute an adapted SCS score for each participant to assess their subjective experience with the game.
Statistically, we investigate potential group differences concerning matches between user input and ground truth, Likert-style survey responses, age, and gender information using the non-parametric Kruskal-Wallis H test (R package: rstatix v.0.7.0) [19], with effect sizes given as \(\eta ^{2}\).
Significant effects revealed via the Kruskal-Wallis H test are followed-up by running pairwise comparisons between group levels, Bonferroni corrected for multiple testing.
4 Results
Overall, the results show group effects both in terms of performance during the game, and user’s explicit knowledge of relevant and irrelevant features. However, we do not detect statistically significant differences when evaluating participants’ subjective experience.
4.1 Game Performance
We evaluate users’ game data to investigate whether naive users benefit comparably from different types of CFEs when tasked to extract knowledge in an unfamiliar domain. Specifically, we compare the number of aliens produced over time for participants receiving either upward CFEs, downward CFEs, mixed CFEs, or no CFEs (control) in the Alien Zoo iterative learning task.
Figure 2A depicts the development of average pack size over trials. All participants depict a positive learning trajectory, but with strikingly different slopes for different groups. The performance curves suggest that the mean number of generated aliens over trials varies as a function of experimental condition, with users receiving no explanations showing the least, and users receiving upward CFE feedback exhibiting the strongest performance increase. The corresponding linear mixed effects model revealed a significant interaction between trial number and group (F(33,1694) = 13.114, p < .001, \(\eta _{\text {p}}^{2}\) = 0.203), confirming this observation.
Follow-up analyses reveal an intriguing pattern of distinctive group differences (see Figs. 2B-F).
The trajectories of the control and the upward group diverge significantly from trial 4 onward (t(300) \(\ge \) -2.660, p \(\le \) .0494, d \(\ge \) -1.066), with participants in the upward group clearly outperforming control participants (Fig. 2B). This pattern also holds when comparing the upward group with the two remaining conditions. Trajectories of the upward and the downward group diverge significantly from trial 7 onward (t(300) \(\ge \) 3.016, p \(\le \) .0167, d \(\ge \) 1.224; Fig. 2C). Statistical differences between the upward and the mixed groups emerge starting at trial 8 (t(300) \(\ge \) 2.851, p \(\le \) .0280, d \(\ge \) 1.135; Fig. 2D).
While performing less efficient as participants in the upward group, participants in the mixed condition also achieve statistically higher scores in the last 5 trials compared to control participants (t(300) \(\ge \) -2.891, p \(\le \) .0247, d \(\ge \) -1.144; Fig. 2E), and in the last 2 trials compared to downward participants (t(300) \(\ge \) -3.121, p \(\le \) .0119, d \(\ge \) -1.251; Fig. 2F). Only trajectories of participants in the control and downward conditions do not show any statistically meaningful differences.
This interaction is complemented by a significant main effect of trial number (F(11,1694) = 95.573, p < .001, \(\eta _{\text {p}}^{2}\) = 0.380), and group (F(3,154) = 11.423, p < .001, \(\eta _{\text {p}}^{2}\) = 0.180).
4.2 Assessing User’s Explicit Knowledge
The first two items of the survey phase assess participants’ explicit knowledge of feature relevance for task completion. Across the two items, a participant could potentially reach 10 correct decisions by matching up their responses with the ground truth perfectly. In terms of mean number of matches between ground truth and user judgments, participants in the control condition matched highest (M = 6.700 ± 0.548 SE), followed by participants in the upward (M = 6.615 ± 0.261 SE), mixed (M = 6.000 ± 0.342 SE), and downward condition (M = 5.241 ± 0.390 SE; Fig. 3A). The corresponding statistical analysis reveals a significant effect of group (H(3) = 10.9, p = .012, \(\eta ^{2}\) = 0.077). Follow-up pairwise comparisons show that participants in the upward match significantly higher than participants in the downward condition (p = .028).
4.3 Assessing User’s Subjective Experience
A modified version of the SCS informs whether participants perceive provided explanations as clear, understandable, and usable. As shown in Fig. 3B, participants in the control (M = 0.760 ± 0.025 SE), upward (M = 0.756 ± 0.029 SE), and mixed (M = 0.754 ± 0.019 SE) conditions achieve very similar scores, while the mean SCS score of downward participants is slightly lower (M = 0.705 ± 0.023 SE). There is no statistically significant effect of group in terms of SCS scores (H(3) = 5.36, p = .147, \(\eta ^{2}\) = 0.017).
5 Discussion
The current study investigates the impact of directionality of CFEs for xAI on objective task performance, explicit knowledge, and subjective experience of novice users during an iterative learning paradigm in an unfamiliar domain.
The results suggest that participants benefit most from receiving upward CFE feedback (i.e. , informing them what choices would have been better), outperforming participants in all other conditions (Fig. 2). Consequently, we replicate prior work showing that upward CFEs induce a significant performance advantage over a no-explanation control [22] in the employed experimental framework, and extend previous insights by the aspect of directionality. In the current experimental setting, upward counterfactuals may have provided novice users with interpretable and clear pathways for actions that improve future behavior [7]. This is in line with previous psychological research demonstrating that reflecting upon “better worlds” may serve as a driving force for learning and adapting behavior [13, 36, 44, 64]. Given the current task, the striking positive impact of upward CFEs is in line with the psychological concept of regulatory fit, as describing how a choice would have been better matches the motivational orientation to improve one’s feeding choices [15]. Previous work in various domains suggests that such a feeling of fit induces more effective and satisfying performance, as well as greater persistence and motivation to continue the task [15, 26, 33]. A similar mechanism may be in effect in the current setting.
Intriguingly, participants who receive mixed CFE feedback also achieve statistically higher scores compared to control and downward groups, specifically towards later trials. Considering that downward CFEs do not improve user performance compared to providing no explanations at all (control), we may suspect that users in the mixed condition benefit from receiving feedback that partially possesses regulatory fit.
A previous study suggests a beneficial effect of explanations that establish a downward comparison [9]. However, other research shows that downward CFEs are relatively more disliked compared to feature rankings in terms of user preference [43]. Intriguingly, participants receiving downward CFEs in the current work do not show statistically meaningful differences in terms of task performance compared to participants receiving no explanations at all. While the current discrepancy of the downward group from the performance of the two other CFE groups is striking, it merits only cautious interpretation generally devoted to null effects. On the one hand, downward CFE feedback may have simply induced complacency that impeded participants’ motivation to act, knowing that there still was a worse route they could have taken [30, 44]. On the other hand, the current scenario may have been inadequate to unleash the negative affect necessary to stimulate action through the presented downward comparisons. Unsuccessfully feeding aliens had only limited personal consequences for participants, potentially keeping the level of perceived regret following a sub-optimal decision comparatively low. Consequently, the beneficial impact of downward CFEs in terms of regret minimization could not be observed [38]. A future study could investigate this possibility more closely via an adapted design that involves increased personal costs, thereby implementing a penalty for poor decision-making and a higher chance of inducing regret.
In terms of explicit knowledge, participants in the upward condition identified relevant and irrelevant input features more readily than in the downward condition, in line with the performance advantage for upward CFEs. This suggests that – in tasks requiring users to extract new information from a system – upward CFEs may be the better option for enhancing user’s explicit knowledge. A curious detail meriting comment, however, concerns the comparably high performance in terms of explicit knowledge by control users that do not receive explanations at all. This may be explained by the relatively high proportion of control participants indicating that they “do not know” for items assessing explicit knowledge, thus being discarded from this analysis. The remaining data may represent control individuals who are the most knowledgeable and confident in their responses.
In terms of objectively quantifiable measures, our study found tangible behavioral group differences, in stark contrast to user responses concerning the subjective usability of explanations provided. This observation is consistent with the literature on the mismatch between these two measures [21, 61], and further highlights the need to carefully consider both subjective and objective measures when evaluating . (xAI) approaches.
5.1 Limitations
In order to provide a comprehensive evaluation of the current findings, it is important to acknowledge and address limitations inherent in this study.
Given the experimental Alien Zoo framework, the results are obtained in relation to a very specific context and with a specific task, diverging from many real-life domains. Today, we are already witnessing the significant impact of AI-based decision-making systems across a wide range of domains, including but not limited to health care [42], the legal system [34], and human resource management [58]. We carefully considered the trade-offs involved in selecting a specific context. Ultimately, the primary objective of investigating the usability and impact of counterfactual directionality on user behavior and experience, motivated the choice for a single and quite abstract domain (i.e. , feeding aliens). This set-up allowed us to maintain a high level of control over experimental variables to isolate effects driven by directionality of CFEs, while minimizing confounding variables that could arise from varying contexts. Importantly, participants could engage with the counterfactual explanations and extract new knowledge from the automated system without being influenced by pre-existing domain knowledge. Thus, the current approach facilitated a more detailed analysis of how CFE directionality specifically affects the task at hand and the extraction of new knowledge from an automated system. Uncovering these specific dynamics within a well-defined context provides a first step, laying out the foundation for future work. Results await validation across various domains, tasks, and user populations, to contribute to a more comprehensive understanding of the broader applicability and usefulness of CFEs across different scenarios.
Similarly, the current work does not cover different approaches for generating the CFEs. As outlined in Eq. 1, we follow an optimization approach to generate minimal adjustments to the model’s input that – depending on experimental condition – either increase or decrease the predicted growth rate. This approach is based on the initial definition of CFE generation for ML [60] and various methods expand on the idea of using optimization principles for generating CFEs [29, 32, 51]. Alternative approaches generate counterfactual instances based on, e.g. , reinforcement learning [48, 57] or conditional generative adversarial networks [55, 65]. It is quite conceivable that the respective method for generating counterfactual explanations could indeed influence the final results. Different methods may introduce variations in the characteristics, interpretability, and quality of the explanations. Therefore, we have taken great care to select an optimization-based approach that aligns with established practices in the field.
A further potential confound of the design may be that it favors early discovery of an effective strategy, resulting in better performance over the duration of the experiment as the performance measure (number of generated aliens) accumulates over time. Finally, the current study neglects to account for individual user characteristics. It may be that anxiety-prone individuals respond more strongly to downward CFE feedback, given altered emotional and probabilistic appraisal of upward counterfactual thinking in individuals with high levels of trait anxiety [39]. Thus, further research is necessary to obtain a more comprehensive understanding of the role of directionality of CFEs in xAI.
5.2 Contribution to Knowledge for xAI
The findings presented in this study have significant implications for the field of explainable and trustworthy artificial intelligence. CFEs have emerged as a popular approach in xAI, as they provide insights into the changes required in input data to influence a model’s output. This study specifically focuses on the directionality of CFEs, distinguishing between upward counterfactuals (describing scenarios better than the factual state) and downward counterfactuals (describing scenarios worse than the factual state).
Our results demonstrate the importance of CFE directionality in shaping behavior and experience of novice users when interacting with an unknown automated system in an unfamiliar domain to extract new knowledge. The findings indicate that upward CFEs offer a significant performance advantage over other forms of counterfactual feedback in the given explanation context. Specifically, users were able to extract new knowledge more effectively and demonstrated higher explicit knowledge of the system when provided with upward CFEs compared to downward CFEs.
These findings point towards critical role of regulatory fit in determining the effectiveness of model explanations [33]. Regulatory fit refers to the alignment between an explanation and the task at hand. In the context of xAI, this implies that the directionality of CFEs should be carefully considered to ensure they are relevant and meaningful to the users’ objectives and cognitive processes. By providing explanations that align with users’ goals and expectations, xAI systems can enhance user performance and improve their understanding of the underlying models [31].
The impact of these findings on xAI as a sub-field of artificial intelligence is substantial. xAI aims to bridge the gap between black-box models and human comprehension, enabling users to trust, interpret, and interact with automated systems more effectively. By identifying the advantages of upward CFEs and the potential benefits of mixed CFEs, this study contributes to the development of more effective and user-centric explainability techniques. Understanding the directionality of CFEs provides valuable insights into how explanations can be tailored to meet users’ needs and improve their decision-making processes.
Furthermore, these findings have broader implications for the wider xAI community. Researchers and practitioners in xAI can leverage this knowledge to design better explainable systems. They can incorporate the directionality of CFEs into the design of xAI interfaces, ensuring that explanations are presented in a way that maximizes user understanding and performance. Additionally, these findings highlight the importance of user-centric evaluation methodologies in xAI research, as they provide valuable insights into the impact of explanations on user behavior and knowledge acquisition.
5.3 Conclusion
The canonical example illustrating the concept of counterfactuals in xAI is an upward CFE: “If you had done X, your loan would have been approved.” The current results, suggesting that upward CFEs are most effective for guiding decision-making, may explain why this example is considered to be an inherently intuitive prototype. Further, the results of this study provide renewed evidence for the importance of considering not only algorithmic aspects of explainability approaches, but also their effectiveness during hands-on human-system interaction. Specifically, they give reason to assume that regulatory fit, i.e. , the alignment between an explanation and the task at hand, may act as a potentially crucial factor in determining the effectiveness of model explanations.
Notes
References
Artelt, A., Hammer, B.: On the computation of counterfactual explanations-a survey. arXiv preprint arXiv:1911.07749 (2019)
Artelt, A., et al.: Evaluating robustness of counterfactual explanations. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 01–09. IEEE (2021). https://doi.org/10.1109/SSCI50451.2021.9660058
Bansal, G., Nushi, B., Kamar, E., Weld, D.S., Lasecki, W.S., Horvitz, E.: Updates in human-AI teams: understanding and addressing the performance/compatibility tradeoff. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 2429–2437 (2019). https://doi.org/10.1609/aaai.v33i01.33012429
Bates, D., Mächler, M., Bolker, B., Walker, S.: Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67(1) (2015). https://doi.org/10.18637/jss.v067.i01
Ben-Shachar, M., Lüdecke, D., Makowski, D.: effectsize: estimation of effect size indices and standardized parameters. J. Open Source Softw. 5(56), 2815 (2020). https://doi.org/10.21105/joss.02815
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. 1st edn. Routledge, London (1984). https://doi.org/10.1201/9781315139470
Byrne, R.M.: Counterfactuals in explainable artificial intelligence (xAI): evidence from human reasoning. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 6276–6282. International Joint Conferences on Artificial Intelligence Organization (2019). https://doi.org/10.24963/ijcai.2019/876
Byrne, R.M.: Counterfactual thought. Annu. Rev. Psychol. 67, 135–157 (2016). https://doi.org/10.1146/annurev-psych-122414-033249
Celar, L., Byrne, R.M.: How people reason with counterfactual and causal explanations for artificial intelligence decisions in familiar and unfamiliar domains. Mem. Cogn. 51, 1481–1496 (2023). https://doi.org/10.3758/s13421-023-01407-5
Dai, X., Keane, M.T., Shalloo, L., Ruelle, E., Byrne, R.M.: Counterfactual explanations for prediction and diagnosis in xAI. In: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, pp. 215–226 (2022). https://doi.org/10.1145/3514094.3534144
Detry, M.A., Ma, Y.: Analyzing repeated measurements using mixed models. JAMA 315(4), 407 (2016). https://doi.org/10.1001/jama.2015.19394
Dyczewski, E.A., Markman, K.D.: General attainability beliefs moderate the motivational effects of counterfactual thinking. J. Exp. Soc. Psychol. 48(5), 1217–1220 (2012). https://doi.org/10.1016/j.jesp.2012.04.016
Epstude, K., Roese, N.J.: The functional theory of counterfactual thinking. Pers. Soc. Psychol. Rev. 12(2), 168–192 (2008)
Goldinger, S.D., Kleider, H.M., Azuma, T., Beike, D.R.: Blaming the victim under memory load. Psychol. Sci. 14(1), 81–85 (2003). https://doi.org/10.1111/1467-9280.01423
Higgins, E.T.: Making a good decision: value from fit. Am. Psychol. 55(11), 1217 (2000). https://doi.org/10.1037/0003-066X.55.11.1217
Hilton, D.J., Slugoski, B.R.: Knowledge-based causal attribution: the abnormal conditions focus model. Psychol. Rev. 93(1), 75–88 (1986). https://doi.org/10.1037/0033-295X.93.1.75
Holzinger, A., Carrington, A., Müller, H.: Measuring the quality of explanations: the system causability scale (SCS): comparing human and machine explanations. KI - Künstliche Intelligenz 34(2), 193–198 (2020). https://doi.org/10.1007/s13218-020-00636-z
Karimi, A.H., Barthe, G., Balle, B., Valera, I.: Model-agnostic counterfactual explanations for consequential decisions. In: International Conference on Artificial Intelligence and Statistics, pp. 895–905. PMLR (2020)
Kassambara, A.: rstatix: pipe-friendly framework for basic statistical tests (2021). https://CRAN.R-project.org/package=rstatix. r package version 0.7.0
Keane, M.T., Smyth, B.: Good counterfactuals and where to find them: a case-based technique for generating counterfactuals for explainable AI (XAI). In: Watson, I., Weber, R. (eds.) ICCBR 2020. LNCS (LNAI), vol. 12311, pp. 163–178. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58342-2_11
Kuhl, U., Artelt, A., Hammer, B.: Keep your friends close and your counterfactuals closer: improved learning from closest rather than plausible counterfactual explanations in an abstract setting. In: 2022 ACM Conference on Fairness, Accountability, and Transparency, pp. 2125–2137 (2022). https://doi.org/10.1145/3531146.3534630
Kuhl, U., Artelt, A., Hammer, B.: Let’s go to the alien zoo: introducing an experimental framework to study usability of counterfactual explanations for machine learning. Front. Comput. Sci. 5, 20 (2023). https://doi.org/10.3389/fcomp.2023.1087929
Kumle, L., Võ, M.L.H., Draschkow, D.: Estimating power in (generalized) linear mixed models: an open introduction and tutorial in r. Behav. Res. Meth. 53(6), 2528–2543 (2021). https://doi.org/10.3758/s13428-021-01546-0
Lage, I., et al.: Human evaluation of models built for interpretability. In: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, vol. 7, pp. 59–67 (2019). https://doi.org/10.1609/hcomp.v7i1.5280
Lombrozo, T.: Explanation and abductive inference. In: Holyoak, K.J., Morrison, R.G. (eds.) The Oxford Handbook of Thinking and Reasoning, pp. 260–276. Oxford University Press, Oxford, UK (2012). https://doi.org/10.1093/oxfordhb/9780199734689.013.0014
Ludolph, R., Schulz, P.J.: Does regulatory fit lead to more effective health communication? A systematic review. Soc. Sci. Med. 128, 142–150 (2015). https://doi.org/10.1016/j.socscimed.2015.01.021
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Markman, K.D., Gavanski, I., Sherman, S.J., McMullen, M.N.: The mental simulation of better and worse possible worlds. J. Exp. Soc. Psychol. 29(1), 87–109 (1993). https://doi.org/10.1006/jesp.1993.1005
Mc Grath, R., et al.: Interpretable credit application predictions with counterfactual explanations. In: NIPS 2018-Workshop on Challenges and Opportunities for AI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy (2018)
McMullen, M.N., Markman, K.D.: Downward counterfactuals and motivation: the wake-up call and the Pangloss effect. Pers. Soc. Psychol. Bull. 26(5), 575–584 (2000). https://doi.org/10.1177/0146167200267005
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019). https://doi.org/10.1016/j.artint.2018.07.007
Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 607–617 (2020). https://doi.org/10.1145/3351095.3372850
Motyka, S., et al.: Regulatory fit: a meta-analytic synthesis. J. Consum. Psychol. 24(3), 394–410 (2014). https://doi.org/10.1016/j.jcps.2013.11.004
Mowbray, A., Chung, P., Greenleaf, G.: Utilizing AI in the legal assistance sector. In: LegalAIIA@ ICAIL, pp. 12–18 (2019)
Muth, C., Bales, K.L., Hinde, K., Maninger, N., Mendoza, S.P., Ferrer, E.: Alternative models for small samples in psychological research: applying linear mixed effects models and generalized estimating equations to repeated measures data. Educ. Psychol. Measur. 76(1), 64–87 (2016). https://doi.org/10.1177/0013164415580432
Myers, A.L., McCrea, S.M., Tyser, M.P.: The role of thought-content and mood in the preparative benefits of upward counterfactual thinking. Motiv. Emot. 38, 166–182 (2014). https://doi.org/10.1007/s11031-013-9362-5
Nielsen, J., Levy, J.: Measuring usability: preference vs. performance. Commun. CM 37(4), 66–75 (1994). https://doi.org/10.1145/175276.175282
Parikh, N., De Brigard, F., LaBar, K.S.: The efficacy of downward counterfactual thinking for regulating emotional memories in anxious individuals. Front. Psychol. 12, 712066 (2022). https://doi.org/10.3389/fpsyg.2021.712066
Parikh, N., LaBar, K.S., De Brigard, F.: Phenomenology of counterfactual thinking is dampened in anxious individuals. Cogn. Emot. 34(8), 1737–1745 (2020). https://doi.org/10.1080/02699931.2020.1802230
Qiao, L., Wang, W., Lin, B.: Learning accurate and interpretable decision rule sets from neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4303–4311 (2021). https://doi.org/10.1609/aaai.v35i5.16555
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2021). https://www.R-project.org/
Rajpurkar, P., Chen, E., Banerjee, O., Topol, E.J.: Ai in health and medicine. Nat. Med. 28(1), 31–38 (2022). https://doi.org/10.1038/s41591-021-01614-0
Ramon, Y., Vermeire, T., Toubia, O., Martens, D., Evgeniou, T.: Understanding consumer preferences for explanations generated by xAI algorithms. arXiv preprint arXiv:2107.02624 (2021)
Roese, N.J.: The functional basis of counterfactual thinking. J. Pers. Soc. Psychol. 66(5), 805 (1994). https://doi.org/10.1037/0022-3514.66.5.805
Roese, N.J.: Counterfactual thinking. Psychol. Bull. 121(1), 133–148 (1997). https://doi.org/10.1037/0033-2909.121.1.133
Roese, N.J., Olson, J.M.: Functions of counterfactual thinking. In: What Might Have Been: The Social Psychology of Counterfactual Thinking, pp. 169–197. Erlbaum (1995)
Rozemberczki, B.,et al.: The Shapley value in machine learning. In: The 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence (2022). https://doi.org/10.24963/ijcai.2022/778
Samoilescu, R.F., Van Looveren, A., Klaise, J.: Model-agnostic and scalable counterfactual explanations via reinforcement learning. arXiv preprint arXiv:2106.02597 (2021)
Sanna, L.J., Turley, K.J.: Antecedents to spontaneous counterfactual thinking: effects of expectancy violation and outcome valence. Pers. Soc. Psychol. Bull. 22(9), 906–919 (1996). https://doi.org/10.1177/0146167296229005
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York (2014)
Sharma, S., Henderson, J., Ghosh, J.: CERTIFAI: counterfactual explanations for robustness, transparency, interpretability, and fairness of artificial intelligence models. arXiv preprint arXiv:1905.07857 (2019)
Shin, Y.M., Kim, S.W., Yoon, E.B., Shin, W.Y.: Prototype-based explanations for graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 13047–13048 (2022). https://doi.org/10.1609/aaai.v36i11.21660
Stepin, I., Alonso, J.M., Catala, A., Pereira-Fariña, M.: A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9, 11974–12001 (2021). https://doi.org/10.1109/ACCESS.2021.3051315
Ustun, B., Spangher, A., Liu, Y.: Actionable recourse in linear classification. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 10–19 (2019). https://doi.org/10.1145/3287560.3287566
Van Looveren, A., Klaise, J., Vacanti, G., Cobb, O.: Conditional generative models for counterfactual explanations. arXiv preprint arXiv:2101.10123 (2021)
Verma, S., Boonsanong, V., Hoang, M., Hines, K.E., Dickerson, J.P., Shah, C.: Counterfactual explanations and algorithmic recourses for machine learning: a review. arXiv preprint arXiv:2010.10596 (2020)
Verma, S., Hines, K., Dickerson, J.P.: Amortized generation of sequential algorithmic recourses for black-box models. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8512–8519 (2022). https://doi.org/10.1609/aaai.v36i8.20828
Votto, A.M., Valecha, R., Najafirad, P., Rao, H.R.: Artificial intelligence in tactical human resource management: a systematic literature review. Int. J. Inf. Manage. Data Insights 1(2), 100047 (2021). https://doi.org/10.1016/j.jjimei.2021.100047
van der Waa, J., Nieuwburg, E., Cremers, A., Neerincx, M.: Evaluating xAI: a comparison of rule-based and example-based explanations. Artif. Intell. 291, 103404 (2021). https://doi.org/10.1016/j.artint.2020.103404
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv. JL Tech. 31, 841 (2017)
Warren, G., Byrne, R.M., Keane, M.T.: Categorical and continuous features in counterfactual explanations of AI systems. In: Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 171–187 (2023)
Warren, G., Keane, M.T., Byrne, R.M.: Features of explainability: how users understand counterfactual and causal explanations for categorical and continuous features in xAI. In: IJCAI-ECAI 2022 Workshop: Cognitive Aspects of Knowledge Representation (2022). https://ceur-ws.org/Vol-3251/paper1.pdf
White, K., Lehman, D.R.: Looking on the bright side: downward counterfactual thinking in response to negative life events. Pers. Soc. Psychol. Bull. 31(10), 1413–1424 (2005). https://doi.org/10.1177/0146167205276064
Wong, E.M.: Narrating near-histories: the effects of counterfactual communication on motivation and performance. Manage. Organ. Hist. 2(4), 351–370 (2007). https://doi.org/10.1177/1744935907086119
Yang, F., Alva, S.S., Chen, J., Hu, X.: Model-based counterfactual synthesizer for interpretation. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1964–1974 (2021). https://doi.org/10.1145/3447548.3467333
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this paper
Cite this paper
Kuhl, U., Artelt, A., Hammer, B. (2023). For Better or Worse: The Impact of Counterfactual Explanations’ Directionality on User Behavior in xAI. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1903. Springer, Cham. https://doi.org/10.1007/978-3-031-44070-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-44070-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44069-4
Online ISBN: 978-3-031-44070-0
eBook Packages: Computer ScienceComputer Science (R0)