Keywords

1 Introduction

The question of how to provide users with understandable, usable, and trustworthy explanations for machine learning (ML) decisions is at the heart of explainable artificial intelligence (xAI). A popular variant within the community are counterfactual explanations (CFEs), drawing out “what-if” scenarios that highlight necessary perturbations of the input data to change a model’s output [60]. Recent years have brought a notable uptick of studies investigating various aspects of CFEs for ML. Prior work focuses, inter alia, on their robustness [2], impact on user trust and satisfaction [61, 62], and usability as a function of algorithmic properties [21] (see [56] for an extensive review of the research landscape).

One key defining characteristic of counterfactual statements is their directionality: upward counterfactuals describe scenarios that are superior to the factual state (i.e. , how it would have been better), while downward counterfactuals refer to more negative alternatives to the factual state (i.e. , how it would have been worse) [28].

There is general agreement among cognitive and social psychologists that upward counterfactuals serve a preparatory role, increasing motivation and guiding future action [8, 64]. The role of downward counterfactuals, however, seems to be more complex. A common argument points towards a predominantly affective role, inducing a sense of relief about the factual state by emphasizing how a scenario could have been worse [46]. However, alternative empirical evidence suggests that downward counterfactuals may act as a wake-up call by drawing attention towards the possibility of worse outcomes, thus increasing motivation to take action [30].

In xAI, the impact of CFE directionality remains even more ambiguous, given that counterfactuals used to explain a model are not spontaneously generated by humans, but automatically computed as actionable feedback deepening users’ understanding. CFE user studies commonly investigate CFEs that flip a binary outcome class [9, 10, 43, 61, 62]. While these outcomes may have qualitative implications within their respective task domains (e.g. , being under vs. over the legal blood alcohol limit to drive [9, 61, 62], chemicals being safe vs. unsafe [9], grass growth levels on a farm being high vs. low [10]), the directionality of provided explanations is often outside the respective research focus. Thus, this aspect has not yet been extensively studied in xAI, and preliminary data available presents inconsistencies.

For instance, [43] report downward CFEs for positive decisions to be less popular compared to importance rankings, and find no differences between rankings and upward CFEs in terms of user preference. In contrast, [9] suggest a behavioral impact of explanations that establish a downward comparison to the factual state on personal decision-making. Following this reasoning, downward CFEs may potentially serve as better actionable feedback.

Given these sparse and inconsistent accounts, it is unclear whether one type of CFEs is more effective than the other in improving user performance in tasks that require model interpretation. Therefore, the current study systematically compares the impact of CFE directionality for ML predictions on user behavior. Specifically, we perform a user study that requires participants to extract new knowledge from an automated system given model predictions and corresponding CFEs. On top of groups exclusively receiving upward and downward CFEs, we provide a third group of users with both types in a mixed condition. We find it conceivable that collective information on better and worse outcomes may grant a more complete understanding of the causal relationships between actions and outcomes, effectively informing future decision-making. We investigate how CFEs of either type impact users’ objective performance, explicit knowledge of the system, and subjective experience, compared to each other and a no-explanation control condition.

2 Related Work

In contrast to using inherently interpretable models such as rule sets or decision trees, establishing explainability for opaque models like support vector machines or neural networks is a challenging endeavor. Proposed approaches include feature importance methods providing insights into the relevance and influence of input features on model predictions [27, 47], rule extraction techniques distilling interpretable decision rules from complex models [40], and prototype-based explanations leveraging representative instances to explain model behavior [52].

In this broader xAI landscape, CFEs take a prominent role given an emerging user-centric focus on explainability [31]. CFEs facilitate human comprehension by explicitly revealing the necessary changes in input data to influence the model’s output [60]. In this way, CFEs provide explanations for instances where the model’s predictions deviate from the desired outcomes, allowing users to understand the factors that contribute to the model’s decision-making process. Their particular appeal lies in their intrinsically contrastive format, bearing a strong resemblance to human cognitive reasoning. Indeed, individuals routinely engage in counterfactual thinking [14, 45]. During this process, one not only retains the representation of actual facts, but also simulates an alternative scenario of how the reality might have differed [8]. This distinct characteristic positions CFEs as a valuable addition to the xAI toolkit, promising to provide users with actionable insights for decision-making and understanding model behavior for a given decision.

How humans reason from counterfactuals has been a prominent research topic in cognitive psychology studies [16, 25, 49], producing relevant implications for the use of CFEs in xAI applications [7]. Human-generated counterfactuals typically change only a limited set of features, preferably undoing recent and controllable events to create hypothetical scenarios that are strongly aligned with the individual’s personal world knowledge and beliefs [7, 12]. The current literature encompasses various computational approaches for generating CFEs [1, 53], reflecting the continuing development within the field of counterfactual explanation generation. To yield explanations that closely resemble human counterfactual thought [31], generation approaches have placed emphasis on producing CFEs that are sparse [32], stay close to the original input (with variations in terms of the distance measures, [18, 20]), focus on controllable (and thus actionable) features [18, 54], and may even diversify the generated solutions to meet end-user’s needs [32].

Still, gaps in understanding in how far certain aspects of CFEs may facilitate or hinder a user’s understanding when they are used in xAI remain. Just recently, [61] demonstrated that human users more readily understand explanations relying on categorical features in contrast to continuous ones, a distinction not typically taken into account by CFE generation approaches. In a similar vein, the current work investigates the potential impact of CFE directionality on user behavior, a fundamental property commonly not addressed in xAI. Upward counterfactuals (i.e. , how it would have been better) are typically generated following negative events [28]. In this way, they may provide a clear roadmap for future improvement and action [7]. Indeed, imagining “better-worlds” broadly leads to performance improvements in various tasks and settings as a driving force for learning from past mistakes [13, 36, 44, 64]. When individuals engage in upward counterfactual thinking, their motivational orientation towards improvement aligns with the counterfactual focus on a hypothetical “better world”, thus inducing regulatory fit [15]. The positive affect associated with regulatory fit may enhance motivation, persistence, and goal-directed behavior, leading to an increased likelihood of taking action to bridge the gap between the current and desired states [33].

Downward counterfactuals, in contrast, refer to imagining more negative alternatives to the factual state (i.e. , how it would have been worse) [28]. This downward comparison may have different functional implications, and research indeed reveals a complex pattern. On the one hand, downward counterfactual thinking is frequently associated with affective regulation, eliciting relief [46] and reducing regret [38]. Through this positive affective role of inducing a feeling of “I’m better off than I could have been”, it seems to serve a self-enhancement function leading to more favorable self-perception [63]. In this way, downward counterfactual thinking may lead to a sense of complacency, reducing the motivation to act [30]. On the other hand, putting one’s attentional focus on an objectively worse counterfactual possibility may induce negative affect, which may in turn serve as a motivator by signaling that the present condition is inadequate and requires action [30]. Thus, by focusing on mistakes and missed opportunities, downward counterfactuals may potentially highlight areas for improvement. Despite these indications for fundamental differences in how humans reason with upward and downward counterfactuals, this crucial aspect of CFEs’ effectiveness and usability received little attention in xAI research so far. An extensive literature review revealed only two previous papers partially addressing this issue.

First, [43] conducted a study examining the effectiveness of feature rankings and CFEs in two everyday contexts: online advertising and loan applications. Specifically, their second experiment focuses on the directionality of explanations, with a particular emphasis on providing upward CFEs following negative outcomes and downward CFEs following positive outcomes. Participants made trade-off decisions between the two explanation modes, thus indicating their preferences for either feature rankings or CFEs. The results present a notable contrast to the prevailing preference for CFEs within the xAI community. Surprisingly, users show a higher preference for feature rankings over downward CFEs when faced with positive outcomes. This suggests that users found feature rankings more favorable in such scenarios. In the case of negative decisions, users exhibit no significant difference in terms of preference between the explanation formats, selecting upward CFEs as frequently as feature rankings. It is important to note, however, that such an assessment of user preference does not specifically allow drawing conclusions about relative usability differences of the explanation formats. Usability and user preference are two distinct aspects when evaluating the effectiveness of a system; aspects that – while being associated to some extent – often do not align [37]. Users may exhibit a subjective preference for systems or explanations, irrespective of the measurable impact on performance [24, 59].

More recently, [9] exposed participants to a model’s input, its decision, and either counterfactual or causal explanations, framed as a software application built to aid decision-making. Depending on the experimental group, the domain presented to a participant encompassed either a familiar scenario (i.e. , blood alcohol level and driving limit), or an unfamiliar one (i.e. , chemical safety). After rating the perceived helpfulness of the explanation presented, participants reached personal decisions whether they would be prepared to drive/handle an unknown chemical for a series of cases where they only saw the model input (Experiment 2 of [9]). Intriguingly, the personal decisions aligned better with model predictions when the preceding explanations would specifically establish a downward comparison. While this may shed a favorable light on downward CFEs for guiding personal action, it is unclear to which extent the familiarization phase framed as judging the helpfulness of a software application carried over to the subsequent decision-making phase. Furthermore, the reported beneficial effect of explanations that establish a downward comparison presents an incidental finding, as the actual focus of the study was on effects related to domain familiarity.

In light of these inconclusive preliminary findings, we aim to take a first step towards a systematic investigation of their directionality impacts CFEs’ usability as actionable feedback in xAI. Specifically, we ask whether novice users tasked to gain new information from an unknown system in an abstract domain [22] benefit more from receiving upward, downward, or mixed CFE feedback. By examining the effects of directionality, we aim to shed light on a nuance of CFEs that has yet to be explored, contributing to a more comprehensive understanding of their effectiveness and applicability in xAI.

3 Methods

To assess the impact of directionality of CFEs in xAI on user behavior, we employ the game-inspired Alien Zoo framework [22]. Consequently, our study assesses the efficacy of upward, downward, and mixed CFEs in acquiring new knowledge from an automated system in a low-knowledge domain, specifically targeting novice users. The Ethics Committee of Bielefeld University, Germany, approved this study.

3.1 Participants

We determined the required sample size for the current study by running an a-priori power analysis, using openly-available empirical data from an earlier empirical study based on the same experimental paradigm [22]. These exemplary data provided us with realistic estimates for fixed and random effects to be expected in the current study. The power analysis (R package mixedpower v.0.1.0 [23]) indicated that 40 participants per group were required to achieve a power of \(>85\%\) (medium effect size with alpha\(<.05\)).

161 Participants were recruited in April 2023 using Prolific AcademicFootnote 1, and assigned to one of four between-participant conditions in a fixed order: upward CFEs (n = 40), downward CFEs (n = 40), mixed CFEs (i.e. , receiving one downward and one upward CFEs in each feedback round, n = 41), and a no-explanation control group (n = 40). We restricted access to the study to native English speakers from the United States, Australia, Canada, New Zealand, Ireland, and the United Kingdom, who did not previously participate in studies with the given experimental framework. Before participating, users provided informed consent through electronic click wrap agreement.

All participants received a base pay of GBP£4 for participation. The three top performers in each condition received a bonus payment of GBP£1. Together with the experimental instructions, participants were informed about a potential monetary bonus to increase compliance with the task [3].

To ensure sufficient data quality, we applied several exclusion criteria prior to analysis. Specifically, a participant’s responses were removed when they failed more than one attention check (n = 3). No participant displayed monotonous response patterns despite poor performance during the game, indicating high effort. Data from 158 participants contributed to the final game analysis (Table reftab:Participants).

Note that for a number of users, logging of survey responses failed due to technical difficulties (n = 8). Further, we excluded users from the subsequent survey analysis if they answered with positive or negative valence only, indicating low-effort entries (n = 4). Consequently, the survey analysis was based on a subset of 146 users from the game phase.

Table 1. Demographic information of participants.

3.2 Experimental Procedure

A detailed account of procedure and design choices underlying the experimental framework is given in [22].

In short, users who agree to participate are directed to a web server to complete a short online game. As part of the game, participants feed a group of aliens iteratively over several trials, choosing combinations of different plants as food for their aliens. During every feeding choice, users may select up to 6 instances of each of five plants represented by leaf symbols of identical shape but different color (see Fig. 1A for an exemplary decision scene). Each player starts out with an initial pack size of 20 aliens. Tasked to find the best plant combination that makes the alien pack grow instead of declining, top players generating the highest number of aliens per experimental group received an additional monetary bonus.

To facilitate the learning process of what plants make an effective alien diet, participants receive feedback in the form of CFEs after every even trial. The feedback depends on the respective participant’s condition (upward = “Your result would have been BETTER if you had selected:”; downward = “Your result would have been WORSE if you had selected:”; mixed = “Your result would have been BETTER/WORSE if you had selected:”; control = no explanation beyond an overview of past choices). Figure 1B depicts an exemplary feedback scene for a participant in the upward group. The game phase consists of 12 trials (i.e. , 12 feeding choices), with two attention checks assessing user’s attentiveness after trials 3 and 7.

Following the game phase, a survey assessed users’ subjective judgments of presented feedback via a modified version of the system causability scale (SCS, [17]). On top of two items assessing explicit knowledge of feature relevance for task success, this scale measures the extent to which an xAI system provides clear and understandable explanations for its decisions. Users rate the quality of explanations based on factors such as completeness, consistency, relevance, and comprehensibility on a five-point Likert scale. Finally, we collected demographic information on participants’ age and gender, before users received a link to access full debriefing information.

Fig. 1.
figure 1

Exemplary scenes from the Alien Zoo game. To improve visibility for this paper, font size in selected images was increased. (A) Example of a typical decision scene. Users are provided with a summary of their last choice, together with the previous and current pack size (note that the aliens are called ‘Shubs’ in the experimental scenario). Moreover, the page shows a padlock with animated aliens to visualize the current pack size. The right side of the screen shows the plant types alongside upward and downward arrow buttons. Note that plant counters are set to 0 at the beginning of each new decision trial, the image above already shows the next selection (all plants set to 2). (B) Example of a feedback scene for participants in the upward CFE condition, displaying user decision from the last two rounds, respective impact on alien number, and computed CFE. Note that type of feedback varied depending on experimental group.

3.3 Prediction of New Alien Number and Generation of CFE Feedback

During the game, an underlying ML model trained on simulated plant-growth-rate data determines changes in alien number. In each trial, the participant’s feeding choice is passed on to a decision tree regression model [50] to predict a change rate for the current pack size (−10 to +10 aliens per decision, capped not to go below 2). Here, we use the model and training data from a previous study relying on the same experimental framework (maximal tree depth of 5 with Gini splitting rule of CART [6]; see Experiment 2 from [22]). This prior work demonstrated the feasibility of the experimental framework when comparing upward CFE feedback to a no-explanation control condition, promising to yield similarly meaningful insights into potential effects driven by different types of CFEs. In addition, the choice to rely on freely-available material that was previously published provided us with exemplary data distributions to obtain realistic estimates for fixed and random effects for the a priori power analysis.

The corresponding training data entails a dependency between two features (plant 2 and plant 4, respectively) and the output variable. Specifically, the growth rate scales linearly with values 1 to 5 for plant 2, iff plant 4 has a value of 1 or 2. To prevent users from applying a simple ‘the more, the better’ approach, the dependence between growth rate and the value 6 of plant 2 was disrupted.

Together with each prediction, we also compute a CFE presenting an alternative plant combination that differs minimally from the current input via optimization [60]. In our implementation, a CFE \({\textbf{x}_{\text {cf}}}\in \mathbb {R}^d\) of an ML model \(h:\mathbb {R}^d\rightarrow \{Y\}\) is realized as solving:

$$\begin{aligned} \underset{{\textbf{x}_{\text {cf}}}\,\in \, \mathbb {R}^d}{\arg \min }\; {\ell }\big (h({\textbf{x}_{\text {cf}}}), y'\big ) + C \cdot {\theta }({\textbf{x}_{\text {cf}}}, \textbf{x}) \end{aligned}$$
(1)

where \(\textbf{x}\in \mathbb {R}^d\) denotes the original input, the regularization \({\theta }(\cdot )\) penalizes deviations from the original input \(\textbf{x}\) (weighted by a regularization strength \(C>0\)), \(y'\in \{Y\}\) denotes the requested output/behavior of the model \(h(\cdot )\) under the counterfactual \({\textbf{x}_{\text {cf}}}\), and \({\ell }(\cdot )\) denotes a loss function penalizing deviations from the requested prediction. Thus, returned CFEs correspond to minimal perturbations to the model’s input that alter the final prediction to a desired outcome. Given the regularization term \({\theta }(\cdot )\), generated CFEs based on this definition remain as similar to the original input \(\textbf{x}\) as possible.

Depending on the participant’s condition, computed CFEs either increase (upward condition, and odd trials of the mixed condition) or decrease (downward condition, and even trials of the mixed condition) the current growth rate prediction by a few decimal points. After two trials, participants receive these CFEs as feedback to further improve their performance in the game (see Fig. 1B).

3.4 Statistical Analysis

We use R-4.1.1 [41] for all statistical analyses, with experimental condition (control, upward, downward, mixed) as independent variable. Given our longitudinal design, we employ linear mixed models for data analysis to effectively address the correlations that arise from multiple measurements taken from each participant [11, 35]. We investigate systematic differences between experimental groups over the 12 feeding trials (R package: lme4 v.4\(\_\)1.1-27.1) [4], with alien pack size over trials as dependent variable, fixed effects of group, trial number and their interaction, and a by-subjects random intercept. We compared model fits using the analysis of variance function (stats package, base R). Effect sizes are reported as \(\eta _{\text {p}}^{2}\) (R package: effectsize v.0.5) [5]. Pairwise estimated marginal means analysis followed-up significant main effects or interactions, Bonferroni corrected to account for multiple comparisons. We report respective effect sizes in terms of Cohen’s d.

We analyze survey data based on item type. Missing values (i.e., users responding “I do not know.” for items assessing explicit knowledge, or “I prefer not to answer.” for items assessing subjective experience) were removed prior to the survey analysis.

The first two items of the survey evaluate the user’s explicit knowledge of feature relevance for successful task completion. Our goal is to determine a comprehensive measure of user knowledge through rewards and penalties for correct and incorrect responses, respectively. To achieve this, we calculate the number of plants correctly identified per participant (i.e. , the number of matches between ground truth and user input).

The remaining items were adapted from the SCS, a rating scale that allows users to evaluate the extent to which an xAI system’s explanations are clear, transparent, and understandable [17]. Based on their responses to these items, we compute an adapted SCS score for each participant to assess their subjective experience with the game.

Statistically, we investigate potential group differences concerning matches between user input and ground truth, Likert-style survey responses, age, and gender information using the non-parametric Kruskal-Wallis H test (R package: rstatix v.0.7.0) [19], with effect sizes given as \(\eta ^{2}\).

Significant effects revealed via the Kruskal-Wallis H test are followed-up by running pairwise comparisons between group levels, Bonferroni corrected for multiple testing.

4 Results

Overall, the results show group effects both in terms of performance during the game, and user’s explicit knowledge of relevant and irrelevant features. However, we do not detect statistically significant differences when evaluating participants’ subjective experience.

4.1 Game Performance

We evaluate users’ game data to investigate whether naive users benefit comparably from different types of CFEs when tasked to extract knowledge in an unfamiliar domain. Specifically, we compare the number of aliens produced over time for participants receiving either upward CFEs, downward CFEs, mixed CFEs, or no CFEs (control) in the Alien Zoo iterative learning task.

Figure 2A depicts the development of average pack size over trials. All participants depict a positive learning trajectory, but with strikingly different slopes for different groups. The performance curves suggest that the mean number of generated aliens over trials varies as a function of experimental condition, with users receiving no explanations showing the least, and users receiving upward CFE feedback exhibiting the strongest performance increase. The corresponding linear mixed effects model revealed a significant interaction between trial number and group (F(33,1694) = 13.114, p < .001, \(\eta _{\text {p}}^{2}\) = 0.203), confirming this observation.

Fig. 2.
figure 2

Development of mean number of generated aliens per trial, plotted together for all groups (A), and pairwise for those groups showing significant differences in the analysis following-up the significant interaction (B-F). Shaded areas denote the standard error of the mean. Asterisks denote statistical significance with p < .05 (*), p < .01 (**), and p < .001 (***), respectively.

Follow-up analyses reveal an intriguing pattern of distinctive group differences (see Figs. 2B-F).

The trajectories of the control and the upward group diverge significantly from trial 4 onward (t(300) \(\ge \) -2.660, p \(\le \) .0494, d \(\ge \) -1.066), with participants in the upward group clearly outperforming control participants (Fig. 2B). This pattern also holds when comparing the upward group with the two remaining conditions. Trajectories of the upward and the downward group diverge significantly from trial 7 onward (t(300) \(\ge \) 3.016, p \(\le \) .0167, d \(\ge \) 1.224; Fig. 2C). Statistical differences between the upward and the mixed groups emerge starting at trial 8 (t(300) \(\ge \) 2.851, p \(\le \) .0280, d \(\ge \) 1.135; Fig. 2D).

While performing less efficient as participants in the upward group, participants in the mixed condition also achieve statistically higher scores in the last 5 trials compared to control participants (t(300) \(\ge \) -2.891, p \(\le \) .0247, d \(\ge \) -1.144; Fig. 2E), and in the last 2 trials compared to downward participants (t(300) \(\ge \) -3.121, p \(\le \) .0119, d \(\ge \) -1.251; Fig. 2F). Only trajectories of participants in the control and downward conditions do not show any statistically meaningful differences.

This interaction is complemented by a significant main effect of trial number (F(11,1694) = 95.573, p < .001, \(\eta _{\text {p}}^{2}\) = 0.380), and group (F(3,154) = 11.423, p < .001, \(\eta _{\text {p}}^{2}\) = 0.180).

4.2 Assessing User’s Explicit Knowledge

The first two items of the survey phase assess participants’ explicit knowledge of feature relevance for task completion. Across the two items, a participant could potentially reach 10 correct decisions by matching up their responses with the ground truth perfectly. In terms of mean number of matches between ground truth and user judgments, participants in the control condition matched highest (M = 6.700 ± 0.548 SE), followed by participants in the upward (M = 6.615 ± 0.261 SE), mixed (M = 6.000 ± 0.342 SE), and downward condition (M = 5.241 ± 0.390 SE; Fig. 3A). The corresponding statistical analysis reveals a significant effect of group (H(3) = 10.9, p = .012, \(\eta ^{2}\) = 0.077). Follow-up pairwise comparisons show that participants in the upward match significantly higher than participants in the downward condition (p = .028).

Fig. 3.
figure 3

(A) Mean number of matches between user input and ground truth in first two items of survey, assessing whether participants can correctly identify plants that are relevant and irrelevant for task success. The dashed line shows the maximally attainable number of matches (i.e. , user responses perfectly aligned with ground truth). Error bars denote the standard error of the mean. The asterisk denotes statistical significance with p < 0.05 (*). (B) Mean adapted SCS scores across groups. Error bars denote the standard error of the mean.

4.3 Assessing User’s Subjective Experience

A modified version of the SCS informs whether participants perceive provided explanations as clear, understandable, and usable. As shown in Fig. 3B, participants in the control (M = 0.760 ± 0.025 SE), upward (M = 0.756 ± 0.029 SE), and mixed (M = 0.754 ± 0.019 SE) conditions achieve very similar scores, while the mean SCS score of downward participants is slightly lower (M = 0.705 ± 0.023 SE). There is no statistically significant effect of group in terms of SCS scores (H(3) = 5.36, p = .147, \(\eta ^{2}\) = 0.017).

5 Discussion

The current study investigates the impact of directionality of CFEs for xAI on objective task performance, explicit knowledge, and subjective experience of novice users during an iterative learning paradigm in an unfamiliar domain.

The results suggest that participants benefit most from receiving upward CFE feedback (i.e. , informing them what choices would have been better), outperforming participants in all other conditions (Fig. 2). Consequently, we replicate prior work showing that upward CFEs induce a significant performance advantage over a no-explanation control [22] in the employed experimental framework, and extend previous insights by the aspect of directionality. In the current experimental setting, upward counterfactuals may have provided novice users with interpretable and clear pathways for actions that improve future behavior [7]. This is in line with previous psychological research demonstrating that reflecting upon “better worlds” may serve as a driving force for learning and adapting behavior [13, 36, 44, 64]. Given the current task, the striking positive impact of upward CFEs is in line with the psychological concept of regulatory fit, as describing how a choice would have been better matches the motivational orientation to improve one’s feeding choices [15]. Previous work in various domains suggests that such a feeling of fit induces more effective and satisfying performance, as well as greater persistence and motivation to continue the task [15, 26, 33]. A similar mechanism may be in effect in the current setting.

Intriguingly, participants who receive mixed CFE feedback also achieve statistically higher scores compared to control and downward groups, specifically towards later trials. Considering that downward CFEs do not improve user performance compared to providing no explanations at all (control), we may suspect that users in the mixed condition benefit from receiving feedback that partially possesses regulatory fit.

A previous study suggests a beneficial effect of explanations that establish a downward comparison [9]. However, other research shows that downward CFEs are relatively more disliked compared to feature rankings in terms of user preference [43]. Intriguingly, participants receiving downward CFEs in the current work do not show statistically meaningful differences in terms of task performance compared to participants receiving no explanations at all. While the current discrepancy of the downward group from the performance of the two other CFE groups is striking, it merits only cautious interpretation generally devoted to null effects. On the one hand, downward CFE feedback may have simply induced complacency that impeded participants’ motivation to act, knowing that there still was a worse route they could have taken [30, 44]. On the other hand, the current scenario may have been inadequate to unleash the negative affect necessary to stimulate action through the presented downward comparisons. Unsuccessfully feeding aliens had only limited personal consequences for participants, potentially keeping the level of perceived regret following a sub-optimal decision comparatively low. Consequently, the beneficial impact of downward CFEs in terms of regret minimization could not be observed [38]. A future study could investigate this possibility more closely via an adapted design that involves increased personal costs, thereby implementing a penalty for poor decision-making and a higher chance of inducing regret.

In terms of explicit knowledge, participants in the upward condition identified relevant and irrelevant input features more readily than in the downward condition, in line with the performance advantage for upward CFEs. This suggests that – in tasks requiring users to extract new information from a system – upward CFEs may be the better option for enhancing user’s explicit knowledge. A curious detail meriting comment, however, concerns the comparably high performance in terms of explicit knowledge by control users that do not receive explanations at all. This may be explained by the relatively high proportion of control participants indicating that they “do not know” for items assessing explicit knowledge, thus being discarded from this analysis. The remaining data may represent control individuals who are the most knowledgeable and confident in their responses.

In terms of objectively quantifiable measures, our study found tangible behavioral group differences, in stark contrast to user responses concerning the subjective usability of explanations provided. This observation is consistent with the literature on the mismatch between these two measures [21, 61], and further highlights the need to carefully consider both subjective and objective measures when evaluating . (xAI) approaches.

5.1 Limitations

In order to provide a comprehensive evaluation of the current findings, it is important to acknowledge and address limitations inherent in this study.

Given the experimental Alien Zoo framework, the results are obtained in relation to a very specific context and with a specific task, diverging from many real-life domains. Today, we are already witnessing the significant impact of AI-based decision-making systems across a wide range of domains, including but not limited to health care [42], the legal system [34], and human resource management [58]. We carefully considered the trade-offs involved in selecting a specific context. Ultimately, the primary objective of investigating the usability and impact of counterfactual directionality on user behavior and experience, motivated the choice for a single and quite abstract domain (i.e. , feeding aliens). This set-up allowed us to maintain a high level of control over experimental variables to isolate effects driven by directionality of CFEs, while minimizing confounding variables that could arise from varying contexts. Importantly, participants could engage with the counterfactual explanations and extract new knowledge from the automated system without being influenced by pre-existing domain knowledge. Thus, the current approach facilitated a more detailed analysis of how CFE directionality specifically affects the task at hand and the extraction of new knowledge from an automated system. Uncovering these specific dynamics within a well-defined context provides a first step, laying out the foundation for future work. Results await validation across various domains, tasks, and user populations, to contribute to a more comprehensive understanding of the broader applicability and usefulness of CFEs across different scenarios.

Similarly, the current work does not cover different approaches for generating the CFEs. As outlined in Eq. 1, we follow an optimization approach to generate minimal adjustments to the model’s input that – depending on experimental condition – either increase or decrease the predicted growth rate. This approach is based on the initial definition of CFE generation for ML [60] and various methods expand on the idea of using optimization principles for generating CFEs [29, 32, 51]. Alternative approaches generate counterfactual instances based on, e.g. , reinforcement learning [48, 57] or conditional generative adversarial networks [55, 65]. It is quite conceivable that the respective method for generating counterfactual explanations could indeed influence the final results. Different methods may introduce variations in the characteristics, interpretability, and quality of the explanations. Therefore, we have taken great care to select an optimization-based approach that aligns with established practices in the field.

A further potential confound of the design may be that it favors early discovery of an effective strategy, resulting in better performance over the duration of the experiment as the performance measure (number of generated aliens) accumulates over time. Finally, the current study neglects to account for individual user characteristics. It may be that anxiety-prone individuals respond more strongly to downward CFE feedback, given altered emotional and probabilistic appraisal of upward counterfactual thinking in individuals with high levels of trait anxiety [39]. Thus, further research is necessary to obtain a more comprehensive understanding of the role of directionality of CFEs in xAI.

5.2 Contribution to Knowledge for xAI

The findings presented in this study have significant implications for the field of explainable and trustworthy artificial intelligence. CFEs have emerged as a popular approach in xAI, as they provide insights into the changes required in input data to influence a model’s output. This study specifically focuses on the directionality of CFEs, distinguishing between upward counterfactuals (describing scenarios better than the factual state) and downward counterfactuals (describing scenarios worse than the factual state).

Our results demonstrate the importance of CFE directionality in shaping behavior and experience of novice users when interacting with an unknown automated system in an unfamiliar domain to extract new knowledge. The findings indicate that upward CFEs offer a significant performance advantage over other forms of counterfactual feedback in the given explanation context. Specifically, users were able to extract new knowledge more effectively and demonstrated higher explicit knowledge of the system when provided with upward CFEs compared to downward CFEs.

These findings point towards critical role of regulatory fit in determining the effectiveness of model explanations [33]. Regulatory fit refers to the alignment between an explanation and the task at hand. In the context of xAI, this implies that the directionality of CFEs should be carefully considered to ensure they are relevant and meaningful to the users’ objectives and cognitive processes. By providing explanations that align with users’ goals and expectations, xAI systems can enhance user performance and improve their understanding of the underlying models [31].

The impact of these findings on xAI as a sub-field of artificial intelligence is substantial. xAI aims to bridge the gap between black-box models and human comprehension, enabling users to trust, interpret, and interact with automated systems more effectively. By identifying the advantages of upward CFEs and the potential benefits of mixed CFEs, this study contributes to the development of more effective and user-centric explainability techniques. Understanding the directionality of CFEs provides valuable insights into how explanations can be tailored to meet users’ needs and improve their decision-making processes.

Furthermore, these findings have broader implications for the wider xAI community. Researchers and practitioners in xAI can leverage this knowledge to design better explainable systems. They can incorporate the directionality of CFEs into the design of xAI interfaces, ensuring that explanations are presented in a way that maximizes user understanding and performance. Additionally, these findings highlight the importance of user-centric evaluation methodologies in xAI research, as they provide valuable insights into the impact of explanations on user behavior and knowledge acquisition.

5.3 Conclusion

The canonical example illustrating the concept of counterfactuals in xAI is an upward CFE: “If you had done X, your loan would have been approved.” The current results, suggesting that upward CFEs are most effective for guiding decision-making, may explain why this example is considered to be an inherently intuitive prototype. Further, the results of this study provide renewed evidence for the importance of considering not only algorithmic aspects of explainability approaches, but also their effectiveness during hands-on human-system interaction. Specifically, they give reason to assume that regulatory fit, i.e. , the alignment between an explanation and the task at hand, may act as a potentially crucial factor in determining the effectiveness of model explanations.