Abstract
Algorithmic recourse is concerned with aiding individuals who are unfavorably treated by automated decisionmaking systems to overcome their hardship, by offering recommendations that would result in a more favorable prediction when acted upon. Such recourse actions are typically obtained through solving an optimization problem that minimizes changes to the individual’s feature vector, subject to various plausibility, diversity, and sparsity constraints. Whereas previous works offer solutions to the optimization problem in a variety of settings, they critically overlook realworld considerations pertaining to the environment in which recourse actions are performed.
The present work emphasizes that changes to a subset of the individual’s attributes may have consequential downstream effects on other attributes, thus making recourse a fundamcausal problem. Here, we model such considerations using the framework of structural causal models, and highlight pitfalls of not considering causal relations through examples and theory. Such insights allow us to reformulate the optimization problem to directly optimize for minimallycostly recourse over a space of feasible actions (in the form of causal interventions) rather than optimizing for minimallydistant “counterfactual explanations”. We offer both the optimization formulations and solutions to deterministic and probabilistic recourse, on an individualized and subpopulation level, overcoming the steep assumptive requirements of offering recourse in general settings. Finally, using synthetic and semisynthetic experiments based on the German Credit dataset, we demonstrate how such methods can be applied in practice under minimal causal assumptions.
A.H. Karimi and J. von Kügelgen—Equal contribution.
This chapter is mostly based on the following two works:
1. Karimi, A. H., Schölkopf, B., & Valera, I. Algorithmic recourse: from counterfactual explanations to interventions. In: Proceedings of the 4th Conference on Fairness, Accountability, and Transparency (FAccT 2021). pp. 353–362 (2021).
2. Karimi, A. H., von Kügelgen, J., Schölkopf, B., & Valera, I. Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. In: Advances in Neural Information Processing Systems 33 (NeurIPS 2020). pp. 265–277 (2020).
Download chapter PDF
1 Introduction
Predictive models are being increasingly used to support consequential decisionmaking in a number of contexts, e.g., denying a loan, rejecting a job applicant, or prescribing lifealtering medication. As a result, there is mounting social and legal pressure [64, 72] to provide explanations that help the affected individuals to understand “why a prediction was output”, as well as “how to act” to obtain a desired outcome. Answering these questions, for the different stakeholders involved, is one of the main goals of explainable machine learning [15, 19, 32, 37, 42, 53, 54].
In this context, several works have proposed to explain a model’s predictions of an affected individual using counterfactual explanations, which are defined as statements of “how the world would have (had) to be different for a desirable outcome to occur” [76]. Of specific importance are nearest counterfactual explanations, presented as the most similar instances to the feature vector describing the individual, that result in the desired prediction from the model [25, 35]. A closely related term is algorithmic recourse—the actions required for, or “the systematic process of reversing unfavorable decisions by algorithms and bureaucracies across a range of counterfactual scenarios”—which is argued as the underwriting factor for temporally extended agency and trust [70].
Counterfactual explanations have shown promise for practitioners and regulators to validate a model on metrics such as fairness and robustness [25, 58, 69]. However, in their raw form, such explanations do not seem to fulfill one of the primary objectives of “explanations as a means to help a datasubject act rather than merely understand” [76].
The translation of counterfactual explanations to recourse actions, i.e., to a recommendable set of actions to help an individual achieve a favorable outcome, was first explored in [69], where additional feasibility constraints were imposed to support the concept of actionable features (e.g., to prevent asking the individual to reduce their age or change their race). While a step in the right direction, this work and others that followed [25, 41, 49, 58] implicitly assume that the set of actions resulting in the desired output would directly follow from the counterfactual explanation. This arises from the assumption that “what would have had to be in the past” (retrodiction) not only translates to “what should be in the future” (prediction) but also to “what should be done in the future” (recommendation) [63]. We challenge this assumption and attribute the shortcoming of existing approaches to their lack of consideration for realworld properties, specifically the causal relationships governing the physical world in which actions are performed.
1.1 Motivating Examples
Example 1
Consider, for example, the setting in Fig. 1 where an individual has been denied a loan and seeks an explanation and recommendation on how to proceed. This individual has an annual salary (\(X_1\)) of \(\$75,000\) and an account balance (\(X_2\)) of \(\$25,000\) and the predictor grants a loan based on the binary output of \(h(X_1,X_2) = \mathrm {sgn}(X_1 + 5 \cdot X_2  \$225,000)\). Existing approaches may identify nearest counterfactual explanations as another individual with an annual salary of \(\$100,000\) (\(+33\%\)) or a bank balance of \(\$30,000\) (\(+20\%\)), therefore encouraging the individual to reapply when either of these conditions are met. On the other hand, assuming actions take place in a world where homeseekers save \(30\%\) of their salary, up to external fluctuations in circumstance, (i.e., \(X_2 \,{:}{=}\,0.3X_1 + U_2\)), a salary increase of only \(+14\%\) to \(\$85,000\) would automatically result in \(\$3,000\) additional savings, with a net positive effect on the loangranting algorithm’s decision.
Example 2
Consider now another instance of the setting of Fig. 1 in which an agricultural team wishes to increase the yield of their rice paddy. While many factors influence yield (temperature, solar radiation, water supply, seed quality, ...), assume that the primary actionable capacity of the team is their choice of paddy location. Importantly, the altitude (\(X_1\)) at which the paddy sits has an effect on other variables. For example, the laws of physics may imply that a 100m increase in elevation results in an average decrease of 1\(^{\circ }\)C in temperature (\(X_2\)). Therefore, it is conceivable that a counterfactual explanation suggesting an increase in elevation for optimal yield, without consideration for downstream effects of the elevation increase on other variables (e.g., a decrease in temperature), may actually result in the prediction not changing.
These two examples illustrate the pitfalls of generating recourse actions directly from counterfactual explanations without consideration for the (causal) structure of the world in which the actions will be performed. Actions derived directly from counterfactual explanations may ask too much effort from the individual (Example 1) or may not even result in the desired output (Example 2).
We also remark that merely accounting for correlations between features (instead of modeling their causal relationships) would be insufficient as this would not align with the asymmetrical nature of causal interventions: for Example 1, increasing bank balance (\(X_2\)) would not lead to a higher salary (\(X_1\)), and for Example 2, increasing temperature (\(X_2\)) would not affect altitude (\(X_1\)), contrary to what would be predicted by a purely correlationbased approach.
1.2 Summary of Contributions and Structure of This Chapter
In the present work, we remedy this situation via a fundamental reformulation of the recourse problem: we rely on causal reasoning (Sect. 2.2) to incorporate knowledge of causal dependencies between features into the process of recommending recourse actions that, if acted upon, would result in a counterfactual instance that favorably changes the output of the predictive model (Sect. 2.1).
First, we illuminate the intrinsic limitations of an approach in which recourse actions are directly derived from counterfactual explanations (Sect. 3.1). We show that actions derived from precomputed (nearest) counterfactual explanations may prove suboptimal in the sense of higherthannecessary cost, or, even worse, ineffective in the sense of not actually achieving recourse. To address these limitations, we emphasize that, from a causal perspective, actions correspond to interventions which not only model changes to the intervenedupon variables, but also downstream effects on the remaining (nonintervenedupon) variables. This insight leads us to propose a new framework of recourse through minimal interventions in an underlying structural causal model (SCM) (Sect. 3.2). We complement this formulation with a negative result showing that recourse guarantees are generally only possible if the true SCM is known (Sect. 3.3).
Second, since realworld SCMs are rarely known we focus on the problem of algorithmic recourse under imperfect causal knowledge (Sect. 4). We propose two probabilistic approaches which allow to relax the strong assumption of a fullyspecified SCM. In the first (Sect. 4.1), we assume that the true SCM, while unknown, is an additive Gaussian noise model [23, 47]. We then use Gaussian processes (GPs) [79] to average predictions over a whole family of SCMs to obtain a distribution over counterfactual outcomes which forms the basis for individualised algorithmic recourse. In the second (Sect. 4.2), we consider a different subpopulationbased (i.e., interventional rather than counterfactual) notion of recourse which allows us to further relax our assumptions by removing any assumptions on the form of the structural equations. This approach proceeds by estimating the effect of interventions on individuals similar to the one for which we aim to achieve recourse (i.e., the conditional average treatment effect [1]), and relies on conditional variational autoencoders [62] to estimate the interventional distribution. In both cases, we assume that the causal graph is known or can be postulated from expert knowledge, as without such an assumption causal reasoning from observational data is not possible [48, Prop. 4.1]. To find minimum cost interventions that achieve recourse with a given probability, we propose a gradientbased approach to solve the resulting optimisation problems (Sect. 4.3).
Our experiments (Sect. 5) on synthetic and semisynthetic loan approval data, show the need for probabilistic approaches to achieve algorithmic recourse in practice, as point estimates of the underlying true SCM often propose invalid recommendations or achieve recourse only at higher cost. Importantly, our results also suggest that subpopulationbased recourse is the right approach to adopt when assumptions such as additive noise do not hold. A userfriendly implementation of all methods that only requires specification of the causal graph and a training set is available at https://github.com/amirhk/recourse.
2 Preliminaries
In this work, we consider algorithmic recourse through the lens of causality. We begin by reviewing the main concepts.
2.1 XAI: Counterfactual Explanations and Algorithmic Recourse
Let \(\mathbf {X}\,=\,(X_1, ..., X_d)\) denote a tuple of random variables, or features, taking values \(\mathbf {x}\,=\,(x_1, ..., x_d)\in \mathcal {X}\,=\,\mathcal {X}_1\times ...\times \mathcal {X}_d\). Assume that we are given a binary probabilistic classifier \(h:\mathcal {X}\rightarrow [0,1]\) trained to make decisions about i.i.d. samples from the data distribution \(P_{\mathbf {X}}\).^{Footnote 1}
For ease of illustration, we adopt the setting of loan approval as a running example, i.e., \(h(\mathbf {x})\ge 0.5\) denotes that a loan is granted and \(h(\mathbf {x})<0.5\) that it is denied. For a given (“factual”) individual \(\mathbf {x}^\texttt {F}\) that was denied a loan, \(h(\mathbf {x}^\texttt {F})<0.5\), we aim to answer the following questions: “Why did individual \(\mathbf {x}^\texttt {F}\) not get the loan?” and “What would they have to change, preferably with minimal effort, to increase their chances for a future application?”.
A popular approach to this task is to find socalled (nearest) counterfactual explanations [76], where the term “counterfactual” is meant in the sense of the closest possible world with a different outcome [36]. Translating this idea to our setting, a nearest counterfactual explanation \(\mathbf {x}^\texttt {CFE}\) for an individual \(\mathbf {x}^\texttt {F}\) is given by a solution to the following optimisation problem:
where \(\text {dist}(\cdot ,\cdot )\) is a distance on \(\mathcal {X}\times \mathcal {X}\), and additional constraints may be added to reflect plausibility, feasibility, or diversity of the obtained counterfactual explanations [22, 24, 25, 39, 41, 49, 58]. Most existing approaches have focused on providing solutions to (1) by exploring semantically meaningful choices of \(\mathrm {dist}(\cdot , \cdot )\) for measuring similarity between individuals (e.g., \(\ell _0, \ell _1, \ell _\infty \), percentileshift), accommodating different predictive models \(h\) (e.g., random forest, multilayer perceptron), and realistic plausibility constraints \(\mathcal {P}\subseteq \mathcal {X}\).^{Footnote 2}
Although nearest counterfactual explanations provide an understanding of the most similar set of features that result in the desired prediction, they stop short of giving explicit recommendations on how to act to realize this set of features. The lack of specification of the actions required to realize \(\mathbf {x}^\texttt {CFE}\) from \(\mathbf {x}^\texttt {F}\) leads to uncertainty and limited agency for the individual seeking recourse. To shift the focus from explaining a decision to providing recommendable actions to achieve recourse, Ustun et al. [69] reformulated (1) as:
where \(\text {cost}^\texttt {F}(\cdot )\) is a userspecified cost function that encodes preferences between feasible actions from \(\mathbf {x}^\texttt {F}\), and \(\mathcal {F}\) and \(\mathcal {P}\) are optional sets of feasibility and plausibility constraints,^{Footnote 3} restricting the actions and the resulting counterfactual explanation, respectively. The feasibility constraints in (2), as introduced in [69], aim at restricting the set of features that the individual may act upon. For instance, recommendations should not ask individuals to change their gender or reduce their age. Henceforth, we refer to the optimization problem in (2) as CFEbased recourse problem, where the emphasis is shifted from minimising a distance as in (1) to optimising a personalised cost function \(\text {cost}^\texttt {F}(\cdot )\) over a set of actions \(\boldsymbol{\delta }\) which individual \(\mathbf {x}^\texttt {F}\) can perform.
The seemingly innocent reformulation of the counterfactual explanation problem in (1) as a recourse problem in (2) is founded on two key assumptions.
Assumption 1
The featurewise difference between factual and nearest counterfactual instances, \(\mathbf {x}^\texttt {CFE} \mathbf {x}^\texttt {F}\), directly translates to minimal action sets \(\boldsymbol{\delta }^*\), such that performing the actions in \(\boldsymbol{\delta }^*\) starting from \(\mathbf {x}^\texttt {F}\) will result in \(\mathbf {x}^\texttt {CFE}\).
Assumption 2
There is a 11 mapping between \(\mathrm {dist}(\cdot , \mathbf {x}^\texttt {F})\) and \(\mathrm {cost}^\texttt {F}(\cdot )\), whereby more effortful actions incur larger distance and higher cost.
Unfortunately, these assumptions only hold in restrictive settings, rendering solutions of (2) suboptimal or ineffective in many realworld scenarios. Specifically, Assumption 1 implies that features \(X_i\) for which \(\delta ^*_i\,=\,0\) are unaffected. However, this generally holds only if (i) the individual applies effort in a world where changing a variable does not have downstream effects on other variables (i.e., features are independent of each other); or (ii) the individual changes the value of a subset of variables while simultaneously enforcing that the values of all other variables remain unchanged (i.e., breaking dependencies between features). Beyond the suboptimality that arises from assuming/reducing to an independent world in (i), and disregarding the feasibility of nonaltering actions in (ii), nonaltering actions may naturally incur a cost which is not captured in the current definition of cost, and hence Assumption 2 does not hold either. Therefore, except in trivial cases where the model designer actively inputs pairwise independent features (independently manipulable inputs) to the classifier \(h\) (see Fig. 2a), generating recommendations from counterfactual explanations in this manner, i.e., ignoring the potentially rich causal structure over \(\mathbf {X}\) and the resulting downstream effects that changes to some features may have on others (see Fig. 2b), warrants reconsideration. A number of authors have argued for the need to consider causal relations between variables when generating counterfactual explanations [25, 39, 41, 69, 76], however, this has not yet been formalized.
2.2 Causality: Structural Causal Models, Interventions, and Counterfactuals
To reason formally about causal relations between features \(\mathbf {X}\,=\,(X_1, ..., X_d)\), we adopt the structural causal model (SCM) framework [45].^{Footnote 4} Specifically, we assume that the datagenerating process of \(\mathbf {X}\) is described by an (unknown) underlying SCM \(\mathcal {M}\) of the general form
where the structural equations \(\mathbf {S}\) are a set of assignments generating each observed variable \(X_r\) as a deterministic function \(f_r\) of its causal parents \(\mathbf {X}_{\text {pa}(r)}\subseteq \mathbf {X}\setminus X_r\) and an unobserved noise variable \(U_r\). The assumption of mutually independent noises (i.e., a fully factorised \(P_\mathbf {U}\)) entails that there is no hidden confounding and is referred to as causal sufficiency. An SCM is often illustrated by its associated causal graph \(\mathcal {G}\), which is obtained by drawing a directed edge from each node in \(\mathbf {X}_{\text {pa}(r)}\) to \(X_r\) for \(r\in [d]:=\{1,\ldots , d\}\), see Fig. 1 and Fig. 2b for examples. We assume throughout that \(\mathcal {G}\) is acyclic. In this case, \(\mathcal {M}\) implies a unique observational distribution \(P_\mathbf {X}\), which factorises over \(\mathcal {G}\), defined as the pushforward of \(P_\mathbf {U}\) via \(\mathbf {S}\).^{Footnote 5}
Importantly, the SCM framework also entails interventional distributions describing a situation in which some variables are manipulated externally. E.g., using the dooperator, an intervention which fixes \(\mathbf {X}_\mathcal {I}\) to \({\boldsymbol{\theta }}\) (where \(\mathcal {I}\subseteq [d]\)) is denoted by \(do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})\). The corresponding distribution of the remaining variables \(\mathbf {X}_{\mathcal {I}}\) can be computed by replacing the structural equations for \(\mathbf {X}_\mathcal {I}\) in \(\mathbf {S}\) to obtain the new set of equations \(\mathbf {S}^{do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})}\). The interventional distribution \(P_{\mathbf {X}_{\mathcal {I}}do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})}\) is then given by the observational distribution implied by the manipulated SCM \(\left( \mathbf {S}^{do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})}, P_\mathbf {U}\right) \).
Similarly, an SCM also implies distributions over counterfactuals—statements about a world in which a hypothetical intervention was performed all else being equal. For example, given observation \(\mathbf {x}^\texttt {F}\) we can ask what would have happened if \(\mathbf {X}_\mathcal {I}\) had instead taken the value \({\boldsymbol{\theta }}\). We denote the counterfactual variable by \(\mathbf {X}(do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }}))\mathbf {x}^\texttt {F}\), whose distribution can be computed in three steps [45]:

1.
Abduction: compute the posterior distribution \(P_{\mathbf {U}\mathbf {x}^\texttt {F}}\) of the exogenous variables \(\mathbf {U}\) given the factual observation \(\mathbf {x}^\texttt {F}\);

2.
Action: perform the intervention \(do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})\) by replacing the structural equations for \(\mathbf {X}_\mathcal {I}\) by \(\mathbf {X}_\mathcal {I}:={\boldsymbol{\theta }}\) to obtain the new structural equations \(\mathbf {S}^{do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})}\);

3.
Prediction: the counterfactual distribution \(P_{\mathbf {X}(do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }}))\mathbf {x}^\texttt {F}}\) is the distribution induced by the resulting SCM \(\left( \mathbf {S}^{do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})}, P_{\mathbf {U}\mathbf {x}^\texttt {F}}\right) \).
For instance, the counterfactual variable for individual \(\mathbf {x}^\texttt {F}\) had action \(a\,=\,do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})\in \mathcal {F}\) been performed would be \(\mathbf {X}^\texttt {SCF}(a) := \mathbf {X}(a)  \mathbf {x}^\texttt {F}\). For a workedout example of computing counterfactuals in SCMs, we refer to Sect. 3.2.
3 Causal Recourse Formulation
3.1 Limitations of CFEBased Recourse
Here, we use causal reasoning to formalize the limitations of the CFEbased recourse approach in (2). To this end, we first reinterpret the actions resulting from solving the CFEbased recourse problem, i.e., \(\boldsymbol{\delta }^*\), as structural interventions by defining the set of indices \(\mathcal {I}\) of observed variables that are intervened upon.
Definition 1
(CFEbased actions). Given an individual \(\mathbf {x}^\texttt {F}\) in world \(\mathcal {M}\) and a solution \(\boldsymbol{\delta }^*\) of (2), denote by \(\mathcal {I}= \{i ~~ \delta ^*_i \ne 0\}\) the set of indices of observed variables that are acted upon. A CFEbased action then refers to a set of structural interventions of the form \(a^\texttt {CFE}(\boldsymbol{\delta }^*,\mathbf {x}^\texttt {F}) := \mathrm {do}( \{ X_i \,{:}{=}\, x^F_i + \delta ^*_i \}_{i \in \mathcal {I}} )\).
Using Definition 1, we can derive the following key results that provide necessary and sufficient conditions for CFEbased actions to guarantee recourse.
Proposition 1
A CFEbased action \(a^\texttt {CFE}(\boldsymbol{\delta }^*,\mathbf {x}^\texttt {F})\) in general (i.e., for arbitrary underlying causal models) results in the structural counterfactual \(\mathbf {x}^\texttt {SCF}\,=\, \mathbf {x}^\texttt {CFE}:= \mathbf {x}^\texttt {F}+ \boldsymbol{\delta }^*\) and thus guarantees recourse (i.e., \(h(\mathbf {x}^\texttt {SCF})\ne h(\mathbf {x}^\texttt {F})\)) if and only if the set of descendants of the acted upon variables determined by \(\mathcal {I}\) is the empty set.
Corollary 1
If all features in the true world \(\mathcal {M}\) are mutually independent, (i.e., if they are all rootnodes in the causal graph), then CFEbased actions always guarantee recourse.
While the above results are formally proven in Appendix A of [28], we provide a sketch of the proof below. If the intervenedupon variables do not have descendants, then by definition \(\mathbf {x}^\texttt {SCF}\,=\, \mathbf {x}^\texttt {CFE}\). Otherwise, the value of the descendants will depend on the counterfactual value of their parents, leading to a structural counterfactual that does not resemble the nearest counterfactual explanation, \(\mathbf {x}^\texttt {SCF}\ne \mathbf {x}^\texttt {CFE}\), and thus may not result in recourse. Moreover, in an independent world the set of descendants of all the variables is by definition the empty set.
Unfortunately, the independent world assumption is not realistic, as it requires all the features selected to train the predictive model \(h\) to be independent of each other. Moreover, limiting changes to only those variables without descendants may unnecessarily limit the agency of the individual, e.g., in Example 1, restricting the individual to only changing bank balance without e.g., pursuing a new/side job to increase their income would be limiting. Thus, for a given nonindependent \(\mathcal {M}\) capturing the true causal dependencies between features, CFEbased actions require the individual seeking recourse to enforce (at least partially) an independent postintervention model \(\mathcal {M}^{a^\texttt {CFE}}\) (so that Assumption 1 holds), by intervening on all the observed variables for which \(\delta _i \ne 0\) as well as on their descendants (even if their \(\delta _i \,=\, 0\)). However, such requirement suffers from two main issues. First, it conflicts with Assumption 2, since holding the value of variables may still imply potentially infeasible and costly interventions in \(\mathcal {M}\) to sever all the incoming edges to such variables, and even then it may be ineffective and not change the prediction (see Example 2). Second, as will be proven in the next section (see also, Example 1), CFEbased actions may still be suboptimal, as they do not benefit from the causal effect of actions towards changing the prediction. Thus, even when equipped with knowledge of causal dependencies, recommending actions directly from counterfactual explanations in the manner of existing approaches is not satisfactory.
3.2 Recourse Through Minimal Interventions
We have demonstrated that actions which immediately follow from counterfactual explanations may require unrealistic assumptions, or alternatively, result in suboptimal or even infeasible recommendations. To solve such limitations we rewrite the recourse problem so that instead of finding the minimal (independent) shift of features as in (2), we seek the minimal cost set of actions (in the form of structural interventions) that results in a counterfactual instance yielding the favorable output from \(h\). For simplicity, we present the formulation for the case of an invertible SCM (i.e., one with invertible structural equations \(\mathbf {S}\)) such that the groundtruth counterfactual \(\mathbf {x}^\texttt {SCF}\,=\, \mathbf {S}^{a}(\mathbf {S}^{1}({\mathbf {x}^\texttt {F}}))\) is a unique point. The resulting optimisation formulation is as follows:
where \(a^*\in \mathcal {F}\) directly specifies the set of feasible actions to be performed for minimally costly recourse, with \(\text {cost}^\texttt {F}(\cdot )\).^{Footnote 6}
Importantly, using the formulation in (4) it is now straightforward to show the suboptimality of CFEbased actions (proof in Appendix A of [28]):
Proposition 2
Given an individual \(\mathbf {x}^\texttt {F}\) observed in world \(\mathcal {M}\), a set of feasible actions \(\mathcal {F}\), and a solution \(a^*\in \mathcal {F}\) of (4), assume that there exists a CFEbased action \(a^\texttt {CFE}(\boldsymbol{\delta }^*,\mathbf {x}^\texttt {F}) \in \mathcal {F}\) (see Definition 1) that achieves recourse, i.e., \(h(\mathbf {x}^\texttt {F}) \ne h(\mathbf {x}^\texttt {CFE})\). Then, \(\text {cost}^\texttt {F}(a^*) \le \text {cost}^\texttt {F}(a^\texttt {CFE}) \).
Thus, for a known causal model capturing the dependencies among observed variables, and a family of feasible interventions, the optimization problem in (4) yields Recourse through Minimal Interventions (MINT). Generating minimal interventions through solving (4) requires that we be able to compute the structural counterfactual, \(\mathbf {x}^\texttt {SCF}\), of the individual \(\mathbf {x}^\texttt {F}\) in world \(\mathcal {M}\), given any feasible action \(a\in \mathcal {F}\). To this end, and for the purpose of demonstration, we consider a class of invertible SCMs, specifically, additive noise models (ANM) [23], where the structural equations \(\mathbf {S}\) are of the form
and propose to use the three steps of structural counterfactuals in [45] to assign a single counterfactual \(\mathbf {x}^\texttt {SCF}(a):=\mathbf {x}(a)\mathbf {x}^\texttt {F}\) to each action \(a\,=\,do(\mathbf {X}_\mathcal {I}={\boldsymbol{\theta }})\in \mathcal {F}\) as below.
Working Example. Consider the model in Fig. 3, where \(\{U_i\}_{i=1}^4\) are mutually independent exogenous variables, and \(\{f_i\}_{i=1}^4\) are deterministic (linear or nonlinear) functions. Let \(\mathbf {x}^\texttt {F}= (x^\texttt {F}_1, x^\texttt {F}_2, x^\texttt {F}_3, x^\texttt {F}_4)^\top \) be the observed features belonging to the (factual) individual seeking recourse. Also, let \(\mathcal {I}\) denote the set of indices corresponding to the subset of endogenous variables that are intervened upon according to the action set \(a\). Then, we obtain a structural counterfactual, \(\mathbf {x}^\texttt {SCF}(a) := \mathbf {x}(a)  \mathbf {x}^\texttt {F}= \mathbf {S}^{a}(\mathbf {S}^{1}({\mathbf {x}^\texttt {F}}))\), by applying the AbductionActionPrediction steps [46] as follows:
Step 1. Abduction uniquely determines the value of all exogenous variables \(\mathbf {U}\) given the observed evidence \(\mathbf {X}=\mathbf {x}^\texttt {F}\):
Step 2. Action modifies the SCM according to the hypothetical interventions, \(\mathrm {do}(\{X_i \,{:}{=}\, a_i\}_{i \in \mathcal {I}})\) (where \(a_i = x^F_i + \delta _i\)), yielding \(\mathbf {S}^{a}\):
where \([\cdot ]\) denotes the Iverson bracket.
Step 3. Prediction recursively determines the values of all endogenous variables based on the computed exogenous variables \(\{u_i\}_{i=1}^4\) from Step 1 and \(\mathbf {S}^{a}\) from Step 2, as:
General Assignment Formulation for ANMs. As we have not made any restricting assumptions about the structural equations (only that we operate with additive noise models^{Footnote 7} where noise variables are pairwise independent), the solution for the working example naturally generalizes to SCMs corresponding to other DAGs with more variables. The assignment of structural counterfactual values can generally be written as:
In words, the counterfactual value of the ith feature, \(x^\texttt {SCF}_i\), takes the value \(x^\texttt {F}_i +\delta _i\) if such feature is intervened upon (i.e., \(i \in \mathcal {I}\)). Otherwise, \(x^\texttt {SCF}_i\) is computed as a function of both the factual and counterfactual values of its parents, denoted respectively by \(f_i(\boldsymbol{\mathrm {pa}}^\texttt {F}_i)\) and \(f_i(\boldsymbol{\mathrm {pa}}^\texttt {SCF}_i)\). The closedform expression in (9) can replace the counterfactual constraint in (4), i.e.,
after which the optimization problem may be solved by building on existing frameworks for generating nearest counterfactual explanations, including gradientbased, evolutionarybased, heuristicsbased, or verificationbased approaches as referenced in Sect. 2.1. It is important to note that unlike CFEbased actions where the precise value of all covariates postintervention are specified, MINTbased actions require that the user focus only on the features upon which interventions are to be performed, which may better align with factors under the users control (e.g., some features may be nonactionable but mutable through changes to other features; see also [6]).
3.3 Negative Result: No Recourse Guarantees for Unknown Structural Equations
In practice, the structural counterfactual \(\mathbf {x}^\texttt {SCF}(a)\) can only be computed using an approximate (and likely imperfect) SCM \(\mathcal {M}\,=\,(\mathbf {S}, P_\mathbf {U})\), which is estimated from data assuming a particular form of the structural equation as in (5). However, assumptions on the form of the true structural equations \(\mathbf {S}_\star \) are generally untestable—not even with a randomized experiment—since there exist multiple SCMs which imply the same observational and interventional distributions, but entail different structural counterfactuals.
Example 3
(adapted from 6.19 in [48]). Consider the following two SCMs \(\mathcal {M}_A\) and \(\mathcal {M}_B\) which arise from the general form in Fig. 1 by choosing \(U_1, U_2 \sim \text {Bernoulli}(0.5)\) and \(U_3\sim \text {Uniform}(\{0, \ldots , K\})\) independently in both \(\mathcal {M}_A\) and \(\mathcal {M}_B\), with structural equations
Then \(\mathcal {M}_A\) and \(\mathcal {M}_B\) both imply exactly the same observational and interventional distributions, and thus are indistinguishable from empirical data. However, having observed \(\mathbf {x}^\texttt {F}\,=\,(1, 0, 0)\), they predict different counterfactuals had \(X_1\) been 0, i.e., \(\mathbf {x}^\texttt {SCF}(X_1\,=\,0)\,=\,(0,0,0)\) and (0, 0, K), respectively.^{Footnote 8}
Confirming or refuting an assumed form of \(\mathbf {S}_\star \) would thus require counterfactual data which is, by definition, never available. Thus, Example 3 proves the following proposition by contradiction.
Proposition 3
(Lack of Recourse Guarantees). If the set of descendants of intervenedupon variables is nonempty, algorithmic recourse can be guaranteed in general (i.e., without further restrictions on the underlying causal model) only if the true structural equations are known, irrespective of the amount and type of available data.
Remark 1
The converse of Proposition 3 does not hold. E.g., given \(\mathbf {x}^\texttt {F}\,=\,(1,0,1)\) in Example 3, abduction in either model yields \(U_3>0\), so the counterfactual of \(X_3\) cannot be predicted exactly.
Building on the framework of [28], we next present two novel approaches for causal algorithmic recourse under unknown structural equations. The first approach in Sect. 4.1 aims to estimate the counterfactual distribution under the assumption of ANMs (5) with Gaussian noise for the structural equations. The second approach in Sect. 4.2 makes no assumptions about the structural equations, and instead of approximating the structural equations, it considers the effect of interventions on a subpopulation similar to \(\mathbf {x}^\texttt {F}\). We recall that the causal graph is assumed to be known throughout.
4 Recourse Under Imperfect Causal Knowledge
4.1 Probabilistic Individualised Recourse
Since the true SCM \(\mathcal {M}_\star \) is unknown, one approach to solving (4) is to learn an approximate SCM \(\mathcal {M}\) within a given model class from training data \(\{\mathbf {x}^i\}_{i\,=\,1}^n\). For example, for an ANM (5) with zeromean noise, the functions \(f_r\) can be learned via linear or kernel (ridge) regression of \(X_r\) given \(\mathbf {X}_{\text {pa}(r)}\) as input. We refer to these approaches as \(\mathcal {M}_{\textsc {lin}}\) and \(\mathcal {M}_{\textsc {kr}}\), respectively. \(\mathcal {M}\) can then be used in place of \(\mathcal {M}_\star \) to infer the noise values as in (5), and subsequently to predict a singlepoint counterfactual \(\mathbf {x}^\texttt {SCF}(a)\) to be used in (4). However, the learned causal model \(\mathcal {M}\) may be imperfect, and thus lead to wrong counterfactuals due to, e.g., the finite sample of the observed data, or more importantly, due to model misspecification (i.e., assuming a wrong parametric form for the structural equations).
To solve such limitation, we adopt a Bayesian approach to account for the uncertainty in the estimation of the structural equations. Specifically, we assume additive Gaussian noise and rely on probabilistic regression using a Gaussian process (GP) prior over the functions \(f_r\); for an overview of regression with GPs, we refer to [79, § 2].
Definition 2
(GPSCM). A Gaussian process SCM (GPSCM) over \(\mathbf {X}\) refers to the model
with covariance functions \(k_{r}:\mathcal {X}_{\text {pa}(r)}\times \mathcal {X}_{\text {pa}(r)}\rightarrow \mathbb {R}\), e.g., RBF kernels for continuous \(X_{\text {pa}(r)}\).
While GPs have previously been studied in a causal context for structure learning [16, 73], estimating treatment effects [2, 56], or learning SCMs with latent variables and measurement error [61], our goal here is to account for the uncertainty over \(f_r\) in the computation of the posterior over \(U_r\), and thus to obtain a counterfactual distribution, as summarised in the following propositions.
Proposition 4
(GPSCM Noise Posterior). Let \(\{\mathbf {x}^i\}_{i\,=\,1}^n\) be an observational sample from (10). For each \(r\in [d]\) with non empty parent set \(\text {pa}(r)>0\), the posterior distribution of the noise vector \(\mathbf {u}_r\,=\,(u_r^1, ...,u_r^n)\), conditioned on \(\mathbf {x}_r\,=\,(x_r^1, ..., x_r^n)\) and \(\mathbf {X}_{\text {pa}(r)}\,=\,(\mathbf {x}_{\text {pa}(r)}^1,...,\mathbf {x}_{\text {pa}(r)}^n)\), is given by
where \(\mathbf {K}:=\big (k_r\big (\mathbf {x}_{\text {pa}(r)}^i, \mathbf {x}_{\text {pa}(r)}^j\big )\big )_{ij}\) denotes the Gram matrix.
Next, in order to compute counterfactual distributions, we rely on ancestral sampling (according to the causal graph) of the descendants of the intervention targets \(\mathbf {X}_\mathcal {I}\) using the noise posterior of (11). The counterfactual distribution of each descendant \(X_r\) is given by the following proposition.
Proposition 5
(GPSCM Counterfactual Distribution). Let \(\{\mathbf {x}^i\}_{i=1}^n\) be an observational sample from (10). Then, for \(r\in [d]\) with \(\text {pa}(r)>0\), the counterfactual distribution over \(X_r\) had \(\mathbf {X}_{\text {pa}(r)}\) been \(\tilde{\mathbf {x}}_{\text {pa}(r)}\) (instead of \(\mathbf {x}^\texttt {F}_{\text {pa}(r)}\)) for individual \(\mathbf {x}^\texttt {F}\in \{\mathbf {x}^i\}_{i=1}^n\) is given by
where \(\tilde{k}:=k_r(\tilde{\mathbf {x}}_{\text {pa}(r)}, \tilde{\mathbf {x}}_{\text {pa}(r)})\), \(\tilde{\mathbf {k}}:=\big (k_r(\tilde{\mathbf {x}}_{\text {pa}(r)}, \mathbf {x}_{\text {pa}(r)}^1), \ldots , k_r(\tilde{\mathbf {x}}_{\text {pa}(r)}, \mathbf {x}_{\text {pa}(r)}^n)\big )\), \(\mathbf {x}_r\) and \(\mathbf {K}\) as defined in Proposition 4, and \(\mu ^\texttt {F}_r\) and \(s^\texttt {F}_r\) are the posterior mean and variance of \(u^\texttt {F}_r\) given by (11).
All proofs can be found in Appendix A of [27]. We can now generalise the recourse problem (4) to our probabilistic setting by replacing the singlepoint counterfactual \(\mathbf {x}^\texttt {SCF}(a)\) with the counterfactual random variable \(\mathbf {X}^\texttt {SCF}(a):=\mathbf {X}(a)\mathbf {x}^\texttt {F}\). As a consequence, it no longer makes sense to consider a hard constraint of the form \(h(\mathbf {x}^\texttt {SCF}(a))>0.5\), i.e., that the prediction needs to change. Instead, we can reason about the expected classifier output under the counterfactual distribution, leading to the following probabilistic version of the individualised recourse optimisation problem:
Note that the threshold \(\texttt {thresh}(a)\) is allowed to depend on a. For example, an intuitive choice is
which has the interpretation of the lowerconfidence bound crossing the decision boundary of 0.5. Note that larger values of the hyperparameter \(\gamma _\textsc {lcb}\) lead to a more conservative approach to recourse, while for \(\gamma _\textsc {lcb}\,=\,0\) merely crossing the decision boundary with \(\ge 50\%\) chance suffices.
4.2 Probabilistic SubpopulationBased Recourse
The GPSCM approach in Sect. 4.1 allows us to average over an infinite number of (non)linear structural equations, under the assumption of additive Gaussian noise. However, this assumption may still not hold under the true SCM, leading to suboptimal or inefficient solutions to the recourse problem. Next, we remove any assumptions about the structural equations, and propose a second approach that does not aim to approximate an individualized counterfactual distribution, but instead considers the effect of interventions on a subpopulation defined by certain shared characteristics with the given (factual) individual \(\mathbf {x}^\texttt {F}\). The key idea behind this approach resembles the notion of conditional average treatment effects (CATE) [1] (illustrated in Fig. 4) and is based on the fact that any intervention \(do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})\) only influences the descendants \(\text {d}(\mathcal {I})\) of the intervenedupon variables, while the nondescendants \(\text {nd}(\mathcal {I})\) remain unaffected. Thus, when evaluating an intervention, we can condition on \(\mathbf {X}_{\text {nd}(\mathcal {I})}\,=\,\mathbf {x}^\texttt {F}_{\text {nd}(\mathcal {I})}\), thus selecting a subpopulation of individuals similar to the factual subject.
Specifically, we propose to solve the following subpopulationbased recourse optimization problem
where, in contrast to (13), the expectation is taken over the corresponding interventional distribution.
In general, this interventional distribution does not match the conditional distribution, i.e.,
because some spurious correlations in the observational distribution do not transfer to the interventional setting. For example, in Fig. 2b we have that
Fortunately, the interventional distribution can still be identified from the observational one, as stated in the following proposition.
Proposition 6
Subject to causal sufficiency, \(P_{\mathbf {X}_{\text {d}(\mathcal {I})}\vert do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }}), \mathbf {x}^\texttt {F}_{\text {nd}(\mathcal {I})}}\) is observationally identifiable (i.e., computable from the observational distribution) via:
As evident from Proposotion 6, tackling the optimization problem in (15) in the general case (i.e., for arbitrary graphs and intervention sets \(\mathcal {I}\)) requires estimating the stable conditionals \(P_{X_r\vert \mathbf {X}_{\text {pa}(r)}}\) (a.k.a. causal Markov kernels) in order to compute the interventional expectation via (16). For convenience (see Sect. 4.3 for details), here we opt for latentvariable implicit density models, but other conditional density estimation approaches may be also be used [e.g., 7, 10, 68]. Specifically, we model each conditional \(p(x_r\vert \mathbf {x}_{\text {pa}(r)})\) with a conditional variational autoencoder (CVAE) [62] as:
To facilitate sampling \(x_r\) (and in analogy to the deterministic mechanisms \(f_r\) in SCMs), we opt for deterministic decoders in the form of neural nets \(D_r\) parametrised by \(\psi _r\), i.e., \(p_{\psi _r}(x_r\vert \mathbf {x}_{\text {pa}(r)}, \mathbf {z}_r) \,=\, \delta (x_r  D_r(\mathbf {x}_{\text {pa}(r)}, \mathbf {z}_r; \psi _r))\), and rely on variational inference [77], amortised with approximate posteriors \(q_{\phi _r}(\mathbf {z}_rx_r, \mathbf {x}_{\text {pa}(r)})\) parametrised by encoders in the form of neural nets with parameters \(\phi _r\). We learn both the encoder and decoder parameters by maximising the evidence lower bound (ELBO) using stochastic gradient descend [11, 30, 31, 50]. For further details, we refer to Appendix D of [27]
Remark 2
The collection of CVAEs can be interpreted as learning an approximate SCM of the form
However, this family of SCMs may not allow to identify the true SCM (provided it can be expressed as above) from data without additional assumptions. Moreover, exact posterior inference over \(\mathbf {z}_r\) given \(\mathbf {x}^\texttt {F}\) is intractable, and we need to resort to approximations instead. It is thus unclear whether sampling from \(q_{\phi _r}(\mathbf {z}_r\vert x^{\texttt {F}}_r, \mathbf {x}^\texttt {F}_{\text {pa}(r)})\) instead of from \(p(\mathbf {z}_r)\) in (17) can be interpreted as a counterfactual within (18). For further discussion on such “pseudocounterfactuals” we refer to Appendix C of [27]
4.3 Solving the Probabilistic Recourse Optimization Problem
We now discuss how to solve the resulting optimization problems in (13) and (15). First, note that both problems differ only on the distribution over which the expectation in the constraint is taken: in (13) this is the counterfactual distribution of the descendants given in Propostion 5; and in (15) it is the interventional distribution identified in Propostion 6. In either case, computing the expectation for an arbitrary classifier h is intractable. Here, we approximate these integrals via Monte Carlo by sampling \(\mathbf {x}_{\text {d}(\mathcal {I})}^{(m)}\) from the interventional or counterfactual distributions resulting from \(a\,=\,do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})\), i.e.,
BruteForce Approach. A way to solve (13) and (15) is to (i) iterate over \(a\in \mathcal {F}\), with \(\mathcal {F}\) being a finite set of feasible actions (possibly as a result of discretizing in the case of a continuous search space); (ii) approximately evaluate the constraint via Monte Carlo ; and (iii) select a minimum cost action amongst all evaluated candidates satisfying the constraint. However, this may be computationally prohibitive and yield suboptimal interventions due to discretisation.
Gradientbased Approach. Recall that, for actions of the form \(a\,=\,do(\mathbf {X}_\mathcal {I}\,=\,{\boldsymbol{\theta }})\), we need to optimize over both the intervention targets \(\mathcal {I}\) and the intervention values \({\boldsymbol{\theta }}\). Selecting targets is a hard combinatorial optimization problem, as there are \(2^{d'}\) possible choices for \(d'\le d\) actionable features, with a potentially infinite number of intervention values. We therefore consider different choices of targets \(\mathcal {I}\) in parallel, and propose a gradientbased approach suitable for differentiable classifiers to efficiently find an optimal \({\boldsymbol{\theta }}\) for a given intervention set \(\mathcal {I}\).^{Footnote 9} In particular, we first rewrite the constrained optimization problem in unconstrained form with Lagrangian [29, 33]:
We then solve the saddle point problem \(\min _{{\boldsymbol{\theta }}} \max _\lambda \mathcal {L}({\boldsymbol{\theta }},\lambda )\) arising from (19) with stochastic gradient descent [11, 30]. Since both the GPSCM counterfactual (12) and the CVAE interventional distributions (17) admit a reparametrization trick [31, 50], we can differentiate through the constraint:
Here, \(\mathbf {x}_{\text {d}(\mathcal {I})}(\mathbf {z})\) is obtained by iteratively computing all descendants in topological order: either substituting \(\mathbf {z}\) together with the other parents into the decoders \(D_r\) for the CVAEs, or by using the Gaussian reparametrization \(x_{r}(\mathbf {z})\,=\,\mu +\sigma \mathbf {z}\) with \(\mu \) and \(\sigma \) given by (12) for the GPSCM. A similar gradient estimator for the variance which enters \(\texttt {thresh}(a)\) for \(\gamma _\textsc {lcb}\ne 0\) is derived in Appendix F of [27].
5 Experiments
In our experiments, we compare different approaches for causal algorithmic recourse on synthetic and semisynthetic data sets. Additional results can be found in Appendix B of [27].
5.1 Compared Methods
We compare the naive pointbased recourse approaches \(\mathcal {M}_\textsc {lin}\) and \(\mathcal {M}_\textsc {kr}\) mentioned at the beginning of Sect. 4.1 as baselines with the proposed counterfactual GPSCM \(\mathcal {M}_\textsc {gp}\) and the CVAE approach for subpopulationbased recourse (\(\textsc {cate}_\textsc {cvae}\)). For completeness, we also consider a \(\textsc {cate}_\textsc {gp}\) approach as a GP can also be seen as modelling each conditional as a Gaussian,^{Footnote 10} and also evaluate the “pseudocounterfactual” \(\mathcal {M}_\textsc {cvae}\) approach discussed in Remark 2. Finally, we report oracle performance for individualised \(\mathcal {M}_\star \) and subpopulationbased recourse methods \(\textsc {cate}_\star \) by sampling counterfactuals and interventions from the true underlying SCM. We note that a comparison with noncausal recourse approaches that assume independent features [58, 69] or consider causal relations to generate counterfactual explanations but not recourse actions [24, 39] is neither natural nor straightforward, because it is unclear whether descendant variables should be allowed to change, whether keeping their value constant should incur a cost, and, if so, how much, c.f. [28].
5.2 Metrics
We compare recourse actions recommended by the different methods in terms of cost, computed as the L2norm between the intervention \({\boldsymbol{\theta }}_\mathcal {I}\) and the factual value \(\mathbf {x}^\texttt {F}_\mathcal {I}\), normalised by the range of each feature \(r\in \mathcal {I}\) observed in the training data; and validity, computed as the percentage of individuals for which the recommended actions result in a favourable prediction under the true (oracle) SCM. For our probabilistic recourse methods, we also report the lower confidence bound \(\text {LCB}:=\mathbb {E}[h]\gamma _{\textsc {lcb}}\sqrt{\text {Var}[h]}\) of the selected action under the given method.
5.3 Synthetic 3Variable SCMs Under Different Assumptions
In our first set of experiments, we consider three classes of SCM s over three variables with the same causal graph as in Fig. 2b. To test robustness of the different methods to assumptions about the form of the true structural equations, we consider a linear SCM, a nonlinear ANM, and a more general, multimodal SCM with nonadditive noise. For further details on the exact form we refer to Appendix E of [27].
Results are shown in Table 1 we observe that the pointbased recourse approaches perform (relatively) well in terms of both validity and cost, when their underlying assumptions are met (i.e., \(\mathcal {M}_\textsc {lin}\) on the linear SCM and \(\mathcal {M}_\textsc {kr}\) on the nonlinear ANM). Otherwise, validity significantly drops as expected (see, e.g., the results of \(\mathcal {M}_\textsc {lin}\) on the nonlinear \(\text {ANM}\), or of \(\mathcal {M}_\textsc {kr}\) on the nonadditive SCM). Moreover, we note that the inferior performance of \(\mathcal {M}_\textsc {kr}\) compared to \(\mathcal {M}_\textsc {lin}\) on the linear SCM suggests an overfitting problem, which does not occur for its more conservative probabilistic counterpart \(\mathcal {M}_\textsc {gp}\). Generally, the individualised approaches \(\mathcal {M}_\textsc {gp}\) and \(\mathcal {M}_\textsc {cvae}\) perform very competitively in terms of cost and validity, especially on the linear and nonlinear ANMs. The subpopulationbased \(\textsc {cate}\) approaches on the other hand, perform particularly well on the challenging nonadditive SCM (on which the assumptions of gp approaches are violated) where \(\textsc {cate}_\textsc {cvae}\) achieves perfect validity as the only nonoracle method. As expected, the subpopulationbased approaches generally lead to higher cost than the individualised ones, since the latter only aim to achieve recourse only for a given individual while the former do it for an entire group (see Fig. 4).
5.4 Semisynthetic 7Variable SCM for LoanApproval
We also test our methods on a larger semisynthetic SCM inspired by the German Credit UCI dataset [43]. We consider the variables age A, gender G, educationlevel E, loan amount L, duration D, income I, and savings S with causal graph shown in Fig. 5. We model age A, gender G and loan duration D as nonactionable variables, but consider D to be mutable, i.e., it cannot be manipulated directly but is allowed to change (e.g., as a consequence of an intervention on L). The SCM includes linear and nonlinear relationships, as well as different types of variables and noise distributions, and is described in more detail in Appendix B of [27].
The results are summarised in Table 2, where we observe that the insights discussed above similarly apply for data generated from a more complex SCM, and for different classifiers.
Finally, we show the influence of \(\gamma _\textsc {lcb}\) on the performance of the proposed probabilistic approaches in Fig. 6. We observe that lower values of \(\gamma _\textsc {lcb}\) lead to lower validity (and cost), especially for the \(\textsc {cate}\) approaches. As \(\gamma _\textsc {lcb}\) increases validity approaches the corresponding oracles \(\mathcal {M}_\star \) and \(\textsc {cate}_\star \), outperforming the pointbased recourse approaches. In summary, our probabilistic recourse approaches are not only more robust, but also allow controlling the tradeoff between validity and cost using \(\gamma _\textsc {lcb}\).
6 Discussion
In this paper, we have focused on the problem of algorithmic recourse, i.e., the process by which an individual can change their situation to obtain a desired outcome from a machine learning model. Using the tools from causal reasoning (i.e., structural interventions and counterfactuals), we have shown that in their current form, counterfactual explanations only bring about agency for the individual to achieve recourse in unrealistic settings. In other words, counterfactual explanations imply recourse actions that may neither be optimal nor even result in favorably changing the prediction of \(h\) when acted upon. This shortcoming is primarily due to the lack of consideration of causal relations governing the world and thus, the failure to model the downstream effect of actions in the predictions of the machine learning model. In other words, although “counterfactual” is a term from causal language, we observed that existing approaches fall short in terms of taking causal reasoning into account when generating counterfactual explanations and the subsequent recourse actions. Thus, building on the statement by Wachter et al. [76] that counterfactual explanations “do not rely on knowledge of the causal structure of the world,” it is perhaps more appropriate to refer to existing approaches as contrastive, rather than counterfactual, explanations [14, 40]. See [26, §2] for more discussion.
To directly take causal consequences of actions into account, we have proposed a fundamental reformulation of the recourse problem, where actions are performed as interventions and we seek to minimize the cost of performing actions in a world governed by a set of (physical) laws captured in a structural causal model. Our proposed formulation in (4), complemented with several examples and a detailed discussion, allows for recourse through minimal interventions (MINT), that when performed will result in a structural counterfactual that favourably changes the output of the model.
The primary limitation of this formulation in (4) is its reliance on the true causal model of the world, subsuming both the graph, and the structural equations. In practice, the underlying causal model is rarely known, which suggests that the counterfactual constraint in (4), i.e., \(\mathbf {x}^\texttt {SCF}(a) := \mathbf {x}(a)  \mathbf {x}^\texttt {F}= \mathbf {S}^{a}(\mathbf {S}^{1}({\mathbf {x}^\texttt {F}}))\), may not be (deterministically) identifiable. As negative result, however, we showed that algorithmic recourse cannot be guaranteed in the absence of perfect knowledge about the underlying \(\text {SCM}\) governing the world, which unfortunately is not available in practice. To address this limitation, we proposed two probabilistic approaches to achieve recourse under more realistic assumptions. In particular, we derived i) an individuallevel recourse approach based on GPs that approximates the counterfactual distribution by averaging over the family of additive Gaussian SCMs; and ii) a subpopulationbased approach, which assumes that only the causal graph is known and makes use of CVAEs to estimate the conditional average treatment effect of an intervention on a subpopulation of individuals similar to the one seeking recourse. Our experiments showed that the proposed probabilistic approaches not only result in more robust recourse interventions than approaches based on point estimates of the SCM, but also allows to tradeoff validity and cost.
Assumptions, Limitations, and Extensions. Throughout the present work, we have assumed a known causal graph and causal sufficiency. While this may not hold for all settings, it is the minimal necessary set of assumptions for causal reasoning from observational data alone. Access to instrumental variables or experimental data may help further relax these assumptions [3, 13, 66]. Moreover, if only a partial graph is available or some relations are known to be confounded, one will need to restrict recourse actions to the subset of interventions that are still identifiable [59, 60, 67]. An alternative approach could address causal sufficiency violations by relying on latent variable models to estimate confounders from multiple causes [78] or proxy variables [38], or to work with bounds on causal effects instead [5, 65, 74].
Perhaps more concerningly, our work highlights the implicit causal assumptions made by existing approaches (i.e., that of independence, or feasible and costfree interventions), which may portray a false sense of recourse guarantees where one does not exists (see Example 2 and all of Sect. 3.1). Our work aims to highlight existing imperfect assumptions, and to offer an alternative formulation, backed with proofs and demonstrations, which would guarantee recourse if assumptions about the causal structure of the world were satisfied. Future research on causal algorithmic recourse may benefit from the rich literature in causality that has developed methods to verify and perform inference under various assumptions [45, 48].
This is not to say that counterfactual explanations should be abandoned altogether. On the contrary, we believe that counterfactual explanations hold promise for “guided audit of the data” [76] and evaluating various desirable model properties, such as robustness [21, 58] or fairness [20, 25, 58, 69, 75]. Besides this, it has been shown that designers of interpretable machine learning systems use counterfactual explanations for predicting model behavior [34] or uncovering inaccuracies in the data profile of individuals [70]. Complementing these offerings of counterfactual explanations, we offer minimal interventions as a way to guarantee algorithmic recourse in general settings, which is not implied by counterfactual explanations.
On the Counterfactual vs Interventional Nature of Recourse. Given that we address two different notions of recourse—counterfactual/individualised (rung 3) vs. interventional/subpopulationbased (rung 2)—one may ask which framing is more appropriate. Since the main difference is whether the background variables \(\mathbf {U}\) are assumed fixed (counterfactual) or not (interventional) when reasoning about actions, we believe that this question is best addressed by thinking about the type of environment and interpretation of \(\mathbf {U}\): if the environment is static, or if \(\mathbf {U}\) (mostly) captures unobserved information about the individual, the counterfactual notion seems to be the right one; if, on the other hand, \(\mathbf {U}\) also captures environmental factors which may change, e.g., between consecutive loan applications, then the interventional notion of recourse may be more appropriate. In practice, both notions may be present (for different variables), and the proposed approaches can be combined depending on the available domain knowledge since each parentchild causal relation is treated separately. We emphasise that the subpopulationbased approach is also practically motivated by a reluctance to make (parametric) assumptions about the structural equations which are untestable but necessary for counterfactual reasoning. It may therefore be useful to avoid problems of misspecification, even for counterfactual recourse, as demonstrated experimentally for the nonadditive SCM.
7 Conclusion
In this work, we explored one of the main, but often overlooked, objectives of explanations as a means to allow people to act rather than just understand. Using counterexamples and the theory of structural causal models (SCM), we showed that actionable recommendations cannot, in general, be inferred from counterfactual explanations. We show that this shortcoming is due to the lack of consideration of causal relations governing the world and thus, the failure to model the downstream effect of actions in the predictions of the machine learning model. Instead, we proposed a shift of paradigm from recourse via nearest counterfactual explanations to recourse through minimal interventions (MINT), and presented a new optimization formulation for the common class of additive noise models. Our technical contributions were complemented with an extensive discussion on the form, feasibility, and scope of interventions in realworld settings. In followup work, we further investigated the epistemological differences between counterfactual explanations and consequential recommendations and argued that their technical treatment requires consideration at different levels of the causal history [52] of events [26]. Whereas MINT provided exact recourse under strong assumptions (requiring the true SCM), we next explored how to offer recourse under milder and more realistic assumptions (requiring only the causal graph). We present two probabilistic approaches that offer recourse with high probability. The first captures uncertainty over structural equations under additive Gaussian noise, and uses Bayesian model averaging to estimate the counterfactual distribution. The second removes any assumptions on the structural equations by instead computing the average effect of recourse actions on individuals similar to the person who seeks recourse, leading to a novel subpopulationbased interventional notion of recourse. We then derive a gradientbased procedure for selecting optimal recourse actions, and empirically show that the proposed approaches lead to more reliable recommendations under imperfect causal knowledge than nonprobabilistic baselines. This contribution is important as it enables recourse recommendations to be generated in more practical settings and under uncertain assumptions.
As a final note, while for simplicity, we have focused in this chapter on credit loan approvals, recourse can have potential applications in other domains such as healthcare [8, 9, 17, 51], justice (e.g., pretrial bail) [4], and other settings (e.g., hiring) [12, 44, 57] whereby actionable recommendations for individuals are sought.
Notes
 1.
Following the related literature, we consider a binary classification task by convention; most of our considerations extend to multiclass classification or regression settings as well though.
 2.
In particular, [14, 41, 76] solve (1) using gradientbased optimization; [55, 69] employ mixedinteger linear program solvers to support mixed numeric/binary data; [49] use graphbased shortest path algorithms; [35] use a heuristic search procedure by growing spheres around the factual instance; [18, 58] build on genetic algorithms for modelagnostic behavior; and [25] solve (1) using satisfiability solvers with closeness guarantees. For a more complete exposition, see the recent surveys [26, 71].
 3.
Here, “feasible” means possible to do, whereas “plausible” means possibly true, believable or realistic. Optimization terminology refers to both as feasibility sets.
 4.
Also known as nonparametric structural equation model with independent errors.
 5.
I.e., for \(r\in [d]\), \(P_{X_r\mathbf {X}_{\text {pa}(r)}}(X_r\mathbf {X}_{\text {pa}(r)}):=P_{U_r}(f_r^{1}(X_r\mathbf {X}_{\text {pa}(r)}))\), where \(f_r^{1}(X_r\mathbf {X}_{\text {pa}(r)})\) denotes the preimage of \(X_r\) given \(\mathbf {X}_{\text {pa}(r)}\) under \(f_r\), i.e., \(f_r^{1}(X_r\mathbf {X}_{\text {pa}(r)}):=\{u\in \mathcal {U}_r:f_r(\mathbf {X}_{\text {pa}(r)},u)\,=\,X_r\}\).
 6.
We note that, although \(\mathbf {x}^\texttt {*SCF}:= \mathbf {x}(a^*)  \mathbf {x}^\texttt {F}= \mathbf {S}^{a^*}(\mathbf {S}^{1}({\mathbf {x}^\texttt {F}}))\) is a counterfactual instance, it does not need to correspond to the nearest counterfactual explanation, \(\mathbf {x}^\texttt {*CFE}:= \mathbf {x}^\texttt {F}+ \boldsymbol{\delta }^*\), resulting from (2) (see, e.g., Example 1). This further emphasizes that minimal interventions are not necessarily obtainable via precomputed nearest counterfactual instances, and recourse actions should be obtained by solving (4) rather than indirectly through the solution of (2).
 7.
We remark that the presented formulation also holds for more general SCMs (for example where the exogenous variable contribution is not additive) as long as the sequence of structural equations \(\mathbf {S}\) is invertible, i.e., there exists a sequence of equations \(\mathbf {S}^{1}\) such that \(\mathbf {x}\,=\, \mathbf {S}(\mathbf {S}^{1}({\mathbf {x}}))\) (in other words, the exogenous variables are uniquely identifiable via the abduction step).
 8.
This follows from abduction on \(\mathbf {x}^\texttt {F}\,=\,(1, 0, 0)\) which for both \(\mathcal {M}_A\) and \(\mathcal {M}_B\) implies \(U_3\,=\,0\).
 9.
For large d when enumerating all \(\mathcal {I}\) becomes computationally prohibitive, we can upperbound the allowed number of variables to be intervened on simultaneously (e.g., \(\mathcal {I}\le 3\)), or choose a greedy approach to select \(\mathcal {I}\).
 10.
References
Abrevaya, J., Hsu, Y.C., Lieli, R.P.: Estimating conditional average treatment effects. J. Bus. Econ. Stat. 33(4), 485–505 (2015)
Alaa, A.M., van der Schaar, M.: Bayesian inference of individualized treatment effects using multitask Gaussian processes. In: Advances in Neural Information Processing Systems, pp. 3424–3432 (2017)
Angrist, J.D., Imbens, G.W., Rubin, D.B.: Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91(434), 444–455 (1996)
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine Bias. ProPublica, New York (2016)
Balke, A., Pearl, J.: Counterfactual probabilities: computational methods, bounds and applications. In:: Uncertainty Proceedings 1994, pp. 46–54. Elsevier (1994)
Barocas, S., Selbst, A.D., Raghavan, M.: The hidden assumptions behind counterfactual explanations and principal reasons. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 80–89 (2020)
Bashtannyk, D.M., Hyndman, R.J.: Bandwidth selection for kernel conditional density estimation. Comput. Stat. Data Anal. 36(3), 279–298 (2001)
Bastani, O., Kim, C., Bastani, H.: Interpretability via model extraction. arXiv preprint arXiv:1706.09773 (2017)
Begoli, E., Bhattacharya, T., Kusnezov, D.: The need for uncertainty quantification in machineassisted medical decision making. Nat. Mach. Intell. 1(1), 20–23 (2019)
Bishop, C.M.: Mixture density networks (1994)
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, pp. 161–168 (2008)
Cohen, L., Lipton, Z.C., Mansour, Y.: Efficient candidate screening under multiple tests and implications for fairness. arXiv preprint arXiv:1905.11361 (2019)
Cooper, G.F., Yoo, C.: Causal discovery from a mixture of experimental and observational data. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 116–125 (1999)
Dhurandhar, A., et al.: Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Advances in Neural Information Processing Systems, pp. 592–603 (2018)
DoshiVelez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)
Friedman, N., Nachman, I.: Gaussian process networks. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 211–219 (2000)
Grote, T., Berens, P.: On the ethics of algorithmic decisionmaking in healthcare. J. Med. Ethics 46(3), 205–211 (2020)
Guidotti, R., Monreale, A., Ruggieri, S., Pedreschi, D., Turini, F., Giannotti, F.: Local rulebased explanations of black box decision systems. arXiv preprint arXiv:1805.10820 (2018)
Gunning, D.: DARPA’S explainable artificial intelligence (XAI) program. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, p. ii. ACM (2019)
Gupta, V., Nokhiz, P., Roy, C.D., Venkatasubramanian, S.: Equalizing recourse across groups. arXiv preprint arXiv:1909.03166 (2019)
HancoxLi, L.: Robustness in machine learning explanations: does it matter? In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 640–647 (2020)
Holzinger, A., Malle, B., Saranti, A., Pfeifer, B.: Towards multimodal causability with graph neural networks enabling information fusion for explainable AI. Inf. Fusion 71, 28–37 (2021)
Hoyer, P., Janzing, D., Mooij, J.M., Peters, J., Schölkopf, B.: Nonlinear causal discovery with additive noise models. In: Advances in Neural Information Processing Systems, pp. 689–696 (2009)
ShJoshi, S., Koyejo, O., Vijitbenjaronk, W., Kim, B., Ghosh, J.: REVISE: towards realistic individual recourse and actionable explanations in blackbox decision making systems. arXiv preprint arXiv:1907.09615 (2019)
Karimi, A.H., Barthe, G., Balle, B., Valera, I.: Modelagnostic counterfactual explanations for consequential decisions. In: International Conference on Artificial Intelligence and Statistics, pp. 895–905 (2020)
Karimi, A.H., Barthe, G., Schölkopf, B., Valera, I.: A survey of algorithmic recourse: contrastive explanations and consequential recommendations. arXiv preprint arXiv:2010.04050 (2020)
Karimi, A.H., von Kügelgen, J., Schölkopf, B., Valera, I.: Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. In: Advances in Neural Information Processing Systems, pp. 265–277 (2020)
Karimi, A.H., Schölkopf, B., Valera, I.: Algorithmic recourse: from counterfactual explanations to interventions. In: 4th Conference on Fairness, Accountability, and Transparency (FAccT 2021), pp. 353–362 (2021)
Karush, W.: Minima of functions of several variables with inequalities as side conditions. Master’s thesis, Department of Mathematics, University of Chicago (1939)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference for Learning Representations (2015)
Kingma, D.P., Welling, M.: Autoencoding variational Bayes. In: 2nd International Conference on Learning Representations (2014)
Kodratoff, Y.: The comprehensibility manifesto. KDD Nugget Newsl. 94(9) (1994)
Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Neyman, J. (ed.) Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley (1951)
Lage, I.: An evaluation of the humaninterpretability of explanation. arXiv preprint arXiv:1902.00006 (2019)
Laugel, T., Lesot, M.J., Marsala, C., Renard, X., Detyniecki, M.: Inverse classification for comparisonbased interpretability in machine learning. arXiv preprint arXiv:1712.08443 (2017)
Lewis, D.K.: Counterfactuals. Harvard University Press, Cambridge (1973)
Lipton, Z.C.: The mythos of model interpretability. Queue 16(3), 31–57 (2018)
Louizos, C., Shalit, U., Mooij, J.M., Sontag, D., Zemel, R., Welling, M.: Causal effect inference with deep latentvariable models. In: Advances in Neural Information Processing Systems, pp. 6446–6456 (2017)
Mahajan, D., Tan, C., Sharma, A.: Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277 (2019)
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Mothilal, R.K., Sharma, A., Tan, C.: DiCE: explaining machine learning classifiers through diverse counterfactual explanations. arXiv preprint arXiv:1905.07697 (2019)
Murdoch, W.J., Singh, C., Kumbier, K., AbbasiAsl, R., Yu, B.: Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. 116(44), 22071–22080 (2019)
Murphy, P.M.: UCI repository of machine learning databases. ftp:/pub/machinelearningdatabaseonics. uci. edu (1994)
Nabi, R., Shpitser, I.: Fair inference on outcomes. In: Proceedings of the... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, vol. 2018, p. 1931. NIH Public Access (2018)
Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)
Pearl, J.: Structural counterfactuals: a brief introduction. Cogn. Sci. 37(6), 977–985 (2013)
Peters, J., Bühlmann, P.: Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101(1), 219–228 (2014)
Peters, J., Janzing, D., Schölkopf, B.: Elements of Causal Inference. The MIT Press, Cambridge (2017)
Poyiadzi, R., Sokol, K., SantosRodriguez, R., De Bie, T., Flach, P.: FACE: feasible and actionable counterfactual explanations. arXiv preprint arXiv:1909.09369 (2019)
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: International Conference on Machine Learning, pp. 1278–1286 (2014)
Rieckmann, A., et al.: Causes of outcome learning: a causal inferenceinspired machine learning approach to disentangling common combinations of potential causes of a health outcome. medRxiv (2020)
Ruben, D.H.: Explaining Explanation. Routledge, London (2015)
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1(5), 206–215 (2019)
Rüping, S.: Learning interpretable models. Ph.D. dissertation, Technical University of Dortmund (2006)
Russell, C.: Efficient search for diverse coherent explanations. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, pp. 20–28. ACM (2019)
Schulam, P., Saria, S.: Reliable decision support using counterfactual models. In: Advances in Neural Information Processing Systems, pp. 1697–1708 (2017)
Schumann, C., Foster, J.S., Mattei, N., Dickerson, J.P.: We need fairness and explainability in algorithmic hiring. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1716–1720 (2020)
Sharma, S., Henderson, J., Ghosh, J.: CERTIFAI: a common framework to provide explanations and analyse the fairness and robustness of blackbox models. In: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 166–172 (2020)
Shpitser, I., Pearl, J.: Identification of conditional interventional distributions. In: 22nd Conference on Uncertainty in Artificial Intelligence, UAI 2006, pp. 437–444 (2006)
Shpitser, I., Pearl, J.: Complete identification methods for the causal hierarchy. J. Mach. Learn. Res. 9(Sep), 1941–1979 (2008)
Silva, R., Gramacy, R.B.: Gaussian process structural equation models with latent variables. In: Proceedings of the TwentySixth Conference on Uncertainty in Artificial Intelligence, pp. 537–545 (2010)
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems, pp. 3483–3491 (2015)
Starr, W.: Counterfactuals. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, fall 2019 edition (2019)
Stöger, K., Schneeberger, D., Holzinger, A.: Medical artificial intelligence: the European legal perspective. Commun. ACM 64(11), 34–36 (2021)
Tian, J., Pearl, J.: Probabilities of causation: bounds and identification. Ann. Math. Artif. Intell. 28(1–4), 287–313 (2000). https://doi.org/10.1023/A:1018912507879
Tian, J., Pearl, J.: Causal discovery from changes. In: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, pp. 512–521 (2001)
Tian, J., Pearl, J.: A general identification condition for causal effects. In: Eighteenth national conference on Artificial intelligence, pp. 567–573 (2002)
Trippe, B.L., Turner, R.E.: Conditional density estimation with Bayesian normalising flows. arXiv preprint arXiv:1802.04908 (2018)
Ustun, B., Spangher, A., Liu, Y.: Actionable recourse in linear classification. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 10–19. ACM (2019)
Venkatasubramanian, S. , Alfano, M.: The philosophical basis of algorithmic recourse. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM (2020)
Verma, S., Dickerson, J., Hines, K.: Counterfactual explanations for machine learning: a review. arXiv preprint arXiv:2010.10596 (2020)
Voigt, P., Von dem Bussche, A.: The EU General Data Protection Regulation (GDPR). A Practical Guide, 1st edn. Springer, Cham (2017). https://doi.org/10.1007/9783319579597
von Kügelgen, J., Rubenstein, P.K., Schölkopf, B., Weller, A.: Optimal experimental design via Bayesian optimization: active causal structure learning for Gaussian process networks. In: NeurIPS Workshop “Do the right thing”: machine learning and causal inference for improved decision making (2019)
von Kügelgen, J., Agarwal, N., Zeitler, J., Mastouri, A., Schölkopf, B.: Algorithmic recourse in partially and fully confounded settings through bounding counterfactual effects. In: ICML Workshop on Algorithmic Recourse (2021)
von Kügelgen, J., Karimi, A.H., Bhatt, U., Valera, I., Weller, A., Schölkopf, B.: On the fairness of causal algorithmic recourse. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence (2022)
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harvard J. Law Technol. 31(2) (2017)
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1(1–2), 1–305 (2008)
Wang, Y., Blei, D.M.: The blessings of multiple causes. J. Am. Stat. Assoc. 114, 1–71 (2019)
Williams, C.K.I., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. MIT Press, Cambridge (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Karimi, AH., von Kügelgen, J., Schölkopf, B., Valera, I. (2022). Towards Causal Algorithmic Recourse. In: Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, KR., Samek, W. (eds) xxAI  Beyond Explainable AI. xxAI 2020. Lecture Notes in Computer Science(), vol 13200. Springer, Cham. https://doi.org/10.1007/9783031040832_8
Download citation
DOI: https://doi.org/10.1007/9783031040832_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783031040825
Online ISBN: 9783031040832
eBook Packages: Computer ScienceComputer Science (R0)