1 Introduction

Risk prediction tools can increase decision efficiency in contexts such as credit, health, and criminal justice. They may bring more neutrality, countering subjective and prejudice-driven human judgment, and improve accuracy, resulting in more efficient and resource-effective decision policies (Barabas et al., 2018). However, for years, algorithmic tools have been criticised for reflecting and potentially exacerbating pre-existing biases (Barocas & Selbst, 2016; Citron & Pasquale, 2014). “Algorithmic bias”, in this context, is generally taken to refer to cases in which “the model’s predictive performance (however defined) unjustifiably differs across disadvantaged groups along social axes such as race, gender, and class” (Mitchell et al., 2021, p. 1). This bias is also referred to as a model’s “skewed performance” along one of these demographic axes.

In the Fairness, Accountability, and Transparency (FAccT) literature, bias tends to be characterized as a problem in its consequences, that is, an issue requiring an ex post solution. An example of this type of solution is fairness metrics, a set of measures that enable one to detect and adjust bias in a model. Algorithmic bias is rarely considered as evidence of the underlying social and technical conditions that (re)produce it—that is, as an issue requiring an ex ante solution. This tendency promotes the design of solutions ex post, by addressing the consequences, rather than (at least also) ex ante, by addressing the conditions. In this article, we seek to rebalance the overall strategy. We analyse explainable artificial intelligence (XAI) approaches with respect to their ability to gather evidence—note, not proof—of social disparities. We focus specifically on feature attribution approaches that rely on Shapley values and counterfactual approaches. These enable us to examine the relationship between protected characteristics such as race or gender and skewed performance.

Although the relation between explainability and fairness is key to approaching algorithmic bias as “evidence”, it remains analytically vague. Additionally, some of the bias-relevant applications of feature attribution approaches tend to represent the role of protected characteristics in discriminatory outcomes unrealistically—e.g., as independent, intrinsic, and causal attributes. A complementary strategy is to approach bias genealogically. In this article, we use genealogy as a constructive, epistemic critique,Footnote 1 with a double role. Constructively, it allows us to explain algorithmic bias in terms of the conditions that give rise to it, ex ante. Critically, it helps explain algorithmic bias not in terms of a single origin (“cause”), but with respect to a broader set of social and technical conditions at play that (re)produce these disparities.

In this respect, we make two main contributions. We offer a theoretical framework to classify XAI approaches according to their relevance to gather evidence of social disparities. We take inspiration from Pearl’s ladder of causation (2000, 2009) to characterize XAI approaches into observational, interventional and counterfactual approaches—namely, concerning their ability to detect (a) whether—and, if so, (b) how—a protected characteristic contributed to skewed performance, and (c) what can be done to change it. The goal is to consider these XAI methods not only as technical tools but as means to investigate and collect evidence about unfair differences in performance alongside protected characteristics. The second is to critique these methods concerning their ability to represent the role of protected characteristics in discriminatory outcomes. Drawing from Kohler-Hausmann’s (2019) constructivist theory of discrimination, we question observational, interventional, and counterfactual XAI approaches concerning the independence, responsibility, and epistemic assumptions they make towards protected characteristics, respectively. The aim is to question XAI approaches in their ability to help capture salient aspects of discrimination.

The remainder of this article is structured as follows. In Sect. 2, we review the relationship between explainability and fairness. In Sect. 3, we present the genealogical approach to bias. In Sect. 4, we characterize XAI approaches concerning their relevance for fairness. In Sect. 5, we question their capacity to address algorithmic discrimination. Finally, we derive three main recommendations for XAI practitioners to develop and policymakers to regulate tools that address algorithmic bias in its conditions and thus mitigate its future occurrence.

2 Explainability and Fairness

Explainability can enhance fairness-relevant properties to different extents and on different levels. Explainability can enhance transparency (Abdollahi & Nasraoui, 2018), granted by the ability to see how a model has arrived at a discriminatory outcome. Additionally, explainability can increase or enable trust in a model (Dodge et al., 2019). It can help, for example, to determine if qualities relevant to algorithmic fairness (such as fairness metrics) are met (Doshi-Velez & Kim, 2017). Explainability can also enhance accountability, as it can provide explanations for AI-informed (un)fair decisions (Leben, 2023; Zhou et al., 2022). These properties concern fairness within the context of “responsible AI”, AI that takes into account moral, and ethical considerations as well as social values (Adadi & Berrada, 2018).

At the same time, the relationship between explainability and fairness is not always positive. Explainability, for instance, can influence the perception of fairness. On that note, some highlight the risks for fairness that more explainability, and more reliance on it, can bring. Examples are the risks of “fairwashing” and of the rationalization, and potential justification, of some types of discrimination (Aivodji et al., 2019). Ananny and Crawford (2018) have highlighted how the ability to “see” a model does not equate to the ability to “govern” it nor “understand” it and, thus, to mitigate bias. Additionally, authors such as Barocas (2022) have recently highlighted the tensions between calls for simpler models to ensure transparency (and thereby facilitate algorithmic fairness), and the inconvenient fact that such models may be less able to satisfy some fairness demands (e.g. allowing for specific parameter tweaks). This shows how the relation between fairness and explainability need not be positive and may require trade-offs.

Notwithstanding the ability of XAI methods to enhance or mitigate the potentially negative consequences of algorithmic bias, our focus here is different. Specifically, we are here interested in the potential of XAI’s methods to gather evidence of the conditions that enable it. We see algorithmic bias as the object, and XAI approaches as a means to uncover and learn about the underlying conditions of social inequality.

Within the realm of explainability approaches, we focus specifically on feature attribution (e.g. SHAP Lundberg & Lee, 2017) and counterfactual approaches (e.g. Galhotra et al., 2021; Karimi et al., 2020, 2021). Among feature attribution approaches, this article focuses specifically on Shapley values,Footnote 2 which can estimate how input features contribute to performance biases (Begley et al., 2020). This is arguably the most popular feature attribution method, as it unifies several related methods and comes with axiomatic guarantees (Lundberg & Lee, 2017). However, various methods exist for computing Shapley values that may provide different attributions for the same prediction (Sundararajan & Najmi, 2020). Following Heskes et al. (2020), we provide a division into marginal, conditional, and interventional or causal approaches. We also examine counterfactual approaches. These explain what could have happened to an outcome had an input feature to a model been changed in a particular way (Barocas et al., 2020; Verma et al., 2020). Together with the latter, as we will present, these approaches can be easily interpreted through a causal ladder framework.

3 A Genealogical Approach to Bias

The above suggests that XAI methods can be relevant to explaining an algorithm’s performance concerning estimating the contribution of protected characteristics. When this performance reveals disparities about gender or race, the approaches we consider can support the goal of explaining this performance in terms of the input variables that conditioned it. However, we need an overarching strategy for how to apply the XAI methods we consider towards that goal. In this article, we propose to adopt a genealogical approach to algorithmic bias.

Genealogy refers to a form of historical critique, designed to overturn social norms by revealing their origins (Hill, 2016). Here, we use the term in its philological sense, to mean a constructive critique that looks at the conditions of possibility of a problem to address it successfully. In our case, this refers to a constructive critique designed to understand algorithmic bias by focusing on the plural, dynamic, and contingent conditions for its possibility, and the potential of XAI methods to surface evidence of them. Specifically, a genealogical approach to algorithmic bias invites us to explain bias in terms of the conditions for its occurrence and understand its explanation not in terms of a single cause, but of gathering evidence on the set of conditions that produce it.

As cited above, past redlining divisions, differential access to healthcare, and disparate arrest practices towards people of colour represent some examples of what we mean by these “conditions”. Historical segregation in US neighbourhoods, for instance, profoundly affected the residents’ access to credit, health insurance, and education (Agyeman, 2021; Perrino, 2020). In turn, this created the conditions of poverty, unemployment, and past default history by which residents in these communities are considered “not worthy” of credit when zip codes are used to calculate the risk of default. We aim to express this genealogical approach by adopting both a constructive and a critical stance, taking inspiration from Pearl’s ladder of causation (2000, 2009) and Kohler-Hausmann’s (2019) constructivist theory of discrimination, whose contribution will become more evident in the following sections.

4 XAI Approaches as Questions

By taking inspiration from Pearl’s ladder of causation (2000, 2009), we provide an ordering principle for XAI approaches to clearly distinguish between their utility for fairness along three levels—specifically, their differential ability to “see”, “govern”, and “understand” what influences skewed performance. Generally, this should help clarify the vague relationship between explainability and fairness. In the specific case of the feature attribution and counterfactual approaches that this article focuses on, this can help answer the following questions: (1) Is a protected characteristic unfairly associated with outcomes? (2) Would intervening to alter a protected characteristic directly affect outcomes? (3) Given observed values for protected characteristics and outcomes, would a hypothetical intervention to alter a subject’s protected characteristic have changed the outcome? We propose so-called “observational” approaches as relevant for procedural fairness, “interventional” approaches for consequential recommendations, and “counterfactual” approaches for algorithmic recourse.

4.1 What Bias: Observational Approaches for Procedural Fairness

We refer to marginal and conditional feature attribution methods as “observational approaches”, as they can help observe whether a protected characteristic is unfairly associated with an outcome. Marginal variable importance measures estimate the importance of features, assuming that these are independent of each other. An example is given by Datta et al. (2016)'s Quantitative Input Influence (QII) method, where Shapley values are used to calculate the average marginal influence of input features. Another example is provided by Štrumbelj and Kononenko's (2014) work. They use marginal Shapley values to develop a sensitivity analysis-based method to estimate individual feature contributions. Even though common, this assumption of independence might lead to incorrect or counterintuitive explanations when the features are, in fact, highly correlated. Additionally, it allows these methods to represent only the direct effects of variables.

This motivates the use of conditional variable importance measures. These can represent indirect effects, and estimate importance by conditioning on a variable. Aas et al. (2021) propose conditioning strategies to compute more accurate Shapley values. Another example is Frye et al.'s (2020) so-called asymmetric Shapley values. They are called “asymmetric” because, when computing Shapley values, they restrict the possible permutations of the features to those consistent with a partial causal ordering. They then apply conditioning by observation to check that their explanations respect the multivariate distribution of the data. Thus, they check that they do not produce misleading or nonsensical explanations because of feature dependence.

Overall, these observational approaches make it possible to check whether a protected characteristic, such as race or gender, is unfairly associated with an outcome. This is relevant for ensuring procedural fairness, i.e. the fairness of the decision-making process (Grgić-Hlača et al., 2018). In the example of credit risk assessment, it would allow one to see whether, for example, gender or race was considered in arriving at a negative assessment for a loan. In this respect, “unfairness” would amount to direct discrimination and be illegal (Prince & Schwarcz, 2019). Procedural fairness is related to commitments to values such as accountability and transparency (Rueda et al., 2022). Additionally, it allows the fulfillment of requests such as compliance in finance or due process in law.

These considerations suggest that these measures could be relevant for fairness in an algorithmic context for tasks such as auditing [reference anonymised]; specifically, ethics-based auditing (EBA; Mökander et al., 2021; reference anonymised). EBA refers to “a structured process whereby an entity’s present or past behaviour is assessed for consistency with relevant principles or norms” (Mökander et al., 2021, p. 2). According to Mökander et al. (2021), EBA can “contribute to good governance by promoting procedural regularity and transparency” (p. 16). The feature importance approaches mentioned here can contribute to assessing whether a protected characteristic played a role in the decision-making process.

In this respect, while both marginal and conditional approaches are concerned with answering the question above, they might be differentially relevant for procedural fairness. Authors like [reference anonymised] suggest that the former can provide insights into model mechanics, while the latter is more informative about the underlying data-generating process. Accordingly, marginal measures could help shed light on discrimination at the level of the model. This could be relevant to EBA when reviewing source code audit (Mökander et al., 2021). Conditional measures could help shed light on discrimination at the system level. In this respect, a model could play an instrumental role, and conditional measures could be useful to check for procedural fairness in more complex decisional settings such as institutions for credit, justice, etc.

4.2 How Bias: Interventional Approaches for Consequential Recommendations

Interventional approaches can help investigate whether intervening to alter a protected characteristic directly affects outcomes. Beyond showing whether a feature is correlated with a discriminatory outcome, these approaches allow us to tell how (e.g. the extent to which) an intentional change in one feature changes the outcome. So-called interventional or causal Shapley values can help answer this question, as they are designed to capture causal contributions (Heskes et al., 2020). They do so by quantifying the effects of each input on a model’s output in accordance with a user-supplied causal graph. Examples include do-Shapley values (Jung et al., 2022) and Shapley flow (Wang et al., 2021).

Unlike marginal observational approaches, most of these approaches consider the relations between input features (Heskes et al., 2020). They do not assume independence between them. They tend to do so by relying on a causal representation of the model or the “world” through a causal diagram. This entails making explicit some assumptions about the features considered in a model. Such a representation can help understand the interaction of the input features within a model or a system, and to reason about potential interventions (Wang et al., 2021). When modelled as a causal driver of both re-arrests and anti-law enforcement resentment (“antisocial cognition”), for example, intensive policing could be reasoned about as a site to intervene on (e.g. to be reduced) to mitigate these outcomes.

Regarding fairness, this can potentially respect and enhance one’s agency. The ability for someone to exercise their agency is close to values such as self-determination and autonomy (Christman, 2020). Interventional approaches could do so in the form of consequential recommendations. Given a negative outcome, consequential recommendations provide the minimum intervention required to obtain a better result (Karimi et al., 2020). For example, they might suggest how much a credit applicant needs to increase their credit score or income to raise their chances of receiving a loan. However, since most protected characteristics cannot be changed, these recommendations suggest interventions on so-called “intervenable” factors, such as income in credit risk assessment, employment in crime risk assessment or nutrition in health risk assessment.

4.3 Why Bias: Counterfactual Approaches for Algorithmic Recourse

Given observed values for protected characteristics and outcomes, these approaches can suggest whether a hypothetical intervention to alter a subject’s protected characteristic would have changed the outcome. Examples are provided by Galhotra et al. (2021) and Karimi et al., (2020, 2021). Most notably, Galhotra et al. (2021) propose “probabilistic contrastive counterfactuals” which not only help quantify the direct and indirect effects of a feature on outcomes, but also provide actionable recourse to individuals negatively affected by such an outcome.

These methods can allow users to check whether a protected characteristic (e.g. race or gender) was the cause of a specific discriminatory outcome, and what can be done to change that outcome. Beyond interventional approaches, counterfactual ones allow one to know not only how to act, but also to understand what brought about a discriminatory outcome. Additionally, interventional approaches base their recommendations on the consequences they can bring about. They are forward-looking. By contrast, counterfactual approaches can base recommendations on what caused a discriminatory outcome. They are backward-looking. One is concerned with improving outcomes, the other with reversing unfavourable outcomes. In that respect, their relevance to fairness is still related to agency, but more closely to “recourse”.

In law, recourse refers to actions that individuals or corporations undertake to remedy unfair or unfavourable legal outcomes (Wallin, 1992). Algorithmic recourse refers to “the systematic process of reversing unfavourable decisions by algorithms and bureaucracies across a range of counterfactual scenarios” (p. 284) (Venkatasubramanian & Alfano, 2020). For example, a rejected loan applicant who is a woman can argue for recourse if there exists a positive counterfactual instance to hers; an applicant who is similar or “close” to her in every other feature but for gender, and who was granted a loan. Karimi et al. (2021) consider that algorithmic recourse is met when a candidate is provided both with an explanation of why the loan was rejected and offered recommendation(s) on how to obtain the loan in the future (Karimi et al., 2021). Algorithmic recourse, they claim, is achieved when one can “can understand and accordingly act to alleviate an unfavourable situation” (p. 2) (Karimi et al., 2021). Given that these explanations are formed by looking at an opposite outcome in a unit that is the same or similar but for a protected feature, these explanations are often referred to as contrastive explanations (Galhotra et al., 2021; Karimi et al., 2021). These formulate explanations in terms of explaining why this outcome rather than another (“the opposite”) happened.

5 Questioning XAI Approaches

What does it mean, however, for a protected feature such as gender or race to “be associated with”, “alter” or “cause” a discriminatory outcome? This section takes inspiration from Kohler-Hausmann’s (2019) constructivist theory of discrimination to question how these XAI methods approach it with respect to protected features.

5.1 From Procedural Explanations to Evidential Observations

From a statistical standpoint, the main objection to using observational approaches for procedural fairness arises from their assumption of feature independence. Marginal Shapley values ignore the fact that a change in one input feature may cause a change in another. If protected characteristics present spurious correlations with the discriminatory outcome via another, possibly unobserved, feature this will produce misleading explanations (Heskes et al., 2020; Nabi & Shpitser, 2018). Conditional Shapley values recognize the presence of other features, and how this can influence the contribution of other features under consideration. However, they do not usually rely on a causal representation (e.g. a causal graph) of the relationship between characteristics. Similarly to some interventional approaches, they might not distinguish intermediate outcomes from covariates (Greiner, 2008) and produce unreliable explanations.

From a critical standpoint, we should ask what it means for a protected characteristic such as race or gender to be independent of other input features. We claim that the problem with assuming independence is that it assumes that protected features can be represented as discrete units, existing in isolation rather than in relation to a host of other variables or features.

The independence assumption comes with important conceptual repercussions. In reality, one cannot separate being a woman or being a person of colour from one’s socioeconomic circumstances, at least in societies where gender or racial inequalities are present (Hu, 2019). Even when this dependence is acknowledged by conditional approaches, i.e., by “controlling for” or “conditioning on” gender, it should be understood in relation to the entire system of other features within which it is embedded. To attribute a role to gender in a credit decision without considering how variables such as income, marital status, and education relate to each other is to misunderstand or even deny the very role that gender plays in credit settings. One is not classified as less likely to pay back a loan just because they are a woman, but because of how that “comes with”, and thus influences, a host of other input features, i.e., lower income because of the gender pay gap.

These approaches are thus more appropriate to observe whether gender played a role, rather than to explain which role gender played vis-à-vis other features. While this aligns with the aims of procedural fairness, these considerations should serve as a caveat to avoid using these approaches beyond their means. It is essential to avoid using the contributions they estimate as full explanations, but rather as mere evidence of a potential link with gender, which needs to be further tested with, for example, causal inference methods or by engaging with the individuals or communities affected. We thus suggest referring to explanations produced through these approaches as “evidential observations”, explanations that may trigger further investigation into the conditions that brought about a discriminatory outcome.

5.2 From Consequential to Constructive Recommendations

One objection that can be brought to consequential recommendations obtained through interventional approaches is precisely that they do not rely on causes, but on whatever brings about the best consequences. As a result, the actions they offer can be either inconsistent or have little to do with discrimination. For example, recommendations tailored towards making rejected applicants more likely to be accepted for a loan might require them to change their race. As this is impossible (and wrong), other input features on which an individual can more plausibly intervene are often used instead. In this case, recommendations can provide a relatively sensible suggestion, such as changing one’s job. However, if one’s rejection was, in fact, a result of (racial) discrimination, this does little to resolve the original injustice.

We ought to ask, what is problematic about “intervening on” a protected characteristic such as race or gender? This request, we argue, entails an assumption about responsibility. Namely, it ascribes to the person with the protected feature the responsibility to change their situation. However, protected features are traits of the unit. Not only can one not practically intervene on or change their race, but also, normatively, one should not be held responsible for it. This is specifically relevant considering our previous claim; that interventional approaches can be useful for fairness with regard to enhancing one’s agency. When developing consequential recommendations, it is crucial that the responsibility for changing an outcome rests on the people who designed and deployed the algorithm (e.g., developers or providers), not the end users subjected to their results.

As such, any consequential recommendation that focuses on the effect of protected characteristics on a discriminatory outcome should not simply rely on another feature that can be “intervened on”. It should instead provide a reason for the developers to reflect on how its model represents these protected characteristics. Given that interventional and causal Shapley values usually rely on a causal representation of the model or the “world” through a causal graph, these approaches can promote fairness when developers use them to understand how protected features such as race or gender are represented in their model alongside a set of other features (Hu, 2019). This is especially relevant given that causal graphs encode direct and indirect associations. Thus, they allow one to reason about positive and negative contributions to skewed performance.

As Kohler-Hausmann (2019) suggests, protected features do not have causal effects so much as structural properties: they are embedded within structures, whether social systems or algorithmic models, influencing and constructing their meaning and role. As such, these approaches could be used to provide constructive recommendations to developers on how to change the way gender or race are represented in the model vis-à-vis other input features. For example, this could be done by realizing that some variables seemingly unrelated to race, e.g. zip codes, are proxies for it in their model and how they could reconfigure this relation to change it. Rather than bringing about desired consequences for and from users, these approaches can be better framed as constructive recommendations from developers toward users.

5.3 From Contrastive to Constitutive Explanations

Regarding counterfactual approaches, one could contend that, even though theoretically sound, they are often too demanding to realize in practice (Verma et al., 2020). Computing counterfactuals generally requires not just a causal graph but also knowledge of the structural equations that govern the relationships between nodes. As we noted previously, some XAI approaches that rely on this intuition overcome these limits by framing the search for the counterfactual as an optimization problem. The counterfactual is found by characterizing a notion of distance that allows us to identify the nearest hypothetical point, which is classified differently from the one considered (Wachter et al., 2017).

While this practical solution reduces the epistemic demands of counterfactuals, we should ask what it means for gender or race to ‘cause’ a discriminatory outcome. The solution provided above relies on the assumption that the two “similar” or “nearest” units are the same but for the protected features. For a protected feature to be causal, as we suggested before, would mean that it is the only factor by which these two similar units differ and that, given that these lead to two different outcomes, the protected feature must be the cause. However, as Kohler-Hausmann (2019) suggests, a counterfactual unit is not the same but for the protected feature. It is a different unit precisely because of the protected feature. The epistemic assumption that contrastive explanations make is that gender or race are “causes” rather than that they constitute or characterize different units or “worlds”.

In this respect, one should talk about constitutive rather than counterfactual explanations. To say that a given system has a causal effect because of how it is constituted is to suggest that if one changed parts of that system, it would have a different causal effect. However, by the logic above, it would also be a different system. In this sense, a constitutive explanation attempts to explain different outcomes by pointing not to the “cause” but to the parts that constitute these different units. With relevance to fairness, explaining different outcomes concerning these parts and their organization can help shed light on the conditions by which similar individuals are treated differently. For example, given two similar individuals with different credit outcomes, a constitutive explanation would entail naming the features by which these individuals are considered similar precisely as what makes their different treatment unfair. Going back to the aim of this article, it would help approach bias ex ante by focusing on the conditions that constitute it, rather than only ex post by focusing on fixing the consequences.

6 Recommendations for AI Policy and Governance

This article argues that the Fairness, Accountability, and Transparency (FAccT) literature tends to focus on bias as a problem that requires ex post solutions, e.g. fairness metrics, rather than also addressing the underlying social and technical conditions that (re)produce it. It proposes a complementary strategy that uses genealogy as a constructive, epistemic critique to explain algorithmic bias in terms of the conditions for its possibility. In this respect, the article has focused on XAI feature attribution approaches (Shapley values) and counterfactuals as potential tools to shed light on these conditions.

Given the considerations above, we conclude with three recommendations that can be useful for XAI practitioners (researchers, developers, and providers of XAI tools) when developing or deploying the XAI approaches and AI policymakers when regulating AI with fairness in mind.

(1) The relevance of explainability for fairness should be explicitly articulated and integrated in AI development and regulation.

Section 2 reports an unclear, and sometimes counterproductive relationship between XAI approaches and bias. This is backed by an increasing number of libraries, such as IBM360Footnote 3 and WhatIf tools,Footnote 4 that offer explainability and fairness tools without providing specific guidance about which tools can best tackle which aspect of bias. IBM360, for example, provides two entirely separate libraries for fairness and explainability: AI Fairness360 and AI Explainability360. This leaves open the possibility that these XAI approaches may be mistakenly or deceitfully employed in their application to fairness problems; for instance, that an XAI method relevant for procedural fairness is used beyond its capacity to detect whether a protected feature played a role in the outcome, to investigate which role it played. As we claimed in this article, when developing and deploying XAI approaches, XAI practitioners should think about them concerning the fairness-relevant questions they can answer and the fairness-relevant solutions that they help identify; for example, as presented in this paper, whether they are useful for procedural fairness or to enhance or protect one’s agency. Additionally, they should provide clear instructions about how their approaches can be used in concert with others and list their limitations and strengths concerning the fairness-relevant purpose they can play. It is also crucial that AI regulatory initiatives introduce measures that recognize and promote the coherence and complementarity of the properties of XAI methods for fairness. Without such recognition, we might miss out on the opportunities they offer but also enhance the risks entailed by their misuse. An example can be found in how one of the latest amendments to the AI Act weakened the fairness-relevant potential of explainability by shifting the focus on ensuring oversight and traceability rather than empowering end-users rights to explanations (Nannini et al., 2023).

(2) The responsibility to act on discriminatory outcomes should not lie exclusively with discriminated users.

XAI approaches should not only be designed for discriminated users seeking advice or recourse after being subjected to discrimination. Research suggests that many users might not even know that they are interacting with an algorithm, let alone that they have been discriminated against [reference anonymised]. It is crucial that XAI practitioners develop methods that not only help recognize instances of discrimination, but also provide constructive explanations on how to address their negative impact. These explanations could suggest interventions that AI providers can undertake to prevent discrimination or to intervene on it once it occurs (Karimi et al., 2021). This would not only help protect consumers from discrimination but also help AI providers prevent future liability claims under upcoming legislative proposals (Hacker, 2022). AI regulatory initiatives increasingly rely on ensuring compliance through risk management, audits and certification (Roberts et al., 2023). Additionally, it has been suggested that external validation of models by trusted third parties can ensure the reproducibility of results and surface biases (Haibe-Kains et al., 2020). Interventional XAI approaches could be used to provide feedback to AI providers in the form of constructive recommendations from third-party audits [reference anonymised] to help them adhere to AI regulations.

(3) Explanations of discriminatory outcomes should name conditions, rather than just the main cause(s) of discrimination.

Hu (2019) suggests that causal graphs can be useful to explicitly state the assumptions one makes about a problem and to examine the social biases at play. Similarly, causal graphs could help represent the features that constitute a discriminatory outcome, and how they relate to each other. This approach would provide explanations that identify the primary factor responsible for discrimination and enhance our comprehension of how it relates to the set of similar conditions that contribute to an unjustifiably different outcome and how these conditions relate to one another. As mentioned, this can help shed light on the conditions by which similar individuals are treated differently. While research proposes the potential use of some fairness metrics to create a prima facie case for discrimination (Wachter et al., 2021), XAI approaches could go further and help provide a richer understanding of systemic discrimination. This is especially relevant in light of new regulatory initiatives such as the California Racial Justice Act (2020), which allows the use of statistical evidence in letting people charged with (or convicted of) a crime raise issues of racial bias and discrimination. As the law relies on a counterfactual intuitionFootnote 5 to prove the existence of racial disparities, counterfactual approaches could provide a richer picture of what constitutes these disparities. As suggested in Sect. 5.3, this could be done by naming the features by which these two ethnically different yet similar individuals receive a different outcome as what makes their differential treatment one firmly rooted in systemic racism.

7 Conclusion

By shifting the focus to the conditions for rather than the consequences of discriminatory outcomes, this article hopes to emphasize the importance of understanding and preventing algorithmic discrimination. The genealogical approach proposed here, both in its constructive and critical components, can help tailor the application of XAI approaches not only to “see” discrimination but also to “govern” and “understand” its workings (Ananny & Crawford, 2018). At the same time, it can help us by recognising these approaches' limitations in representing and addressing algorithmic discrimination. Pragmatically, we have provided a set of policy recommendations by which both these constructive and critical components can be integrated into AI development and regulation.

Significantly, we also recognize that XAI approaches can only go so far in matters of discrimination. They should be envisioned as part of a more comprehensive strategy. Thus, in this article, we evaluate them in their potential to support, rather than supplant, approaches to address algorithmic discrimination. Thinking with a genealogical approach in mind, future research could explore how these XAI approaches could be used in concert with qualitative and contextual efforts to (re)construct how historical disparities emerge and are reproduced in AI systems. For example, their ability to surface hints of the conditions that enable discrimination could be corroborated through qualitative data and the participation of the communities involved to build a more comprehensive and accurate picture.