1 Introduction

Machine learning is increasingly used in a wide range of decision-making scenarios that have serious implications for individuals and society, including financial lending [10, 35], hiring [8, 27], online advertising [26, 40], pretrial and immigration detention [5, 42], child maltreatment screening [13, 46], health care [18, 31], and social services [1, 22]. Whilst this has the potential to overcome undesirable aspects of human decision-making, there is concern that biases in the data and model inaccuracies can lead to decisions that treat historically discriminated groups unfavourably. The research community has therefore started to investigate how to ensure that learned models do not take decisions that are unfair with respect to sensitive attributes (e.g. race or gender).

This effort has led to the emergence of a rich set of fairness definitions [12, 15, 20, 23, 37] providing researchers and practitioners with criteria to evaluate existing systems or to design new ones. Many such definitions have been found to be mathematically incompatible [7, 12, 14, 15, 29], and this has been viewed as representing an unavoidable trade-off establishing fundamental limits on fair machine learning, or as an indication that certain definitions do not map on to social or legal understandings of fairness [16].

Most fairness definitions focus on the relationship between the model output and the sensitive attribute. However, deciding which relationship is appropriate for the model under consideration requires careful considerations about the patterns of unfairness underlying the training data. Therefore, the choice of a fairness definition always needs to consider the dataset used to train the model. In this manuscript, we use the framework causal Bayesian network draw attention to this point, by visually describing unfairness in a dataset as the presence of an unfair causal effect of the sensitive attribute in the data-generation mechanism. We then use this viewpoint to raise concern on the fairness debate surrounding the COMPAS pretrial risk assessment tool. Finally, we show that causal Bayesian networks offer a powerful tool for representing, reasoning about, and dealing with complex unfairness scenarios.

2 A Graphical View of (Un)fairness

Consider a dataset \(\varDelta =\{a^n,x^n,y^n\}_{n=1}^N\), corresponding to N individuals, where \(a^n\) indicates a sensitive attribute, and \(x^n\) a set of observations that can be used (together with \(a^n\)) to form a prediction \(\hat{y}^n\) of outcome \(y^n\). We assume a binary setting \(a^n,y^n,\hat{y}^n\in \{0,1\}\) (unless otherwise specified), and indicate with \(A,\mathcal{X}\), Y, and \(\hat{Y}\) the (set of) random variablesFootnote 1 corresponding to \(a^n,x^n,y^n\), and \(\hat{y}^n\).

In this section we show at a high-level that a correct use of fairness definitions concerned with statistical properties of \(\hat{Y}\) with respect to A requires an understanding of the patterns of unfairness underlying \(\varDelta \), and therefore of the relationships among A, \(\mathcal{X}\) and Y. More specifically we show that:

  1. (i)

    Using the framework of causal Bayesian networks (CBNs), unfairness in \(\varDelta \) can be viewed as the presence of an unfair causal path from A to \(\mathcal{X}\) or Y.

  2. (ii)

    In order to determine which properties \(\hat{Y}\) should possess to be fair, it is necessary to question and understand unfairness in \(\varDelta \).

figure a

Assume a dataset \(\varDelta =\{a^n,x^n=\{q^n,d^n\},y^n\}_{n=1}^N\) corresponding to a college admission scenario in which applicants are admitted based on qualifications Q, choice of department D, and gender A; and in which female applicants apply more often to certain departments. This scenario can be represented by the CBN on the left (see Appendix A for an overview of BNs, and Sect. 3 for a detailed treatment of CBNs). The causal path \(A\rightarrow Y\) represents direct influence of gender A on admission Y, capturing the fact that two individuals with the same qualifications and applying to the same department can be treated differently depending on their gender. The indirect causal path \(A\rightarrow D \rightarrow Y\) represents influence of A on Y through D, capturing the fact that female applicants more often apply to certain departments. Whilst the direct influence \(A\rightarrow Y\) is certainly an unfair one, the paths \(A\rightarrow D\) and \(D\rightarrow Y\), and therefore \(A\rightarrow D \rightarrow Y\), could either be considered as fair or as unfair. For example, rejecting women more often due to department choice could be considered fair with respect to college responsibility. However, this could be considered unfair with respect to societal responsibility if the departmental differences were a result of systemic historical or cultural factors (e.g. if female applicants apply to specific departments at lower rates because of overt or covert societal discouragement). Finally, if the college were to lower the admission rates for departments chosen more often by women, then the path \(D \rightarrow Y\) would be unfair.

Deciding whether a path is fair or unfair requires careful ethical and sociological considerations and/or might not be possible from a dataset alone. Nevertheless, this example illustrates that we can view unfairness in a dataset as the presence of an unfair causal path from the sensitive attribute A to \(\mathcal{X}\) or Y.

Different (un)fair path labeling requires \(\hat{Y}\) to have different characteristics in order to be fair. In the case in which the causal paths from A to Y are all unfair (e.g. if \(A\rightarrow D \rightarrow Y\) is considered unfair), a \(\hat{Y}\) that is statistically independent of A (denoted with ) would not contain any of the unfair influence of A on Y. In such a case, \(\hat{Y}\) is said to satisfy demographic parity.

Demographic Parity (DP). \(\hat{Y}\) satisfies demographic parity if , i.e. \(p(\hat{Y}=1|A=0)=p(\hat{Y}=1|A=1)\), where e.g. \(p(\hat{Y}=1|A=0)\) can be estimated as

$$\begin{aligned} p(\hat{Y}=1|A=0) \approx \frac{1}{N_0} \sum _{n=1}^{N} \mathbb {1}_{\hat{y}^n = 1, a^n=0}, \end{aligned}$$

with \(\mathbb {1}_{\hat{y}^n = 1, a^n=0}=1\) if \(\hat{y}^n = 1\) and \(a^n=0\) (and zero otherwise), and where \(N_0\) is the number of individuals with \(a^n=0\). Notice that many classifiers, rather than a binary prediction \(\hat{y}^n\in \{0,1\}\), output a degree of belief that the individual belongs to class 1, \(r^n\), also called score. This could correspond to the probability of class 1, \(r^n=p(y^n = 1 | a^n, x^{n})\), as in the case of logistic regression. To obtain the prediction \(\hat{y}^n\in \{0,1\}\) from \(r^n\), it is common to use a threshold \(\theta \), i.e. \(\hat{y}^n=\mathbb {1}_{r^n>\theta }\). In this case, we can rewrite the estimate for \(p(\hat{Y}=1|A=0)\) as

$$\begin{aligned} p(\hat{Y}=1|A=0) \approx \frac{1}{N_0}\sum _{n=1}^{N}\mathbb {1}_{r^n>\theta ,a^n=0}. \end{aligned}$$

Notice that implies for all values of \(\theta \).

In the case in which the causal paths from A to Y are all fair (e.g. if \(A\rightarrow Y\) is absent and \(A\rightarrow D\rightarrow Y\) is considered fair), a \(\hat{Y}\) such that or would be allowed to contain such a fair influence, but the (dis)agreement between Y and \(\hat{Y}\) would not be allowed to depend on A. In these cases, \(\hat{Y}\) is said to satisfy equal false positive/false negative rates and calibration respectively.

Equal False Positive and Negative Rates (EFPRs/EFNRs). \(\hat{Y}\) satisfies EFPRs and EFNRs if , i.e.  (EFPRs) \(p(\hat{Y}=1|Y=0,A=0)=p(\hat{Y}=1|Y=0,A=1)\) and (EFNRs) \(p(\hat{Y}=0|Y=1,A=0)=p(\hat{Y}=0|Y=1,A=1)\).

Calibration. \(\hat{Y}\) satisfies calibration if . In the case of score output R, this condition is often instead called predictive parity at threshold \(\theta \), \(p(Y=1|R>\theta ,A=0)=p(Y=1|R>\theta ,A=1)\), and calibration defined as requiring .

In the case in which at least one causal path from A to Y is unfair (e.g. if \(A\rightarrow Y\) is present), EFPRs/EFNRs and calibration are inappropriate criteria, as they would not require the unfair influence of A on Y to be absent from \(\hat{Y}\) (e.g. a perfect model (\(\hat{Y} = Y\)) would automatically satisfy EFPRs/EFNRs and calibration, but would contain the unfair influence). This observation is particularly relevant to the recent debate surrounding the correctional offender management profiling for alternative sanctions (COMPAS) pretrial risk assessment tool. We revisit this debate in the next section.

2.1 The COMPAS Debate

Over the past few years, numerous state and local governments around the United States have sought to reform their pretrial court systems with the aim of reducing unprecedented levels of incarceration, and specifically the population of low-income defendants and racial minorities in America’s prisons and jails [2, 24, 30]. As part of this effort, quantitative tools for determining a person’s likelihood for reoffending or failure to appear, called risk assessment instruments (RAIs), were introduced to replace previous systems driven largely by opaque discretionary decisions and money bail [6, 25]. However, the expansion of pretrial RAIs has unearthed new concerns of racial discrimination which would nullify the purported benefits of these systems and adversely impact defendants’ civil liberties.

An intense ongoing debate, in which the research community has also been heavily involved, was triggered by an exposé from investigative journalists at ProPublica [5] on the COMPAS pretrial RAI developed by Equivant (formerly Northpointe) and deployed in Broward County in Florida. The COMPAS general recidivism risk scale (GRRS) and violent recidivism risk scale (VRRS), the focus of ProPublica’s investigation, sought to leverage machine learning techniques to improve the predictive accuracy of recidivism compared to older RAIs such as the level of service inventory-revised [3] which were primarily based on theories and techniques from a sub-field of psychology known as the psychology of criminal conduct [4, 9]Footnote 2.

Fig. 1.
figure 1

Number of black and white defendants in each of two aggregate risk categories [14]. The overall recidivism rate for black defendants is higher than for white defendants (52% vs. 39%), i.e. . Within each risk category, the proportion of defendants who reoffend is approximately the same regardless of race, i.e. . Black defendants are more likely to be classified as medium or high risk (58% vs. 33%) i.e. . Among individuals who did not reoffend, black defendants are more likely to be classified as medium or high risk than white defendants (44.9% to 23.5%). Among individuals who did reoffend, white defendant are more likely to be classified as low risk than black defendants (47.7% vs 28%), i.e. .

ProPublica’s criticism of COMPAS centered on two concerns. First, the authors argued that the distribution of the risk score \(R\in \{1,\ldots ,10\}\) exhibited discriminatory patterns, as black defendants displayed a fairly uniform distribution across each value, while white defendants exhibited a right skewed distribution, suggesting that the COMPAS recidivism risk scores disproportionately rated white defendants as lower risk than black defendants. Second, the authors claimed that the GRRS and VRRS did not satisfy EFPRs and EFNRs, as \(\text {FPRs}\,=\,44.9\%\) and \(\text {FNRs}\,=\,28.0\%\) for black defendants, whilst \(\text {FPRs}\,=\,23.5\%\) and \(\text {FNRs}\,=\,47.7\%\) for white defendants (see Fig. 1). This evidence led ProPublica to conclude that COMPAS had a disparate impact on black defendants, leading to public outcry over potential biases in RAIs and machine learning writ large.

In response, Equivant published a technical report [19] refuting the claims of bias made by ProPublica and concluded that COMPAS is sufficiently calibrated, in the sense that it satisfies predictive parity at key thresholds. Subsequent analyses [12, 15, 29] confirmed Equivant’s claims of calibration, but also demonstrated the incompatibility of EFPRs/EFNRs and calibration due to differences in base rates across groups () (see Appendix B). Moreover, the studies suggested that attempting to satisfy these competing forms of fairness force unavoidable trade-offs between criminal justice reformers’ purported goals of racial equity and public safety.

As explained in Sect. 2, is an appropriate fairness criterion when influence from A is considered unfair, whilst EFPRs/EFNRs and calibration, by requiring the rate of (dis)agreement between Y and \(\hat{Y}\) to be the same for black and white defendants (and therefore by not being concerned with dependence of Y on A), are appropriate when influence from A is considered fair. Therefore, if dependence of Y on A includes influence of A in Y through an unfair causal path, both EFPRs/EFNRs and calibration would be inadequate, and the fact that they cannot be satisfied at the same time irrelevant.

Fig. 2.
figure 2

Possible CBNunderlying the dataset used for COMPAS.

As previous research has shown [28, 34, 43], modern policing tactics center around targeting a small number of neighborhoods—often disproportionately populated by non-white and low income residents—with recurring patrols and stops. This uneven distribution of police attention, as well as other factors such as funding for pretrial services [30, 45], means that differences in base rates between racial groups are not reflective of ground truth rates. We can rephrase these findings as indicating the presence of a direct path \(A\rightarrow Y\) (through unobserved neighborhood) in the CBN representing the data-generation mechanism (Fig. 2). Such tactics also imply an influence of A on Y through the set of variables \({\mathcal F}\) containing number of prior arrests. In addition, the influence of A on Y through \(A\rightarrow Y\) and \(A \rightarrow {\mathcal F} \rightarrow Y\) could be more prominent or contain more unfairness due to racial discrimination.

These observations indicate that EFPRs/EFNRs and calibration are inappropriate criteria for this case, and more generally that the current fairness debate surrounding COMPAS gives insufficient consideration to the patterns of unfairness underlying the data. Our analysis formalizes the concerns raised by social scientists and legal scholars on mismeasurement and unrepresentative data in the US criminal justice system. Multiple studies [21, 33, 36, 45] have argued that the core premise of RAIs, to assess the likelihood a defendant reoffends, is impossible to measure and that the empirical proxy used (e.g. arrest or conviction) introduces embedded biases and norms which render existing fairness tests unreliable.

This section used the CBN framework to describe at a high-level different patterns of unfairness that can underlie a dataset and to point out issues with current deployment of fairness definitions. In the remainder of the manuscript, we use this framework more extensively to further advance our analysis on fairness. Before doing that, we give some background on CBNs [17, 38, 39, 41, 44], assuming that all variables except A are continuous.

3 Causal Bayesian Networks

A Bayesian network is a directed acyclic graph where nodes and edges represent random variables and statistical dependencies. Each node \(X_i\) in the graph is associated with the conditional distribution \(p(X_i|\text {pa}(X_i))\), where \(\text {pa}(X_i)\) is the set of parents of \(X_i\). The joint distribution of all nodes, \(p(X_1, \ldots , X_I)\), is given by the product of all conditional distributions, i.e. \(p(X_1,\ldots ,X_I)=\prod _{i=1}^Ip(X_i|\text {pa}(X_i))\) (see Appendix A for more details on Bayesian networks).

When equipped with causal semantic, namely when representing the data-generation mechanism, Bayesian networks can be used to visually express causal relationships. More specifically, CBNs enable us to give a graphical definition of causes and causal effects: if there exists a directed path from A to Y, then A is a potential cause of Y. Directed paths are also called causal paths.

Fig. 3.
figure 3

(a): CBN with a confounder C for the effect of A on Y. (b): Modified CBN resulting from intervening on A.

The causal effect of A on Y can be seen as the information traveling from A to Y through causal paths, or as the conditional distribution of Y given A restricted to causal paths. This implies that, to compute the causal effect, we need to disregard the information that travels along non-causal paths, which occurs if such paths are open. Since paths with an arrow emerging from A are either causal or closed (blocked) by a collider, the problematic paths are only those with an arrow pointing into A, called back-door paths, which are open if they do not contain a collider.

An example of an open back-door path is given by \(A\leftarrow C \rightarrow Y\) in the CBN \(\mathcal{G}\) of Fig. 3(a): the variable C is said to be a confounder for the effect of A on Y, as it confounds the causal effect with non-causal information. To understand this, assume that A represents hours of exercise in a week, Y cardiac health, and C age: observing cardiac health conditioning on exercise level from p(Y|A) does not enable us to understand the effect of exercise on cardiac health, since p(Y|A) includes the dependence between A and Y induced by age.

Each parent-child relationship in a CBN represents an autonomous mechanism, and therefore it is conceivable to change one such a relationship without changing the others. This enables us to express the causal effect of \(A=a\) on Y as the conditional distribution \(p_{\rightarrow A=a}(Y|A=a)\) on the modified CBN \(\mathcal{G}_{\rightarrow A=a}\) of Fig. 3(b), resulting from replacing p(A|C) with a Dirac delta distribution \(\delta _{A=a}\) (thereby removing the link from C to A) and leaving the remaining conditional distributions p(Y|AC) and p(C) unaltered – this process is called intervention on A. The distribution \(p_{\rightarrow A=a}(Y|A=a)\) can be estimated as \(p_{\rightarrow A=a}(Y|A=a) = \int _C p_{\rightarrow A=a}(Y|A=a,C)p_{\rightarrow A=a}(C|A=a) = \int _C p(Y|A=a,C)p(C)\). This is a special case of the following back-door adjustment formula.

Back-Door Adjustment. If a set of variables \(\mathcal{C}\) satisfies the back-door criterion relative to \(\{A, Y\}\), the causal effect of A on Y is given by \(p_{\rightarrow A}(Y|A)=\int _\mathcal{C} p(Y|A,\mathcal{C})p(\mathcal{C})\). \(\mathcal{C}\) satisfies the back-door criterion if (a) no node in \(\mathcal{C}\) is a descendant of A and (b) \(\mathcal{C}\) blocks every back-door path from A to Y.

The equality \(p_{\rightarrow A=a}(Y|A=a,\mathcal{C}) = p(Y|A=a,\mathcal{C})\) follows from the fact that \(\mathcal{G}_{A \rightarrow }\), obtained by removing from \(\mathcal{G}\) all links emerging from A, retains all (and only) the back-door paths from A to Y. As \(\mathcal{C}\) blocks all such paths, in \(\mathcal{G}_{A \rightarrow }\). This means that there is no non-causal information traveling from A to Y when conditioning on \(\mathcal{C}\) and therefore conditioning on A coincides with intervening.

Fig. 4.
figure 4

(a): CBN in which conditioning on C closes the paths \(A\leftarrow C\leftarrow X \rightarrow Y\) and \(A\leftarrow C\rightarrow Y\) but opens the path \(A\leftarrow E\rightarrow C\leftarrow X \rightarrow Y\). (b): CBN with one direct and one indirect causal path from A to Y.

Conditioning on C to block an open back-door path may open a closed path on which C is a collider. For example, in the CBN of Fig. 4(a), conditioning on C closes the paths \(A\leftarrow C\leftarrow X \rightarrow Y\) and \(A\leftarrow C\rightarrow Y\), but opens the path \(A\leftarrow E\rightarrow C\leftarrow X \rightarrow Y\) (additional conditioning on X would close \(A\leftarrow E\rightarrow C\leftarrow X \rightarrow Y\)).

The back-door criterion can also be derived from the rules of do-calculus [38, 39], which indicate whether and how \(p_{\rightarrow A}(Y|A)\) can be estimated using observations from \(\mathcal{G}\): for many graph structures with unobserved confounders the only way to compute causal effects is by collecting observations directly from \(\mathcal{G}_{\rightarrow A}\) – in this case the effect is said to be non-identifiable.

Potential Outcome Viewpoint. Let \(Y_{A=a}\) be the random variable with distribution \(p(Y_{A=a}) = p_{\rightarrow A=a}(Y|A=a)\). \(Y_{A=a}\) is called potential outcome and, when not ambiguous, we will refer to it with the shorthand \(Y_a\). The relation between \(Y_{a}\) and all the variables in \(\mathcal{G}\) other than Y can be expressed by the graph obtained by removing from \(\mathcal{G}\) all the links emerging from A, and by replacing Y with \(Y_{a}\). If \(Y_{a}\) is independent on A in this graph, thenFootnote 3 \(p(Y_a)=p(Y_a|A=a)=p(Y|A=a)\). If \(Y_{a}\) is independent of A in this graph when conditioning on \(\mathcal{C}\), then

$$\begin{aligned} p(Y_{a}) = \int _{\mathcal{C}} p(Y_{a}|\mathcal{C}) p(\mathcal{C}) = \int _{\mathcal{C}} p(Y_{a}|A=a,\mathcal{C}) p(\mathcal{C}) = \int _{\mathcal{C}} p(Y|A=a,\mathcal{C}) p(\mathcal{C}), \end{aligned}$$

i.e. we retrieve the back-door adjustment formula.

In the remainder of the section we show that, by performing different interventions on A along different causal paths, it is possible to isolate the contribution of the causal effect of A on Y along a group of paths.

Direct and Indirect Effect

Consider the CBN of Fig. 4(b), containing the direct path \(A\rightarrow Y\) and one indirect causal path through the variable M. Let \(Y_{a}(M_{\bar{a}})\) be the random variable with distribution equal to the conditional distribution of Y given A restricted to causal paths, with \(A=a\) along \(A\rightarrow Y\) and \(A=\bar{a}\) along \(A\rightarrow M\rightarrow Y\). The average direct effect (ADE) of \(A=a\) with respect to \(A=\bar{a}\), defined as

$$\begin{aligned}&\text {ADE}_{\bar{a} a } =\langle Y_{a}(M_{\bar{a}}) \rangle _{p(Y_{a}(M_{\bar{a}}))} - \langle Y_{\bar{a}} \rangle _{p(Y_{\bar{a}})}, \end{aligned}$$

where e.g. \(\langle Y_{a} \rangle _{p(Y_{a})}=\int _{Y_{a}} Y_{a}p(Y_{a})\), measures the difference in flow of causal information from A to Y between the case in which \(A=a\) along \(A\rightarrow Y\) and \(A=\bar{a}\) along \(A\rightarrow M\rightarrow Y\) and the case in which \(A=\bar{a}\) along both paths.

Analogously, the average indirect effect (AIE) of \(A=a\) with respect to \(A=\bar{a}\), is defined as \(\text {AIE}_{\bar{a} a } =\langle Y_{\bar{a}}(M_a) \rangle _{p(Y_{\bar{a}}(M_a))} - \langle Y_{\bar{a}} \rangle _{p(Y_{\bar{a}})}\).

The difference \(\text {ADE}_{\bar{a} a } - \text {AIE}_{a \bar{a} }\) gives the average total effect (ATE) \(\text {ATE}_{\bar{a} a} = \langle Y_{a} \rangle _{p(Y_{a})} - \langle Y_{\bar{a}} \rangle _{p(Y_{\bar{a}})}\)Footnote 4.

Fig. 5.
figure 5

Top: CBN with the direct path from A to Y and the indirect paths passing through M highlighted in red. Bottom: CBN corresponding to (1). (Color figure online)

Path-Specific Effect

To estimate the effect along a specific group of causal paths, we can generalize the formulas for the ADE and AIE by replacing the variable in the first term with the one resulting from performing the intervention \(A=a\) along the group of interest and \(A=\bar{a}\) along the remaining causal paths. For example, consider the CBN of Fig. 5 (top) and assume that we are interested in isolating the effect of A on Y along the direct path \(A\rightarrow Y\) and the paths passing through M, \(A\rightarrow M \rightarrow ,\ldots ,\rightarrow Y\), namely along the red links. The path-specific effect (PSE) of \(A=a\) with respect to \(A=\bar{a}\) for this group of paths is defined as

$$\begin{aligned} \text {PSE}_{\bar{a} a} = \langle Y_a(M_a, L_{\bar{a}}(M_a)) \rangle -\langle Y_{\bar{a}} \rangle , \end{aligned}$$

where \(p(Y_a(M_a, L_{\bar{a}} (M_a)))\) is given by

$$\begin{aligned} \int _{C,M,L} p(Y|A=a,C,M,L)p(L|A=\bar{a},C,M)p(M|A=a,C)p(C). \end{aligned}$$

In the simple case in which the CBN corresponds to a linear model, e.g.

$$\begin{aligned}&A\sim \text {Bern}(\pi ), C = \epsilon _c,\nonumber \\&M=\theta ^m+\theta ^m_{a}A+\theta ^m_{c}C+\epsilon _m,L=\theta ^l+\theta ^l_{a}A+\theta ^l_{c}C+\theta ^l_{m}M+\epsilon _l,\nonumber \\&Y=\theta ^y+\theta ^y_{a}A+\theta ^y_{c}C+\theta ^y_{m}M+\theta ^y_{l}L+\epsilon _y, \end{aligned}$$
(1)

where \(\epsilon _c\), \(\epsilon _m\), \(\epsilon _l\) and \(\epsilon _y\) are unobserved independent zero-mean Gaussian variables, we can compute \(\langle Y_{\bar{a}} \rangle \) by expressing Y as a function of \(A=\bar{a}\) and the Gaussian variables, by recursive substitutions in CM and L, i.e. 

$$\begin{aligned} Y_{\bar{a}}&=\theta ^y+\theta ^y_{a}\bar{a}+\theta ^y_{c}\epsilon _c+\theta ^y_{m}(\theta ^m+\theta ^m_{a}\bar{a}+\theta ^m_{c}\epsilon _c+\epsilon _m)\\&+\theta ^y_{l}(\theta ^l+\theta ^l_{a}\bar{a}+\theta ^l_{c}\epsilon _c+\theta ^l_{m}(\theta ^m+\theta ^m_{a}\bar{a}+\theta ^m_{c}\epsilon _c+\epsilon _m)+\epsilon _l)+\epsilon _y, \end{aligned}$$

and then take the mean, obtaining \(\langle Y_{\bar{a}} \rangle =\theta ^y+\theta ^y_{a}\bar{a}+\theta ^y_{m}(\theta ^m+\theta ^m_{a}\bar{a})+\theta ^y_{l}(\theta ^l+\theta ^l_{a}\bar{a}+\theta ^l_{m}(\theta ^m+\theta ^m_{a}\bar{a}))\). Analogously

$$\begin{aligned} \langle Y_a(M_a, L_{\bar{a}}(M_a)) \rangle&=\theta ^y+\theta ^y_{a}a+\theta ^y_m(\theta ^m+\theta ^m_{a}a)+\theta ^y_l(\theta ^l+\theta ^l_a\bar{a}+ \theta ^l_m(\theta ^m+\theta ^m_{a}a)). \end{aligned}$$

For \(a=1\) and \(\bar{a} = 0\), this gives

$$\begin{aligned} \text {PSE}_{\bar{a} a}=\theta ^y_{a}(a-\bar{a})+\theta ^y_{m}\theta ^m_{a}(a-\bar{a})+\theta ^y_{l}\theta ^l_{m}\theta ^m_{a}(a-\bar{a})=\theta ^y_{a}+\theta ^y_{m}\theta ^m_{a}+\theta ^y_{l}\theta ^l_{m}\theta ^m_{a}. \end{aligned}$$

The same conclusion could have been obtained by looking at the graph annotated with path coefficients (Fig. 5 (bottom)). The PSE is obtained by summing over the three causal paths of interest (\(A\rightarrow Y\), \(A\rightarrow M \rightarrow Y\), and \(A\rightarrow M \rightarrow L \rightarrow Y\)) the product of all coefficients in each path.

Notice that \(\text {AIE}_{\bar{a} a}\), given by

(2)

coincides with \(\text {AIE}^a_{\bar{a} a}\), given by

(3)

Effect of Treatment on Treated. Consider the conditional distribution \(p(Y_{a}|A=\bar{a})\). This distribution measures the information travelling from A to Y along all open paths, when A is set to a along causal paths and to \(\bar{a}\) along non-causal paths. The effect of treatment on treated (ETT) of \(A=a\) with respect to \(A=\bar{a}\) is defined as \(\text {ETT}_{\bar{a} a} = \langle Y_{a} \rangle _{p(Y_{a}|A=\bar{a})}- \langle Y_{\bar{a}} \rangle _{p(Y_{\bar{a}}|A=\bar{a})} = \langle Y_{a} \rangle _{p(Y_{a}|A=\bar{a})} - \langle Y \rangle _{p(Y|A=\bar{a})}\). As the PSE, the ETT measures difference in flow of information from A to Y when A takes different values along different paths. However, the PSE considers only causal paths and different values for A along different causal paths, whilst the ETT considers all open paths and different values for A along causal and non-causal paths respectively. Similarly to \(\text {ATE}_{\bar{a} a}\), \(\text {ETT}_{\bar{a} a}\) for the CBN of Fig. 4(b) can be expressed as

$$\begin{aligned} \text {ETT}_{\bar{a} a}&=\underbrace{\langle Y_{a}(M_{\bar{a}}) \rangle - \langle Y_{\bar{a}} \rangle }_{\text {ADE}_{\bar{a} a|\bar{a}}} -(\underbrace{\langle Y_{a}(M_{\bar{a}}) \rangle - \langle Y_{a} \rangle }_{\text {AIE}_{a \bar{a}|\bar{a}}}). \end{aligned}$$

Notice that, if we define difference in flow of non-causal (along the open back-door paths) information from A to Y when \(A=a\) with respect to when \(A=\bar{a}\) as \(\text {NCI}_{\bar{a} a} = \langle Y_{\bar{a}} \rangle _{p(Y_{\bar{a}}|A=a)} - \langle Y \rangle _{p(Y|A=\bar{a})}\), we obtain

$$\begin{aligned} \langle Y \rangle _{p(Y|A=a)} - \langle Y \rangle _{p(Y|A=\bar{a})}&= \langle Y_{\bar{a}} \rangle _{p(Y_{\bar{a}}|A=a)} - \langle Y \rangle _{p(Y|A=\bar{a})}\\&- (\langle Y_{\bar{a}} \rangle _{p(Y_{\bar{a}}|A=a)} - \langle Y \rangle _{p(Y|A=a)})\\&= \text {NCI}_{\bar{a} a} - \text {ETT}_{a \bar{a}} = \text {NCI}_{\bar{a} a} - \text {ADE}_{a \bar{a}| a} + \text {AIE}_{\bar{a} a|a}. \end{aligned}$$

4 Fairness Considerations Using CBNs

Equipped with the background on CBNs from Sect. 3, in this section we further investigate unfairness in a dataset \(\varDelta =\{a^n, x^n, y^n\}_{n=1}^N\), discuss issues that might arise when building a decision system from it, and show how to measure and deal with unfairness in complex scenarios, revisiting and extending material from [11, 32, 47].

4.1 Back-Door Paths from A to Y

In Sect. 2 we have introduced a graphical interpretation of unfairness in a dataset \(\varDelta \) as the presence of an unfair causal path from A to \(\mathcal{X}\) or Y. More specifically, we have shown through a college admission example that unfairness can be due to an unfair link emerging (a) from A or (b) from a subsequent variable in a causal path from A to Y (e.g.  \(D\rightarrow Y\) in the example). Our discussion did not mention paths from A to Y with an arrow pointing into A, namely back-door paths. This is because such paths are not problematic.

figure b

To understand this, consider the hiring scenario described by the CBN on the left, where A represents religious belief and E educational background of the applicant, which influences religious participation (\(E\rightarrow A\)). Whilst due to the open back-door path from A to Y, the hiring decision Y is only based on E.

4.2 Opening Closed Unfair Paths from A to Y

In Sect. 2, we have seen that, in order to reason about fairness of \(\hat{Y}\), it is necessary to question and understand unfairness in \(\varDelta \). In this section, we warn that another crucial element needs to be considered in the fairness discussion around \(\hat{Y}\), namely

  1. (i)

    The subset of variables used to form \(\hat{Y}\) could project into \(\hat{Y}\) unfair patterns in \(\mathcal{X}\) that do not concern Y.

This could happen, for example, if a closed unfair path from A to Y is opened when conditioning on the variables used to form \(\hat{Y}\).

Fig. 6.
figure 6

CBN underlying a music degree scenario.

As an example, assume the CBN in Fig. 6 representing the data-generation mechanism underlying a music degree scenario, where A corresponds to gender, M to music aptitude (unobserved, i.e. \(M\notin \varDelta \)), X to the score obtained from an ability test taken at the beginning of the degree, and Y to the score obtained from an ability test taken at the end of the degree. Individuals with higher music aptitude M are more likely to obtain higher initial and final scores (\(M\rightarrow X\), \(M\rightarrow Y\)). Due to discrimination occurring at the initial testing, women are assigned a lower initial score than men for the same aptitude level (\(A \rightarrow X\)). The only path from A to Y, \(A\rightarrow X \leftarrow M \rightarrow Y\), is closed as X is a collider on this path. Therefore the unfair influence of A on X does not reach Y (). Nevertheless, as , a prediction \(\hat{Y}\) based on the initial score X only would contain the unfair influence of A on X. For example, assume the following linear model: \(Y=\gamma M, X =\alpha A + \beta M\), with \(\langle A^2 \rangle _{p(A)}=1\) and \(\langle M^2 \rangle _{p(M)}=1\). A linear predictor of the form \(\hat{Y} = \theta _X X\) minimizing \(\langle (Y-\hat{Y})^2 \rangle _{p(A)p(M)}\) would have parameters \(\theta _X=\gamma \beta /(\alpha ^2+\beta ^2)\), giving \(\hat{Y} = \gamma \beta (\alpha A + \beta M)/(\alpha ^2+\beta ^2)\), i.e. . Therefore, this predictor would be using the sensitive attribute to form a decision, although implicitly rather than explicitly. Instead, a predictor explicitly using the sensitive attribute, \(\hat{Y} = \theta _X X + \theta _A A\), would have parameters

$$\begin{aligned} \left( \begin{array}{c} \theta _X \\ \theta _A \\ \end{array} \right)&=\left( \begin{array}{cc} \alpha ^2+\beta ^2 &{} \alpha \\ \alpha &{} 1 \\ \end{array} \right) ^{-1} \left( \begin{array}{c} \gamma \beta \\ 0\\ \end{array} \right) =\left( \begin{array}{c} \gamma /\beta \\ -\alpha \gamma /\beta \\ \end{array} \right) , \end{aligned}$$

i.e. \(\hat{Y} = \gamma M\). Therefore, this predictor would be fair. From the CBN we can see that the explicit use of A can be of help in retrieving M. Indeed, since , using A in addition to X can give information about M. In general (e.g. in a non-linear setting) it is not guaranteed that using A would ensure . Nevertheless, this example shows how explicit use of the sensitive attribute in a model can ensure fairness rather than lead to unfairness.

This observation is relevant to one of the simplest fairness definitions, motivated by legal requirements, called fairness through unawareness, which states that \(\hat{Y}\) is fair as long as it does not make explicit use of the sensitive attribute A. Whilst this fairness criterion is often indicated as problematic because some of the variables used to form \(\hat{Y}\) could be a proxy for A (such as neighborhood for race), the example above shows a more subtle issue with it.

4.3 Path-Specific Population-Level Unfairness

In this section, we show that the path-specific effect introduced in Sect. 3 can be used to quantify unfairness in \(\varDelta \) in complex scenarios.

Consider the college admission example discussed in Sect. 2 (Fig. 7). In the case in which the path \(A\rightarrow D\), and therefore \(A\rightarrow D\rightarrow Y\), is considered unfair, unfairness overall population can be quantified with \(\langle Y \rangle _{p(Y|a)}-\langle Y \rangle _{p(Y|{\bar{a}})}\) (coinciding with \(\text {ATE}_{\bar{a} a} = \langle Y_{a} \rangle _{p(Y_{a})}-\langle Y_{\bar{a}} \rangle _{p(Y_{\bar{a}})}\)) where, for example, \(A=a\) and \(A=\bar{a}\) indicate female and male applicants respectively.

Fig. 7.
figure 7

CBN underlying a college admission scenario.

In the more complex case in which the path \(A \rightarrow D\rightarrow Y\) is considered fair, unfairness can instead be quantified with the path-specific effect along the direct path \(A\rightarrow Y\), \(\text {PSE}_{\bar{a} a}\), given by

$$\begin{aligned} \langle Y_{a}(D_{\bar{a}}) \rangle _{p(Y_{a}(D_{\bar{a}}))}-\langle Y_{\bar{a}} \rangle _{p(Y_{\bar{a}})} . \end{aligned}$$

Notice that computing \(p(Y_{a}(D_{\bar{a}}))\) requires knowledge of the CBN. If the CBN structure is not known or estimating its conditional distributions is challenging, the resulting estimate could be imprecise.

Path-Specific Individual-Level Unfairness

In the college admission example of Fig. 7 in which the path \(A \rightarrow D\rightarrow Y\) is considered fair, rather than measuring unfairness overall population, we might want to know e.g. whether a rejected female applicant \(\{a^n=a=1, q^n, d^n, y^n=0\}\) was treated unfairly. We can answer this question by estimating whether the applicant would have been admitted had she been male (\(A=\bar{a}=0\)) along the direct path \(A\rightarrow Y\) from \(p(Y_{\bar{a}}(D_a)|A=a, Q=q^n, D=d^n)\) (notice that the outcome in the actual world, \(y^n\), corresponds to \(p(Y_{a}(D_a)|A=a, Q=q^n, D=d^n)=\mathbb {1}_{Y_{a}(D_a)=y^n}\)).

To understand how this can be achieved, consider the following linear model associated to a CBN with the same structure as the one in Fig. 7

$$\begin{aligned}&A\sim \text {Bern}(\pi ), Q=\theta ^q+\epsilon _q, D=\theta ^d+\theta ^d_{a}A+\epsilon _d, Y=\theta ^y+\theta ^y_{a}A+\theta ^y_{q}Q+\theta ^y_{d}D+\epsilon _y. \end{aligned}$$
figure c

The relationships between AQDY and \(Y_{\bar{a}}(D_a)\) in this model can be inferred from the twin Bayesian network [38] on the left resulting from the intervention \(A=a\) along \(A\rightarrow D\) and \(A=\bar{a}\) along \(A\rightarrow Y\): in addition to AQDY, the network contains the variables \(Q^*\), \(D_a\) and \(Y_{\bar{a}}(D_a)\) corresponding to the counterfactual world in which \(A=\bar{a}\) along \(A\rightarrow Y\). The two groups of variables are connected through \(\epsilon _d, \epsilon _q, \epsilon _y\), indicating that the factual and counterfactual worlds share the same unobserved randomness. From this network, we can deduce that Footnote 5, and therefore that we can express \(p(Y_{\bar{a}}(D_a)|A=a, Q=q^n, D=d^n)\) as

(4)

As \(\epsilon ^n_q=q^n-\theta ^q\), \(\epsilon ^n_d = d^n-\theta ^d-\theta ^d_a\), we obtainFootnote 6 \(\langle Y_{\bar{a}}(D_a) \rangle _{p(Y_{\bar{a}}(D_a)|A=a, Q=q^n, D=d^n)}=\theta ^y+\theta ^y_{q}q^n+\theta ^y_{d}d^n\).

Equation (4) suggests that, in more complex scenarios (e.g. in which the variables are non-linearly related), we can obtain a Monte-Carlo estimate of \(p(Y_{\bar{a}}(D_a)|a,q^n, d^n)\) by sampling \(\epsilon _q\) and \(\epsilon _d\) from \(p(\epsilon _q, \epsilon _d|a, q^n, d^n)\).

In [11], we used this approach to introduce a prediction system such that the two distributions \(p(\hat{Y}_{\bar{a}}(D_a)|A=a, Q=q^n, D=d^n)\) and \(p(\hat{Y}_{a}(D_a)|A=a, Q=q^n, D=d^n)\) coincide – we called this property path-specific counterfactual fairness.

5 Conclusions

We used causal Bayesian networks to provide a graphical interpretation of unfairness in a dataset as the presence of an unfair causal effect of a sensitive attribute. We used this viewpoint to revisit the recent debate surrounding the COMPAS pretrial risk assessment tool and, more generally, to point out that fairness evaluation on a model requires careful considerations on the patterns of unfairness underlying the training data. We then showed that causal Bayesian networks provide us with a powerful tool to measure unfairness in a dataset and to design fair models in complex unfairness scenarios.

Our discussion did not cover difficulties in making reasonable assumptions on the structure of the causal Bayesian network underlying a dataset, nor on the estimations of the associated conditional distributions or of other quantities of interest. These are obstacles that need to be carefully considered to avoid improper usage of this framework.