1 Introduction

In 2019, Obermeyer et al. published a concerning article that illustrated how a widely used health risk-prediction tool, that is applied to roughly 200 million individuals in the U.S. per year, exhibited significant racial bias [1]. They found that this bias was due to the algorithm equating the predicted health-care cost of a patient with how sick they were likely to be. However, due to systemic racism issues, such as distrust of the health-care system and direct racial discrimination, Black individuals usually have more barriers to receiving proper health care. Therefore, they generally spend less on health-care costs per year [2]. Obermeyer et al. found that only 17.7% of patients that the algorithm assigned to receive extra care were Black, but if the bias in the system was corrected for, the percentage would increase drastically to 46.5% [3].

Numerous examples like the one above exist which depict machine learning algorithms as being biased to a particular marginalization class such as race or gender. For instance, women are less likely to be shown ads for high paid positions like CEO or CFO [4], and facial recognition systems are more likely to be wrong when given a picture of a Black woman [5]. Finding deficiencies such as these has spurred research into creating machine learning models that achieve fair outcomes.

Defining, implementing, and enforcing fairness in machine learning is, above all else, a sociotechnicalFootnote 1 challenge. Machine learning systems may behave unfairly, or in a biased manner, for a multitude of reasons: e.g., due to social biases reflected in the datasets used in training, because of societal biases that are reflected (explicitly or implicitly) in the design process, and/or as a result of interaction with particular stakeholders or clients during run-time [7]. Without viewing fair machine learning metrics from lenses of philosophy, sociology, and law, choosing and implementing a metric stays firmly technical, and does not consider societal impacts that could arise after deployment. In this instance, we are blindly hoping to align to our wanted societal principles. To solve this problem, and to help practitioners choose correct fair machine learning metrics in an informed, societal-aware, manner, we develop the following field guide that depicts popular binary statistics-based fair machine learning metrics through lenses of philosophy, sociology, and the law. We note that this field guide does not cover bias mitigation strategies and we refer interested readers to [8,9,10] for more in-depth discussions on this topic. Additionally, we do not extensively cover individual fair machine learning metrics as the majority of them involve causal and/or counterfactual reasoning (which would require a lengthy introduction), and we instead refer interested readers to [9, 11,12,13].

Many works have been published over the last few years on fair machine learning, including a handful of survey papers and textbooks that are closely aligned with this field guide [9, 10, 13]. While these survey papers do a good job of explaining the mathematical and algorithmic aspects of mitigating bias and achieving fairness, they often leave out critical discussion of philosophical and legal groundings that are important to make a sociotechnical system rather than just a technical one. Additionally, while works exist that align philosophical [14,15,16,17] and legal [18,19,20,21] notions with proposed fair machine learning metrics, they often do not go in depth on the technical, algorithmic, and mathematical foundations which are also needed to make a sociotechnical system. Our work resolves this issue by producing a survey that covers both of these realms to allow for practitioners to understand not only how specific fair machine learning metrics function, but their social science groundings as well.

The rest of the field guide is as follows. We begin by introducing three key categories of statistics-based fair machine learning metrics in Sect. 2. In Sect. 3, we introduce important philosophical perspectives, such as Rawls’ Equality of Opportunity (EOP), that serve as a foundation for many of the proposed fair machine learning metrics. Next, in Sect.  4, we depict popular legal ideals that have a strong connection to fairness in machine learning. Section 5 contains our analysis and discussion on popular statistics-based fair machine learning metrics as well as their classification in relation to the social science topics discussed. In Sect. 6, we give several critiques from philosophy, sociology, and law of the fair machine learning field as a whole. Finally, in Sect. 7, we detail our major conclusions.

2 Independence, separation, and sufficiency

Many of the proposed fair machine learning metrics have groundings in statistics [22]. For example, statistical parity depends on the measurement of raw positive classification rates; equalized odds depend on false positive and false-negative rates; and predictive parity depends on true-positive rates. The use of statistical measures is attractive, because they are relatively simple to measure and definitions built using statistical measures can usually be achieved without having to make any assumptions on the underlying data distributions. Many of the common statistical measures used in fair machine learning metrics are listed in Table 1. We note that these statistical measures are not unique to fair machine learning metrics; rather, they are general measures from the field of statistics itself.

Table 1 Definitions for common statistical measures

It is important to note that a grounding in statistics does not provide individual level fairness, or even sub-group fairness for a marginalized class [23]. Instead, it provides meaningful guarantees to the “average” member of a marginalizedFootnote 2 group. To fully implement fairness on the individual level, techniques such as causal and counterfactual inference are commonly used and we refer interested readers to [12, 13] for an introduction to this field.

Additionally, many statistical measures directly oppose one another. For instance, it is impossible to satisfy false-positive rates, false-negative rates, and positive predictive value across marginalized groups. This creates the direct consequence that many definitions of fair machine learning metrics cannot be satisfied in tandem. This fact was firmly cemented in the work completed by Barocas et al. [13]. In this work, they propose three representative fairness criteria—independence, separation, and sufficiency—that serve as a classification boundary for the statistics-based fair machine learning metrics that have been published. The authors capitalize on the fact that most proposed fairness criteria are simply properties of the joint distribution of a marginalization attribute \({S}\) (e.g., race or gender), a target variable \({Y}\), and the classification (or in some cases probability score) \(\hat{Y}\), which allowed them to create three distinct categories by forming conditional independence statements between the three random variables.

2.1 Independence

The first formal category, independence, only requires that the marginalization attribute, \({S}\) (\(S = 0\) non-marginalized, \(S = 1\) marginalized), is statistically independent of the classification outcome \(\hat{Y}\), \(\hat{Y} \perp S\). For the binary classification case, the authors of [13] produce two different formulations

$$\begin{aligned}&\mathrm{{Exact:}} P[\hat{Y} = 1 \mid S = 0] = P[\hat{Y} = 1 \mid S = 1]\\&\mathrm{{Relaxed:}} \frac{P[\hat{Y}=1 \mid S = 0]}{P[\hat{Y} = 1 \mid S = 1]} \ge 1 - \epsilon . \end{aligned}$$

When considering the event \(\hat{Y}=1\) to be the positive outcome, this condition requires the acceptance rates to be the same across all groups. The relaxed version notes that the ratio between the acceptance rates of different groups needs to be greater than a threshold that is determined by a predefined slack term \(\epsilon\). In many cases, \(\epsilon = 0.2\) to align with the four-fifths rule in disparate impact law (see Sect. 4.1). We note that the relaxed formulation is essentially the exact formulation, but with emphasis placed on measuring the ratio between the two groups rather than measuring their difference.

Barocas et al. also note that while independence aligns well with how humans reason about fairness, several draw-backs exist for fair machine learning metrics that fall into this category (e.g., statistical parity, treatment parity, conditional statistical parity, and overall accuracy equality) [13]. Specifically, the metrics of this category ignore any correlation between the marginalization attributes and the target variable \({Y}\) which constrains the construction of a perfect prediction model. Additionally, independence enables laziness. In other words, it allows situations where qualified people are carefully selected for one group (e.g., non-marginalized), while random people are selected for the other (marginalized). Furthermore, the independence category allows the trade of false negatives for false positives, meaning that neither of these rates are considered more important, which is false in many circumstances [13].

2.2 Separation

The second category Barocas et al. propose is separation which captures the idea that in many scenarios, the marginalization characteristic may be correlated with the target variable [13]. Specifically, the random variables satisfy separation if \(\hat{Y} \perp S \mid Y\) (\(\hat{Y}\) is conditionally independent of \({S}\) when given \({Y}\)). In the binary classification case, it is equivalent to requiring that all groups achieve the same true- and false-positive rates

$$\begin{aligned}&TP\;:\;P[\hat{Y}=1 \mid Y = 1 \cap S = 0] = P[\hat{Y}=1 \mid Y = 1 \cap S = 1] \\&FP\;:\;P[\hat{Y}=1 \mid Y = 0 \cap S = 0] = P[\hat{Y}=1 \mid Y = 0 \cap S = 1]. \end{aligned}$$

Additionally, this requirement can be relaxed to only require the same true-positive rates or the same false-positive rates. Fair machine learning metrics that fall under separation include: false- positive error rate balance, false-negative error rate balance, equalized odds, treatment equality, balance for the positive class, and balance for the negative class.

2.3 Sufficiency

The final category, sufficiency, makes use of the idea that for the purpose of predicting \({Y}\), the value of \({S}\) does not need to be used if given \(\hat{Y}\), since \({S}\) is subsumed by the classification \(\hat{Y}\) [13]. For example, in the case college admissions, if the person’s GPA or SAT score is sufficient for their race, then the admission committee does not need to actively look at race when making the decision. More concretely, the random variables satisfy sufficiency if \(Y \perp S \mid \hat{Y}\) (\({Y}\) is conditionally independent of \({S}\) given \(\hat{Y}\)). In the binary classification case, this is the same as requiring a parity of positive or negative predictive values across all groups \(\hat{y} \in \hat{Y}= \{0, 1\}\)

$$\begin{aligned} P[Y = 1 \mid \hat{Y} = \hat{y} \cap S = 0] = P[Y = 1 \mid \hat{Y} = \hat{y} \cap S = 1]. \end{aligned}$$

The authors of [13] note that it is common to assume that \(\hat{Y}\) satisfies sufficiency if the marginalization attribute \({S}\) and the target variable \({Y}\) are clearly understood from the problem context. Some examples of fair machine learning metrics that satisfy sufficiency include: predictive parity, conditional use accuracy, test fairness, and well calibration.

3 Philosophical perspectives

This section is devoted to explaining the philosophical underpinnings of statistics-based fair machine learning metrics. Many of the statistics-based fair machine learning metrics in the fair machine learning literature correspond to the notions of distributive justice from social science literature [24]. Here, we introduce the philosophical ideal of equality of opportunity (EOP) and its three main frames: formal EOP, substantive EOP, and luck-egalitarian EOP. We note that other definitions and ideals of egalitarianism exist, and are relevant to the discussion on fair machine learning [25,26,27], but we limit ourselves to the discussion below as it directly aligns with our classification of popular fair machine learning metrics in Sect. 5. We direct interested readers to these mentioned works, along with corresponding surveys [28, 29], for additional reading. Additionally, it is important to note that in Sect. 5, we only classify the statistics-based fair machine learning metrics into Rawls’ and luck-egalitarian EOP and not formal EOP. This can be justified, since several issues with formal EOP prevent it from being a good basis for a fairness measure. We introduce the concept of formal EOP only to serve as a grounding for the discussion of Rawls’ and luck-egalitarian EOP, since they would be harder to understand without it.

In [14], Khan et al. propose grounding current (and future) proposed fair machine learning metrics in the moral framework of equality of opportunity (EOP) [28]. EOP is a political ideal that is opposed to assigned-at-birth (caste) hierarchy, but not to hierarchy itself. In a caste hierarchy, a child normally acquires the social status of their parents. Social mobility may be possible, but the process to rise through the hierarchy is open to only specific individuals depending on their initial social status. In contrast to a caste hierarchy, EOP demands that the social hierarchy is determined by a form of equal competition among all members of the society. From a philosophical perspective, EOP is a principle that dictates how desirable positions, or opportunities, should be distributed among members of a society. As a moral framework, EOP allows machine learning practitioners to see fairness notions’ motivations, strengths, and shortcomings in an organized and comparative fashion. Additionally, it presents moral questions that machine learning practitioners must answer to construct a fairness system that satisfies their desired values [14]. Furthermore, it allows practitioners to understand and appreciate why there may be disagreement when it comes to choosing a specific fair machine learning metric, as different people will have different moral beliefs about what fairness and equality mean. The different conceptions of EOP (formal, substantive, and luck-egalitarian EOP) all interpret the idea of competing on equal terms in different ways (Fig. 1).

Fig. 1
figure 1

Definitions and warnings for popular equality of opportunity modes

3.1 Formal EOP

Formal EOP emphasizes that any desirable position (in a society, or more concretely, a job opening) is available and open to everyone. The distribution of these desirable positions follows according to the individual’s relevant qualifications, and in this setting, the most qualified always wins. Formal EOP takes a step in the direction of making decisions based on relevant criteria rather than making them in a blatant discriminatory fashion [14]. In the fair machine learning setting, formal EOP has often been implemented as fairness through blindness or fairness through unawareness [30]. In other words, formal EOP-based metrics strip away any irrelevant marginalization attributes, such as race or gender, before training is performed.Footnote 3

However, while formal EOP has the benefit of awarding positions based on actual qualifications of an individual, and in excluding irrelevant marginalization information, it makes no attempt to correct for arbitrary privileges. This includes unequal access to opportunities that can lead to disparities between individuals’ qualifications. Examples of this situation can be seen in the task of predicting prospective student’s academic performance for use in college admission decisions [31]. Individuals belonging to marginalized or non-conforming groups, such as Black and/or LGBTQIA+Footnote 4 students, and/or students with disabilities, are disproportionately impacted by the challenges of poverty, racism, bullying, and discrimination. An accurate predictor for a student’s success therefore may not correspond to a fair decision-making procedure as the impacts of these challenges create a tax on the “cognitive-bandwidth” of non-majority students, which, in turn, affects their academic performance [32, 33].

The issue of not accounting for arbitrary privileges can be broken down into two main problems: the Before and After problems. In the Before problem, arbitrary and morally irrelevant privileges weigh heavily on the outcomes of formally fair competitions as people with more privilege are often in a better position to build relevant qualifications. This can be seen in the problem described above of predicting students’ performance for admissions. The After problem is an effect of formally fair, but not arbitrary-privilege aware, competition in that the winner of a competition (e.g., getting hired or admitted to a top-tier college) is then in the position to gain even more success and qualifications. It introduces a compounding snow-ball effect in that winners win faster, but losers also lose faster [14]. Overall, formal EOP compounds both privilege and disprivilege, which is referred to as discrimination laundering [34].

3.2 Substantive (Rawls’) EOP

Substantive EOP addresses the discrimination laundering problem in that it requires all individuals to have the same opportunity to gain qualifications. It aims to give everyone a fair chance at success in a competition. For example, making all extra curricular activities and high-school opportunities equally available to all students regardless of wealth or social status. Substantive EOP is often equated with Rawls’ fair EOP which states that all individuals, regardless of how rich or poor they are born, should have the same opportunities to develop their talents to allow people with the same talents and motivation to have the same opportunities [35]. In fair machine learning, substantive EOP is often implemented through metrics such as statistical parity and equalized odds, which assume that talent and motivation are equally distributed among sub-populations.

However, the assumption that these talents are equally distributed often does not hold in practice. By the time a machine learning system is being used to make a decision, it is normally too late to provide individuals with the opportunity to develop qualifications. In this lens, fair machine learning has re-written Rawls’ EOP to say that a competition must only measure a candidate on the basis of their talent, while ignoring qualifications that reflect the candidates unequal developmental opportunities prior to the point of the competition [14].

3.3 Luck-egalitarian EOP

Furthering the ideals of substantive EOP, luck-egalitarian EOP enforces that a person’s outcome should be affected only by their choices, not their circumstances [36]. For example, a student with rich parents who did not try hard in their studies should not have an advantage over a student with poor parents who did work hard in being admitted to a university. Overall, luck-egalitarian EOP is an attractive idea, and several metrics fall into this category (e.g., predictive parity, conditional use accuracy, test fairness, and well calibration), but the difficulty of separating choice from circumstance is non-trivial, and in practice, quite difficult to implement.

A few solutions for separation have been proposed. Economist and political scientist John Roemer proposed instead of trying to separate an individual’s qualifications into effects of consequence and choice, we should instead control for certain matters of consequence (such as, race, gender, and disability) that will impact a person’s access to opportunities to develop qualifications [29]. While this solution solves the separation problem, another issue of sub-group comparison emerges. We can compare apples to apples, and oranges to oranges, but we are now unable to compare apples to oranges [14]. Unfortunately, the EOP frameworks offer no solution to this problem of overall ranking.

Another problem in luck-egalitarian EOP is the separation of efforts in addition to circumstance [14]. It may be the case that a wealthy student works hard at their studies, i.e., the circumstance of being wealthy interacts with the effort of the student. This effect of entanglement is nearly impossible to separate. However, fortunately, this separation is only required when the circumstance gives access to a broad range of advantages. For instance, if a student’s family wealth status allows them to gain an advantage over all other students in almost every competition (not just university admission, but also job hiring or access to other opportunities), then there is a fairness problem. This is because there is an indication that the arbitrary privilege or circumstance, and not the relevant skill, is being used as the basis for decision. On the other hand, if the student only has the advantage in the admissions process, then it could be due to their effort rather than their circumstance, and we may or may not have a matter of unfairness where we need to separate effort from circumstance.

4 Legal perspectives

The definition of “fairness” in the eyes of the law will change country to country and what is considered legally fair in one may contradict or oppose that of another. For this reason, we specify that this article is framed from the viewpoint of Western law, specifically that of U.S. anti-discrimination law. We chose this frame of reference, since most fair machine learning literature focuses on terminology and legal aspects proposed in this body of law [18].

4.1 Disparate impact

Disparate impact occurs when members of a marginalized class are negatively affected more than others when using a formally neutral policy or rule [37]. In other words, it is unintentional or indirect discrimination. This statutory standard was first formalized in the 1971 landmark U.S. Supreme Court case of Griggs vs. Duke Power Co., where Duke Power Co. required employees to have a high-school diploma to be considered for a promotion [38]. The Supreme Court noted that Title VII of the 1964 Civil Rights Act was violated in this situation as Duke Power Co.’s diploma requirement, which had little relation to the overall job performance, prevented a disproportionate number of Black employees from being promoted, and thus, deemed it to have an illegal (unjustified) disparate impact. In fair machine learning literature, disparate impact is often framed as occurring when the outcomes of classification differ across different subgroups, even if this outcome disparity is not intentional [39].

However, disparate impact in of itself is not illegal. In U.S. anti-discrimination law, indirect discrimination in employment is not unlawful if it can be justified by a “legitimate aim” or a genuine occupational requirement and/or business necessity [40]. For example, in the Griggs vs. Duke Power Co. case, if the high-school diploma requirement was shown to be necessary for overall job success, then the resulting disparities would have been legal [23].

The most common measure of disparate impact is the four-fifths rule proposed in the Uniform Guidelines on Employee Selection Procedures in the Code of Federal Regulation [41]. This rule states that if the selection rate for a certain group is less than 80% of the group with the highest selection rate, then there is a disparate impact (or adverse impact) on that group. For a concrete example, in the Griggs vs. Duke Power Co. case, the selection rate of Black individuals for promotion was 6%, while the selection rate of White individuals was 58%. By taking the ratio of the two selection rates, we get \(\frac{6}{58} = 10.3\%\) which is well below the 80% threshold. In Sect. 5.1.1, we discuss a fair machine learning metric that aligns well with the disparate impact four-fifths rule.

Fig. 2
figure 2

Graphical depiction of disparate impact vs. disparate treatment. For the disparate impact (left) side, while both people were given boxes to stand on to see over the fence, the person on the left is disproportionately impacted by being shorter. For disparate treatment, the person on the left if directly being discriminated against by not being given a box to stand on

4.2 Disparate treatment

In direct contrast with disparate impact, disparate treatment occurs when an individual is intentionally treated different based on their membership of a marginalized class (see Fig. 2). The Uniform Guidelines on Employee Selection Procedures state that: “disparate treatment occurs where members of a race, gender, or ethnic group have been denied the same employment, promotion, membership, or other employment opportunities as have been available to other employees or applicants” [42]. The key legal question behind disparate treatment is whether the alleged discriminator’s actions were motivated by discriminatory intent [41]. In fair machine learning literature, a metric is often described as adhering to disparate treatment law if it does not use any marginalization attributes in the decision-making process [43].

In the algorithmic decision-making context, this suggests that any explicit use of marginalization attributes, either in constructing algorithmic predictions or in setting thresholds, is strictly prohibited [21]. Thus, the current standard practice in algorithmic decision-making is to prevent disparate treatment, i.e., excluding protected attributes from inputs.Footnote 5 This notion of fairness is known as equal treatment as it specifies that equal individuals should be treated equally irrespective of their demographic membership [23].

4.3 Anti-classification and anti-subordination

Two of the main principles that motivated the creation of U.S. anti-discrimination legislation are anti-classification and anti-subordination [18, 46]. The anti-classification principle notes that the U.S. government may not classify people (either overtly or surreptitiously) by use of a marginalization attribute such as race or gender. Anti-classification is often seen in fair machine learning as fairness through unawareness [11], although many note that the exclusion of the marginalization attributes can lead to discriminatory solutions [23]. Additionally, anti-classification can be achieved if the output of the model is independent of the input, even if the marginalization attribute is used in the classification. In other words, any fairness metric that falls into the independence category (see Sect. 2.1) aligns with anti-classification as well, since it maintains \(\hat{Y} \perp S\).

It is less straightforward to see that any fairness metric in the sufficiency category satisfies anti-classification, as well. This is because sufficiency is closely aligned with the notion of calibration [13]. A fairness metric is considered calibrated if it requires that the actual outcomes are independent of the marginalization attributes after controlling for some estimated risk score (\(Y \perp S \mid R\)). Calibration can be formally notated as

$$\begin{aligned} P[Y = 1 \mid R = r \cap S = s] = r, \end{aligned}$$

where \({R}\) is the score or probability of being classified to a certain class. This formulation is almost identical to the one for sufficiency, as shown in Sect. 2.3. The only difference is in the use of a score value \({R}\) or the predicted value \(\hat{Y}\). However, in the binary case, it turns out that the score value and the predicted value can be considered the same. Since there are only two possible outputs (e.g., 0 and 1), the score can easily be interpreted as the output label by comparing \({r}\) with \(1-r\) and choosing the label corresponding to the higher value.

While this discussion shows that sufficiency and calibration are equivalent, it does not show that the outcome of the model is independent of the marginalization attribute. However, it turns out that calibrated models rarely require access to the marginalization attribute during testing [47], since in most cases, we can simply assume that \(\hat{Y}\) satisfies sufficiency (and therefore calibration) when the problem is clearly defined (Sect. 2.3). This assumption reduces the original sufficiency equation to

$$\begin{aligned} P[Y = 1 \mid \hat{Y} = \hat{y} \cap S = 0] & = [Y = 1 \mid \hat{Y} = \hat{y} \cap S = 1]\\ & = P[Y=1 \mid \hat{Y} = \hat{y}] \end{aligned}$$

which now makes no use of the marginalization attribute \({S}\) . Several of the fair machine learning metrics discussed in Sect. 5 align with anti-calibration, namely: statistical parity, treatment parity, conditional statistical parity, conditional use accuracy, predictive parity, overall accuracy equality, test fairness, and well calibration.

Anti-subordination, on the other hand, is the idea that anti-discrimination laws should serve a grander purpose and should actively be aiming to tackle societal hierarchies between different groups. In other words, it argues that it is inappropriate for any group to be “subordinated” in society, and that empowering marginalized groups, even at the sake of the non-marginalized group, should be the main focus of anti-discrimination law [48]. Affirmative action, a set a procedures designed to eliminate unlawful discrimination in competitions such as hiring, is the most noticeable case of anti-subordination legislation. Anti-subordination is a less popular idea in the fair machine learning literature, although there has been some discussion of using debiasing techniques to achieve the effect of anti-subordination [18, 39].

Although it is a less popular idea, it turns out that all of the fair machine learning metrics we discuss in this work could potentially be seen to align with anti-subordination. This is due to the fact that they all aim to eliminate stratification based on a marginalization attribute while disregarding possible base-rate differences among the protected groups [47]. For a concrete example, consider statistical parity (Sect. 5.1.1) in relation to college admissions. While it may be the case that there are different base acceptance rates between the marginalization groups, statistical parity classifies the same fraction of each group as being admitted. Another (more direct) example is equalized odds (Sect. 5.2.3), since it can be seen as an implementation of affirmative action [49].

4.4 Procedural fairness

In addition to the concepts of anti-classification and anti-subordination, procedural fairness has been increasingly advocated for in the legal field. Procedural fairness (often also called procedural justice) is the perceived fairness of a specific court proceeding—from surroundings to the treatment that people receive. Procedural fairness refers to the fairness of the decision-making process that leads to the outcome. This contrasts with distributive fairness (i.e., disparate impact or disparate treatment) which refers to making sure that the outcomes of a process (or classification task) are fair [19]. Research has shown that higher perceptions of procedural fairness lead to improved acceptance of court decisions, a more positive attitude towards the courts and the justice system, and greater compliance with court orders [50]. Following this shift in thought from the legal field, many fair machine learning researchers are beginning to recognize that most of the proposed fair machine learning metrics have focused solely on distributive fairness [51,52,53], rather than procedural. However, over the last few years, commentary on utilizing procedural fairness as a grounding for fair machine learning metrics has been published [19, 51]. In other words, procedural fairness would look at the entire ecosystem that a fair machine learning metric would exist in. This would require not just analyzing the model itself, but also the datasets used in training, who is creating the algorithm, and the final population that the model will ultimately be used on.

We note that in our classification of the fair machine learning metrics performed in Sect. 5, we do not use procedural fairness as a classification boundary. While each of the discussed metrics could be incorporated into a procedural fairness pipeline, none of the metrics themselves constitute procedural fairness. We included this section only as a point of information of the types of legal ideals that exist in relation to fair machine learning.

5 Statistics-based fair machine learning metrics

In this section, we organize several popular statistics-based fair machine learning metrics into categories based on several axes, including: what attributes of the machine learning system they use (e.g., the predicted outcomes, the predicted and actual outcomes, or the predicted probability and actual outcomes), which formal statistical criterion (independence, separation, or sufficiency) it aligns with as proposed in [13], which legal notion it can be tied to, as well as which philosophical ideal serves as its foundation [e.g., substantive (Rawls’) EOP or luck-egalitarian EOP] using the classification procedure explained in [15, 54]. Figure 3 shows our classification of the metrics along the statistical lines of true positive, true negative, false positive, and false negative depending on what metrics the fairness method uses, and Table 2 summarizes our main classification conclusions. Additionally, at the end of this section, we devote space to discussing individual fairness and the (apparent) differences between individual- and group-fair machine learning metrics. We additionally note that this section uses the following variables: \(S=1\) is the marginalized or minority group, \(S = 0\) is the non-marginalized or majority group, \(\hat{Y}\) is the predicted value or class (i.e., label), and \({Y}\) is the true or actual label/class.

Fig. 3
figure 3

Classification of each fair machine learning metric along the statistical boundaries of true positive, false positive, true negative, and false negative as determined by which of the 4 measures is used in the fair machine learning metric. For instance, since equalized odds uses positive predictive rates (i.e., both TP and FP values), it is on the row where the predicted value is positive. Similarly, since test fairness says that all groups should have equal probability of belonging to the positive class, it is classified in the top left square related to the actual label of positive. Additionally, the diagonal lines means that the metric in the box at the end of the line uses the statistics that the line runs through. For example, treatment equality uses both false-positive and false-negative values. A list of all the acronyms can be found in Table 2

Table 2 Definitions and classifications for popular fair machine learning classification metrics

To provide a background for the following proofs, we state a relaxed, binary, version of the definitions for Rawls’ EOP and luck-egalitarian EOP for supervised learning proposed by Heidari et al. [15]:

Definition 1

(Rawls’ and luck-egalitarian EOP for supervised learning) Predictive model h satisfies Rawlsian/luck-egalitarian EOP if for all \(s\in S=\{0,1\}\) and all \(y, \hat{y} \in Y, \hat{Y} = \{0,1\}\)

$$\begin{aligned}&\text {Rawls': } F^h(U \le u\mid S = 0 \cap Y = y) = F^h(U \le u \mid S = 1 \cap Y = y) \\&\text {LE: } F^h(U \le u\mid S = 0 \cap \hat{Y} = \hat{y}) = F^h(U \le u \mid S = 1 \cap \hat{Y} = \hat{y}), \end{aligned}$$

where \(F^h(U \le u)\) specifies the distribution of utility \({U}\) (i.e., the distribution of winning a social competition like being admitted to a university) across individuals under the predictive model \({h}\), i.e., it is the difference between the individual’s actual effort \({A}\) and their circumstance \({D}\), \(U = A - D\). In this relaxed case, utility is the difference between the individual’s predicted and actual class.

5.1 Predicted outcomes

The predicted outcomes family of fair machine learning metrics are the simplest, and most intuitive, notions of fairness. More explicitly, the predicted outcome class of metrics focuses on using the predicted outcome of various different demographic distributions of the data, and models only satisfy this definition if both the marginalized and non-marginalized groups have equal probability of being assigned to the positive predictive class [55]. Many different metrics fall into the predicted outcome category, such as statistical parity and conditional statistical parity. Additionally, each metric in this group satisfies Rawls’ definition of EOP as well as satisfies the statistical constraint of independence and the legal notions of anti-classification and anti-subordination. We also note that in the process of certifying and removing bias using fair machine learning metrics from this category, it is common to use the actual labels in place of the predicted labels (see Definition 1.1 of [56]). However, for simplicity, we present the fair machine learning metrics of this section using the predicted outcome only.

5.1.1 Statistical parity

Statistical parity is also often called demographic parity, statistical fairness, equal acceptance rate, or benchmarking. As the name implies, it requires that there is an equal probability for both individuals in the marginalized and non-marginalized groups to be assigned to the positive class [11, 30]. Notationally, statistical parity can be written as

$$\begin{aligned} P[\hat{Y} = 1 \mid S = 0] = P [\hat{Y} = 1 \mid S = 1]. \end{aligned}$$

In [15], Heidari et al. map statistics-based fair machine learning metrics to equality of opportunity (EOP) models from political philosophy. As discussed in Sect. 3, in EOP, an individual’s outcome is affected by their circumstance \({c}\) (all ‘irrelevant’ factors like race, gender, status, etc. that a person should not be held accountable for) and effort \({e}\) (those items which a person can morally be held accountable for). For any \({c}\) and \({e}\), a policy \(\phi\) can be used to create a distribution of utility \({U}\) (e.g., the acceptance to a school, getting hired for a position, etc.) among the people with circumstance \({c}\) and effort \({e}\).

To map the notions of circumstance and effort to the proposed statistics-based fair machine learning notions, they treat the predictive model \({h}\) as policy \(\phi\) and assume that a person’s features can be divided into those that are ‘irrelevant’ and those that they can be held accountable for (a.k.a, are effort-based). Additionally, they let the person’s irrelevant features be seen as their individual circumstance (\(\mathbf {z}\) for \({c}\)), their effort-based utility as their effort (\({d}\) for \({e}\)), and their utility be the difference between their actual effort \({a}\) and effort-based utility \({d}\). The difference between \({a}\) and \({d}\) is much like the difference between \({y}\) and \(\hat{y}\) in machine learning literature. In other words, \({d}\) is the utility a person should receive based on their accountable factors (i.e., the salary a person should receive based on their experience/education/etc.), while \({a}\) is the utility a person actually receives (i.e., the actual salary they are paid). Here, we recall the proof for statistical parity as Rawls’ EOP as presented in [15]:

Proposition 1

(Statistical parity as Rawls’ EOP [15]) Consider the binary classification task where \(Y, \hat{Y}=\{0,1\}\). Suppose \(U = A - D\), \(A = \hat{Y}\), and \(D = Y = 1\) (i.e., the effort-based utility of all individuals is assumed to be the same). Then, the conditions of Rawls’ EOP are equivalent to statistical parity when \(\hat{Y} = 1\).

Proof

Recall that Rawls’ EOP requires that \(s\in S = \{0,1\}\), \(y \in Y= \{0,1\}\), \(u = a - d\in \{-1, 0\}\)

$$\begin{aligned} P[U\le u \mid S = 0 \cap Y = y] = P [U \le u \mid S = 1 \cap Y = y]. \end{aligned}$$

Replacing \({u}\) with \((A - D)\), \({D}\) with 1, and \({A}\) with \(\hat{Y}\), the above is equivalent to

$$\begin{aligned}&P[A - D \le u \mid S = 0 \cap Y = 1] = P[A - D \le u \mid S = 1 \cap Y = 1] \\&P[\hat{Y} - 1 \le u \mid S = 0 \cap Y = 1] = P[\hat{Y} - 1 \le u \mid S = 1 \cap Y = 1] \\&P[\hat{Y} \le u + 1\mid S = 0] = P[\hat{Y} \le u + 1 \mid S = 1] \\&P[\hat{Y} = \hat{y}\mid S = 0] = P[\hat{Y} = \hat{y} \mid S = 1] \end{aligned}$$

because of the facts that \(u = \hat{y} - y\) and \(y = 1\) produce the result \(\hat{y} = u + 1\). This is equal to the definition for statistical parity when \(\hat{Y} = 1\); therefore, the conditions of Rawls’ EOP are equivalent to statistical parity. \(\square\)

Treatment parity Instead of measuring the difference between the assignment rates, treatment parityFootnote 6 looks at the ratio between the assignment rate. It is not a derivative of statistical parity as much as it is a different way of looking at it. The distinction between the forms of statistical parity and treatment parity was made to better connect with the legal term of disparate impact—as the treatment parity form was explicitly designed to be the mathematical counterpart to the legal notion [21, 56]. Mathematically, it is defined as

$$\begin{aligned} \frac{P[\hat{Y} = 1 \mid S = 0]}{P[\hat{Y} = 1 \mid S = 1]} \ge 1 - \epsilon , \end{aligned}$$

where \(\epsilon\) is the allowed slack of the metric and is usually set to 0.2 to achieve the \(80\%\) rule of disparate impact law. This equation says that the proportion of positive predictions for both the marginalized and non-marginalized groups must be similar (around threshold \(1 - \epsilon\)). Since it is essentially the same as statistical parity, it also aligns with Rawls’ EOP, independence, anti-classification, and anti-subordination.

5.1.2 Conditional statistical parity

Conditional statistical parity is an extension of statistical parity which allows a certain set of legitimate attributes to be factored into the outcome [20]. Factors are considered “legitimate” if they can be justified by ethics, by the law, or by a combination of both. This notion of fairness was first defined by Kamiran et al. [57] who wanted to quantify explainable and illegal discrimination in automated decision-making where one or more attributes could contribute to the explanation. Conditional statistical parity is satisfied if both marginalized and non-marginalized groups have an equal probability of being assigned to the positive predicted class when there is a set of legitimate factors that are being controlled for. Notationally, it can be written as

$$\begin{aligned} P[\hat{Y} = 1 \mid L_1 = a \cap L_2 = b \cap S = 0] = P[\hat{Y} = 1 \mid L_1 = a \cap L_2 = b \cap S = 1], \end{aligned}$$

where \(L_1, L_2\) are legitimate features that are being conditioned on. For example, if the task was to predict if a certain person makes over $50,000 a year,Footnote 7 then \(L_1\) could represent work status and \(L_2\) could be the individuals relationship status. Another, simplified way to write this can be seen as

$$\begin{aligned} P[\hat{Y} = 1 \mid L = \ell \cap S = 1] = P[\hat{Y} = 1 \mid L = \ell \cap S = 0], \end{aligned}$$

where \(\ell \in L\) is the set of legitimate features being conditioned on.

Fig. 4
figure 4

Two different examples of Simpson’s paradox. In the image on the left, while both the blue and purple groups are negatively correlated among themselves, when taken as a whole, there is a positive correlation. The opposite can be seen in the second image where the groups are positively correlated when taken alone, but when aggregated, a negative correlation is produced

Furthermore, conditional statistical parity helps to overcome Simpson’s paradox as it incorporates extra conditioning information beyond the original class label. Simpson’s paradox says that if a correlation occurs in several different groups, it may disappear, or even reverse, when the groups are aggregated [59]. This event can be seen in Fig. 4. Put mathematically, Simpson’s paradox can be written as

$$\begin{aligned}&P[A \mid B \cap C]< P [A \mid B^c \cap C] \text { and } P[A \mid B \cap C^c] < P [A \mid B^c \cap C^c]\\&\text {but} \\&P[A\mid B] > P[A\mid B^c], \end{aligned}$$

where \(X^c\) denotes the complement of the variable. An analysis that does not consider all of the relevant statistics might suggest that unfairness and discrimination are at play, when in reality, the situation may be morally and legally acceptable if all of the information was known. As with the above two metrics, conditional statistical parity aligns with Rawls’ EOP, independence, anti-classification, and anti-subordination. Proofs for both treatment parity and conditional statistical parity belonging to Rawls’ EOP can be found in Appendix A.1.

5.2 Predicted and actual outcomes

The predicted and actual outcome class of metrics uses both the model’s predictions as well as the true labels of the data. This class of fair machine learning metrics includes: predictive parity, false positive error rate balance, false-negative error rate balance, equalized odds, conditional use accuracy, overall accuracy equality, and treatment equality.

5.2.1 Conditional use accuracy

Conditional use accuracy, also termed as predictive value parity, requires that positive and negative predicted values are similar across different groups [32]. Statistically, it aligns exactly with the requirement for sufficiency and therefore also aligns with anti-classification and anti-subordination [13]. Mathematically, it can be written as follows:

$$\begin{aligned} P[Y = y \mid \hat{Y} = y \cap S = 0] = P[Y = y \mid \hat{Y} = y \cap S = 1] \;\; \text { for } \;\; y\in \{0,1\}. \end{aligned}$$

In [15], they provide a proof that conditional use accuracy falls into the luck-egalitarian EOP criterion and we recall their work below:

Proposition 2

(Conditional use accuracy as luck-egalitarian EOP [15]) Consider the binary classification task where \(y \in Y = \{0,1\}\). Suppose that \(U = A - D\), \(A = Y\), and \(D = \hat{Y}\) (i.e., the effort-based utility of an individual under model \({h}\) is assumed to be the same as their predicted label). Then, the conditions of luck-egalitarian EOP are equivalent to those of conditional use accuracy (otherwise known as predictive value parity).

Proof

Recall that luck-egalitarian EOP requires that for \(s\in S = \{0,1\}\), \(\hat{y}\in \hat{Y}=\{0,1\}\), and \(u \in \{-1, 1\}\)

$$\begin{aligned} P[U \le u \mid \hat{Y}=\hat{y} \cap S = 0 ] = P[U \le u \mid \hat{Y}=\hat{y} \cap S = 1 ]. \end{aligned}$$

Replacing \(U{}\) with \(A - D\), \({D}\) with \(\hat{Y}\), and \({Q}\) with \({Y}\), we obtain the following:

$$\begin{aligned}&P[A - D \le u \mid \hat{Y} = \hat{y} \cap S = 0 ] = P[A - D \le u \mid \hat{Y}=\hat{y} \cap S = 1 ] \\&P[Y - \hat{Y} \le u \mid \hat{Y} = \hat{y} \cap S = 0 ] = P[Y - \hat{Y} \le u \mid \hat{Y}=\hat{y} \cap S = 1 ] \\&P[Y \le u + \hat{y}\mid \hat{Y} = \hat{y} \cap S = 0 ] = P[Y \le u + \hat{y}\mid \hat{Y}=\hat{y} \cap S = 1 ] \\&P[Y = y \mid \hat{Y} = \hat{y} \cap S = 0 ] = P[Y = y \mid \hat{Y}=\hat{y} \cap S = 1 ], \end{aligned}$$

since \(u = a - d = y - \hat{y}\) produces the result that \(y = u + \hat{y}\). The last line is then equal to the statement for conditional use accuracy when \(y = \hat{y} = \{0,1\}\). \(\square\)

5.2.2 Predictive parity

Predictive parity, otherwise known by the name outcome test, is a fair machine learning metric that requires the positive predictive values to be similar across both marginalized and non-marginalized groups [60]. Mathematically, it can be seen as

$$\begin{aligned} P[Y=y \mid \hat{Y} = 1\cap S = 0] = P[Y=y\mid \hat{Y} = 1 \cap S = 1] \;\; \text { for } \;\; y\in \{0,1\}, \end{aligned}$$

since if a classifier has equal positive predictive values for both groups, it will also have equal false discovery rates. Since predictive parity is simply conditional use accuracy when \(\hat{Y} = 1\), it falls into the same philosophical category as conditional use accuracy, which is luck-egalitarian EOP. Furthermore, [13] states that predictive parity aligns with sufficiency. Since it aligns with sufficiency, it also aligns with anti-classification and anti-subordination as discussed in Sect. 4.

5.2.3 Equalized odds

The fair machine learning metric of equalized odds is also known as conditional procedure accuracy equality and disparate mistreatment. It requires that true- and false-positive rates are similar across different groups [49]

$$\begin{aligned} P[\hat{Y} = 1 \mid Y = y \cap S = 0] = P[\hat{Y} = 1 \mid Y = y \cap S = 1] \;\; \text { for } \;\; y\in \{0,1\}. \end{aligned}$$

Equalized odds aligns with Rawls’ EOP and [15] provides a proof for this classification which we recall in Appendix A.2, since it is similar to the proof for statistical parity as Rawls’ EOP. Additionally, it aligns with separation and anti-subordination [13].

5.2.4 False-positive error rate balance

False-positive error rate balance, otherwise known as predictive equality, requires that false-positive rates are similar across different groups [60]. It can be seen mathematically as

$$\begin{aligned} P[\hat{Y}=\hat{y} \mid Y = 0 \cap S = 0] = P[\hat{Y} = \hat{y} \mid Y = 0 \cap S = 1] \;\; \text { for } \;\; \hat{y}\in \{0,1\}. \end{aligned}$$

We note that if a classifier has equal false-positive rates for both groups, it will also have equal true-negative rates, hence why \(\hat{y} \in \{0,1\}\). This fairness metric can be seen as a relaxed version of equalized odds that only requires equal false-positive rates, and therefore, it aligns with all the categories that equalized odds does, specifically: Rawls’ EOP, separation, and anti-subordination.

5.2.5 False-negative error rate balance

False-negative error rate balance, also called equal opportunity, is the direct opposite of the above fair machine learning metric of false-positive error rate balance in that it requires false-negative rates to be similar across different groups [60]. This metric can be written as

$$\begin{aligned} P[\hat{Y}=\hat{y} \mid Y = 1 \cap S = 0] = P[\hat{Y} = \hat{y} \mid Y = 1 \cap S = 1] \;\; \text { for } \;\; \hat{y}\in \{0,1\}, \end{aligned}$$

and we note that a classifier that has equal false-negative rates across the two groups will also have equal true-positive rates. This fair machine learning metric can also be seen as a relaxed version of equalized odds that only requires equal false-negative error rates, and, therefore, aligns with all the same categories.

5.2.6 Overall accuracy equality

As the name implies, overall accuracy equality requires similar prediction accuracy across different groups. In this case, we are assuming that obtaining a true negative is as desirable as obtaining a true positive [32]. According to [13], it matches with the statistical measure of independence, meaning that it also aligns with anti-classification and anti-subordination. Mathematically, it can be written as

$$\begin{aligned} P[\hat{Y}=y \mid Y = y \cap S = 0] = P[\hat{Y} = y\mid Y = y \cap S = 1] \;\; \text { for } \;\; y, \hat{y} \in \{0,1\}. \end{aligned}$$

Overall accuracy equality is the third fair machine learning metric that Heidari et al. prove belongs to the Rawls’ EOP category of fair machine learning metrics [15] and we recall their proof in Appendix A.2.

5.2.7 Treatment equality

Treatment equality analyzes fairness by looking at how many errors were obtained rather than through the lens of accuracy. It requires an equal ratio of false-negative and false-positive values for all groups [32]. Furthermore, it agrees exactly with the statistical measure of separation [13], and the legal notion of anti-subordination

$$\begin{aligned} \frac{\mathrm{{FN}}_{S = 0}}{\mathrm{{FP}}_{S = 0}} = \frac{\mathrm{{FN}}_{S = 1}}{\mathrm{{FP}}_{S = 1}}. \end{aligned}$$

Treatment equality can be considered as the ratio of false-positive error rate balance and false-negative error rate balance. Since both of these metrics fall into the Rawls’ EOP category, treatment equality does as well.

5.3 Predicted probabilities and actual outcomes

The predicted probability and actual outcome category of fair machine learning metrics is similar to the above category of metrics that use the predicted and actual outcomes. However, instead of using the predictions themselves, this category uses the probability of being predicted to a certain class. This category of metrics includes: test fairness, well calibration, balance for the positive class, and balance for the negative class. The first two metrics fall in line with the statistical measure of sufficiency and legal notions of both anti-classification and anti-subordination, while the last two align with separation and anti-subordination.

5.3.1 Test fairness

Test fairness, which falls under the luck-egalitarian EOP category (see proof in Appendix A.3.1), is satisfied if, for any predicted probability score \(p \in \mathcal {P}\), subjects in both the marginalized and non-marginalized groups have equal probability of actually belonging to the positive class. Test fairness has also been referenced by the terms calibration, equal calibration, and matching conditional frequencies [60]. Mathematically, it can be written as follows:

$$\begin{aligned} P[Y = 1 \mid \mathcal {P} = p \cap S = 0] = P[Y = 1 \mid \mathcal {P} = p \cap S = 1]. \end{aligned}$$

5.3.2 Well calibration

Well calibration is very similar to the metric of test fairness, but it additionally requires that for any predicted probability score \(p \in \mathcal {P}\), not only should the majority and minority classes have equal probability of belonging to the positive class, but this probability should be p [61]

$$\begin{aligned} P[Y=1\mid \mathcal {P}=p \cap S = 0] = P[Y=1\mid \mathcal {P}=p \cap S = 1] = p. \end{aligned}$$

Since well calibration is an extension of test fairness, it also falls under the classifications of luck-egalitarian EOP, sufficiency, anti-classification, and anti-subordination.

5.3.3 Balance for the positive class

As the name suggests, the balance for the positive class metric requires that individuals who experience a positive outcome, regardless of which group they belong to, should have an equal mean predicted probability of being in the positive class [61]. It can be seen as being similar to the metric of equal opportunity, which says that a classifier should give equivalent treatment to all groups

$$\begin{aligned} \mathbb {E}[\mathcal {P} \mid Y = 1 \cap S = 0] =\mathbb {E}[\mathcal {P} \mid Y = 1 \cap S = 1]. \end{aligned}$$

Like false-positive error rate balance, the balance for the positive class metric can be seen as a derivative of the equalized odds metric when \(Y = 1\). Additionally, instead of taking into account the predicted label \(\hat{y} \in \hat{Y}\), it concerns itself with the predicted probability \(\mathcal {P}\). Since equalized odds falls into Rawls’ EOP category of metrics, the balance for the positive class metric does as well. Similarly, balance for the positive class also aligns with separation and anti-subordination.

5.3.4 Balance for the negative class

The metric of balance for the negative class is opposite of the balance for the positive class metric. Instead of requiring balance in the predictive mean of the positive class, it requires balance in the predicted mean of the negative class [61]. It is similar to the measure of false positive error rate balance

$$\begin{aligned} \mathbb {E}[\mathcal {P} \mid Y = 0 \cap S = 0] =\mathbb {E}[\mathcal {P} \mid Y = 0 \cap S = 1]. \end{aligned}$$

Same as the argument for balance for the positive class, the balance for the negative class metric is a derivative of equalized odds when \(Y = 0\) and we approximate \(\hat{y} \in \hat{Y}\) with probability score \(\mathcal {P}\). Therefore, the balance for the negative class metric falls under all the same categorization as equalized odds (as well as balance for the positive class) does.

5.4 Discussion

5.4.1 Impossibility results of statistics-based metrics

Although each of the statistics-based fair machine learning metrics we introduce above formalize an intuitive notion of fairness, the definitions are not, in general, mathematically compatible. In other words, some definitions of fairness cannot be enforced at the same time. These incompatibilities between the fairness definitions were first explored during public debate over a recidivismFootnote 8 tool called COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) [62]. While ProPublica proved that COMPAS does not satisfy false-positive error rate balance [63], other researchers found that it did satisfy metrics such as predictive parity and test fairness [64, 65].

The tension experienced here is due to impossibility results that govern the underlying statistics of the different fairness measures. This notion is backed by several research publications including [13] where the authors explained how independence, separation, and sufficiency are mutually exclusive, and [61, 66] where the authors of the publications showed that if a model satisfies balance for the negative class, balance for the positive class, and test fairness among marginalized and non-marginalized groups, then there must be equal base rates (which implies that the actual classification was independent of the group) or the model was 100% accurate [62].

5.4.2 Individual fairness

Up until this point, the metrics we have discussed all focus on the notion of group fairness. In other words, these metrics ensure some kind of statistical parity for members of different groups and not a specific individual [67]. Another set of fair machine learning metrics that consider the fairness as it relates to a specific individual is called individual fairness. Individual fairness ensures that people who are similar in the eyes of the classification task are treated similarly (i.e., obtain the same prediction) [67]. In this section, we recount the first (and most famous) notion of individual fairness—fairness through awareness. We note that other individual fair machine learning metrics exist, such as [68,69,70,71], and we direct interested readers to these publications, as well as a survey over them [67], for more detail.

Fairness through awareness Fairness through awareness, most commonly called individual fairness, was first proposed by Dwork et al. [11]. The motivation in creating fairness through awareness was that simply using statistical parity between different groups could possibly result in unfair outcomes at the individual level. To solve this issue, Dwork et al. proposed to use a distance metric that measured how similar an individual was to another. Two individuals were considered alike if their combination of task-relevant attributes was nearby each other, and the overall process was deemed fair if the two individuals (who were alike) received the same outcome from the model [67]. This process can be seen as being similar to the legal practice of situation testing. Situation testing is an experimental method that aims to establish discrimination on the spot [72]. It takes pairs of individuals who are similar, but do not necessarily belong to the same group, and puts them in the same situation. If the individual who is part of the marginalized group is treated differently than the individual in the non-marginalized group, then there is a viable case of discrimination that can be brought to court. Several research works [73, 74] studied the use of kNN and causal Bayesian networks to facilitate the similarity measurements in situation testing-based discrimination detection. Additionally, fairness through awareness aligns with Aristotle’s conception of “justice as consistency” [67, 75].

A downfall of this metric is that it does not allow for comparison of all individuals, since it only compares similar individuals. Therefore, in the hiring example, the applicants who have similar background experiences can be compared to each other, but they cannot be compared to those who have different prior work experience. This makes it impossible to construct a total ranking of all the candidates. Additionally, fairness through awareness can be difficult to implement as it requires explicitly defining what similarity means in a certain context and what is considered similar in one case may not be considered similar in another. Furthermore, specifically for fairness through awareness, it requires the defining of a distance metric by the people who set the policy, which is not a simple task to do [67].

Group fairness vs. individual fairness Many technical research papers assume that both group and individual fairness are important, although conflicting, measures [70, 71]. However, Binns argues that this conflict is based on a misconception, and when we look at the philosophical underpinnings of group and individual fairness, they are not actually trying to achieve different things [67]. While the group-fair and individual-fair machine learning metrics may conflict on a technical level,Footnote 9 Binns argues that not only are the two not in conflict, “but are just different ways of reflecting the same set of moral and political concerns” [67].

As mentioned in 5.4.2, individual fairness relates to Aristotle’s conception of justice as “consistency” as similar individuals should receive similar outcomes. Intuitively, consistency is not a problem in (supervised) machine learning-based decision-making as the outcomes are largely deterministic and the model should produce the same output label/class for similar inputs [67]. In this case, no matter if the fair machine learning metric assumes a group or individual basis, it should satisfy consistency. Group-fair machine learning metrics, on the other hand, are mostly grounded in the egalitarian concepts of EOP. However, EOP can also be used to ground individual-fair machine learning metrics, since when specifying the fair machine learning metrics, the designers have to consider certain features (e.g., test scores) and ignore others (e.g., race) when judging the similarity between individuals. Those choices reflect assumptions which correspond to egalitarian principles [67]. This shows that consistency (individual fairness) and egalitarianism (group fairness) themselves do not conflict at the level of principle. Binns goes on to reiterate that: “the appearance of conflict between the two is an artifact of the failure to fully articulate assumptions behind them, and the reasons for applying them in a particular context” [67].

6 Philosophical, sociological, and legal criticism of the fair machine learning field

As we have noted throughout the paper, there have been several attempts to define fairness quantitatively. Some think that the rapid growth of this new field has led to widely inconsistent motivations, terminology, and notation, presenting a serious challenge for cataloging and comparing definitions [62]. Through this article, we try to remedy the fact that there has not been more effort spent on aligning quantitative definitions with philosophical measures. After all, what weight does a quantitative definition hold when not grounded in humanistic values?

We continue that discussion in this section. Despite much headway in the last several years, our work in fair machine learning is far from over. In fact, there are several missteps that the field has taken that we need to remedy before truly being able to call our methods “fair”. These issues include rigid categorization strategies, improper terminology, hurtful assumptions and abstractions, misalignment with legal ideals, and issues of diversity and power struggles.

6.1 Rigid “Box-Like” categorization

In most of the publications we discussed, fairness is enforced on rigidly structured groups or categories. For instance, many papers (including all discussed in this work) consider the binary categories of male or female and White or Black as the main axis along which to determine if an algorithm is fair or not. These ontologicalFootnote 10 assumptions, though helpful in simplifying the problem at hand, are often misplaced.

The problem with narrowing concepts like gender and race down to simple binary groups is that there is no precise definition of what a “group” or “category” is in the first place. Despite not having a common interpretation, it is widely accepted in the social sciences that groups and categories are social constructs, not rigid boxes into which a person can be placed. Constructionist ontology is the understanding that socially salient categories such as gender and race are not embodied by sharing a physical trait or genealogical features, but are in fact constituted by a web of social relations and meanings [77]. Social construction does not mean that these groups are not real, but that these categories of race and gender are brought into existence and shaped into what we know them to be by historical events, social forces, political power, and/or colonial conquest [78]. When we treat these social constructs as rigidly defined attributes, rather than structural, institutional, and relational circumstances, it minimizes the structural aspects of algorithmic (un)fairness [79]. The very concept of fairness itself can only be understood when framed in the viewpoint of the specific social group being considered.

Specific to racial categorization, Sebastian Benthall and critical race scholar Bruce D. Haynes discuss how “racial classification is embedded in state institutions, and reinforced in civil society in ways that are relevant to the design of machine learning systems” [80]. Race is widely acknowledge in the social science field to be a social construction tied to a specific context and point of history, rather than to a certain phenotypical property. Hanna et al. explain that the meaning of a “race” at any given point in time is tied to a specific racial project which is an explanation and interpretation of racial identities according to the efforts in organizing and distributing resources along particular racial lines [79]. They express that it would be more accurate to describe race as having relational qualities, with dimensions that are symbolic or based on phenotype, but that are also contingent on specific social and historical contexts [79].

The fair machine learning community needs to understand the multidimensional aspects of concepts such as race and gender and needs to seriously consider the impact that our conceptualization and operationalization of historically marginalized groups have on these groups today when defining what a “group” or “category” is in a specific fair machine learning setting. “To oversimplify is to do violence, or even more, to re-inscribe violence on communities that already experience structural violence” [79]. The simplifications we make erase the social, economic, and political complexities of racial, gender, and sexuality categories. These counterfactual-based methodologies tend to treat groups as interchangeable, obscuring the unique oppression encountered by each group [77]. Overall, we cannot do meaningful work in fair machine learning without first understanding and specifying the social ontology of the human groupings about which we are concerned will be the basis for unfairness [77].

An avenue of fair machine learning research that would be a good start for solving this issue is the work on intersectional fairness [81, 82]. Intersectionality describes the ways in which inequality based on marginalization attributes like gender, race, ethnicity, belonging to LGBTQIA+, and/or disability class “intersect” to create unique effects of discrimination. An example of intersectional fairness can be seen in Fig. 5. Discrimination does not operate inside of a vacuum and often times discrimination based on one marginalization attribute reinforces the discrimination based on another. For example, if we try to solve the pay gap between men and women and do not include other dimensions like race, socio-economic status, or immigration status, then it is very likely that our solution will actually reinforce inequalities among women [83]. While intersectional fairness would be a good start, as it allows for a finer grained classification of an individual, it does not fix the categorization issue itself as that will take the collaboration between the fair machine learning community and social scientists.

Fig. 5
figure 5

Fictional example of acceptance rates to a university’s computer science department based on gender, race, and ACT score. For gender: 0 = male and 1 = female. For ACT score: 0 = poor, 1 = okay, 2 = average, and 3 = excellent. For race: 0 = White, 1 = Black, and 2 = Other. In the top image, the red dots represent students that were not admitted and blue dots show the students that were admitted. In this fictional example, White men were accepted with worse ACT scores than other applicants like Black men or White females. Additionally, no Black women were accepted. If we only took into account gender or race, we would not be able to see this trend and our correction could reinforce inequalities among the specific marginalization class itself

6.2 Unintentionally adverse terminology

It is natural to take words such as “bias” and “protected groups” at face value when reading a fair machine learning publication. Especially, when we, as technically minded researchers, would rather spend more time understanding the functionality of an algorithm, rather than the schematics of a particular word. However, “placation is an absolution” [84] and “Language shapes our thoughts” [85]. Throughout this work (and many of the works mentioned within), the term algorithmic bias is used liberally and without much thought. However, the word “bias” actively removes responsibility from the algorithm or dataset creator by obscuring the social structures and byproducts of oppressive institutions that contribute to the output of the algorithm [84]. It makes the effect of bias (i.e., an unfair model) out to be purely accidental.

So why use “bias” then? Mainly because the word oppression is strong and polarizing [84, 86].Footnote 11Algorithmic oppression as a theoretical concept acknowledges that there are systems of oppression that cannot simply be reformed, and that not every societal problem has (or should have) a technological solution. Algorithmic oppression analyzes the ways that technology has violent impacts on marginalized peoples’ lives, and in doing so, it does not water down the impact to “discrimination” or “implicit bias”, because doing so fundamentally invalidates the struggles and hardships that oppressed people endure [84].

In addition to Oppression over Bias, Hampton also comments on the term “protected groups”. They note that calling marginalized groups like Black, LGBTQIA+, or even females, “protected groups” is a “meaningless gesture, although well intentioned” [84]. This is because, in reality, these groups are not protected, but oppressed and disparaged and calling them “protected groups” does nothing to change their circumstances.

We echo the sentiments of Hampton [84]. This section is more of a critique of our language than a request to overhaul an already confusing field in terms of terminology. Let it serve as a reminder that our choice of words have very real consequences beyond simply explaining the techniques of our method.

6.3 Damaging assumptions and abstractions

6.3.1 Assumptions

When designing a fair machine learning model, many elements are generally assumed and not distinctly specified. Some of these assumptions include the societal objective hoped to be fulfilled by deploying a fair model, the set of individuals subjected to classification by the fair model, and the decision space available to the decision-makers who will interact with the model’s final predictions [62]. These assumptions can have undesirable consequences when they do not hold in the actual usage of the model. Each assumption is a choice that fundamentally determines if the model will ultimately advance fairness in society [62]. Additionally, it rarely is the case that the moral assumptions beneath the fair machine learning metrics are explained [88].

Of particular importance is the assumption of the population who will be acted upon by the model, i.e., the individuals who will be subjected to classification. The way that a person comes to belong in a social category or grouping may reflect underlying (objectionable) social structures, e.g., the “predictive” policing that targets racial minorities for arrest [62]. A model that satisfies fairness criteria when evaluated only on the population to which the model is applied may overlook unfairness in the process by which individuals came to be subject to the model in the first place [62].

Starting with clearly articulated goals can improve both fairness and accountability. Recent criticisms of fair machine learning have rightly pointed out that quantitative notions of fairness can restrict our thinking when we aim to make adjustments to a decision-making process, rather than to address the societal problems at hand. While algorithmic thinking runs such risks, quantitative approaches can also force us to make our assumptions more explicit and clarify what we are treating as background conditions. In doing so, we have the opportunity to be more deliberate and have meaningful debate about the difficult policy issues that we might otherwise hand-wave away, such as: “what is our objective”, and “how do we want to go about achieving it” [62]? Additionally, developing fair machine learning metrics that consider and analyze the entire ecosystem that they will operate under (i.e., procedural fairness) could offer a potential fix for the risk posed by assumptions.

6.3.2 Abstractions

Abstraction is one of the cornerstones of computing. It allows a programmer to hide all but the needed information about an object to reduce complexity and increase efficiency. However, abstraction can also lead to the erasure of critical social and historical contexts in problems where fair machine learning is necessary [79]. Almost all proposed fair machine learning metrics (and all those discussed in this work) bound the surrounding system tightly to only consider the machine learning model, the inputs, and the outputs, while completely abstracting away any social context [6]. By abstracting away the social context in which fair machine learning algorithms are deployed, we no longer are able to understand the broader context that determines how fair our outcome truly is.

Selbst et al. call these abstraction pitfalls traps—failure modes that occur when failing to properly understand and account for the interactions between a technical system and our humanistic, societal, world [6]. Specifically, they name five specific traps that arise when we fail to consider how the social concept aligns with technology and we recall them below:

  1. 1.

    Framing trap: failure to model the entire system over which a social criterion, such as fairness, will be enforced.

  2. 2.

    Portability trap: failure to understand how re-purposing algorithmic solutions designed for one social context may be misleading, inaccurate, or otherwise do harm when applied to a different context.

  3. 3.

    Formalism trap: failure to account for the full meaning of social concepts such as fairness, which can be procedural, contextual, contestable, and cannot be resolved through mathematical formalism.

  4. 4.

    Ripple effect trap: failure to understand how the insertion of technology into an existing social system changes the behaviors and embedded values of the pre-existing system.

  5. 5.

    Solutionism trap: failure to recognize the possibility that the best solution to a problem may not involve technology.

Selbst et al.’s main proposed solution is to focus on the process of determining where and how to apply technical solutions, and when applying technical solutions causes more harm than good [6]. They point out that to come to such a conclusion, technical researchers will need to either learn new social science skills or partner with social scientists on projects. Additionally, they point out that we must also become more comfortable with going against the intrinsic nature of the computer scientist to use abstraction, and be at ease with the difficult or unresolvable tensions between the usefulness and dangers of it [6].

6.4 Misalignment of current fair ML metrics with the legal field

Several works exist that critique the alignment of current fair machine learning metrics with disparate impact and disparate treatment. Xiang and Raji note that both disparate impact and disparate treatment were developed with human discriminators in mind, and simply replacing human decision-makers with algorithmic ones is often not appropriate [18]. They state that “intent is an inherently human characteristic”, and the common fair machine learning characterization of disparate treatment as not using marginalization class variables in an algorithm should be contested. They also note that simply accounting for disproportionate outcomes is not enough to prove disparate impact. It is only the first step of a disparate impact case, and there is only liability if the defendant cannot justify the outcomes using non-discriminatory rationals.

Additionally, [19] notes that while most work on fair machine learning has focused on achieving a fair distribution of decision outcomes, little-to-no attention has been paid to the overall decision process used to generate the outcome (i.e., procedural fairness). They note that this is at the determent of not incorporating human moral sense for whether or not it is fair to use a feature in a decision-making scenario. To this end, they support the use of procedural fairness, since it utilizes several considerations that are overlooked in distributive fairness cases, such as feature volitionality, feature reliability, feature privacy, and feature relevance.

However, there has been some push-back on developing procedurally fair machine learning metrics. Xiang and Raji note that the term “procedural fairness” as described in the fair machine learning literature is a narrow and misguided view of what procedural fairness means from a legal lens [18]. Procedural justice aims to arrive at a just outcome through an iterative process as well as through a close examination of the set of governing laws in place that guide the decision-maker to a specific decision [18, 89]. They pose that the overall goal of procedural fairness in machine learning should be re-aligned with the aim of procedural justice by instead analyzing the system surrounding the algorithm, as well as its use, rather than simply looking at it from the specifics of the algorithm itself.

6.5 Power dynamics and diversity

Here, we consider three important power dynamics: who is doing the classifying, who is picking the objective function, and who gets to define what counts as science. Starting with the first—who has the power to classify—J. Khadijah Abdurahman says that “it is not just that classification systems are inaccurate or biased, it is who has the power to classify, to determine the repercussions/policies associated thereof, and their relation to historical and accumulated injustice” [90]. As mentioned above, since there is no agreed upon definition of what a group/category is, it is ultimately up to those in power to classify people according to the task at hand. Often, this results in rigid classifications that do not align with how people would classify themselves. Additionally, because of data limitations, most often those in power employ the categories provided by the U.S. census or other taxonomies which stem from bureaucratic processes. However, it is well studied that these categories are unstable, contingent, and rooted in racial inequality [79]. When we undertake the process of classifying people, we need to understand what the larger implications of classifying are, and how they further impact or reinforce hurtful social structures.

The second question—who chooses the final optimization function to use in a fair machine learning algorithm—seems fairly intuitive. Of course, those creating fair machine learning methods do. However, should we have this power? The choice of how to construct the objective function of an algorithm is intimately connected with the political economy question of who has ownership and control rights over data and algorithms [91]. It is important to keep in mind that our work is, overall, for the benefit of marginalized populaces. That being the case, “it is not only irresponsible to force our ideas of what communities need, but also violent” [84]. “Before seeking new design solutions, we [should] look for what is already working at the community level” and “honor and uplift traditional, indigenous, and local knowledge and practices” [92]. This may require taking to asking the oppressed groups what their communities need, and what we should keep in mind when constructing the optimization algorithm to better serve them. “We must emphasize an importance of including all communities, and the voices and ideas of marginalized people must be centered as [they] are the first and hardest hit by algorithmic oppression” [84].

The final question—who chooses what is defined as science—comes from the study of the interplay of feminism with science and technology. Ruth Hubbard, the first woman to hold a tenured professorship position in biology at Harvard, advocated for the inclusion of other social groups besides White men to be allowed to make scientific contributions as “whoever gets to define what counts as a scientific problem also gets a powerful role in shaping the picture of the world that results from scientific research” [93]. For a drastic example, consider R.A. Fisher, who for a long period of time was the world’s leading statistician and practically invented large parts of the subject, was also a eugenicist, and thought that “those who did not take his word as God-given truth were at best stupid and at worst evil” [85].

Despite calls for diversity in science and technology, there are conflicting views on how to go about doing so. Some say that including marginalized populaces will help gain outside perspectives that will overall aid in creating technology to better suit the people it will eventually be used on [62]. Others say that this is actually not the case, and more diversity will not automatically solve algorithmic oppression [84]. Sociologist Ruha Benjamin points out that “having a more diverse team is an inadequate solution to discriminatory design practices that grow out of the interplay of racism and capitalism” as it shifts responsibility from “our technologies are harming people” to “BIPOCFootnote 12 tokens have to fix it” [84, 94]. By promoting diversity as a solution to the problem of algorithmic oppression, we “obscure the fact that there are power imbalances that are deeply embedded in societal systems and institutions” [84].

Regardless of how to solve the diversity issue, it is agreed upon that it is important to engage with marginalized communities and educate them on what fair machine learning is, and how it affects them. “We will solve nothing without involving our communities, and we must take care to ensure we do not impose elitist ideas of who can and cannot do science and engineering” [84]. It is our view that we, the fair machine learning community, should be having conversations with BIPOC communities about their thoughts on how we, the fair machine learning community, should solve the diversity issue (as well as thoughts on what they need and actually want from our community), and what we can do to help fix the problems machine learning ultimately created in the first place.

7 Conclusion

In this field guide, we have attempted to remedy a long standing problem in the fair machine learning field, namely, the abstraction of technical aspects of algorithms with their philosophical, sociological, and legal underpinnings. By explaining the details of popular statistics-based fair machine learning algorithms in both formal and social science terminology, ultimately, we recenter algorithmic fairness as a sociotechnical problem, rather than simply a technical one. Additionally, our classification of the fair machine learning metrics into statistical, philosophical, and legal categories allows for better understanding of the groundings of the metrics, and can shine insight on if a chosen metric is in social alignment with a desired outcome. We hope that this field guide not only helps machine learning practitioners understand how specific algorithms align with long-held humanistic values, but also that it will spark conversation and collaboration with the social science field to construct better algorithms.

In addition to explaining the metrics themselves, we also offered a critique on the field of fair machine learning as a whole. We do this specifically by calling upon literature produced by those marginalized and underrepresented in the fair machine learning community as they have view points that are critical to understanding how our work actually impacts and affects the social groups they belong to. When designing a fair machine learning algorithm, or any machine learning algorithm at all, we need to be mindful that our work ultimately impacts people beyond the immediate research community and our research labs. Our work should be centered around eliminating harm through algorithmic oppression, not in being (unintentionally) complicit to the violence against oppressed populaces by machine learning.

We conclude with the following call to action. We, the fair machine learning research community, before releasing fair machine learning methods should be intimately aware of their philosophical, social, and legal ties (as these notions ultimately determine how the final model will be implemented and used), as well as how they will actually affect the marginalized community they propose to protect. It is only in this way that we can actually contribute meaningful and “fair” machine learning research.