Skip to main content
Log in

Equalized odds is a requirement of algorithmic fairness

  • Original Research
  • Published:
Synthese Aims and scope Submit manuscript

Abstract

Statistical criteria of fairness are formal measures of how an algorithm performs that aim to help us determine whether an algorithm would be fair to use in decision-making. In this paper, I introduce a new version of the criterion known as “Equalized Odds,” argue that it is a requirement of procedural fairness, and show that it is immune to a number of objections to the standard version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Angwin et al. (2016).

  2. These formulations of Equal False-Positive Rates and Equal False-Negative Rates are from Hedden (2021).

  3. Hardt et al. (2016).

  4. See Flores et al. (2016), Huq (2019), Mayson (2019), Hellman (2020), and Long (2021) for further objections that are outside the scope of this paper.

  5. This formulation of Calibration Within Groups is from Hedden (2021, p. 214).

  6. See e.g. Dieterich et al. (2016), Corbett-Davies et al. (2016), and Long (2021).

  7. Rawls (1999, pp. 74–75).

  8. The probabilistic expectations here should be understood in an externalist way, in terms of objective chances. Hedden (2021) suggests that understanding them in terms of objective chances will only work when there is “objective chanciness involved” (see fn. 15). However, we can (and I think should) always understand the relevant probabilities in terms of objective chances. On my interpretation of Equalized Odds, whether a given decision subject is qualified is not chancy but rather determinate at the time of decision (see Sect. 3). There will, however, be objective chanciness in how the decision procedure being evaluated classifies particular decision subjects. This will be true even if the decision procedure consists of a deterministic algorithm; such an algorithm must be implemented on physical hardware that might malfunction, in addition to chanciness introduced by how input data is collected and processed.

  9. Perhaps poor claimants are wronged if the researchers believe the algorithm’s predictions. As Hedden points out (p. 220), defenders of moral encroachment argue that we can wrong others simply by believing certain things about them; see e.g. Moss (2018) and Basu (2019a, 2019b). However, whether anyone believes an algorithm’s predictions is not something that is intrinsic to the algorithm itself.

  10. As Di Bello and O’Neil (2020) point out, and take to motivate a criterion of procedural justice in criminal trials that they call “equal protection,” which requires that “innocent defendants not be exposed to higher risks of mistaken conviction than other innocent defendants facing the same charges or comparably serious charges” (p. 158).

  11. Intuitions in favor of Equalized Odds are especially strong when the disadvantaged group more socially marginalized than the advantaged group. See Castro (2019) for one possible explanation of the asymmetry.

  12. Table reproduced from Eva (2022, p. 254). I have substituted percentages for fractions for convenience.

  13. Eva (2022, p. 258).

  14. Eva (2022, p. 258). Note that Eva concedes that there is an alternative way to explain why the algorithm’s predictions are unfair: one might think that it is unfair to base lending decisions on zip code because “the correlations between race, zip code and default rates are themselves the product of unjust social economic historical trends” (p. 255). However, Eva maintains that “there is something intrinsically unfair in the predictions themselves, [and] we should not need to refer to the predictive features used by the algorithm in order to diagnose that unfairness. [W]e should be able to diagnose the intrinsic unfairness of the algorithm’s predictions using statistical criteria alone” (p. 257).

  15. While the case judgments I have discussed support Equalized Odds, a full defense would require investigating why it is a requirement of fairness, which is outside the scope of this paper. Castro (2019) develops one possible argument, focusing on the special case of recidivism prediction.

  16. Note that Eva’s focus is on what they call “intrinsic fairness,” which concerns the fairness of an algorithm’s predictions considered in isolation from how they are used as well as other features of the surrounding social and historical context. It is open to Eva to accept that Base Rate Tracking is not a requirement of procedural fairness, but maintain that it is nonetheless a criterion of intrinsic fairness. I submit, however, that there is nothing unfair about the situation just described, which suggests that Base Rate Tracking is not a criterion of intrinsic fairness, either.

  17. Angwin et al. (2016).

  18. See Castro (2019) for an alternative reconstruction of ProPublica’s argument.

  19. Dietrich et al. (2016).

  20. Dietrich et al. (2016, p. 8).

  21. Note that ProPublica used being charged with a new crime as a proxy for recidivism, as is standard in the field of recidivism prediction. Given racial disparities in policing, one might worry that this practice will tend to result in decision-makers overestimating the recidivism risk posed by black defendants relative to white defendants (see Mayson 2019 and Long 2021 for discussion).

  22. Various authors have suggested that the appropriate threshold might be different for different individuals. See e.g. Castro (2019), Huq (2019), and Long (2021). I set aside the difficult question of what kinds of facts determine what threshold is appropriate, as well as the complication that other factors might be relevant to how decision subjects ought to be treated.

  23. To simplify discussion, I am assuming that the practice of pretrial detention on the basis of recidivism risk is morally justifiable.

  24. Sutton wrote the following in his autobiography: “Why did I rob banks? Because I enjoyed it. I loved it. I was more alive when I was inside a bank, robbing it, than at any other time in my life. I enjoyed everything about it so much that one or two weeks later I'd be out looking for the next job” (Sutton & Linn, 2004).

  25. Note that, if Equalized Odds is a requirement of procedural fairness, then it follows that procedural fairness is not perfectly accessible to us, as Equalized Odds is an externalist constraint. Other authors have endorsed the idea that there are externalist constraints on procedural fairness; see e.g. Gardiner (2019).

  26. Strevens (1999) notes that “probabilistic generalization is the rule in the medical sciences” (p. 244), adding that the relevant probabilities should be understood objectively. Just how we should understand objective chances of macroscopic events is a vexed issue, and beyond the scope of this paper. Various accounts of objective chance are available that are compatible with my defense of Equalized Odds; see for example List and Pivato (2015) and Glynn (2010).

  27. Cf. Smith (2014), who argues that subjective moral theories cannot explain the duty to gather evidence before acting.

  28. Corbett-Davies et al. (2016).

  29. I am grateful to an anonymous reviewer for suggesting that I address this objection to Equalized Odds more explicitly.

  30. The claim that fairness requires using the same decision-threshold across social groups is sometimes called the “single-threshold rule.” See Mayson (2019), Huq (2019), and Corbett-Davies and Goel (2018) for discussion. See also footnote 22.

  31. As Hedden puts it; see p. 225.

  32. Simoiu et al., (2017, p. 1994).

  33. Hedden (2021, pp. 221–222).

  34. Hedden (2021, p. 220).

  35. What if we chose an example featuring categorical qualifications instead of dispositional ones? Now we cannot construct a case that is structurally analogous to Hedden’s, because the relevant objective chances will all be 0% or 100%. (And even if we could, the algorithm would still not violate Equalized Odds, because it would still classify everyone correctly.).

  36. Rawls (1999, p. 202).

  37. Why would evidentiary unfairness disproportionately affect members of socially marginalized groups? Because social marginalization tends to generate evidence that affected individuals lack the traits that qualify them for more favorable treatment by social institutions. Crucially, this evidence is generated even in cases where the individuals in question do qualify for favorable treatment. For example, some widely accepted risk factors for recidivism are in effect measures of an individual’s “level of legitimate economic opportunity” (to borrow a poignant phrase from COMPAS’ user manual). Using these features to estimate recidivism risk will presumably lead courts to judge low-risk individuals from marginalized groups to be higher-risk than their counterparts from more privileged backgrounds. Similar points apply, mutatis mutandis, to other features institutions treat as a basis for allocating benefits and burdens. (Consider taking the prestige of job applicants’ undergraduate institution into account in making hiring decisions, or taking wealth into account in making lending decisions.).

  38. Equalized Odds and Calibration Within Groups are normally incompatible when base rates of the feature of interest differ across groups. This has been demonstrated formally for the version of Equalized Odds discussed by ProPublica; see Kleinberg et al. (2016), Chouldechova (2017), and Miconi (2017). The conflict also arises for my revised version of Equalized Odds as it applies to both dispositional and categorical qualification problems. (To see that the problem arises in dispositional cases, consider that racial profiling will be necessary to ensure Calibration Within Groups is satisfied in cases where race is an independent risk factor, in the sense that it gives us evidence of objective risk that is not screened off by other available evidence.) See Long (2021) for an argument that we should accept Calibration Within Groups and reject Equalized Odds.

  39. See Castro (2022) and Eva (2022) for arguments that Calibration Within Groups is not a requirement of fairness.

  40. Long (2021) cites this as a reason to reject Equal False Positive Rates—one half of Equalized Odds—as a requirement of procedural fairness. Long and others raise additional objections to Equalized Odds as well (see footnote 4 for references); I think these objections can be answered, but answering them is beyond the scope of this paper.

  41. See Castro (2019) for one proposal about the normative foundations of Equalized Odds.

References

  • Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias. ProPublica. Retrieved from https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed 23 May 2016.

  • Ayres. (2002). Outcome tests of racial disparities in police practices. Justice Research and Policy, 4(1–2), 131–142.

    Article  Google Scholar 

  • Basu, R. (2019). The wrongs of racist beliefs. Philosophical Studies, 176(9), 2497–2515.

    Article  Google Scholar 

  • Basu, R. (2019). Radical moral encroachment: The moral stakes of racist beliefs. Philosophical Issues, 2019(1), 9–23.

    Article  Google Scholar 

  • Becker, G. S. (1957). The economics of discrimination. University of Chicago Press.

    Google Scholar 

  • Becker, G. S. (1993). Nobel lecture: The economic way of looking at behavior. Journal of Political Economy, 101(3), 385–409.

    Article  Google Scholar 

  • Castro, C. (2019). What’s wrong with machine bias? Ergo, an Open Access Journal of Philosophy, 6(15), 405–426. https://doi.org/10.3998/ergo.12405314.0006.015

    Article  Google Scholar 

  • Castro, C. (2022). Just machines. Public Affairs Quarterly, 36(2), 163–183.

    Article  Google Scholar 

  • Chouldechova, A. (2017). Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big Data, 5(2), 153–163.

    Article  Google Scholar 

  • Corbett-Davies, S., & Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. http://arxiv.org/abs/1808.00023

  • Corbett-Davies, S., Pierson, E., Feller, A., & Goel, S. (2016). A computer program used for bail and sentencing decisions was labeled biased against blacks. It’s actually not that clear. The Washington Post. https://www.washingtonpost.com/news/monkeycage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/. Accessed 17 Oct 2016.

  • Di Bello, M., & O’Neil, C. (2020). Profile evidence, fairness, and the risks of mistaken convictions. Ethics, 130(2), 147–178.

    Article  Google Scholar 

  • Dieterich, W., Mendoza, C., & Brennan, T. (2016). COMPAS risk scales: Accuracy equity and predictive parity. Retrieved from https://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf.

  • Eva, B. (2022). Algorithmic fairness and base rate tracking. Philosophy and Public Affairs, 50(2), 239–266.

    Article  Google Scholar 

  • Flores, A. W., Bechtel, K., & Lowenkamp, C. T. (2016). False positives, false negatives, and false analyses: A rejoinder to machine bias: There’s software used across the country to predict future criminals and it’s biased against blacks. Federal Probation, 80(2), 38–46.

    Google Scholar 

  • Gardiner, G. (2019). The reasonable and the relevant: Legal standards of proof. Philosophy and Public Affairs, 47(3), 288–318.

    Article  Google Scholar 

  • Glynn, L. (2010). Deterministic chance. The British Journal for the Philosophy of Science, 61(1), 51–80.

    Article  Google Scholar 

  • Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 29, 1–9.

    Google Scholar 

  • Hedden, B. (2021). On statistical criteria of algorithmic fairness. Philosophy and Public Affairs, 49(2), 209–231.

    Article  Google Scholar 

  • Hellman, D. (2020). Measuring algorithmic fairness. Virginia Law Review, 106(4), 811–866.

    Google Scholar 

  • Huq, A. Z. (2019). Racial equity in algorithmic criminal justice. Duke Law Journal, 68(6), 1043–1134.

    Google Scholar 

  • Kleinberg, J., Mullainathan, S., & Raghavan, M. (2016). Inherent trade-offs in the fair determination of risk scores. http://arxiv.org/abs/1609.05807.

  • List, C., & Pivato, M. (2015). Emergent chance. Philosophical Review, 124(1), 119–152.

    Article  Google Scholar 

  • Long, R. (2021). Fairness in machine learning: Against false positive rate equality as a measure of fairness. Journal of Moral Philosophy, 19(1), 49–78.

    Article  Google Scholar 

  • Mayson, S. (2019). Bias in, bias out. Yale Law Journal, 128(8), 2122–2473.

    Google Scholar 

  • Miconi, T. (2017). The impossibility of "fairness": A generalized impossibility result for decisions. http://arxiv.org/abs/1707.01195.

  • Moss, S. (2018). Moral encroachment. Proceedings of the Aristotelian Society, 118(2), 177–205.

    Article  Google Scholar 

  • Rawls, J. (1999). A Theory of Justice (revised edition). Belknap Press.

    Book  Google Scholar 

  • Simoiu, C., Corbett-Davies, S., & Goel, S. (2017). The problem of infra-marginality in outcome tests for discrimination. The Annals of Applied Statistics, 11(3), 1193–1216.

    Article  Google Scholar 

  • Smith, H. M. (2014). The subjective moral duty to inform oneself before acting. Ethics, 125(1), 11–38.

    Article  Google Scholar 

  • Strevens, M. (1999). Objective Probability as a Guide to the World. Philosophical Studies, 95(3), 243–275.

    Article  Google Scholar 

  • Sutton, W., & Linn, E. (2004). Where the money was: The memoirs of a bank robber. Crown.

    Google Scholar 

Download references

Acknowledgements

For helpful feedback on earlier versions of this paper, I would like to thank Arden Ali, Jeff Behrends, Jennifer Carr, Ryan Doody, Lyndal Grant, Lily Hu, Gregory Keenan, Euan MacDonald, Greg Ray, Friederike Schuur, Nikita Shepard, and anonymous reviewers at the ACM Conference on Fairness, Accountability, and Transparency. I would also like to thank audiences at several presentations of this paper in 2020 and 2021, including at the International Society for Justice Research Annual Meeting, the University of Edinburgh Legal Theory Research Group Seminar Series, the University of Florida, the University of Florida South Eastern Graduate Philosophy Conference, the Harvard University Cyberethics Forum, the Jain Family Institute, and the Rocky Mountain Ethics Congress. Special thanks to Jenna Donohue, Jay Hodges, Milo Phillips-Brown, Duncan Purves, and two anonymous reviewers at Synthese for extensive feedback on previous drafts.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Gray Grant.

Ethics declarations

Conflict of interest

The author has no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Some of the research for this paper was conducted during an Assistant Professorship at the University of Texas at San Antonio.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grant, D.G. Equalized odds is a requirement of algorithmic fairness. Synthese 201, 101 (2023). https://doi.org/10.1007/s11229-023-04054-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11229-023-04054-0

Keywords

Navigation