Skip to main content
Log in

Egalitarian Machine Learning

  • Published:
Res Publica Aims and scope Submit manuscript

Abstract

Prediction-based decisions, which are often made by utilizing the tools of machine learning, influence nearly all facets of modern life. Ethical concerns about this widespread practice have given rise to the field of fair machine learning and a number of fairness measures, mathematically precise definitions of fairness that purport to determine whether a given prediction-based decision system is fair. Following Reuben Binns (2017), we take ‘fairness’ in this context to be a placeholder for a variety of normative egalitarian considerations. We explore a few fairness measures to suss out their egalitarian roots and evaluate them, both as formalizations of egalitarian ideas and as assertions of what fairness demands of predictive systems. We pay special attention to a recent and popular fairness measure, counterfactual fairness, which holds that a prediction about an individual is fair if it is the same in the actual world and any counterfactual world where the individual belongs to a different demographic group (cf. Kusner et al. (2018)).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Note, then, that a ‘fair’ machine could be fair in this sense and yet strike us as unfair in another. For instance, it might be grossly inaccurate.

  2. This is no doubt also partially due to the fact that several intuitive measures are incompatible with each other in large ranges of cases, forcing hard choices between measures. For more on this, see Kleinberg et al. (2016).

  3. We also think that ‘fairness’ does even more than this. For instance, we think ‘fairness’ is also, at times, a placeholder for a less morally laden, statistical sense of ‘fair’ that is synonymous with ‘unbiased’, in the sense of lacking errors that systematically skew results towards certain types of outcomes over others.

  4. Heidari et al. (2019) is a notable exception. Heidari et al. argue that certain fairness measures can be given a natural interpretation as operationalizing a concern for certain egalitarian principles, and that users of fairness measures have reason to select only a measure that reflects what, in the relevant context, fairness requires. As will emerge in our discussion below, we fully agree. We take ourselves to building on this work (and that of Binns (2017)) by further investigating how robust are the justificatory connections between particular pairs of fairness measures and normative principles.

  5. Cf. Rawls (1987, pp. 10-11) on the possibility that people with fundamental moral disagreements can reach ‘overlapping consensus’ on principles to be used in a particular practical context. Cf. also Kagan (1992) on the possibility that theories with different fundamental moral commitments can exhibit partial agreement in their first-order principles.

  6. Importantly, we will not argue for the impossibility of a context-invariant fairness measure. But here let us register our skepticism that there could be one.

  7. Although our official position about narrower senses of fairness is, as we mentioned, an agnostic one, in footnote 20 we briefly offer reason to doubt that, if there is fairness in a narrower sense, it is normatively relevant in the cases that we discuss. We thank an anonymous reviewer for pushing us to consider this distinction.

  8. Though, in some contexts, a protected attribute could very well be considered a merit. For instance, Hausman (2014) discusses a case in which race is a merit for a director searching for an actor to play the role of Martin Luther King. But, again, such cases are not typical.

  9. Cp. Arneson (2015, p.5): ‘It should be noted that formal equality of opportunity […] puts moral constraints on market decisions […] If one operates a business and provides a product or service to the public for sale, formal equality of opportunity is violated if one refuses to sell to some class of potential customers on grounds that are whimsical […] or prejudiced (Sect. 1).’.

  10. We should note here that we are not claiming that stereotype threat actually manifests this way. In this case, as in all other hypothetical cases in this paper, we are stipulating plausible—if oversimplified—causal claims strictly for illustrative purposes.

  11. Here, it is worth emphasizing that the practical force of this conclusion depends in part on whether and to what extent something like the causal dependencies exhibited in our example manifest in the real world. At one extreme, they might represent mere conceptual possibilities and hence lack any practical upshot. This extreme, we reject. As mentioned in footnote 10, we construct our examples to mimic plausible real-world cases in which a fairness measure fails to track salient egalitarian concerns. That said, exactly when and how often examples of the sort we discuss arise in the real world is an empirical question that lies beyond the scope of this paper but whose answer—for precisely the reasons argued for here—is critically important for determining when and why any particular fairness measure should (or should not) be deployed in any particular circumstance.

  12. This case might be a little confusing. How can we have an artificially high rejection rate for one group while achieving equal true and false positive rates across groups? The key is that (a) the system is not perfectly accurate (creating, in essence, some wiggle room), and (b) the uneven distribution of qualifications which, under certain conditions and in the absence of an artificially high rejection rate for the group or some other corrective (such as perfect accuracy), will reveal itself in the form of asymmetrical true positive or false positive rates. For a more fully fleshed out example of this phenomenon, see Sect. 3.3. For a real-world instance of this phenomenon, see Corbett-Davies et al. (2016).

  13. For further discussion of the compatibility of formal equality of opportunity and statistical discrimination, see Arneson (2015), Sect. 1.3.

  14. As before, y stands for, ‘the subject has the property that the predictor is trying to predict’, \({\hat{y} }\) for, ‘the predictor predicts that the subject has the property’, and a for ‘the subject belongs to such and such protected class(es)’.

  15. We do not have the space here to explore calibration the way we have explored fairness through unawareness and equalized odds, but we think that it—like those measures—is no panacea. To see why, we recommend Corbett-Davies and Goel (2018).

  16. Since p(\({\hat{y} }\) | a & y) = p(\({\hat{y} }\) & a & y) ÷ p(a & y) = (number of members of the minority group who are qualified and test qualified ÷ total number of applicants) ÷ (total member of minority group who are qualified ÷ total number of applicants) = (180 ÷ 450) ÷ (190 ÷ 450) = 180 ÷ 190 ≈ .95. (Where a = df. ‘is a member of the minority group’).

  17. Since p(\({\hat{y} }\) |~ a & y) = p(\({\hat{y} }\) & ~ a & y) ÷ p(~ a & y) = (number of members of the majority group who are qualified and test qualified ÷ total number of applicants) ÷ (total number of members of the majority group who are qualified ÷ total number of applicants) = (45 ÷ 450) ÷ (55 ÷ 450) = 45 ÷ 55 ≈ .82.

  18. It is here worth noting that there are instances where we think pursuing calibration at the expense of equalized odds is unfair. We think that the case that brought to light the tension between calibration and measures like equalized odds—the notorious case of COMPAS (see Angwin et al. 2016 for details)—is one of them.

  19. Note that to the extent that the mid-level egalitarian principles are not—like the fairness measures—narrowly focused on a particular aspect of fairness, there is a larger stable of potential counterexamples from which we might draw.

  20. Arguably, both Graduate School 3 and Jobs 3 raise the complication discussed at the end of Sect. 3: someone might judge that, although these situations are unfair simpliciter, the use of these prediction-based decision systems is nevertheless fair in a narrower sense (just as, e.g., judges do not act unfairly in a narrow sense when they base their decisions on the arguments of opposing counsel). Since we are using these cases simply to motivate the interest of the more robust fairness measures and egalitarian principles discussed in Sects. 3.4 and 3.5, we officially wish to remain agnostic about this issue. But we are doubtful that there is a compelling analogy here with the case of judges. Judges issue their decisions from within a longstanding social practice that there is (we presume) good independent reason to structure in the ways that the legal system is in fact structured. But the design, choice, and use of algorithms in prediction-based decision making does not, offhand, appear to be embedded in any such social practice. So, even if the fact that some decisions are made fair in a narrow sense by conforming to certain social practices, it is doubtful whether such grounds are available in the case of prediction-based decision making.

  21. Later, in Sect. 3.5, we consider an even more demanding egalitarian principle that does not take the natural distribution of talents or abilities as a given.

  22. This case and the causal model depicting it are adapted from Kusner et al. (2018).

  23. For an accessible primer on causal models, see Pearl et al. (2016).

  24. Note: it is plausible that there are arrows running from race and sex to GPA, LSAT score, and first year law grades since race and sex influence these factors via racism and sexism (via, e.g., differences in class size across schools students are coming from, teachers stereotyping students as more or less likely to succeed or more or less likely to have behavioral problems, and so on). But of course it is not plausible that race and sex are the only influences on these things; hence we use ‘studiousness’ as a catch-all to represent the bundle of intrinsic attributes not causally related to one’s race or sex that influence GPA, LSAT score, and first year law grades. Note also (as previewed in footnote 10) that this is a massively oversimplified model that is being used for illustrative purposes; in reality, there are many more variables than those represented above, and the variables we have chosen might not actually interact exactly in the way pictured here.

  25. Cp. Arneson (2015: Sect. 1); Cohen (2008, pp. 2-3); Segall (2014, pp. 6–7)

  26. Cp. Cohen (2009, pp. 17-18).

  27. Cp. Temkin (1993, p. 17); Cohen (2008, p. 7).

  28. See Brighouse et al. (forthcoming) for discussion, in the case of education, of how substantive equality of opportunity informs the law and other educational standards.

  29. For example, luck egalitarianism has been defended as a plausible approach to ethical questions in healthcare policy (Segall (2009) and Eyal (2013)); education policy (Voigt (2007) and Harel Ben-Shahar (2016)); and social policy (Dworkin (2001) and Segall (2012)).

  30. Indeed, in motivating counterfactual fairness, Kusner et al. (2018, p. 7) emphasize this affinity between the measure and the motivations for luck egalitarianism. This apparent motivation for the measure, we take it, is all the more reason to ask how tight is the connection between the measure and the considerations that seem to motivate it. Noteworthy also, as Kusner et al. (2018, p. 3) emphasize, is the counterfactual fairness measure’s commitment to facts about individuals, not groups, as the determinant of whether there is unfairness; this is a commitment shared by canonical versions of luck egalitarianism (cp. Temkin (1993, p. 92)).

  31. Cp. Arneson and Hurley (2001, p. 85).

  32. To our knowledge, the only dissent from this luck-egalitarian orthodoxy is to be found in Segall (2014, pp. 49-50). (Temkin (2011, pp. 65-66) defends a pluralist view, close enough to the orthodoxy for our purposes, on which ‘global’ considerations of fairness matter in their own right over and above ‘local’ ones.) Segall, however, does not intend his discussion to refute the orthodoxy but rather to offer an attractive alternative to it (Segall (2014, p. 43)). So, for our purposes, this intramural luck-egalitarian dispute can be set aside. The point is that there is a well-motivated luck-egalitarian principle which condemns the inequality in question. The further question of whether, all things considered, this principle is the most plausible version of luck egalitarianism (considered either as an unconditional or mid-level egalitarian principle) is a substantive first-order question of the kind we are setting aside.

  33. There are of course other cases that resemble the Insurance case in which counterfactual fairness would not come apart from luck egalitarianism—for example, cases in which it is a matter of option luck whether these business-owners were in the situation in which they are exposed to these risks of harm in the first place. We have stipulated away this possibility in the Insurance case, by placing Frank and Rita in situations to which they have no reasonable alternative option but to run their respective inherited businesses, and by stipulating that the inequality is in part the result of a kind of force majeure—namely, a meteorite strike—that could not plausibly be claimed to be an instance of option luck. These details of the case, although somewhat fanciful, illustrate vividly the kinds of cases in which this kind of mismatch could arise: one-off interactions between the algorithm and those subject to it, with a great deal at stake, against a background for which no one can reasonably be held responsible, and in which the algorithm tracks ‘local’ responsibility facts. We think this is a plausible causal possibility, but as we noted in footnote 11, we take no stand on the empirical question of how often such cases in fact arise. (We thank an anonymous reviewer for helpful comments on this point.)

  34. For discussion of these issues and defense of this ‘asymmetrical’ version of luck egalitarianism (so-called because it treats asymmetrically inequalities and equalities that arise from factors beyond people’s control), see Segall (2016, chapter 3). As before, we are not taking a stance on the first-order normative intramural debate among luck egalitarians; the present point is that this ‘asymmetrical’ version of luck egalitarianism is a well-motivated version of the view.

  35. For what it is worth, satisfying various other fairness measures also requires (to varying degrees) sacrificing accuracy. And so this question arises, and the following considerations apply, for many other measures in addition to counterfactual fairness.

References

  • Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner (2016). Machine bias: There’s software used across the country to predict future criminals and it’s biased against blacks. ProPublica. Retrieved from https://www.propublica.org/article/machine-bias-risk-assessmentsin-criminal-sentencing.

  • Arneson, R., and S. Hurley. 2001. Luck and equality. Proceedings of the Aristotelian Society 75: 51–90.

    Google Scholar 

  • Arneson, R. 2015. Equality of opportunity. In The Stanford encyclopedia of philosophy (Summer 2015 Edition), Edward N. Zalta (ed.), URL = <https://plato.stanford.edu/archives/sum2015/entries/equal-opportunity/>.

  • Binns, Reuben. 2018. Fairness in machine learning: Lessons from political philosophy. In Conference on fairness, accountability and transparency, 149-159. PMLR

  • Brighouse, H., Geron, T., and Levinson, M. (forthcoming). Conceptions of educational equity. AERA Open.

  • Cohen, G. A. 2008. Rescuing justice and equality. Cambridge, MA: Harvard University Press.

    Book  Google Scholar 

  • Cohen, G. A. 2009. Why not socialism? Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Corbett-Davies, Sam, and Sharad Goel. 2018. The measure and mismeasure of fairness: A critical review of fair machine learning. arXiv preprint arXiv:1808.00023.

  • Corbett-Davies, Sam, Emma Pierson, Avi Feller, and Sharad Goel. 2016 “A Computer Program Used for Bail and Sentencing Decisions Was Labeled Biased against Blacks: It’s Actually Not That Clear.” Washington Post, October 17, 2016. https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/.

  • Dworkin, R. 2001. Sovereign virtue. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Eyal, N. 2013. Leveling down health. In Inequalities in health: Concepts, measures, and ethics, ed. N. Eyal, et al. Oxford: Oxford University Press.

    Chapter  Google Scholar 

  • Grgic-Hlaca, Nina, Muhammad Bilal Zafar, Krishna P. Gummadi, and Adrian Weller. 2016. The case for process fairness in learning: Feature selection for fair decision making. In NIPS Symposium on Machine Learning and the Law, 1: 2.

  • Harel Ben-Shahar, T. 2016. Equality in education: Why we must go all the way. Ethical Theory and Moral Practice 19: 83–100.

    Article  Google Scholar 

  • Hausman, D. 2014. Affirmative action: Bad arguments and some good ones. In The ethical life: Fundamental readings in ethics and moral problems, 3rd edn, ed. Russ Shafer-Landau, 476–489. New York: Oxford University Press.

    Google Scholar 

  • Hedden, Brian. 2021. On statistical criteria of algorithmic fairness. Philosophy and Public Affairs 49 (2): 209–231.

    Article  Google Scholar 

  • Heidari, Hoda, Michele Loi, Krishna P. Gummadi, and Andreas Krause. 2019. A moral framework for understanding fair ML through economic models of equality of opportunity. In Proceedings of the conference on fairness, accountability, and transparency, 181–190

  • Johnson, G. M. 2020. Algorithmic bias: On the implicit biases of social technology. Synthese 198: 9941–9961. https://doi.org/10.1007/s11229-020-02696-y.

    Article  Google Scholar 

  • Kagan, S. 1992. The structure of normative ethics. Philosophical Perspectives 6: 223–242.

    Article  Google Scholar 

  • Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan 2016. Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv preprint arXiv:1609.05807.

  • Kusner, Matt J., Joshua Loftus, Chris Russell, and Ricardo Silva. 2018. Counterfactual fairness. Advances in Neural Information Processing Systems 30.

  • Pearl, Judea, Madelyn Glymour, and Nicholas P. Jewell. 2016. "Causal inference in statistics: A primer. 2016." Google Ascholar there is no corresponding record for this reference.

  • Rawls, J. 1987. The idea of an overlapping consensus. Oxford Journal of Legal Studies 7 (1): 1–25.

    Article  Google Scholar 

  • Segall, S. 2009. Health, luck, and justice. Princeton, NJ: Princeton University Press.

    Book  Google Scholar 

  • Segall, S. 2012. Should the best qualified be appointed? Journal of Moral Philosophy 9 (1): 31–54.

    Article  Google Scholar 

  • Segall, S. 2014. Equality and opportunity. Oxford: Oxford University Press.

    Google Scholar 

  • Segall, S. 2016. Why inequality matters: Luck egalitarianism, its meaning, and value. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Sweeney, L. (2013a) Discrimination in online ad delivery. available at SSRN: https://ssrn.com/abstract=2208240 or http://dx.doi.org/https://doi.org/10.2139/ssrn.2208240

  • Temkin, L. 1993. Inequality. Oxford: Oxford University Press.

    Google Scholar 

  • Temkin, L. 2011. Justice, equality, fairness, desert, rights, free will, responsibility, and luck. In Responsibility and distributive justice, ed. Carl Knight and Zofia Stemplovska, 51–76. Oxford: Oxford University Press.

    Chapter  Google Scholar 

  • Voigt, K. 2007. Individual choice and unequal participation in higher education. Theory and Research in Education 5 (1): 87–112.

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to audiences at University of Memphis Fed Ex Institute of Technology, Cal Poly San Louis Obispo,the EnTIRE (Engineering Tools for Innovation and Research in Education) workshop series, California State University Long Beach, the Copenhagen Workshop on Algorithmic Fairness, the 2021 AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), WeRobot 2021, and two anonymous reviewers at Res Publica. We are especially grateful to Sune Holm and Kasper Lippert-Rasmussen, who organized and invited us to the Copenhagen Workshop on Algorithmic Fairness, which first inspired us to conceive of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clinton Castro.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Castro, C., O’Brien, D. & Schwan, B. Egalitarian Machine Learning. Res Publica 29, 237–264 (2023). https://doi.org/10.1007/s11158-022-09561-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11158-022-09561-4

Keywords

Navigation