Skip to main content
Log in

Measuring discrimination in algorithmic decision making

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Society is increasingly relying on data-driven predictive models for automated decision making. This is not by design, but due to the nature and noisiness of observational data, such models may systematically disadvantage people belonging to certain categories or groups, instead of relying solely on individual merits. This may happen even if the computing process is fair and well-intentioned. Discrimination-aware data mining studies of how to make predictive models free from discrimination, when the historical data, on which they are built, may be biased, incomplete, or even contain past discriminatory decisions. Discrimination-aware data mining is an emerging research discipline, and there is no firm consensus yet of how to measure the performance of algorithms. The goal of this survey is to review various discrimination measures that have been used, analytically and computationally analyze their performance, and highlight implications of using one or another measure. We also describe measures from other disciplines, which have not been used for measuring discrimination, but potentially could be suitable for this purpose. This survey is primarily intended for researchers in data mining and machine learning as a step towards producing a unifying view of performance criteria when developing new algorithms for non-discriminatory predictive modeling. In addition, practitioners and policy makers could use this study when diagnosing potential discrimination by predictive models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The code for our experiments is available at https://github.com/zliobaite/paper-fairml-survey.

References

  • Angwin J, Larson J (2016) Bias in criminal risk scores is mathematically inevitable, researchers say. ProPublica. https://www.propublica.org/article/bias-in-criminal-risk-scores-is-mathematically-inevitable-researchers-say

  • Arrow KJ (1973) The theory of discrimination. In: Ashenfelter O, Rees A (eds) Discrimination in labor markets. Princeton University Press, Princeton, pp 3–33

    Google Scholar 

  • Barocas S, Hardt M (eds) (2014) International workshop on fairness, accountability, and transparency in machine learning (FATML). http://www.fatml.org/2014

  • Barocas S, Selbst AD (2016) Big data’s disparate impact. Calif Law Rev 104:671–732

  • Barocas S, Friedler S, Hardt M, Kroll J, Venkatasubramanian S, Wallach H (eds) (2015) 2nd International workshop on fairness, accountability, and transparency in machine learning (FATML). http://www.fatml.org

  • Berendt B, Preibusch S (2014) Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artif Intell Law 22(2):175–209

    Article  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer-Verlag New York, Inc., New York

    MATH  Google Scholar 

  • Blank RM, Dabady M, Citro CF (2004) Methods for assessing discrimination, NRCUP (2004) Measuring racial discrimination. National Academies Press, Washigton D.C

    Google Scholar 

  • Bonchi F, Hajian S, Mishra B, Ramazzotti D (2015) Exposing the probabilistic causal structure of discrimination. CoRR arXiv:1510.00552

  • Burn-Murdoch J (2013) The problem with algorithms: magnifying misbehaviour. The Guardian. http://www.theguardian.com/news/datablog/2013/aug/14/problem-with-algorithms-magnifying-misbehaviour

  • Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data Min Knowl Discov 21(2):277–292

    Article  MathSciNet  Google Scholar 

  • Calders T, Zliobaite I (eds) (2012) IEEE ICDM 2012 International workshop on discrimination and privacy-aware data mining (DPADM). https://sites.google.com/site/dpadm2012/

  • Calders T, Zliobaite I (2013) Why unbiased computational processes can lead to discriminative decision procedures. In: Custers B, Zarsky T, Schermer B, Calders T (eds) Discrimination and privacy in the information society—Data mining and profiling in large databases, Springer, pp 43–57

  • Calders T, Karim A, Kamiran F, Ali W, Zhang X (2013) Controlling attribute effect in linear regression. In: Proceedings of the 13th international conference on data Mining, ICDM, pp 71–80

  • Citron DK, Pasqualle III, FA (2014) The scored society: Due process for automated predictions. Wash Law Rev 89:1–33

  • Cochran WG (1954) Some methods for strengthening the common chi2 tests. Biometrics 10(4):417–451

    Article  MathSciNet  Google Scholar 

  • Corbett-Davies S, Pierson E, Feller A, Goel S (2016) A computer program used for bail and sentencing decisions was labeled biased against blacks. its actually not that clear. The Washington Post. https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/?utm_term=.4ded14bf289e

  • Custers B, Calders T, Schermer B, Zarsky T (eds) (2013) Discrimination and privacy in the information society. Data mining and profiling in large databases. Springer, Berlin

    Google Scholar 

  • Dwork C, Hardt M, Pitassi T, Reingold O, Zemel RS (2012) Fairness through awareness. In: Proceedings of innovations in theoretical computer science, pp 214–226

  • Dwoskin E (2015) How social bias creeps into web technology. The Wall Street Journal. http://www.wsj.com/articles/computers-are-showing-their-biases-and-tech-firms-are-concerned-1440102894

  • Edelman BG, Luca M (2014) Digital discrimination: the case of airbnb.com. Working Paper 14-054, Harvard Business School NOM Unit

  • European Commission (2011) How to present a discrimination claim: Handbook on seeking remedies under the EU Non-discrimination Directives. EU Publications Office

  • European Union Agency for Fundamental Rights (2011) Handbook on European non-discrimination law. EU Publications Office, Luxemberg

    Google Scholar 

  • Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 259–268

  • Fukuchi K, Sakuma J, Kamishima T (2013) Prediction with model-based neutrality. In: Proceedings of European conference on machine learning and knowledge discovery in databases, pp 499–514

    Chapter  Google Scholar 

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  • Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459

    Article  Google Scholar 

  • Hajian S, Domingo-Ferrer J, Farras O (2014) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Min Knowl Discov 28(5–6):1158–1188

    Article  MathSciNet  Google Scholar 

  • Hajian S, Domingo-Ferrer J, Monreale A, Pedreschi D, Giannotti F (2015) Discrimination and privacy-aware patterns. Data Min Knowl Discov 29(6):1733–1782

    Article  MathSciNet  Google Scholar 

  • Hardt M, Price E, Srebro, N (2016) Equality of opportunity in supervised learning. In: Proceedings of advances in neural information processing systems 29, pp 3315–3323

  • Hillier A (2003) Spatial analysis of historical redlining: a methodological explanation. J Hous Res 14(1):137–168

    Google Scholar 

  • Kamiran F, Calders T (2009) Classification without discrimination. In: Proceedings nd IC4 conference on computer, control and communication, pp 1–6

  • Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM, pp 869–874

  • Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644

    Article  Google Scholar 

  • Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: Proceedings of European conference on machine learning and knowledge discovery in databases, ECMLPKDD, pp 35–50

  • Kleinberg J, Mullainathan S, Raghavan M (2017) Inherent trade-offs in the fair determination of risk scores. In: Proceedings 8th Conference on innovations in theoretical computer science

  • Luong BT, Ruggieri S, Turini F (2011) k-NN as an implementation of situation testing for discrimination discovery and prevention. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 502–510

  • Mancuhan K, Clifton C (2014) Combating discrimination using bayesian networks. Artif Intell Law 22(2):211–238

    Article  Google Scholar 

  • Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60

    Article  MathSciNet  Google Scholar 

  • Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Nat Cancer Inst 22(4):719–748

    Google Scholar 

  • Mascetti S, Ricci A, Ruggieri S (2014) Special issue: computational methods for enforcing privacy and fairness in the knowledge society. Artif Intell Law 22:109

    Article  Google Scholar 

  • Miller CC (2015) When algorithms discriminate. New York Times. http://www.nytimes.com/2015/07/10/upshot/when-algorithms-discriminate.html

  • Nature Editorial (2016) More accountability for big-data algorithms. Nature 537(7621):449

  • Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD International Conference on knowledge discovery and data mining, KDD, pp 560–568

  • Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM international conference on data mining, SDM, pp 581–592

  • Pedreschi D, Ruggieri S, Turini F (2012) A study of top-k measures for discrimination discovery. In: Proceedings of the 27th annual acm symposium on applied computing, SAC, pp 126–131

  • Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29(5):582–638

    Article  Google Scholar 

  • Romei A, Ruggieri S, Turini F (2013) Discrimination discovery in scientific project evaluation: a case study. Expert Syst Appl 40(15):6064–6079

    Article  Google Scholar 

  • Rorive I (2009) Proving discrimination cases the role of situation testing. http://migpolgroup.com/public/docs/153.ProvingDiscriminationCases_theroleofSituationTesting_EN_03.09.pdf

  • Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 1:41–55

    Article  MathSciNet  Google Scholar 

  • Ruggieri S (2014) Using t-closeness anonymity to control for non-discrimination. Trans Data Priv 7(2):99–129

    MathSciNet  Google Scholar 

  • Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. ACM Trans Knowl Discov Data 4(2):9:1–9:40

    Article  Google Scholar 

  • Ruggieri S, Hajian S, Kamiran F, Zhang, X (2014) Anti-discrimination analysis using privacy attack strategies. In: Proceedings of European conference on machine learning and knowledge discovery in databases, ECMLPKDD, pp. 694–710

    Google Scholar 

  • Tax D (2001) One-class classification. Ph.D. thesis, Delft University of Technology

  • The White House (2016) Big data: a report on algorithmic systems, opportunity, and civil rights. Executive office of the president. https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/2016_0504_data_discrimination.pdf

  • Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13

    Article  Google Scholar 

  • Wihbey J (2015) The possibilities of digital discrimination: Research on e-commerce, algorithms and big data. Journalist’s resource. https://journalistsresource.org/studies/society/internet/possibilities-online-racial-discrimination-research-airbnb

  • Yinger J (1986) Measuring racial discrimination with fair housing audits: caught in the act. Am Econ Rev 76(5):881–893

    Google Scholar 

  • Zemel RS, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. In: Proceedings of the 30th international conference on machine learning, pp 325–333

  • Zhang L, Wu Y, Wu X (2016) Situation testing-based discrimination discovery: A causal inference approach. In: Proceedings of the 25th international joint conference on artificial intelligence, IJCAI, pp 2718–2724

  • Zliobaite I (2015) On the relation between accuracy and fairness in binary classification. In: The 2nd workshop on fairness, accountability, and transparency in machine learning (FATML) at ICML’15

  • Zliobaite I, Custers B (2016) Using sensitive personal data may be necessary for avoiding discrimination in data-driven decision models. Artif Intell Law 24(2):183–201

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Indrė Žliobaitė.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Žliobaitė, I. Measuring discrimination in algorithmic decision making. Data Min Knowl Disc 31, 1060–1089 (2017). https://doi.org/10.1007/s10618-017-0506-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-017-0506-1

Keywords

Navigation