Measuring discrimination in algorithmic decision making

Žliobaitė, Indrė

doi:10.1007/s10618-017-0506-1

Measuring discrimination in algorithmic decision making

Published: 31 March 2017

Volume 31, pages 1060–1089, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Indrė Žliobaitė ORCID: orcid.org/0000-0003-2427-5407^1,2

9522 Accesses
133 Citations
15 Altmetric
Explore all metrics

Abstract

Society is increasingly relying on data-driven predictive models for automated decision making. This is not by design, but due to the nature and noisiness of observational data, such models may systematically disadvantage people belonging to certain categories or groups, instead of relying solely on individual merits. This may happen even if the computing process is fair and well-intentioned. Discrimination-aware data mining studies of how to make predictive models free from discrimination, when the historical data, on which they are built, may be biased, incomplete, or even contain past discriminatory decisions. Discrimination-aware data mining is an emerging research discipline, and there is no firm consensus yet of how to measure the performance of algorithms. The goal of this survey is to review various discrimination measures that have been used, analytically and computationally analyze their performance, and highlight implications of using one or another measure. We also describe measures from other disciplines, which have not been used for measuring discrimination, but potentially could be suitable for this purpose. This survey is primarily intended for researchers in data mining and machine learning as a step towards producing a unifying view of performance criteria when developing new algorithms for non-discriminatory predictive modeling. In addition, practitioners and policy makers could use this study when diagnosing potential discrimination by predictive models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

The code for our experiments is available at https://github.com/zliobaite/paper-fairml-survey.

References

Angwin J, Larson J (2016) Bias in criminal risk scores is mathematically inevitable, researchers say. ProPublica. https://www.propublica.org/article/bias-in-criminal-risk-scores-is-mathematically-inevitable-researchers-say
Arrow KJ (1973) The theory of discrimination. In: Ashenfelter O, Rees A (eds) Discrimination in labor markets. Princeton University Press, Princeton, pp 3–33
Google Scholar
Barocas S, Hardt M (eds) (2014) International workshop on fairness, accountability, and transparency in machine learning (FATML). http://www.fatml.org/2014
Barocas S, Selbst AD (2016) Big data’s disparate impact. Calif Law Rev 104:671–732
Barocas S, Friedler S, Hardt M, Kroll J, Venkatasubramanian S, Wallach H (eds) (2015) 2nd International workshop on fairness, accountability, and transparency in machine learning (FATML). http://www.fatml.org
Berendt B, Preibusch S (2014) Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artif Intell Law 22(2):175–209
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer-Verlag New York, Inc., New York
MATH Google Scholar
Blank RM, Dabady M, Citro CF (2004) Methods for assessing discrimination, NRCUP (2004) Measuring racial discrimination. National Academies Press, Washigton D.C
Google Scholar
Bonchi F, Hajian S, Mishra B, Ramazzotti D (2015) Exposing the probabilistic causal structure of discrimination. CoRR arXiv:1510.00552
Burn-Murdoch J (2013) The problem with algorithms: magnifying misbehaviour. The Guardian. http://www.theguardian.com/news/datablog/2013/aug/14/problem-with-algorithms-magnifying-misbehaviour
Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data Min Knowl Discov 21(2):277–292
Article MathSciNet Google Scholar
Calders T, Zliobaite I (eds) (2012) IEEE ICDM 2012 International workshop on discrimination and privacy-aware data mining (DPADM). https://sites.google.com/site/dpadm2012/
Calders T, Zliobaite I (2013) Why unbiased computational processes can lead to discriminative decision procedures. In: Custers B, Zarsky T, Schermer B, Calders T (eds) Discrimination and privacy in the information society—Data mining and profiling in large databases, Springer, pp 43–57
Calders T, Karim A, Kamiran F, Ali W, Zhang X (2013) Controlling attribute effect in linear regression. In: Proceedings of the 13th international conference on data Mining, ICDM, pp 71–80
Citron DK, Pasqualle III, FA (2014) The scored society: Due process for automated predictions. Wash Law Rev 89:1–33
Cochran WG (1954) Some methods for strengthening the common chi2 tests. Biometrics 10(4):417–451
Article MathSciNet Google Scholar
Corbett-Davies S, Pierson E, Feller A, Goel S (2016) A computer program used for bail and sentencing decisions was labeled biased against blacks. its actually not that clear. The Washington Post. https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/?utm_term=.4ded14bf289e
Custers B, Calders T, Schermer B, Zarsky T (eds) (2013) Discrimination and privacy in the information society. Data mining and profiling in large databases. Springer, Berlin
Google Scholar
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel RS (2012) Fairness through awareness. In: Proceedings of innovations in theoretical computer science, pp 214–226
Dwoskin E (2015) How social bias creeps into web technology. The Wall Street Journal. http://www.wsj.com/articles/computers-are-showing-their-biases-and-tech-firms-are-concerned-1440102894
Edelman BG, Luca M (2014) Digital discrimination: the case of airbnb.com. Working Paper 14-054, Harvard Business School NOM Unit
European Commission (2011) How to present a discrimination claim: Handbook on seeking remedies under the EU Non-discrimination Directives. EU Publications Office
European Union Agency for Fundamental Rights (2011) Handbook on European non-discrimination law. EU Publications Office, Luxemberg
Google Scholar
Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 259–268
Fukuchi K, Sakuma J, Kamishima T (2013) Prediction with model-based neutrality. In: Proceedings of European conference on machine learning and knowledge discovery in databases, pp 499–514
Chapter Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459
Article Google Scholar
Hajian S, Domingo-Ferrer J, Farras O (2014) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Min Knowl Discov 28(5–6):1158–1188
Article MathSciNet Google Scholar
Hajian S, Domingo-Ferrer J, Monreale A, Pedreschi D, Giannotti F (2015) Discrimination and privacy-aware patterns. Data Min Knowl Discov 29(6):1733–1782
Article MathSciNet Google Scholar
Hardt M, Price E, Srebro, N (2016) Equality of opportunity in supervised learning. In: Proceedings of advances in neural information processing systems 29, pp 3315–3323
Hillier A (2003) Spatial analysis of historical redlining: a methodological explanation. J Hous Res 14(1):137–168
Google Scholar
Kamiran F, Calders T (2009) Classification without discrimination. In: Proceedings nd IC4 conference on computer, control and communication, pp 1–6
Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM, pp 869–874
Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644
Article Google Scholar
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: Proceedings of European conference on machine learning and knowledge discovery in databases, ECMLPKDD, pp 35–50
Kleinberg J, Mullainathan S, Raghavan M (2017) Inherent trade-offs in the fair determination of risk scores. In: Proceedings 8th Conference on innovations in theoretical computer science
Luong BT, Ruggieri S, Turini F (2011) k-NN as an implementation of situation testing for discrimination discovery and prevention. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 502–510
Mancuhan K, Clifton C (2014) Combating discrimination using bayesian networks. Artif Intell Law 22(2):211–238
Article Google Scholar
Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60
Article MathSciNet Google Scholar
Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Nat Cancer Inst 22(4):719–748
Google Scholar
Mascetti S, Ricci A, Ruggieri S (2014) Special issue: computational methods for enforcing privacy and fairness in the knowledge society. Artif Intell Law 22:109
Article Google Scholar
Miller CC (2015) When algorithms discriminate. New York Times. http://www.nytimes.com/2015/07/10/upshot/when-algorithms-discriminate.html
Nature Editorial (2016) More accountability for big-data algorithms. Nature 537(7621):449
Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD International Conference on knowledge discovery and data mining, KDD, pp 560–568
Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM international conference on data mining, SDM, pp 581–592
Pedreschi D, Ruggieri S, Turini F (2012) A study of top-k measures for discrimination discovery. In: Proceedings of the 27th annual acm symposium on applied computing, SAC, pp 126–131
Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29(5):582–638
Article Google Scholar
Romei A, Ruggieri S, Turini F (2013) Discrimination discovery in scientific project evaluation: a case study. Expert Syst Appl 40(15):6064–6079
Article Google Scholar
Rorive I (2009) Proving discrimination cases the role of situation testing. http://migpolgroup.com/public/docs/153.ProvingDiscriminationCases_theroleofSituationTesting_EN_03.09.pdf
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 1:41–55
Article MathSciNet Google Scholar
Ruggieri S (2014) Using t-closeness anonymity to control for non-discrimination. Trans Data Priv 7(2):99–129
MathSciNet Google Scholar
Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. ACM Trans Knowl Discov Data 4(2):9:1–9:40
Article Google Scholar
Ruggieri S, Hajian S, Kamiran F, Zhang, X (2014) Anti-discrimination analysis using privacy attack strategies. In: Proceedings of European conference on machine learning and knowledge discovery in databases, ECMLPKDD, pp. 694–710
Google Scholar
Tax D (2001) One-class classification. Ph.D. thesis, Delft University of Technology
The White House (2016) Big data: a report on algorithmic systems, opportunity, and civil rights. Executive office of the president. https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/2016_0504_data_discrimination.pdf
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13
Article Google Scholar
Wihbey J (2015) The possibilities of digital discrimination: Research on e-commerce, algorithms and big data. Journalist’s resource. https://journalistsresource.org/studies/society/internet/possibilities-online-racial-discrimination-research-airbnb
Yinger J (1986) Measuring racial discrimination with fair housing audits: caught in the act. Am Econ Rev 76(5):881–893
Google Scholar
Zemel RS, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. In: Proceedings of the 30th international conference on machine learning, pp 325–333
Zhang L, Wu Y, Wu X (2016) Situation testing-based discrimination discovery: A causal inference approach. In: Proceedings of the 25th international joint conference on artificial intelligence, IJCAI, pp 2718–2724
Zliobaite I (2015) On the relation between accuracy and fairness in binary classification. In: The 2nd workshop on fairness, accountability, and transparency in machine learning (FATML) at ICML’15
Zliobaite I, Custers B (2016) Using sensitive personal data may be necessary for avoiding discrimination in data-driven decision models. Artif Intell Law 24(2):183–201
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė
Department of Geosciences and Geography, University of Helsinki, Helsinki, Finland
Indrė Žliobaitė

Authors

Indrė Žliobaitė
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Indrė Žliobaitė.

Additional information

Responsible editor: Johannes Fürnkranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Žliobaitė, I. Measuring discrimination in algorithmic decision making. Data Min Knowl Disc 31, 1060–1089 (2017). https://doi.org/10.1007/s10618-017-0506-1

Download citation

Received: 05 July 2016
Accepted: 24 March 2017
Published: 31 March 2017
Issue Date: July 2017
DOI: https://doi.org/10.1007/s10618-017-0506-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measuring discrimination in algorithmic decision making

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Research Methods for Public Policy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Measuring discrimination in algorithmic decision making

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Research Methods for Public Policy

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation