Data Mining and Knowledge Discovery

, Volume 31, Issue 4, pp 1060–1089 | Cite as

Measuring discrimination in algorithmic decision making

  • Indrė ŽliobaitėEmail author
Part of the following topical collections:
  1. Academic Surveys and Tutorials


Society is increasingly relying on data-driven predictive models for automated decision making. This is not by design, but due to the nature and noisiness of observational data, such models may systematically disadvantage people belonging to certain categories or groups, instead of relying solely on individual merits. This may happen even if the computing process is fair and well-intentioned. Discrimination-aware data mining studies of how to make predictive models free from discrimination, when the historical data, on which they are built, may be biased, incomplete, or even contain past discriminatory decisions. Discrimination-aware data mining is an emerging research discipline, and there is no firm consensus yet of how to measure the performance of algorithms. The goal of this survey is to review various discrimination measures that have been used, analytically and computationally analyze their performance, and highlight implications of using one or another measure. We also describe measures from other disciplines, which have not been used for measuring discrimination, but potentially could be suitable for this purpose. This survey is primarily intended for researchers in data mining and machine learning as a step towards producing a unifying view of performance criteria when developing new algorithms for non-discriminatory predictive modeling. In addition, practitioners and policy makers could use this study when diagnosing potential discrimination by predictive models.


Discrimination-aware data mining Fairness-aware machine learning Accountability Predictive modeling Indirect discrimination 


  1. Angwin J, Larson J (2016) Bias in criminal risk scores is mathematically inevitable, researchers say. ProPublica.
  2. Arrow KJ (1973) The theory of discrimination. In: Ashenfelter O, Rees A (eds) Discrimination in labor markets. Princeton University Press, Princeton, pp 3–33Google Scholar
  3. Barocas S, Hardt M (eds) (2014) International workshop on fairness, accountability, and transparency in machine learning (FATML).
  4. Barocas S, Selbst AD (2016) Big data’s disparate impact. Calif Law Rev 104:671–732Google Scholar
  5. Barocas S, Friedler S, Hardt M, Kroll J, Venkatasubramanian S, Wallach H (eds) (2015) 2nd International workshop on fairness, accountability, and transparency in machine learning (FATML).
  6. Berendt B, Preibusch S (2014) Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artif Intell Law 22(2):175–209CrossRefGoogle Scholar
  7. Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer-Verlag New York, Inc., New YorkzbMATHGoogle Scholar
  8. Blank RM, Dabady M, Citro CF (2004) Methods for assessing discrimination, NRCUP (2004) Measuring racial discrimination. National Academies Press, Washigton D.CGoogle Scholar
  9. Bonchi F, Hajian S, Mishra B, Ramazzotti D (2015) Exposing the probabilistic causal structure of discrimination. CoRR arXiv:1510.00552
  10. Burn-Murdoch J (2013) The problem with algorithms: magnifying misbehaviour. The Guardian.
  11. Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data Min Knowl Discov 21(2):277–292MathSciNetCrossRefGoogle Scholar
  12. Calders T, Zliobaite I (eds) (2012) IEEE ICDM 2012 International workshop on discrimination and privacy-aware data mining (DPADM).
  13. Calders T, Zliobaite I (2013) Why unbiased computational processes can lead to discriminative decision procedures. In: Custers B, Zarsky T, Schermer B, Calders T (eds) Discrimination and privacy in the information society—Data mining and profiling in large databases, Springer, pp 43–57Google Scholar
  14. Calders T, Karim A, Kamiran F, Ali W, Zhang X (2013) Controlling attribute effect in linear regression. In: Proceedings of the 13th international conference on data Mining, ICDM, pp 71–80Google Scholar
  15. Citron DK, Pasqualle III, FA (2014) The scored society: Due process for automated predictions. Wash Law Rev 89:1–33Google Scholar
  16. Cochran WG (1954) Some methods for strengthening the common chi2 tests. Biometrics 10(4):417–451MathSciNetCrossRefGoogle Scholar
  17. Corbett-Davies S, Pierson E, Feller A, Goel S (2016) A computer program used for bail and sentencing decisions was labeled biased against blacks. its actually not that clear. The Washington Post.
  18. Custers B, Calders T, Schermer B, Zarsky T (eds) (2013) Discrimination and privacy in the information society. Data mining and profiling in large databases. Springer, BerlinGoogle Scholar
  19. Dwork C, Hardt M, Pitassi T, Reingold O, Zemel RS (2012) Fairness through awareness. In: Proceedings of innovations in theoretical computer science, pp 214–226Google Scholar
  20. Dwoskin E (2015) How social bias creeps into web technology. The Wall Street Journal.
  21. Edelman BG, Luca M (2014) Digital discrimination: the case of Working Paper 14-054, Harvard Business School NOM UnitGoogle Scholar
  22. European Commission (2011) How to present a discrimination claim: Handbook on seeking remedies under the EU Non-discrimination Directives. EU Publications OfficeGoogle Scholar
  23. European Union Agency for Fundamental Rights (2011) Handbook on European non-discrimination law. EU Publications Office, LuxembergGoogle Scholar
  24. Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 259–268Google Scholar
  25. Fukuchi K, Sakuma J, Kamishima T (2013) Prediction with model-based neutrality. In: Proceedings of European conference on machine learning and knowledge discovery in databases, pp 499–514CrossRefGoogle Scholar
  26. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182zbMATHGoogle Scholar
  27. Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459CrossRefGoogle Scholar
  28. Hajian S, Domingo-Ferrer J, Farras O (2014) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Min Knowl Discov 28(5–6):1158–1188MathSciNetCrossRefGoogle Scholar
  29. Hajian S, Domingo-Ferrer J, Monreale A, Pedreschi D, Giannotti F (2015) Discrimination and privacy-aware patterns. Data Min Knowl Discov 29(6):1733–1782MathSciNetCrossRefGoogle Scholar
  30. Hardt M, Price E, Srebro, N (2016) Equality of opportunity in supervised learning. In: Proceedings of advances in neural information processing systems 29, pp 3315–3323Google Scholar
  31. Hillier A (2003) Spatial analysis of historical redlining: a methodological explanation. J Hous Res 14(1):137–168Google Scholar
  32. Kamiran F, Calders T (2009) Classification without discrimination. In: Proceedings nd IC4 conference on computer, control and communication, pp 1–6Google Scholar
  33. Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM, pp 869–874Google Scholar
  34. Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644CrossRefGoogle Scholar
  35. Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: Proceedings of European conference on machine learning and knowledge discovery in databases, ECMLPKDD, pp 35–50Google Scholar
  36. Kleinberg J, Mullainathan S, Raghavan M (2017) Inherent trade-offs in the fair determination of risk scores. In: Proceedings 8th Conference on innovations in theoretical computer scienceGoogle Scholar
  37. Luong BT, Ruggieri S, Turini F (2011) k-NN as an implementation of situation testing for discrimination discovery and prevention. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD, pp 502–510Google Scholar
  38. Mancuhan K, Clifton C (2014) Combating discrimination using bayesian networks. Artif Intell Law 22(2):211–238CrossRefGoogle Scholar
  39. Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60MathSciNetCrossRefGoogle Scholar
  40. Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J Nat Cancer Inst 22(4):719–748Google Scholar
  41. Mascetti S, Ricci A, Ruggieri S (2014) Special issue: computational methods for enforcing privacy and fairness in the knowledge society. Artif Intell Law 22:109CrossRefGoogle Scholar
  42. Miller CC (2015) When algorithms discriminate. New York Times.
  43. Nature Editorial (2016) More accountability for big-data algorithms. Nature 537(7621):449Google Scholar
  44. Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM SIGKDD International Conference on knowledge discovery and data mining, KDD, pp 560–568Google Scholar
  45. Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM international conference on data mining, SDM, pp 581–592Google Scholar
  46. Pedreschi D, Ruggieri S, Turini F (2012) A study of top-k measures for discrimination discovery. In: Proceedings of the 27th annual acm symposium on applied computing, SAC, pp 126–131Google Scholar
  47. Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29(5):582–638CrossRefGoogle Scholar
  48. Romei A, Ruggieri S, Turini F (2013) Discrimination discovery in scientific project evaluation: a case study. Expert Syst Appl 40(15):6064–6079CrossRefGoogle Scholar
  49. Rorive I (2009) Proving discrimination cases the role of situation testing.
  50. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 1:41–55MathSciNetCrossRefGoogle Scholar
  51. Ruggieri S (2014) Using t-closeness anonymity to control for non-discrimination. Trans Data Priv 7(2):99–129MathSciNetGoogle Scholar
  52. Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. ACM Trans Knowl Discov Data 4(2):9:1–9:40CrossRefGoogle Scholar
  53. Ruggieri S, Hajian S, Kamiran F, Zhang, X (2014) Anti-discrimination analysis using privacy attack strategies. In: Proceedings of European conference on machine learning and knowledge discovery in databases, ECMLPKDD, pp. 694–710Google Scholar
  54. Tax D (2001) One-class classification. Ph.D. thesis, Delft University of TechnologyGoogle Scholar
  55. The White House (2016) Big data: a report on algorithmic systems, opportunity, and civil rights. Executive office of the president.
  56. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min 3(3):1–13CrossRefGoogle Scholar
  57. Wihbey J (2015) The possibilities of digital discrimination: Research on e-commerce, algorithms and big data. Journalist’s resource.
  58. Yinger J (1986) Measuring racial discrimination with fair housing audits: caught in the act. Am Econ Rev 76(5):881–893Google Scholar
  59. Zemel RS, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. In: Proceedings of the 30th international conference on machine learning, pp 325–333Google Scholar
  60. Zhang L, Wu Y, Wu X (2016) Situation testing-based discrimination discovery: A causal inference approach. In: Proceedings of the 25th international joint conference on artificial intelligence, IJCAI, pp 2718–2724Google Scholar
  61. Zliobaite I (2015) On the relation between accuracy and fairness in binary classification. In: The 2nd workshop on fairness, accountability, and transparency in machine learning (FATML) at ICML’15Google Scholar
  62. Zliobaite I, Custers B (2016) Using sensitive personal data may be necessary for avoiding discrimination in data-driven decision models. Artif Intell Law 24(2):183–201CrossRefGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  2. 2.Department of Geosciences and GeographyUniversity of HelsinkiHelsinkiFinland

Personalised recommendations