Knowledge and Information Systems

, Volume 35, Issue 3, pp 613–644 | Cite as

Quantifying explainable discrimination and removing illegal discrimination in automated decision making

  • Faisal Kamiran
  • Indrė Žliobaitė
  • Toon Calders
Regular Paper


Recently, the following discrimination-aware classification problem was introduced. Historical data used for supervised learning may contain discrimination, for instance, with respect to gender. The question addressed by discrimination-aware techniques is, given sensitive attribute, how to train discrimination-free classifiers on such historical data that are discriminative, with respect to the given sensitive attribute. Existing techniques that deal with this problem aim at removing all discrimination and do not take into account that part of the discrimination may be explainable by other attributes. For example, in a job application, the education level of a job candidate could be such an explainable attribute. If the data contain many highly educated male candidates and only few highly educated women, a difference in acceptance rates between woman and man does not necessarily reflect gender discrimination, as it could be explained by the different levels of education. Even though selecting on education level would result in more males being accepted, a difference with respect to such a criterion would not be considered to be undesirable, nor illegal. Current state-of-the-art techniques, however, do not take such gender-neutral explanations into account and tend to overreact and actually start reverse discriminating, as we will show in this paper. Therefore, we introduce and analyze the refined notion of conditional non-discrimination in classifier design. We show that some of the differences in decisions across the sensitive groups can be explainable and are hence tolerable. Therefore, we develop methodology for quantifying the explainable discrimination and algorithmic techniques for removing the illegal discrimination when one or more attributes are considered as explanatory. Experimental evaluation on synthetic and real-world classification datasets demonstrates that the new techniques are superior to the old ones in this new context, as they succeed in removing almost exclusively the undesirable discrimination, while leaving the explainable differences unchanged, allowing for differences in decisions as long as they are explainable.


Classification Independence Discrimination-aware data mining 


  1. 1.
    Ahearn T (2010) Discrimination lawsuit shows importance of employer policy on the use of criminal records during background checks. via:
  2. 2.
    Asuncion A, Newman D (2007) UCI machine learning repository. Online
  3. 3.
    Attorney-General’s Dept C (1984) Australian sex discrimination act 1984. via:
  4. 4.
    Becker G (1971) The economics of discrimination. University of Chicago Press, ChicagoCrossRefGoogle Scholar
  5. 5.
    Bickel P, Hammel E, O’Connell J (1975) Sex bias in graduate admissions: data from Berkeley. Science 187(4175):398–404CrossRefGoogle Scholar
  6. 6.
    Calders T, Kamiran F, Pechenizkiy M (2009) Building classifiers with independency constraints. In: IEEE ICDM workshop on domain driven data mining (DDDM’09), pp 13–18Google Scholar
  7. 7.
    Calders T, Verwer S (2010) Three naive bayes approaches for discrimination-free classification. Data Mining Knowl Discov 21(2):277–292MathSciNetCrossRefGoogle Scholar
  8. 8.
    Chan PK, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (KDD’98), pp 164–168Google Scholar
  9. 9.
    Chawla N, Hall L, Joshi A (2005) Wrapper-based computation and evaluation of sampling methods for imbalanced datasets. In: Proceedings of the 1st international workshop on Utility-based data mining, pp 24–33Google Scholar
  10. 10.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357zbMATHGoogle Scholar
  11. 11.
    Collard D (1972) The economics of discrimination. Econ J 82(326):788–790CrossRefGoogle Scholar
  12. 12.
    Dedman B (1988) The color of money: atlanta blacks losing in home loans scramble: banks favor white areas by 5–1 margin. Atlanta J ConstGoogle Scholar
  13. 13.
    Dewey D (1958) The economics of discrimination. South Econ J 24(4):494–496CrossRefGoogle Scholar
  14. 14.
    Domingos P (1999) Metacost: a general method for making classifiers cost-sensitive. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (KDD)), pp 155–164Google Scholar
  15. 15.
    Dutch Central Bureau for Statistics (2001) VolkstellingGoogle Scholar
  16. 16.
    Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the 17th international joint conference on, artificial intelligence (IJCAI’01), pp 973–978Google Scholar
  17. 17.
    Ellis E (2005) EU anti-discrimination law. Oxford University Press, OxfordGoogle Scholar
  18. 18.
    European Network Against Racism (1998). via:
  19. 19.
    European Union Legislation (2012) via:
  20. 20.
    Hajian S, Domingo-Ferrer J, Martinez-Balleste A (2011) Discrimination prevention in data mining for intrusion and crime detection. In: IEEE symposium on computational intelligence in cyber security (CICS). IEEE, pp 47–54Google Scholar
  21. 21.
    Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Rule protection for indirect discrimination prevention in data mining. Model Dec Artif Intell 6820:211–222Google Scholar
  22. 22.
    Hart M (2005) Subjective decisionmaking and unconscious discrimination. Alabama Law Rev 56:741Google Scholar
  23. 23.
    Kamiran F, Calders T (2009) Classifying without discriminating. In: Proceedings of the 2nd international conference on computer, control and, communication (IC4), pp 1–6Google Scholar
  24. 24.
    Kamiran F, Calders T (2010) Classification with no discrimination by preferential sampling. In: Proceedings of the 19th annual machine learning conference of Belgium and the Netherlands (BENELEARN’10), pp 1–6Google Scholar
  25. 25.
    Kamiran F, Calders T (2012) Data preprocessing techniques for classification without discrimination. Knowl Inf Syst 33:1–33Google Scholar
  26. 26.
    Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of IEEE international conference on data mining (ICDM), pp 869–874Google Scholar
  27. 27.
    Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324zbMATHCrossRefGoogle Scholar
  28. 28.
    Koknar-Tezel S, Latecki L (2010) Improving SVM classification on imbalanced time series data sets with ghost points. Knowl Inf Syst 24(2):1–23Google Scholar
  29. 29.
    Krueger A (1963) The economics of discrimination. J Polit Econ 71(5):481–486CrossRefGoogle Scholar
  30. 30.
    Luong B, Ruggieri S, Turini F (2011) k-nn as an implementation of situation testing for discrimination discovery and prevention. Technical Report TR-11-04, Dipartimento di Informatica, Universita di PisaGoogle Scholar
  31. 31.
    Margineantu D, Dietterich T (1999) Learning decision trees for loss minimization In: Multi-class problems. Technical report, Department of Computer Science, Oregon State UniversityGoogle Scholar
  32. 32.
    Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of ACM SIGKDD conference on knowledge discovery and data mining (KDD’08)Google Scholar
  33. 33.
    Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM international conference on data mining (SDM’09), pp 581–592Google Scholar
  34. 34.
    Reder M (1958) The economics of discrimination. Am Econ Rev 48(3):495–500Google Scholar
  35. 35.
    Ruggieri S, Pedreschi D, Turini F (2010) DCUBE: discrimination discovery in databases. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD’10). ACM, pp 1127–1130Google Scholar
  36. 36.
    Ruggieri S, Pedreschi D, Turini F (2010) Integrating induction and deduction for finding evidence of discrimination. Artif Intell Law 18:1–43Google Scholar
  37. 37.
    Sawhill I (1973) The economics of discrimination against women: some new findings. J Human Res 8(3):383–396CrossRefGoogle Scholar
  38. 38.
    Simpson EH (1951) The interpretation of interaction in contingency tables. J R Stat Soc 13:238–241zbMATHGoogle Scholar
  39. 39.
    U. The US department of Justice (2011) The US federal legislation, via:
  40. 40.
    Turney P (2000) Cost-sensitive learning bibliography. In: Institute for Information Technology, National Research Council, Ottawa, CanadaGoogle Scholar
  41. 41.
    United Kingdom Legislation, 2012. via:
  42. 42.
    The US Civil Rights Act, 2006. via:
  43. 43.
    U. Us Dept. of Justice. Us equal credit opportunity act, 1974. via:
  44. 44.
    E. Us Empl. Opp. Comm. Us equal pay act, 1963. via:
  45. 45.
    US Fair Housing Act (1968). via:
  46. 46.
    Wang B, Japkowicz N (2009) Boosting support vector machines for imbalanced data Sets. Knowl Inf Syst, pp 1–20Google Scholar
  47. 47.
    Zliobaite I, Kamiran F, Calders T (2011) Handling conditional discrimination. In: Proceedings of IEEE international conference on data mining (ICDM’11), pp 992–1001Google Scholar

Copyright information

© Springer-Verlag London 2012

Authors and Affiliations

  • Faisal Kamiran
    • 1
  • Indrė Žliobaitė
    • 2
  • Toon Calders
    • 3
  1. 1.Mathematical and Computer Sciences and Engineering DivisionKing Abdullah University of Science and Technology (KAUST)ThuwalSaudi Arabia
  2. 2.Bournemouth UniversityPooleUK
  3. 3.Eindhoven University of TechnologyEindhovenThe Netherlands

Personalised recommendations