In this paper, we investigate how to modify the naive Bayes classifier in order to perform classification that is restricted to be independent with respect to a given sensitive attribute. Such independency restrictions occur naturally when the decision process leading to the labels in the data-set was biased; e.g., due to gender or racial discrimination. This setting is motivated by many cases in which there exist laws that disallow a decision that is partly based on discrimination. Naive application of machine learning techniques would result in huge fines for companies. We present three approaches for making the naive Bayes classifier discrimination-free: (i) modifying the probability of the decision being positive, (ii) training one model for every sensitive attribute value and balancing them, and (iii) adding a latent variable to the Bayesian model that represents the unbiased label and optimizing the model parameters for likelihood using expectation maximization. We present experiments for the three approaches on both artificial and real-life data.
Calders T, Kamiran F, Pechenizkiy M (2009) Building classifiers with independency constraints. In: IEEE ICDM workshop on domain driven data mining. IEEE press
Calders T, Kamiran F, Pechenizkiy M (2010) Constructing decision trees under independency constraints. Technical report, TU Eindhoven
Chan PK, Stolfo SJ (1998) Toward scalable learning with non-uniform class and cost distributions: a case study in credit card fraud detection. In: Proceedings of ACM SIGKDD, pp 164–168
Duivesteijn W, Feelders AJ (2008) Nearest neighbour classification with monotonicity constraints. In: Proceedings of ECML/PKDD’08. Springer, Berlin, pp 301–316
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of IJCAI’01, pp 973–978
Kamiran F, Calders T (2009) Classifying without discriminating. In: Proceedings of IC409. IEEE press
Kamiran F, Calders T (2010) Classification with no discrimination by preferential sampling. In: Proc. Benelearn
Kotlowski W, Dembczynski K, Greco S, Slowinski R (2007) Statistical model for rough set approach to multicriteria classification. In: Proceedings of ECML/PKDD’07. Springer, Berlin
Margineantu DD, Dietterich TG (1999) Learning decision trees for loss minimization in multi-class problems. Technical report, Department Computer Science, Oregon State University
Nijssen S, Fromont E (2007) Mining optimal decision trees from itemset lattices. In: Proceedings of ACM SIGKDD
Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of ACM SIGKDD
Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of SIAM DM
This research was partially funded by the Netherlands Organisation for Scientific Research (NWO) through the Responsible Innovation project grant Data Mining Without Discrimination (KMVI-08-29).
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Responsible editor: José L Balcázar, Francesco Bonchi, Aristides Gionis, Michèle Sebag.
About this article
Cite this article
Calders, T., Verwer, S. Three naive Bayes approaches for discrimination-free classification. Data Min Knowl Disc 21, 277–292 (2010). https://doi.org/10.1007/s10618-010-0190-x