Abstract
With the spread of data mining technologies and the accumulation of social data, such technologies and data are being used for determinations that seriously affect individuals’ lives. For example, credit scoring is frequently determined based on the records of past credit data together with statistical prediction techniques. Needless to say, such determinations must be nondiscriminatory and fair in sensitive features, such as race, gender, religion, and so on. Several researchers have recently begun to attempt the development of analysis techniques that are aware of social fairness or discrimination. They have shown that simply avoiding the use of sensitive features is insufficient for eliminating biases in determinations, due to the indirect influence of sensitive information. In this paper, we first discuss three causes of unfairness in machine learning. We then propose a regularization approach that is applicable to any prediction algorithm with probabilistic discriminative models. We further apply this approach to logistic regression and empirically show its effectiveness and efficiency.
Keywords
References
Aggarwal, C.C., Yu, P.S. (eds.): Privacy-Preserving Data Mining: Models and Algorithms. Springer (2008)
Boyd, D.: Privacy and publicity in the context of big data. In: Keynote Talk of The 19th Int’l Conf. on World Wide Web (2010)
Calders, T., Verwer, S.: Three naive bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery 21, 277–292 (2010)
Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: Proc. of the 24th Int’l Conf. on Machine Learning, pp. 193–200 (2007)
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. arxiv.org:1104.3913 (2011)
Elkan, C.: The foundations of cost-sensitive learning. In: Proc. of the 17th Int’l Joint Conf. on Artificial Intelligence, pp. 973–978 (2001)
Frank, A., Asuncion, A.: UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine (2010), http://archive.ics.uci.edu/ml
Gondek, D., Hofmann, T.: Non-redundant data clustering. In: Proc. of the 4th IEEE Int’l Conf. on Data Mining, pp. 75–82 (2004)
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis. Wiley-Interscience (2001)
NIPS workshop — inductive transfer: 10 years later (2005), http://iitrl.acadiau.ca/itws05/
Kamiran, F., Calders, T., Pechenizkiy, M.: Discrimination aware decision tree learning. In: Proc. of the 10th IEEE Int’l Conf. on Data Mining, pp. 869–874 (2010)
Kamishima, T., Akaho, S., Sakuma, J.: Fairness-aware learning through regularization approach. In: Proc. of The 3rd IEEE Int’l Workshop on Privacy Aspects of Data Mining, pp. 643–650 (2011)
Luong, B.T., Ruggieri, S., Turini, F.: k-NN as an implementation of situation testing for discrimination discovery and prevention. In: Proc. of the 17th Int’l Conf. on Knowledge Discovery and Data Mining, pp. 502–510 (2011)
Nissim, K.: Private data analysis via output perturbation. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, ch. 4. Springer (2008)
Pariser, E.: The Filter Bubble: What The Internet Is Hiding From You. Viking (2011)
Pearl, J.: Causality: Models, Reasoning and Inference, 2nd edn. Cambridge University Press (2009)
Pedreschi, D., Ruggieri, S., Turini, F.: Discrimination-aware data mining. In: Proc. of the 14th Int’l Conf. on Knowledge Discovery and Data Mining (2008)
Pedreschi, D., Ruggieri, S., Turini, F.: Measuring discrimination in socially-sensitive decision records. In: Proc. of the SIAM Int’l Conf. on Data Mining, pp. 581–592 (2009)
Perlich, C., Kaufman, S., Rosset, S.: Leakage in data mining: Formulation, detection, and avoidance. In: Proc. of the 17th Int’l Conf. on Knowledge Discovery and Data Mining, pp. 556–563 (2011)
Ruggieri, S., Pedreschi, D., Turini, F.: DCUBE: Discrimination discovery in databases. In: Proc of The ACM SIGMOD Int’l Conf. on Management of Data, pp. 1127–1130 (2010)
Strehl, A., Ghosh, J.: Cluster ensembles — a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
Venkatasubramanian, S.: Measures of anonimity. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, ch. 4. Springer (2008)
Žliobaitė, I., Kamiran, F., Calders, T.: Handling conditional discrimination. In: Proc. of the 11th IEEE Int’l Conf. on Data Mining (2011)
Zadrozny, B.: Learning and evaluating classifiers under sample selection bias. In: Proc. of the 21st Int’l Conf. on Machine Learning, pp. 903–910 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kamishima, T., Akaho, S., Asoh, H., Sakuma, J. (2012). Fairness-Aware Classifier with Prejudice Remover Regularizer. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-33486-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33485-6
Online ISBN: 978-3-642-33486-3
eBook Packages: Computer ScienceComputer Science (R0)