Artificial Intelligence and Law

, Volume 24, Issue 2, pp 183–201 | Cite as

Using sensitive personal data may be necessary for avoiding discrimination in data-driven decision models



Increasing numbers of decisions about everyday life are made using algorithms. By algorithms we mean predictive models (decision rules) captured from historical data using data mining. Such models often decide prices we pay, select ads we see and news we read online, match job descriptions and candidate CVs, decide who gets a loan, who goes through an extra airport security check, or who gets released on parole. Yet growing evidence suggests that decision making by algorithms may discriminate people, even if the computing process is fair and well-intentioned. This happens due to biased or non-representative learning data in combination with inadvertent modeling procedures. From the regulatory perspective there are two tendencies in relation to this issue: (1) to ensure that data-driven decision making is not discriminatory, and (2) to restrict overall collecting and storing of private data to a necessary minimum. This paper shows that from the computing perspective these two goals are contradictory. We demonstrate empirically and theoretically with standard regression models that in order to make sure that decision models are non-discriminatory, for instance, with respect to race, the sensitive racial information needs to be used in the model building process. Of course, after the model is ready, race should not be required as an input variable for decision making. From the regulatory perspective this has an important implication: collecting sensitive personal data is necessary in order to guarantee fairness of algorithms, and law making needs to find sensible ways to allow using such data in the modeling process.


Non-discrimination Fairness Regression Data mining Personal data Sensitive data 


  1. Ajunwa I, Friedler S, Scheidegger C, Venkatasubramanian S (2016) Hiring by algorithm: predicting and preventing disparate impact. SSRNGoogle Scholar
  2. Barocas S, Selbst AD (2016) Big data’s disparate impact. Calif Law Rev.
  3. Bygrave L (2002). Data protection law; approaching its rationale, logic and limits, vol. 10 of information law series. Kluwer Law International, The HagueGoogle Scholar
  4. Calders T, Karim A, Kamiran F, Ali, W, Zhang X (2013) Controlling attribute effect in linear regression. In: Proceedings of 13th IEEE ICDM, pp 71–80Google Scholar
  5. Calders T, Zliobaite I (2013) Why unbiased computational processes can lead to discriminative decision procedures. In: Discrimination and Privacy in the Information Society, pp 43–57Google Scholar
  6. Citron DK, Pasquale FA (2014) The scored society: due process for automated predictions. Wash Law Rev, Vol. 89, p. 1, U of Maryland Legal Studies Research Paper No. 2014-8Google Scholar
  7. Custers BHM (2012) Predicting data that people refuse to disclose; how data mining predictions challenge informational self-determination. Priv Obs Mag 3.
  8. Custers B, Calders T, Schermer B, Zarsky T (eds) (2013a) Discrimination and privacy in the information society: data mining and profiling in large databases. Springer, HeidelbergGoogle Scholar
  9. Custers B, Van der Hof S, Schermer B, Appleby-Arnold S, Brockdorff N (2013b) Informed consent in social media use. the gap between user expectations and eu personal data protection law. SCRIPTed J Law Technol Soc 10:435–457Google Scholar
  10. Edelman BG, Luca M (2014) Digital discrimination: the case of Working Paper 14-054, Harvard Business SchoolGoogle Scholar
  11. Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of 21st ACM KDD, pp 259–268Google Scholar
  12. Gellert R, De Vries K, De Hert P, Gutwirth S (2013) A comparative analysis of anti-discrimination and data protection legislations. In: Discrimination and privacy in the information society: data mining and profiling in large databases. Springer, HeidelbergGoogle Scholar
  13. Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459CrossRefGoogle Scholar
  14. Hillier A (2003) Spatial analysis of historical redlining: a methodological explanation. J Hous Res 14(1):137–168Google Scholar
  15. Hornung G (2012) A general data protection regulation for europe? Light and shade. The Commissions Draft of 25 January 2012, 9 SCRIPTed, pp 64–81Google Scholar
  16. House TW (2014) Big data: seizing opportunities, preserving valuesGoogle Scholar
  17. Kamiran F, Calders T (2009) Classification without discrimination. In IEEE international conference on computer, control & communication, IEEE-IC4. IEEE pressGoogle Scholar
  18. Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of 10th IEEE ICDM, pp 869–874Google Scholar
  19. Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644CrossRefGoogle Scholar
  20. Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: Proceedings of ECMLPKDD, pp 35–50Google Scholar
  21. Kay M, Matuszek C, Munson S (2015) Unequal representation and gender stereotypes in image search results for occupations. In: Proceedings of 33rd ACM CHI, pp 3819–3828Google Scholar
  22. Kosinski M, Stillwell D, Graepel T (2013) Private traits and attributes are predictable from digital records of human behaviour. Proc Natl Acad Sci 110(15):5802–5805CrossRefGoogle Scholar
  23. Kuner C (2012) The european commission’s proposed data protection regulation: a copernican revolution in european data protection law. Privacy and Security Law ReportGoogle Scholar
  24. Luong BT, Ruggieri S, Turini F (2011) k-NN as an implementation of situation testing for discrimination discovery and prevention. In: Proceedings of 17th KDD, pp 502–510Google Scholar
  25. Mancuhan K, Clifton C (2014) Combating discrimination using bayesian networks. Artif Intell Law 22(2):211–238CrossRefGoogle Scholar
  26. McCrudden C, Prechal S (2009) The concepts of equality and non-discrimination in europe. European commission. DG Employment, Social Affairs and Equal OpportunitiesGoogle Scholar
  27. Ohm P (2010) Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev 57:1701–1765Google Scholar
  28. Pearl J (2009) Causality: models, reasoning and inference, 2nd edn. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  29. Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of 14th ACM KDD, pp 560–568Google Scholar
  30. Pope DG, Sydnor JR (2011) Implementing anti-discrimination policies in statistical profiling models. Am Econ J Econ Policy 3(3):206–231CrossRefGoogle Scholar
  31. Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29(5):582–638CrossRefGoogle Scholar
  32. Schermer B, Custers B, Van der Hof S (2014) The crisis of consent: how stronger legal protection may lead to weaker consent in data protection. Ethics Inf Technol 16(2):171–182Google Scholar
  33. Squires G (2003) Racial profiling, insurance style: insurance redlining and the uneven development of metropolitan areas. J Urban Aff 25(4):391–410MathSciNetCrossRefGoogle Scholar
  34. Sweeney L (2013) Discrimination in online ad delivery. Commun ACM 56(5):44–54MathSciNetCrossRefGoogle Scholar
  35. Weisberg S (1985) Applied linear regression, second editionGoogle Scholar
  36. Zemel RS, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. In: Proceedings of 30th ICML, pp 325–333Google Scholar
  37. Zliobaite I (2015) A survey on measuring indirect discrimination in machine learning. CoRR, abs/1511.00148Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.Helsinki Institute for Information Technology HIITAalto UniversityEspooFinland
  2. 2.Department of Geosciences and GeographyUniversity of HelsinkiHelsinkiFinland
  3. 3.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  4. 4.eLaw, Center for Law and Digital Technologies, Faculty of LawLeiden UniversityLeidenThe Netherlands
  5. 5.WODC, Ministry of Security and JusticeThe HagueThe Netherlands

Personalised recommendations