Responsibly Innovating Data Mining and Profiling Tools: A New Approach to Discrimination Sensitive and Privacy Sensitive Attributes

  • Bart H. M. Custers
  • Bart W. Schermer


Data mining is a technology that extracts useful information, such as patterns and trends, from large amounts of data. The privacy sensitive input data and the output data that is often used for selections deserve protection against abuse. In this paper we describe one of the main results of our research project on developing new privacy preserving and discrimination aware data mining tools, namely why the common measures for mitigating privacy and discrimination concerns, such as a priori limiting measures (particularly access controls, anonymity and purpose specification) are mechanisms that are increasingly failing solutions against privacy and discrimination issues in the novel context of advanced data mining and profiling. Contrary to previous attempts to protect privacy and prevent discrimination in data mining, we did not focus on new designs that better enable (a priori) access limiting measures regarding input data, but rather focused on (a posteriori) responsibility and transparency. Instead of limiting access to data, which is increasingly hard to enforce in a world of automated and interlinked databases and information networks, rather the question how data can and may be used was stressed.


Data Mining Access Control Information Asymmetry Sensitive Attribute Data Mining Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors would like to thank the Netherlands Organization for Scientific Research (NWO) for enabling this research.


  1. Adriaans, P., and D. Zantinge. 1996. Data mining. Harlow: Addison Wesley Longman.Google Scholar
  2. Bygrave, L.A. 2002. Data protection law; approaching its rationale, logic and limits, Information law series, vol. 10. The Hague/London/New York: Kluwer Law International.Google Scholar
  3. Calders, T. 2007. The complexity of satisfying constraints on transaction databases. Acta Informatica 44(7–8): 591–624.CrossRefGoogle Scholar
  4. Calders, T. 2008. Itemset frequency satisfiability: Complexity and axiomatization. Theoretical Computer Science 394(1–2): 84–111.CrossRefGoogle Scholar
  5. Calders, T., and S. Verwer. 2010. Three Naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery, September 2010, Vol. 21, Issue 2, pp. 277–292.Google Scholar
  6. Chawla, N.V., K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer. 2002. Smote: Synthetic minority over-sampling technique. International Journal of Artificial Intelligence Research (JAIR) 16: 321–357.Google Scholar
  7. Cocx, T.K. 2009. Algorithmic tools for data-oriented law enforcement, PhD thesis, University of Leiden.Google Scholar
  8. Custers, B.H.M. 2004. The power of knowledge. Tilburg: Wolf Legal Publishers.Google Scholar
  9. Custers, B.H.M. 2010. Data mining with discrimination sensitive and privacy sensitive attributes. In Proceedings of ISP 2010, international conference on information security and privacy, 12–14, July 2010, Orlando, Florida.Google Scholar
  10. Custers, B., T. Calders, B. Schermer, and T. Zarsky. 2013. Discrimination and privacy in the information society; data mining and profiling in large databases. Heidelberg: Springer.CrossRefGoogle Scholar
  11. Del Carmen, A. 2007. Racial profiling in America. Upper Sadle River: Prentice Hall.Google Scholar
  12. Denning, D. 1983. Cryptography and data security. Amsterdam: Addison-Wesley.Google Scholar
  13. Fayyad, U.-M., G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. 1996. Advances in knowledge discovery and data mining. Menlo Park: AAAI Press/The MIT Press.Google Scholar
  14. Goldberg, I.A. 2000. A pseudonymous communications infrastructure for the Internet, dissertation, Berkeley: University of California at Berkeley.Google Scholar
  15. Hornung, G. 2012. A general data protection regulation for Europe? Light and shade in the commission’s draft of 25 January 2012. SCRIPTed 9(1): 64–81.CrossRefGoogle Scholar
  16. Kamiran, F., and T. Calders. 2009. Classification without discrimination. In IEEE international conference on computer, control & communication (IEEE-IC4), 17–19 February 2009, Karachi, Pakistan.Google Scholar
  17. Kamiran, F., and T. Calders. 2010. Exploiting independency constraints for classification.
  18. Kuner, Chr. (2012) The European Commission’s proposed data protection regulation: A Copernican Revolution in European Data Protection Law. Privacy and security law report, 6 February 2012.Google Scholar
  19. Lindell, Y., and B. Pinkas. 2002. Privacy preserving data mining. Journal of Cryptology 15(3): 177–206.CrossRefGoogle Scholar
  20. Mannila, H., D. Hand, and P. Smith. 2001. Principles of data mining. Cambridge, MA: MIT Press.Google Scholar
  21. Meeks, K. 2000. Driving while black. New York: Broadway Books.Google Scholar
  22. Ohm, P. 2010. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57: 1701.Google Scholar
  23. Pearl, D. 2009. Causality: Models, reasoning, and inference, 2nd ed. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  24. Pedreschi, D., R. Ruggieri, and F. Turini. 2008. Discrimination-aware data mining. In Proceedings of the 14th ACM SIGKDD conference on knowledge discovery and data mining. New York: ACM, pp. 560–568Google Scholar
  25. Robinson, N., H. Graux, M. Botterman, and L. Valeri. 2009. Review of the European data protection directive. Cambridge: RAND Europe.Google Scholar
  26. Schermer, B.W. 2007. Software agents, surveillance, and the right to privacy: A legislative framework for agent-enabled surveillance, PhD thesis, Leiden University.Google Scholar
  27. Schermer, B.W. 2011. The limits of privacy in automated profiling and data mining. Computer Law & Security Review 27(7): 45–52.CrossRefGoogle Scholar
  28. Solove, D. 2004. The digital person; technology and privacy in the information age. New York: New York University Press.Google Scholar
  29. van den Hoven, M.J. 1997. Privacy and the varieties of informational wrongdoing in an information age. Computers and Society 27(2): 33–37.CrossRefGoogle Scholar
  30. Vedder, A.H. 1999. KDD: The challenge to individualism. Ethics and Information Technology 1(4): 275–281.CrossRefGoogle Scholar
  31. Weitzner, D.J., H. Abelson, et al. 2006. Transparent accountable data mining: New strategies for privacy protection, MIT technical report. Cambridge: MIT.Google Scholar
  32. Westin, A. 1967. Privacy and freedom. London: Bodley Head.Google Scholar
  33. Withrow, B. 2006. Racial profiling. Upper Sadle River: Prentice Hall.Google Scholar
  34. Zarsky, T.Z. 2003, Mine your own business! Making the case for the implications of the data mining of personal information in the forum of public opinion. Yale Journal of Law and Technology 5: 157.Google Scholar
  35. Zarsky, T.Z. 2006. Chapter 12: Online privacy, tailoring, and persuasion. In Privacy and technologies of identity, a cross disciplinary conversation, ed. K. Strandburg and D. Stan Raicu, 209–224. New York: Springer.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.eLaw@Leiden, The Centre for Law in the Information SocietyLeiden UniversityLeidenThe Netherlands
  2. 2.WODC – Ministry of Security and JusticeHagueThe Netherlands

Personalised recommendations