Data Mining and Knowledge Discovery

, Volume 29, Issue 6, pp 1733–1782

Discrimination- and privacy-aware patterns

  • Sara Hajian
  • Josep Domingo-Ferrer
  • Anna Monreale
  • Dino Pedreschi
  • Fosca Giannotti
Article

Abstract

Data mining is gaining societal momentum due to the ever increasing availability of large amounts of human data, easily collected by a variety of sensing technologies. We are therefore faced with unprecedented opportunities and risks: a deeper understanding of human behavior and how our society works is darkened by a greater chance of privacy intrusion and unfair discrimination based on the extracted patterns and profiles. Consider the case when a set of patterns extracted from the personal data of a population of individual persons is released for a subsequent use into a decision making process, such as, e.g., granting or denying credit. First, the set of patterns may reveal sensitive information about individual persons in the training population and, second, decision rules based on such patterns may lead to unfair discrimination, depending on what is represented in the training cases. Although methods independently addressing privacy or discrimination in data mining have been proposed in the literature, in this context we argue that privacy and discrimination risks should be tackled together, and we present a methodology for doing so while publishing frequent pattern mining results. We describe a set of pattern sanitization methods, one for each discrimination measure used in the legal literature, to achieve a fair publishing of frequent patterns in combination with two possible privacy transformations: one based on \(k\)-anonymity and one based on differential privacy. Our proposed pattern sanitization methods based on \(k\)-anonymity yield both privacy- and discrimination-protected patterns, while introducing reasonable (controlled) pattern distortion. Moreover, they obtain a better trade-off between protection and data quality than the sanitization methods based on differential privacy. Finally, the effectiveness of our proposals is assessed by extensive experiments.

Keywords

Frequent patterns Anti-discrimination Privacy  Data mining 

References

  1. Aggarwal CC, Yu PS (2008) Privacy preserving data mining: models and algorithms. Springer, BerlinCrossRefGoogle Scholar
  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases. VLDB pp 487–499Google Scholar
  3. Agrawal R, Srikant R (2000) Privacy preserving data mining. In: SIGMOD 2000. ACM Press, New York, pp 439–450Google Scholar
  4. Atzori M, Bonchi F, Giannotti F, Pedreschi D (2008) Anonymity preserving pattern discovery. VLDB J 17(4):703–727CrossRefGoogle Scholar
  5. Australian Legislation (2014) (a) Victorian Current Acts - Equal Opportunity Act - 2010 (amended Sept. 17, 2014); (b) Queensland - Anti-Discrimination Act 1991 (current as at July 1, 2014)Google Scholar
  6. Berendt B, Preibusch S (2014) Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artif Intell Law 22(2):175–209CrossRefGoogle Scholar
  7. Bhaskar R, Laxman S, Smith A, Thakurta A (2010) Discovering frequent patterns in sensitive data. In KDD 2010. ACM Press, New York, pp 503–512Google Scholar
  8. Bonomi L (2013) Mining frequent patterns with differential privacy. PVLDB 6(12):1422–1427Google Scholar
  9. Calders T, Goethals B (2007) Non-derivable itemset mining. DMKD 14(1):171–206MathSciNetGoogle Scholar
  10. Calders T, Verwer S (2010) Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21(2):277–292MathSciNetCrossRefGoogle Scholar
  11. Custers B, Calders T, Schermer B, Zarsky TZ (eds) Discrimination and privacy in the information society—data mining and profiling in large databases. Studies in Applied Philosophy, Epistemology and Rational Ethics 3. Springer, Berlin (2013)Google Scholar
  12. Dalenius T (1974) The invasion of privacy problem and statistics production—an overview. Statistik Tidskrift 12:213–225Google Scholar
  13. Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRefGoogle Scholar
  14. Dwork C (2006) Differential privacy. In: ICALP 2006 LNCS 4052. Springer, Berlin, pp 112Google Scholar
  15. Dwork C, Hardt M, Pitassi T, Reingold O, Zemel RS (2012) Fairness through awareness. In: ITCS 2012. ACM Press, New York, pp 214–226Google Scholar
  16. European Union Legislation (1995) Directive 95/46/ECGoogle Scholar
  17. European Union Legislation (2014) (a) Racial Equality Directive, 2000/43/EC; (b) Employment Equality Directive, 2000/78/EC; (c) European Parliament legislative resolution on equal treatment between persons irrespective of religion or belief, disability, age or sexual orientation (A6-0149/2009)Google Scholar
  18. Frank A, Asuncion A (2010) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine http://archive.ics.uci.edu/ml/datasets
  19. Friedman A, Wolff R, Schuster A (2008) Providing \(k\)-anonymity in data mining. VLDB J 17(4):789–804CrossRefGoogle Scholar
  20. Friedman A, Schuster A (2010) Data mining with differential privacy. In: KDD 2010. ACM, New York, pp 493–502Google Scholar
  21. Fung BCM, Wang K, Fu AW-C, Yu PS (2010) Introduction to privacy-preserving data publishing: concepts and techniques. Chapman & Hall/CRC, Boca RatonCrossRefGoogle Scholar
  22. Gehrke J, Hay M, Lui E, Pass R (2012) Crowd-blending privacy. In: CRYPTO pp 479–496Google Scholar
  23. Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing. Wiley, New YorkMATHGoogle Scholar
  24. Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Rule protection for indirect discrimination prevention in data mining. In: MDAI 2011 Lectuer Notes in Computer Science vol 6820. Springer, Berlin, pp 211–222Google Scholar
  25. Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459CrossRefGoogle Scholar
  26. Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE Computer Society, pp 360–369Google Scholar
  27. Hajian S, Domingo-Ferrer J (2012) A study on the impact of data anonymization on anti-discrimination. In: 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE Computer Society, pp 352–359Google Scholar
  28. Hajian S, Domingo-Ferrer J, Farràs O (2014) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Min Knowl Discov 28(5–6):1158–1188MathSciNetCrossRefGoogle Scholar
  29. Hay M, Rastogi V, Miklau G, Suciu D (2010) Boosting the accuracy of differentially private histograms through consistency. Proc VLDB 3(1):1021–1032CrossRefGoogle Scholar
  30. Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Schulte-Nordholt E, Spicer K, de Wolf P-P (2012) Statistical disclosure control. Wiley, New YorkCrossRefGoogle Scholar
  31. Kamiran F, Calders T (2011) Data preprocessing techniques for classification without discrimination. Knowl Inf Syst 33(1):1–33CrossRefGoogle Scholar
  32. Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of IEEE International Conference on Data Mining, pp 869–874Google Scholar
  33. Kamiran F, Karim A, Zhang X (2010) Decision theory for discrimination-aware classification. In: ICDM IEEE, pp 924–929Google Scholar
  34. Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644CrossRefGoogle Scholar
  35. Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: ECML/PKDD. Lecture Notes in Computer Science vol 7524. Springer, Berlin pp 35–50Google Scholar
  36. Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: KDD. ACM Press, New York, pp 599–604Google Scholar
  37. Lee J, Clifton C (2012) Differential identifiability. In: KDD 2012. ACM Press, New York, pp 1041–1049Google Scholar
  38. Li N, Qardaji WH, Su D, Cao J (2012) PrivBasis: frequent itemset mining with differential privacy. Proc VLDB 5(11):1340–1351CrossRefGoogle Scholar
  39. Li N, Li T, Venkatasubramanian S (2007) \(t\)-Closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE) pp 106–115Google Scholar
  40. Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM), pp 369–376Google Scholar
  41. Loung BL, Ruggieri S, Turini F (2011) k-NN as an implementation of situation testing for discrimination discovery and prevention. In: ACM international conference on knowledge discovery and data mining (KDD 2011). ACM Press, New York, pp 502–510Google Scholar
  42. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) \(l\)-Diversity: privacy beyond \(k\)-anonymity. ACM Trans Knowl Discov Data (TKDD) 1(1), Article 3Google Scholar
  43. McSherry F, Talwar K (2007) Mechanism design via differential privacy. In: Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS), pp 94–103Google Scholar
  44. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th International Conference on Database TheoryGoogle Scholar
  45. Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, pp 560–568Google Scholar
  46. Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM International Conference on Data Mining (SDM). SIAM, pp 581–592Google Scholar
  47. Pedreschi D, Ruggieri S, Turini F (2009) Integrating induction and deduction for finding evidence of discrimination. In: 12th ACM International Conference on Artificial Intelligence and Law (ICAIL). ACM Press, New York, pp 157–166Google Scholar
  48. Pedreschi D, Ruggieri S, Turini F (2013) The discovery of discrimination. In: Custers BHM, Calders T, Schermer BW, Zarsky TZ (eds) Discrimination and privacy in the information society, volume 3 of studies in applied philosophy. Epistemology and Rational Ethics. Springer, Berlin, p 4357Google Scholar
  49. Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. ACM Trans Knowl Discov Data (TKDD) 4(2), Article 9Google Scholar
  50. Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRefGoogle Scholar
  51. Soria-Comas J, Domingo-Ferrer J (2012) Sensitivity-independent differential privacy via prior knowledge refinement. Int J Uncertain Fuzziness Knowl Based Syst 20(6):855–876MathSciNetCrossRefGoogle Scholar
  52. Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570MathSciNetCrossRefMATHGoogle Scholar
  53. United States Congress, US Equal Pay Act (1963) http://archive.eeoc.gov/epa/anniversary/epa-40.html
  54. Zemel RS, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. ICML 3:325–333Google Scholar
  55. Zeng C, Naughton JF, Cai J-Y (2012) On differentially private frequent itemset mining. PVLDB 6(1):25–36Google Scholar
  56. Zliobaite I, Kamiran F, Calders T (2011) Handling conditional discrimination. In: Proceedings of 13th IEEE International Conference on Data Mining (ICDM) pp 992–1001Google Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  • Sara Hajian
    • 1
  • Josep Domingo-Ferrer
    • 1
  • Anna Monreale
    • 2
  • Dino Pedreschi
    • 2
  • Fosca Giannotti
    • 3
  1. 1.Department of Computer Engineering and Maths, UNESCO Chair in Data PrivacyUniversitat Rovira i VirgiliTarragonaCatalonia
  2. 2.Dipartimento di InformaticaUniversità di PisaPisaItaly
  3. 3.ISTI-CNRPisaItaly

Personalised recommendations