Skip to main content
Log in

Discrimination- and privacy-aware patterns

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Data mining is gaining societal momentum due to the ever increasing availability of large amounts of human data, easily collected by a variety of sensing technologies. We are therefore faced with unprecedented opportunities and risks: a deeper understanding of human behavior and how our society works is darkened by a greater chance of privacy intrusion and unfair discrimination based on the extracted patterns and profiles. Consider the case when a set of patterns extracted from the personal data of a population of individual persons is released for a subsequent use into a decision making process, such as, e.g., granting or denying credit. First, the set of patterns may reveal sensitive information about individual persons in the training population and, second, decision rules based on such patterns may lead to unfair discrimination, depending on what is represented in the training cases. Although methods independently addressing privacy or discrimination in data mining have been proposed in the literature, in this context we argue that privacy and discrimination risks should be tackled together, and we present a methodology for doing so while publishing frequent pattern mining results. We describe a set of pattern sanitization methods, one for each discrimination measure used in the legal literature, to achieve a fair publishing of frequent patterns in combination with two possible privacy transformations: one based on \(k\)-anonymity and one based on differential privacy. Our proposed pattern sanitization methods based on \(k\)-anonymity yield both privacy- and discrimination-protected patterns, while introducing reasonable (controlled) pattern distortion. Moreover, they obtain a better trade-off between protection and data quality than the sanitization methods based on differential privacy. Finally, the effectiveness of our proposals is assessed by extensive experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Article 20, General Data Protection Regulation, unofficial consolidated version provided by the Rapporteur, 22 October 2013. http://www.janalbrecht.eu/fileadmin/material/Dokumente/DPR-Regulation-inofficial-consolidated-LIBE.

  2. Discrimination on the basis of an attribute value happens if a person with that attribute value is treated less favorably than a person with another value.

  3. Discrimination occurs when a higher proportion of people not in the group is able to comply.

  4. \(\alpha \) states an acceptable level of discrimination according to laws and regulations. For example, the U.S. Equal Pay Act United States Congress (1963) states that “a selection rate for any race, sex, or ethnic group which is less than four-fifths of the rate for the group with the highest rate will generally be regarded as evidence of adverse impact”. This amounts to using \(clift\) with \(\alpha =1.25\).

References

  • Aggarwal CC, Yu PS (2008) Privacy preserving data mining: models and algorithms. Springer, Berlin

    Book  Google Scholar 

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases. VLDB pp 487–499

  • Agrawal R, Srikant R (2000) Privacy preserving data mining. In: SIGMOD 2000. ACM Press, New York, pp 439–450

  • Atzori M, Bonchi F, Giannotti F, Pedreschi D (2008) Anonymity preserving pattern discovery. VLDB J 17(4):703–727

    Article  Google Scholar 

  • Australian Legislation (2014) (a) Victorian Current Acts - Equal Opportunity Act - 2010 (amended Sept. 17, 2014); (b) Queensland - Anti-Discrimination Act 1991 (current as at July 1, 2014)

  • Berendt B, Preibusch S (2014) Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artif Intell Law 22(2):175–209

    Article  Google Scholar 

  • Bhaskar R, Laxman S, Smith A, Thakurta A (2010) Discovering frequent patterns in sensitive data. In KDD 2010. ACM Press, New York, pp 503–512

  • Bonomi L (2013) Mining frequent patterns with differential privacy. PVLDB 6(12):1422–1427

    Google Scholar 

  • Calders T, Goethals B (2007) Non-derivable itemset mining. DMKD 14(1):171–206

    MathSciNet  Google Scholar 

  • Calders T, Verwer S (2010) Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21(2):277–292

    Article  MathSciNet  Google Scholar 

  • Custers B, Calders T, Schermer B, Zarsky TZ (eds) Discrimination and privacy in the information society—data mining and profiling in large databases. Studies in Applied Philosophy, Epistemology and Rational Ethics 3. Springer, Berlin (2013)

  • Dalenius T (1974) The invasion of privacy problem and statistics production—an overview. Statistik Tidskrift 12:213–225

    Google Scholar 

  • Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212

    Article  MathSciNet  Google Scholar 

  • Dwork C (2006) Differential privacy. In: ICALP 2006 LNCS 4052. Springer, Berlin, pp 112

  • Dwork C, Hardt M, Pitassi T, Reingold O, Zemel RS (2012) Fairness through awareness. In: ITCS 2012. ACM Press, New York, pp 214–226

  • European Union Legislation (1995) Directive 95/46/EC

  • European Union Legislation (2014) (a) Racial Equality Directive, 2000/43/EC; (b) Employment Equality Directive, 2000/78/EC; (c) European Parliament legislative resolution on equal treatment between persons irrespective of religion or belief, disability, age or sexual orientation (A6-0149/2009)

  • Frank A, Asuncion A (2010) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine http://archive.ics.uci.edu/ml/datasets

  • Friedman A, Wolff R, Schuster A (2008) Providing \(k\)-anonymity in data mining. VLDB J 17(4):789–804

    Article  Google Scholar 

  • Friedman A, Schuster A (2010) Data mining with differential privacy. In: KDD 2010. ACM, New York, pp 493–502

  • Fung BCM, Wang K, Fu AW-C, Yu PS (2010) Introduction to privacy-preserving data publishing: concepts and techniques. Chapman & Hall/CRC, Boca Raton

    Book  Google Scholar 

  • Gehrke J, Hay M, Lui E, Pass R (2012) Crowd-blending privacy. In: CRYPTO pp 479–496

  • Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing. Wiley, New York

    MATH  Google Scholar 

  • Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Rule protection for indirect discrimination prevention in data mining. In: MDAI 2011 Lectuer Notes in Computer Science vol 6820. Springer, Berlin, pp 211–222

  • Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459

    Article  Google Scholar 

  • Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE Computer Society, pp 360–369

  • Hajian S, Domingo-Ferrer J (2012) A study on the impact of data anonymization on anti-discrimination. In: 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE Computer Society, pp 352–359

  • Hajian S, Domingo-Ferrer J, Farràs O (2014) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Min Knowl Discov 28(5–6):1158–1188

    Article  MathSciNet  Google Scholar 

  • Hay M, Rastogi V, Miklau G, Suciu D (2010) Boosting the accuracy of differentially private histograms through consistency. Proc VLDB 3(1):1021–1032

    Article  Google Scholar 

  • Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Schulte-Nordholt E, Spicer K, de Wolf P-P (2012) Statistical disclosure control. Wiley, New York

    Book  Google Scholar 

  • Kamiran F, Calders T (2011) Data preprocessing techniques for classification without discrimination. Knowl Inf Syst 33(1):1–33

    Article  Google Scholar 

  • Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of IEEE International Conference on Data Mining, pp 869–874

  • Kamiran F, Karim A, Zhang X (2010) Decision theory for discrimination-aware classification. In: ICDM IEEE, pp 924–929

  • Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644

    Article  Google Scholar 

  • Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: ECML/PKDD. Lecture Notes in Computer Science vol 7524. Springer, Berlin pp 35–50

  • Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: KDD. ACM Press, New York, pp 599–604

  • Lee J, Clifton C (2012) Differential identifiability. In: KDD 2012. ACM Press, New York, pp 1041–1049

  • Li N, Qardaji WH, Su D, Cao J (2012) PrivBasis: frequent itemset mining with differential privacy. Proc VLDB 5(11):1340–1351

    Article  Google Scholar 

  • Li N, Li T, Venkatasubramanian S (2007) \(t\)-Closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE) pp 106–115

  • Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM), pp 369–376

  • Loung BL, Ruggieri S, Turini F (2011) k-NN as an implementation of situation testing for discrimination discovery and prevention. In: ACM international conference on knowledge discovery and data mining (KDD 2011). ACM Press, New York, pp 502–510

  • Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) \(l\)-Diversity: privacy beyond \(k\)-anonymity. ACM Trans Knowl Discov Data (TKDD) 1(1), Article 3

  • McSherry F, Talwar K (2007) Mechanism design via differential privacy. In: Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS), pp 94–103

  • Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th International Conference on Database Theory

  • Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, pp 560–568

  • Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM International Conference on Data Mining (SDM). SIAM, pp 581–592

  • Pedreschi D, Ruggieri S, Turini F (2009) Integrating induction and deduction for finding evidence of discrimination. In: 12th ACM International Conference on Artificial Intelligence and Law (ICAIL). ACM Press, New York, pp 157–166

  • Pedreschi D, Ruggieri S, Turini F (2013) The discovery of discrimination. In: Custers BHM, Calders T, Schermer BW, Zarsky TZ (eds) Discrimination and privacy in the information society, volume 3 of studies in applied philosophy. Epistemology and Rational Ethics. Springer, Berlin, p 4357

    Google Scholar 

  • Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. ACM Trans Knowl Discov Data (TKDD) 4(2), Article 9

  • Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

    Article  Google Scholar 

  • Soria-Comas J, Domingo-Ferrer J (2012) Sensitivity-independent differential privacy via prior knowledge refinement. Int J Uncertain Fuzziness Knowl Based Syst 20(6):855–876

    Article  MathSciNet  Google Scholar 

  • Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570

    Article  MathSciNet  MATH  Google Scholar 

  • United States Congress, US Equal Pay Act (1963) http://archive.eeoc.gov/epa/anniversary/epa-40.html

  • Zemel RS, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. ICML 3:325–333

    Google Scholar 

  • Zeng C, Naughton JF, Cai J-Y (2012) On differentially private frequent itemset mining. PVLDB 6(1):25–36

    Google Scholar 

  • Zliobaite I, Kamiran F, Calders T (2011) Handling conditional discrimination. In: Proceedings of 13th IEEE International Conference on Data Mining (ICDM) pp 992–1001

Download references

Acknowledgments

The following funding sources are gratefully acknowledged: Government of Catalonia (ICREA Acadèmia Prize to the second author and Grant 2014 SGR 537), Spanish Government (Project TIN2011-27076-C03-01 “CO-PRIVACY”), European Commission (Projects FP7 “DwB”, FP7-SMARTCITIES n. 609042 “PETRA”, FP7 “Inter-Trust” and H2020 “CLARUS”) and Templeton World Charity Foundation (Grant TWCF0095/AB60 “CO-UTILITY”). The authors are with the UNESCO Chair in Data Privacy. The views in this paper are the authors’ own and do not necessarily reflect the views of UNESCO or the Templeton World Charity Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josep Domingo-Ferrer.

Additional information

Responsible editor: Bart Goethals.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hajian, S., Domingo-Ferrer, J., Monreale, A. et al. Discrimination- and privacy-aware patterns. Data Min Knowl Disc 29, 1733–1782 (2015). https://doi.org/10.1007/s10618-014-0393-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-014-0393-7

Keywords

Navigation