Discrimination- and privacy-aware patterns

Hajian, Sara; Domingo-Ferrer, Josep; Monreale, Anna; Pedreschi, Dino; Giannotti, Fosca

doi:10.1007/s10618-014-0393-7

Discrimination- and privacy-aware patterns

Published: 22 November 2014

Volume 29, pages 1733–1782, (2015)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Sara Hajian¹,
Josep Domingo-Ferrer¹,
Anna Monreale²,
Dino Pedreschi² &
…
Fosca Giannotti³

1652 Accesses
59 Citations
3 Altmetric
Explore all metrics

Abstract

Data mining is gaining societal momentum due to the ever increasing availability of large amounts of human data, easily collected by a variety of sensing technologies. We are therefore faced with unprecedented opportunities and risks: a deeper understanding of human behavior and how our society works is darkened by a greater chance of privacy intrusion and unfair discrimination based on the extracted patterns and profiles. Consider the case when a set of patterns extracted from the personal data of a population of individual persons is released for a subsequent use into a decision making process, such as, e.g., granting or denying credit. First, the set of patterns may reveal sensitive information about individual persons in the training population and, second, decision rules based on such patterns may lead to unfair discrimination, depending on what is represented in the training cases. Although methods independently addressing privacy or discrimination in data mining have been proposed in the literature, in this context we argue that privacy and discrimination risks should be tackled together, and we present a methodology for doing so while publishing frequent pattern mining results. We describe a set of pattern sanitization methods, one for each discrimination measure used in the legal literature, to achieve a fair publishing of frequent patterns in combination with two possible privacy transformations: one based on \(k\)-anonymity and one based on differential privacy. Our proposed pattern sanitization methods based on \(k\)-anonymity yield both privacy- and discrimination-protected patterns, while introducing reasonable (controlled) pattern distortion. Moreover, they obtain a better trade-off between protection and data quality than the sanitization methods based on differential privacy. Finally, the effectiveness of our proposals is assessed by extensive experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of Privacy Preserving Approaches in High Utility Pattern Mining

Data Protection Around the World: An Introduction

Responsibly Innovating Data Mining and Profiling Tools: A New Approach to Discrimination Sensitive and Privacy Sensitive Attributes

Notes

Article 20, General Data Protection Regulation, unofficial consolidated version provided by the Rapporteur, 22 October 2013. http://www.janalbrecht.eu/fileadmin/material/Dokumente/DPR-Regulation-inofficial-consolidated-LIBE.
Discrimination on the basis of an attribute value happens if a person with that attribute value is treated less favorably than a person with another value.
Discrimination occurs when a higher proportion of people not in the group is able to comply.
\(\alpha \) states an acceptable level of discrimination according to laws and regulations. For example, the U.S. Equal Pay Act United States Congress (1963) states that “a selection rate for any race, sex, or ethnic group which is less than four-fifths of the rate for the group with the highest rate will generally be regarded as evidence of adverse impact”. This amounts to using \(clift\) with \(\alpha =1.25\).

References

Aggarwal CC, Yu PS (2008) Privacy preserving data mining: models and algorithms. Springer, Berlin
Book Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases. VLDB pp 487–499
Agrawal R, Srikant R (2000) Privacy preserving data mining. In: SIGMOD 2000. ACM Press, New York, pp 439–450
Atzori M, Bonchi F, Giannotti F, Pedreschi D (2008) Anonymity preserving pattern discovery. VLDB J 17(4):703–727
Article Google Scholar
Australian Legislation (2014) (a) Victorian Current Acts - Equal Opportunity Act - 2010 (amended Sept. 17, 2014); (b) Queensland - Anti-Discrimination Act 1991 (current as at July 1, 2014)
Berendt B, Preibusch S (2014) Better decision support through exploratory discrimination-aware data mining: foundations and empirical evidence. Artif Intell Law 22(2):175–209
Article Google Scholar
Bhaskar R, Laxman S, Smith A, Thakurta A (2010) Discovering frequent patterns in sensitive data. In KDD 2010. ACM Press, New York, pp 503–512
Bonomi L (2013) Mining frequent patterns with differential privacy. PVLDB 6(12):1422–1427
Google Scholar
Calders T, Goethals B (2007) Non-derivable itemset mining. DMKD 14(1):171–206
MathSciNet Google Scholar
Calders T, Verwer S (2010) Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21(2):277–292
Article MathSciNet Google Scholar
Custers B, Calders T, Schermer B, Zarsky TZ (eds) Discrimination and privacy in the information society—data mining and profiling in large databases. Studies in Applied Philosophy, Epistemology and Rational Ethics 3. Springer, Berlin (2013)
Dalenius T (1974) The invasion of privacy problem and statistics production—an overview. Statistik Tidskrift 12:213–225
Google Scholar
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Article MathSciNet Google Scholar
Dwork C (2006) Differential privacy. In: ICALP 2006 LNCS 4052. Springer, Berlin, pp 112
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel RS (2012) Fairness through awareness. In: ITCS 2012. ACM Press, New York, pp 214–226
European Union Legislation (1995) Directive 95/46/EC
European Union Legislation (2014) (a) Racial Equality Directive, 2000/43/EC; (b) Employment Equality Directive, 2000/78/EC; (c) European Parliament legislative resolution on equal treatment between persons irrespective of religion or belief, disability, age or sexual orientation (A6-0149/2009)
Frank A, Asuncion A (2010) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine http://archive.ics.uci.edu/ml/datasets
Friedman A, Wolff R, Schuster A (2008) Providing \(k\)-anonymity in data mining. VLDB J 17(4):789–804
Article Google Scholar
Friedman A, Schuster A (2010) Data mining with differential privacy. In: KDD 2010. ACM, New York, pp 493–502
Fung BCM, Wang K, Fu AW-C, Yu PS (2010) Introduction to privacy-preserving data publishing: concepts and techniques. Chapman & Hall/CRC, Boca Raton
Book Google Scholar
Gehrke J, Hay M, Lui E, Pass R (2012) Crowd-blending privacy. In: CRYPTO pp 479–496
Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing. Wiley, New York
MATH Google Scholar
Hajian S, Domingo-Ferrer J, Martínez-Ballesté A (2011) Rule protection for indirect discrimination prevention in data mining. In: MDAI 2011 Lectuer Notes in Computer Science vol 6820. Springer, Berlin, pp 211–222
Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. IEEE Trans Knowl Data Eng 25(7):1445–1459
Article Google Scholar
Hajian S, Monreale A, Pedreschi D, Domingo-Ferrer J, Giannotti F (2012) Injecting discrimination and privacy awareness into pattern discovery. In: 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE Computer Society, pp 360–369
Hajian S, Domingo-Ferrer J (2012) A study on the impact of data anonymization on anti-discrimination. In: 2012 IEEE 12th International Conference on Data Mining Workshops. IEEE Computer Society, pp 352–359
Hajian S, Domingo-Ferrer J, Farràs O (2014) Generalization-based privacy preservation and discrimination prevention in data publishing and mining. Data Min Knowl Discov 28(5–6):1158–1188
Article MathSciNet Google Scholar
Hay M, Rastogi V, Miklau G, Suciu D (2010) Boosting the accuracy of differentially private histograms through consistency. Proc VLDB 3(1):1021–1032
Article Google Scholar
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Schulte-Nordholt E, Spicer K, de Wolf P-P (2012) Statistical disclosure control. Wiley, New York
Book Google Scholar
Kamiran F, Calders T (2011) Data preprocessing techniques for classification without discrimination. Knowl Inf Syst 33(1):1–33
Article Google Scholar
Kamiran F, Calders T, Pechenizkiy M (2010) Discrimination aware decision tree learning. In: Proceedings of IEEE International Conference on Data Mining, pp 869–874
Kamiran F, Karim A, Zhang X (2010) Decision theory for discrimination-aware classification. In: ICDM IEEE, pp 924–929
Kamiran F, Zliobaite I, Calders T (2013) Quantifying explainable discrimination and removing illegal discrimination in automated decision making. Knowl Inf Syst 35(3):613–644
Article Google Scholar
Kamishima T, Akaho S, Asoh H, Sakuma J (2012) Fairness-aware classifier with prejudice remover regularizer. In: ECML/PKDD. Lecture Notes in Computer Science vol 7524. Springer, Berlin pp 35–50
Kantarcioglu M, Jin J, Clifton C (2004) When do data mining results violate privacy? In: KDD. ACM Press, New York, pp 599–604
Lee J, Clifton C (2012) Differential identifiability. In: KDD 2012. ACM Press, New York, pp 1041–1049
Li N, Qardaji WH, Su D, Cao J (2012) PrivBasis: frequent itemset mining with differential privacy. Proc VLDB 5(11):1340–1351
Article Google Scholar
Li N, Li T, Venkatasubramanian S (2007) \(t\)-Closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: IEEE 23rd International Conference on Data Engineering (ICDE) pp 106–115
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM), pp 369–376
Loung BL, Ruggieri S, Turini F (2011) k-NN as an implementation of situation testing for discrimination discovery and prevention. In: ACM international conference on knowledge discovery and data mining (KDD 2011). ACM Press, New York, pp 502–510
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) \(l\)-Diversity: privacy beyond \(k\)-anonymity. ACM Trans Knowl Discov Data (TKDD) 1(1), Article 3
McSherry F, Talwar K (2007) Mechanism design via differential privacy. In: Proceedings of the 48th IEEE Symposium on Foundations of Computer Science (FOCS), pp 94–103
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th International Conference on Database Theory
Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD). ACM Press, New York, pp 560–568
Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM International Conference on Data Mining (SDM). SIAM, pp 581–592
Pedreschi D, Ruggieri S, Turini F (2009) Integrating induction and deduction for finding evidence of discrimination. In: 12th ACM International Conference on Artificial Intelligence and Law (ICAIL). ACM Press, New York, pp 157–166
Pedreschi D, Ruggieri S, Turini F (2013) The discovery of discrimination. In: Custers BHM, Calders T, Schermer BW, Zarsky TZ (eds) Discrimination and privacy in the information society, volume 3 of studies in applied philosophy. Epistemology and Rational Ethics. Springer, Berlin, p 4357
Google Scholar
Ruggieri S, Pedreschi D, Turini F (2010) Data mining for discrimination discovery. ACM Trans Knowl Discov Data (TKDD) 4(2), Article 9
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Article Google Scholar
Soria-Comas J, Domingo-Ferrer J (2012) Sensitivity-independent differential privacy via prior knowledge refinement. Int J Uncertain Fuzziness Knowl Based Syst 20(6):855–876
Article MathSciNet Google Scholar
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Article MathSciNet MATH Google Scholar
United States Congress, US Equal Pay Act (1963) http://archive.eeoc.gov/epa/anniversary/epa-40.html
Zemel RS, Wu Y, Swersky K, Pitassi T, Dwork C (2013) Learning fair representations. ICML 3:325–333
Google Scholar
Zeng C, Naughton JF, Cai J-Y (2012) On differentially private frequent itemset mining. PVLDB 6(1):25–36
Google Scholar
Zliobaite I, Kamiran F, Calders T (2011) Handling conditional discrimination. In: Proceedings of 13th IEEE International Conference on Data Mining (ICDM) pp 992–1001

Download references

Acknowledgments

The following funding sources are gratefully acknowledged: Government of Catalonia (ICREA Acadèmia Prize to the second author and Grant 2014 SGR 537), Spanish Government (Project TIN2011-27076-C03-01 “CO-PRIVACY”), European Commission (Projects FP7 “DwB”, FP7-SMARTCITIES n. 609042 “PETRA”, FP7 “Inter-Trust” and H2020 “CLARUS”) and Templeton World Charity Foundation (Grant TWCF0095/AB60 “CO-UTILITY”). The authors are with the UNESCO Chair in Data Privacy. The views in this paper are the authors’ own and do not necessarily reflect the views of UNESCO or the Templeton World Charity Foundation.

Author information

Authors and Affiliations

Department of Computer Engineering and Maths, UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Av. Països Catalans 26, 43007, Tarragona, Catalonia
Sara Hajian & Josep Domingo-Ferrer
Dipartimento di Informatica, Università di Pisa, Largo Pontecorvo, 3, 56127, Pisa, Italy
Anna Monreale & Dino Pedreschi
ISTI-CNR, Pisa, Italy
Fosca Giannotti

Authors

Sara Hajian
View author publications
You can also search for this author in PubMed Google Scholar
Josep Domingo-Ferrer
View author publications
You can also search for this author in PubMed Google Scholar
Anna Monreale
View author publications
You can also search for this author in PubMed Google Scholar
Dino Pedreschi
View author publications
You can also search for this author in PubMed Google Scholar
Fosca Giannotti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josep Domingo-Ferrer.

Additional information

Responsible editor: Bart Goethals.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hajian, S., Domingo-Ferrer, J., Monreale, A. et al. Discrimination- and privacy-aware patterns. Data Min Knowl Disc 29, 1733–1782 (2015). https://doi.org/10.1007/s10618-014-0393-7

Download citation

Received: 01 February 2013
Accepted: 09 November 2014
Published: 22 November 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10618-014-0393-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discrimination- and privacy-aware patterns

Abstract

Access this article

Similar content being viewed by others

Analysis of Privacy Preserving Approaches in High Utility Pattern Mining

Data Protection Around the World: An Introduction

Responsibly Innovating Data Mining and Profiling Tools: A New Approach to Discrimination Sensitive and Privacy Sensitive Attributes

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discrimination- and privacy-aware patterns

Abstract

Access this article

Similar content being viewed by others

Analysis of Privacy Preserving Approaches in High Utility Pattern Mining

Data Protection Around the World: An Introduction

Responsibly Innovating Data Mining and Profiling Tools: A New Approach to Discrimination Sensitive and Privacy Sensitive Attributes

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation