Data Mining and Knowledge Discovery

, Volume 28, Issue 5, pp 1158–1188

Generalization-based privacy preservation and discrimination prevention in data publishing and mining

Article

DOI: 10.1007/s10618-014-0346-1

Cite this article as:
Hajian, S., Domingo-Ferrer, J. & Farràs, O. Data Min Knowl Disc (2014) 28: 1158. doi:10.1007/s10618-014-0346-1

Abstract

Living in the information society facilitates the automatic collection of huge amounts of data on individuals, organizations, etc. Publishing such data for secondary analysis (e.g. learning models and finding patterns) may be extremely useful to policy makers, planners, marketing analysts, researchers and others. Yet, data publishing and mining do not come without dangers, namely privacy invasion and also potential discrimination of the individuals whose data are published. Discrimination may ensue from training data mining models (e.g. classifiers) on data which are biased against certain protected groups (ethnicity, gender, political preferences, etc.). The objective of this paper is to describe how to obtain data sets for publication that are: (i) privacy-preserving; (ii) unbiased regarding discrimination; and (iii) as useful as possible for learning models and finding patterns. We present the first generalization-based approach to simultaneously offer privacy preservation and discrimination prevention. We formally define the problem, give an optimal algorithm to tackle it and evaluate the algorithm in terms of both general and specific data analysis metrics (i.e. various types of classifiers and rule induction algorithms). It turns out that the impact of our transformation on the quality of data is the same or only slightly higher than the impact of achieving just privacy preservation. In addition, we show how to extend our approach to different privacy models and anti-discrimination legal concepts.

Keywords

Data mining Anti-discrimination Privacy Generalization 

Copyright information

© The Author(s) 2014

Authors and Affiliations

  • Sara Hajian
    • 1
  • Josep Domingo-Ferrer
    • 1
  • Oriol Farràs
    • 1
  1. 1.Department of Computer Engineering and Maths, UNESCO Chair in Data PrivacyUniversitat Rovira i VirgiliTarragonaCatalonia

Personalised recommendations