Abstract
The need of improving the privacy on public datasets is becoming more and more important because the number of public available datasets is growing very fast. This forced the continuous research to find better protection methods that prevent the disclosure of the entities or individuals in a dataset while preserving the data utility.
In this paper we present a new approach for categorical data protection based on applying clustering to the dataset and then protecting each cluster. We show that this new approach allow us to have protections with better trade-off between data utility and individuals information disclosure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms. Springer (2008)
Bonchi, F., Ferrari, E.: Privacy-aware knowledge discovery. CRC Press (2011)
Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: The small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada, Ottawa (1993)
Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier (2001)
Domingo-Ferrer, J., Torra, V.: Distance-based and probabilistic record linkage for re-identification of records with categorical variables. In: Butlletí de l’ÀCIA, vol. 28, pp. 243–250. Associació Catalana d’Intelligència Artificial (2002)
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. In: IEEE Transactions on Knowledge and Data Engineering, vol. 14, pp. 189–201. IEEE Press, New York (2002)
Domingo-Ferrer, J., Gonzlez-Nicols, U.: Hybrid microdata using microaggregation. Information Sciences 180(15), 2834–2844 (2010)
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall (1988)
Kennard, R., Martin, L.: Computer Aided Design of Experiments. Technometrics 11(1), 137–148 (1969)
Kooiman, P., Willenborg, L., Gouweleeuw, J.: PRAM: A method for disclosure limitation of microdata. CBS research paper 9705 (1998)
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian Multidimensional K-Anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006). IEEE Computer Society, Washington, DC (2006)
Nin, J., Herranz, J., Torra, V.: Rethinking rank swapping to decrease disclosure risk. Data Knowledge and Engineering 64, 346–364 (2008)
Oganian, A., Domingo-Ferrer, J.: On the complexity of microaggregation. In: Second Joint UNECE-Eurostat Work Session on Statistical Data Confidentiality, Skopje (2001)
Samarati, P.: Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)
Torra, V., Domingo-Ferrer, J.: Disclosure control methods and information loss for microdata, pp. 91–110. Elsevier (2001)
Torra, V.: Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)
UCI machine learning repository, http://archive.ics.uci.edu/ml/
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer (2001)
Winkler, W.E.: Re-identification Methods for Masked Microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)
Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure Risk Assessment in Perturbative Microdata Protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marés, J., Torra, V. (2012). Clustering-Based Categorical Data Protection. In: Domingo-Ferrer, J., Tinnirello, I. (eds) Privacy in Statistical Databases. PSD 2012. Lecture Notes in Computer Science, vol 7556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33627-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-33627-0_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33626-3
Online ISBN: 978-3-642-33627-0
eBook Packages: Computer ScienceComputer Science (R0)