Skip to main content

Clustering-Based Categorical Data Protection

  • Conference paper
Book cover Privacy in Statistical Databases (PSD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7556))

Included in the following conference series:

Abstract

The need of improving the privacy on public datasets is becoming more and more important because the number of public available datasets is growing very fast. This forced the continuous research to find better protection methods that prevent the disclosure of the entities or individuals in a dataset while preserving the data utility.

In this paper we present a new approach for categorical data protection based on applying clustering to the dataset and then protecting each cluster. We show that this new approach allow us to have protections with better trade-off between data utility and individuals information disclosure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Yu, P.S.: Privacy-Preserving Data Mining: Models and Algorithms. Springer (2008)

    Google Scholar 

  2. Bonchi, F., Ferrari, E.: Privacy-aware knowledge discovery. CRC Press (2011)

    Google Scholar 

  3. Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: The small aggregates method. In: Proceedings of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada, Ottawa (1993)

    Google Scholar 

  4. Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–133. Elsevier (2001)

    Google Scholar 

  5. Domingo-Ferrer, J., Torra, V.: Distance-based and probabilistic record linkage for re-identification of records with categorical variables. In: Butlletí de l’ÀCIA, vol. 28, pp. 243–250. Associació Catalana d’Intelligència Artificial (2002)

    Google Scholar 

  6. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. In: IEEE Transactions on Knowledge and Data Engineering, vol. 14, pp. 189–201. IEEE Press, New York (2002)

    Google Scholar 

  7. Domingo-Ferrer, J., Gonzlez-Nicols, U.: Hybrid microdata using microaggregation. Information Sciences 180(15), 2834–2844 (2010)

    Google Scholar 

  8. Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice Hall (1988)

    Google Scholar 

  9. Kennard, R., Martin, L.: Computer Aided Design of Experiments. Technometrics 11(1), 137–148 (1969)

    Google Scholar 

  10. Kooiman, P., Willenborg, L., Gouweleeuw, J.: PRAM: A method for disclosure limitation of microdata. CBS research paper 9705 (1998)

    Google Scholar 

  11. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian Multidimensional K-Anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006). IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  12. Nin, J., Herranz, J., Torra, V.: Rethinking rank swapping to decrease disclosure risk. Data Knowledge and Engineering 64, 346–364 (2008)

    Google Scholar 

  13. Oganian, A., Domingo-Ferrer, J.: On the complexity of microaggregation. In: Second Joint UNECE-Eurostat Work Session on Statistical Data Confidentiality, Skopje (2001)

    Google Scholar 

  14. Samarati, P.: Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)

    Google Scholar 

  15. Torra, V., Domingo-Ferrer, J.: Disclosure control methods and information loss for microdata, pp. 91–110. Elsevier (2001)

    Google Scholar 

  16. Torra, V.: Microaggregation for Categorical Variables: A Median Based Approach. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 162–174. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  17. UCI machine learning repository, http://archive.ics.uci.edu/ml/

  18. Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Lecture Notes in Statistics. Springer (2001)

    Google Scholar 

  19. Winkler, W.E.: Re-identification Methods for Masked Microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Yancey, W.E., Winkler, W.E., Creecy, R.H.: Disclosure Risk Assessment in Perturbative Microdata Protection. In: Domingo-Ferrer, J. (ed.) Inference Control in Statistical Databases. LNCS, vol. 2316, pp. 135–152. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marés, J., Torra, V. (2012). Clustering-Based Categorical Data Protection. In: Domingo-Ferrer, J., Tinnirello, I. (eds) Privacy in Statistical Databases. PSD 2012. Lecture Notes in Computer Science, vol 7556. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33627-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33627-0_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33626-3

  • Online ISBN: 978-3-642-33627-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics