Skip to main content

An Overview of the Use of Clustering for Data Privacy

Abstract

In this chapter we review some of our results related to the use of clustering in the area of data privacy. The paper gives a brief overview of data privacy and, more specifically, on data driven methods for data privacy and discusses where clustering can be applied in this setting. We discuss the role of clustering in the definition of masking methods, and on the calculation of information loss and data utility.

Keywords

  • Data privacy
  • Clustering
  • Fuzzy clustering
  • Information loss
  • Microaggregation

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-24211-8_10
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-24211-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Hardcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1
Fig. 2

References

  1. Abril, D., Navarro-Arribas, G., Torra, V.: Towards semantic microaggregation of categorical data for confidential documents. Modeling Decisions for Artificial Intelligence. Lecture Notes in Computer Science, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)

    Google Scholar 

  2. Abril, D., Navarro-Arribas, G., Torra, V.: Spherical microaggregation: Anonymizing sparse vector spaces. Comput. Secur. 49, 28–44 (2015)

    CrossRef  Google Scholar 

  3. Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Semantic anonymisation of set-valued data. In: Proceedings of the 6th International Conference on Agents and Artificial Intelligence (ICAART) vol. 1, pp. 102–112 (2014)

    Google Scholar 

  4. Batet, M., Erola, A., Sánchez D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)

    CrossRef  Google Scholar 

  5. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)

    CrossRef  MATH  Google Scholar 

  6. Byun, J.-W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: Secure Data Management. Lecture Notes in Computer Science, pp. 48–63. Springer, Heidelberg (2006)

    Google Scholar 

  7. Campello, R.J.G.B.: A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recogn. Lett. 28(7), 833–841 (2007)

    CrossRef  Google Scholar 

  8. Cao, J., Carminati, B., Ferrari, E., Tan, K.-L.: CASTLE: continuously anonymizing data streams. IEEE Trans. Dependable Secure Comput. 8, 337–352 (2011)

    CrossRef  Google Scholar 

  9. De Capitani di Vimercati, S., Foresti, S., Livraga, G., Samarati, P.: Data privacy: definitions and techniques. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 20(6), 793–817 (2012)

    CrossRef  Google Scholar 

  10. Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceeding of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada (1993)

    Google Scholar 

  11. DMOZ: The Open Directory Project. www.dmoz.org (2015)

  12. Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180, 2834–2844 (2010)

    CrossRef  Google Scholar 

  13. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)

    CrossRef  Google Scholar 

  14. Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS’2001 (Eurostat, ISBN 92-894-1176-5), vol. 2, pp. 807–826. Creta, Greece (2001)

    Google Scholar 

  15. Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier (2001)

    Google Scholar 

  16. Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–134. North-Holland, Amsterdam, The Netherlands (2001)

    Google Scholar 

  17. Domingo-Ferrer, J., Torra, V.: Towards fuzzy c-means based microaggregation. In: Grzegorzewski, P., Hryniewicz, O., Gil, M.A. (eds.) Soft Methods in Probability and Statistics, pp. 289–294. Physica, Heidelberg (2002)

    Google Scholar 

  18. Domingo-Ferrer, J., Torra, V.: Fuzzy microaggregation for microdata protection. J. Adv. Comput. Intell. Intell. Inform. 7(2), 153–159 (2003)

    MATH  Google Scholar 

  19. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)

    MathSciNet  CrossRef  Google Scholar 

  20. Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs using the open directory project. SORT Stat. Oper. Res. 35, Trans. 41–58 (2011)

    Google Scholar 

  21. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT, Cambridge 1998

    MATH  Google Scholar 

  22. Feder, T., Nabar, S.U., Terzi, E.: Anonymizing graphs. CoRR abs/0810.5578 (2008)

    Google Scholar 

  23. Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference Very Large Data Bases, pp. 758–769 (2007)

    Google Scholar 

  24. Hansen, S.L., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)

    CrossRef  Google Scholar 

  25. Hay, M., Miklau, G., Jensen, D.: Anonymizing social networks. In: Proceedings of the VLDB Endowment (2008)

    Google Scholar 

  26. Hore, B., Jammalamadaka, R.C., Mehrotra, S.: Flexible anonymization for privacy preserving data publishing: a systematic search based approach. In: Proceedings of the 7th SIAM International Conference on Data Mining (2007)

    Google Scholar 

  27. Hüllermeier, E., Rifqi, M.: A fuzzy variant of the rand index for comparing clustering structures. In: Proceedings of IFSA-EUSFLAT (2009)

    Google Scholar 

  28. Ladra, S., Torra, V.: On the comparison of generic information loss measures and cluster-specific ones. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 16(1) 107–120 (2008)

    CrossRef  Google Scholar 

  29. Laszlo, M., Mukherjee, S.: Optimal univariate microaggregation with data suppression. J. Syst. Softw. 86, 677–682 (2013)

    CrossRef  Google Scholar 

  30. Laszlo, M., Mukherjee, S.: Iterated local search for microaggregation. J. Syst. Softw. 100, 15–26 (2015)

    CrossRef  Google Scholar 

  31. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of International Conference on Data Engineering (2006)

    CrossRef  Google Scholar 

  32. Li, N., Li, T., Venkatasubramanian, S.: T-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the IEEE ICDE (2007)

    Google Scholar 

  33. Liu, J., Wang, K.: Anonymizing bag-valued sparse data by semantic similarity-based clustering. Knowl. Inf. Syst. 35, 435–461 (2013)

    CrossRef  Google Scholar 

  34. Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceeding of the SIGMOD (2008)

    CrossRef  Google Scholar 

  35. Martínez, S., Sánchez, D., Valls, A., Batet, M.: Privacy protection of textual attributes through a semantic-based masking method. Inf. Fusion 13(4), 304–314 (2012)

    CrossRef  Google Scholar 

  36. Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31(5), 653–672 (2012)

    CrossRef  Google Scholar 

  37. Miyamoto, S.: Introduction to Fuzzy Clustering (in Japanese). Morikita, Tokyo (1999)

    Google Scholar 

  38. Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Springer, Berlin (2008)

    MATH  Google Scholar 

  39. Navarro-Arribas, G., Abril, D., Torra, V.: Dynamic anonymous index for confidential data. Data Privacy Management and Autonomous Spontaneous Security. Lecture Notes in Computer Science, vol. 8247, pp. 362–368. Springer Berlin Heidelberg, Germany (2014)

    Google Scholar 

  40. Nin, J., Herranz, J., Torra, V.: On the disclosure risk of multivariate microaggregation. Data Knowl. Eng. 67, 399–412 (2008)

    CrossRef  Google Scholar 

  41. Nin, J., Herranz, J., Torra, V.: How to Group Attributes in Multivariate Microaggregation. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 16(1), 121–138 (2008)

    CrossRef  Google Scholar 

  42. Nin, J., Torra, V.: Analysis of the univariate microaggregation disclosure risk. N. Gener. Comput. 27, 177–194 (2009)

    CrossRef  MATH  Google Scholar 

  43. Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U. N. Econ. Comm. Eur. 18(4), 345–353 (2001)

    Google Scholar 

  44. Pei, J., Xu, J., Wang, Z., Wang, W., Wang, K.: Maintaining K-anonymity against incremental updates. In: Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007 (SSBDM, 2007), pp. 5–5 (2007)

    Google Scholar 

  45. Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 1010–1027 (2001)

    CrossRef  Google Scholar 

  46. Solanas, A., Martínez-Balleste, A., Domingo-Ferrer, J., Mateo-Sanz, J.M.: A 2d-tree-based blocking method for microaggregating very large data sets. In: The First International Conference on Availability, Reliability and Security (ARES) (2006)

    Google Scholar 

  47.  Solanas, A., Pietro, R.D.: A linear-time multivariate micro-aggregation for privacy protection in uniform very large data sets. Modeling Decisions for Artificial Intelligence. Lecture Notes in Computer Science, pp. 203–214. Springer, Heidelberg (2008)

    Google Scholar 

  48. Solé, M., Muntés-Mulero, V., Nin, J.: Efficient microaggregation techniques for large numerical data volumes. Int. J. Inf. Secur. 11, 253–267 (2012)

    CrossRef  Google Scholar 

  49. Stokes, K.: Graph k-anonymity through k-means and as modular decomposition. In: Proceedings of the NordSec 2013. Lecture Notes in Computer Science, vol. 8208, pp. 263–278. (2013)

    CrossRef  Google Scholar 

  50. Stokes, K., Torra, V.: n-Confusion: a generalization of k-anonymity. In: Proceedings of the 5th International Workshop on Privacy and Anonymity in the Information Society (PAIS). Berlin, Germany (2012)

    Google Scholar 

  51. Stokes,K., Torra, V.: Multiple releases of k-anonymous data sets and k-anonymous relational databases. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 20(06), 839–853 (2012)

    MathSciNet  CrossRef  Google Scholar 

  52. Stokes, K., Torra, V.: On some clustering approaches for graphs. In: Proceeding of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011) (ISBN 978-1-4244-7315-1), pp. 409–415. Taipei, Taiwan (2011)

    Google Scholar 

  53. Stokes, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft. Comput. 16(10), 1657–1670 (2012)

    CrossRef  MATH  Google Scholar 

  54. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 10, 557–570 (2002)

    Google Scholar 

  55. Torra, V.: Microaggregation for categorical variables: a median based approach. In: Proceeding of the Privacy in Statistical Databases (PSD 2004). Lecture Notes in Computer Science, vol. 3050, pp. 162–174 (2004)

    CrossRef  Google Scholar 

  56. Torra, V. (2015) A fuzzy microaggregation algorithm using fuzzy c-means, Proc. CCIA 2015, Volume 277: Artificial Intelligence Research and Development, IOS Press, 214–223 DOI: 10.3233/978-1-61499-578-4-214

  57. Torra, V., Miyamoto, S.: Evaluating fuzzy clustering algorithms for microdata protection. Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, pp. 175–186 (2004)

    CrossRef  Google Scholar 

  58. Torra, V., Narukawa, Y.: Modeling Decisions: Information Fusion and Aggregation Operators. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  59. Truta, T.M., Campan, A.: K-anonymization incremental maintenance and optimization techniques. In: Proceeding of the 2007 ACM Symposium on Applied Computing, pp. 380–387 (2007)

    Google Scholar 

  60. Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Springer, New York (2006)

    MATH  Google Scholar 

  61. Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 689–700. ACM (2007)

    Google Scholar 

  62. Zhou, B., Pei. J.: Preserving privacy in social networks against neighborhood attacks. In: Proceeding of the ICDE 2008 (2008)

    Google Scholar 

Download references

Acknowledgements

Partial support by the Spanish MEC (projects TIN2011-27076-C03-03 and TIN2014-55243-P) is acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vicenç Torra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Torra, V., Navarro-Arribas, G., Stokes, K. (2016). An Overview of the Use of Clustering for Data Privacy. In: Celebi, M., Aydin, K. (eds) Unsupervised Learning Algorithms. Springer, Cham. https://doi.org/10.1007/978-3-319-24211-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24211-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24209-5

  • Online ISBN: 978-3-319-24211-8

  • eBook Packages: EngineeringEngineering (R0)