An Overview of the Use of Clustering for Data Privacy

  • Vicenç TorraEmail author
  • Guillermo Navarro-Arribas
  • Klara Stokes


In this chapter we review some of our results related to the use of clustering in the area of data privacy. The paper gives a brief overview of data privacy and, more specifically, on data driven methods for data privacy and discusses where clustering can be applied in this setting. We discuss the role of clustering in the definition of masking methods, and on the calculation of information loss and data utility.


Data privacy Clustering Fuzzy clustering Information loss Microaggregation 



Partial support by the Spanish MEC (projects TIN2011-27076-C03-03 and TIN2014-55243-P) is acknowledged.


  1. 1.
    Abril, D., Navarro-Arribas, G., Torra, V.: Towards semantic microaggregation of categorical data for confidential documents. Modeling Decisions for Artificial Intelligence. Lecture Notes in Computer Science, vol. 6408, pp. 266–276. Springer, Heidelberg (2010)Google Scholar
  2. 2.
    Abril, D., Navarro-Arribas, G., Torra, V.: Spherical microaggregation: Anonymizing sparse vector spaces. Comput. Secur. 49, 28–44 (2015)CrossRefGoogle Scholar
  3. 3.
    Batet, M., Erola, A., Sánchez, D., Castellà-Roca, J.: Semantic anonymisation of set-valued data. In: Proceedings of the 6th International Conference on Agents and Artificial Intelligence (ICAART) vol. 1, pp. 102–112 (2014)Google Scholar
  4. 4.
    Batet, M., Erola, A., Sánchez D., Castellà-Roca, J.: Utility preserving query log anonymization via semantic microaggregation. Inf. Sci. 242, 49–63 (2013)CrossRefGoogle Scholar
  5. 5.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)CrossRefzbMATHGoogle Scholar
  6. 6.
    Byun, J.-W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: Secure Data Management. Lecture Notes in Computer Science, pp. 48–63. Springer, Heidelberg (2006)Google Scholar
  7. 7.
    Campello, R.J.G.B.: A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recogn. Lett. 28(7), 833–841 (2007)CrossRefGoogle Scholar
  8. 8.
    Cao, J., Carminati, B., Ferrari, E., Tan, K.-L.: CASTLE: continuously anonymizing data streams. IEEE Trans. Dependable Secure Comput. 8, 337–352 (2011)CrossRefGoogle Scholar
  9. 9.
    De Capitani di Vimercati, S., Foresti, S., Livraga, G., Samarati, P.: Data privacy: definitions and techniques. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 20(6), 793–817 (2012)CrossRefGoogle Scholar
  10. 10.
    Defays, D., Nanopoulos, P.: Panels of enterprises and confidentiality: the small aggregates method. In: Proceeding of the 1992 Symposium on Design and Analysis of Longitudinal Surveys, pp. 195–204. Statistics Canada (1993)Google Scholar
  11. 11.
    DMOZ: The Open Directory Project. (2015)
  12. 12.
    Domingo-Ferrer, J., González-Nicolás, U.: Hybrid microdata using microaggregation. Inf. Sci. 180, 2834–2844 (2010)CrossRefGoogle Scholar
  13. 13.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans. Knowl. Data Eng. 14(1), 189–201 (2002)CrossRefGoogle Scholar
  14. 14.
    Domingo-Ferrer, J., Mateo-Sanz, J.M., Torra, V.: Comparing SDC methods for microdata on the basis of information loss and disclosure risk. In: Pre-proceedings of ETK-NTTS’2001 (Eurostat, ISBN 92-894-1176-5), vol. 2, pp. 807–826. Creta, Greece (2001)Google Scholar
  15. 15.
    Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 91–110. Elsevier (2001)Google Scholar
  16. 16.
    Domingo-Ferrer, J., Torra, V.: A quantitative comparison of disclosure control methods for microdata. In: Doyle, P., Lane, J.I., Theeuwes, J.J.M., Zayatz, L. (eds.) Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 111–134. North-Holland, Amsterdam, The Netherlands (2001)Google Scholar
  17. 17.
    Domingo-Ferrer, J., Torra, V.: Towards fuzzy c-means based microaggregation. In: Grzegorzewski, P., Hryniewicz, O., Gil, M.A. (eds.) Soft Methods in Probability and Statistics, pp. 289–294. Physica, Heidelberg (2002)Google Scholar
  18. 18.
    Domingo-Ferrer, J., Torra, V.: Fuzzy microaggregation for microdata protection. J. Adv. Comput. Intell. Intell. Inform. 7(2), 153–159 (2003)zbMATHGoogle Scholar
  19. 19.
    Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Erola, A., Castellà-Roca, J., Navarro-Arribas, G., Torra, V.: Semantic microaggregation for the anonymization of query logs using the open directory project. SORT Stat. Oper. Res. 35, Trans. 41–58 (2011)Google Scholar
  21. 21.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT, Cambridge 1998zbMATHGoogle Scholar
  22. 22.
    Feder, T., Nabar, S.U., Terzi, E.: Anonymizing graphs. CoRR abs/0810.5578 (2008)Google Scholar
  23. 23.
    Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference Very Large Data Bases, pp. 758–769 (2007)Google Scholar
  24. 24.
    Hansen, S.L., Mukherjee, S.: A polynomial algorithm for optimal univariate microaggregation. IEEE Trans. Knowl. Data Eng. 15(4), 1043–1044 (2003)CrossRefGoogle Scholar
  25. 25.
    Hay, M., Miklau, G., Jensen, D.: Anonymizing social networks. In: Proceedings of the VLDB Endowment (2008)Google Scholar
  26. 26.
    Hore, B., Jammalamadaka, R.C., Mehrotra, S.: Flexible anonymization for privacy preserving data publishing: a systematic search based approach. In: Proceedings of the 7th SIAM International Conference on Data Mining (2007)Google Scholar
  27. 27.
    Hüllermeier, E., Rifqi, M.: A fuzzy variant of the rand index for comparing clustering structures. In: Proceedings of IFSA-EUSFLAT (2009)Google Scholar
  28. 28.
    Ladra, S., Torra, V.: On the comparison of generic information loss measures and cluster-specific ones. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 16(1) 107–120 (2008)CrossRefGoogle Scholar
  29. 29.
    Laszlo, M., Mukherjee, S.: Optimal univariate microaggregation with data suppression. J. Syst. Softw. 86, 677–682 (2013)CrossRefGoogle Scholar
  30. 30.
    Laszlo, M., Mukherjee, S.: Iterated local search for microaggregation. J. Syst. Softw. 100, 15–26 (2015)CrossRefGoogle Scholar
  31. 31.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of International Conference on Data Engineering (2006)CrossRefGoogle Scholar
  32. 32.
    Li, N., Li, T., Venkatasubramanian, S.: T-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the IEEE ICDE (2007)Google Scholar
  33. 33.
    Liu, J., Wang, K.: Anonymizing bag-valued sparse data by semantic similarity-based clustering. Knowl. Inf. Syst. 35, 435–461 (2013)CrossRefGoogle Scholar
  34. 34.
    Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceeding of the SIGMOD (2008)CrossRefGoogle Scholar
  35. 35.
    Martínez, S., Sánchez, D., Valls, A., Batet, M.: Privacy protection of textual attributes through a semantic-based masking method. Inf. Fusion 13(4), 304–314 (2012)CrossRefGoogle Scholar
  36. 36.
    Martínez, S., Sánchez, D., Valls, A.: Semantic adaptive microaggregation of categorical microdata. Comput. Secur. 31(5), 653–672 (2012)CrossRefGoogle Scholar
  37. 37.
    Miyamoto, S.: Introduction to Fuzzy Clustering (in Japanese). Morikita, Tokyo (1999)Google Scholar
  38. 38.
    Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering. Springer, Berlin (2008)zbMATHGoogle Scholar
  39. 39.
    Navarro-Arribas, G., Abril, D., Torra, V.: Dynamic anonymous index for confidential data. Data Privacy Management and Autonomous Spontaneous Security. Lecture Notes in Computer Science, vol. 8247, pp. 362–368. Springer Berlin Heidelberg, Germany (2014)Google Scholar
  40. 40.
    Nin, J., Herranz, J., Torra, V.: On the disclosure risk of multivariate microaggregation. Data Knowl. Eng. 67, 399–412 (2008)CrossRefGoogle Scholar
  41. 41.
    Nin, J., Herranz, J., Torra, V.: How to Group Attributes in Multivariate Microaggregation. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 16(1), 121–138 (2008)CrossRefGoogle Scholar
  42. 42.
    Nin, J., Torra, V.: Analysis of the univariate microaggregation disclosure risk. N. Gener. Comput. 27, 177–194 (2009)CrossRefzbMATHGoogle Scholar
  43. 43.
    Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. U. N. Econ. Comm. Eur. 18(4), 345–353 (2001)Google Scholar
  44. 44.
    Pei, J., Xu, J., Wang, Z., Wang, W., Wang, K.: Maintaining K-anonymity against incremental updates. In: Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007 (SSBDM, 2007), pp. 5–5 (2007)Google Scholar
  45. 45.
    Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13, 1010–1027 (2001)CrossRefGoogle Scholar
  46. 46.
    Solanas, A., Martínez-Balleste, A., Domingo-Ferrer, J., Mateo-Sanz, J.M.: A 2d-tree-based blocking method for microaggregating very large data sets. In: The First International Conference on Availability, Reliability and Security (ARES) (2006)Google Scholar
  47. 47.
     Solanas, A., Pietro, R.D.: A linear-time multivariate micro-aggregation for privacy protection in uniform very large data sets. Modeling Decisions for Artificial Intelligence. Lecture Notes in Computer Science, pp. 203–214. Springer, Heidelberg (2008)Google Scholar
  48. 48.
    Solé, M., Muntés-Mulero, V., Nin, J.: Efficient microaggregation techniques for large numerical data volumes. Int. J. Inf. Secur. 11, 253–267 (2012)CrossRefGoogle Scholar
  49. 49.
    Stokes, K.: Graph k-anonymity through k-means and as modular decomposition. In: Proceedings of the NordSec 2013. Lecture Notes in Computer Science, vol. 8208, pp. 263–278. (2013)CrossRefGoogle Scholar
  50. 50.
    Stokes, K., Torra, V.: n-Confusion: a generalization of k-anonymity. In: Proceedings of the 5th International Workshop on Privacy and Anonymity in the Information Society (PAIS). Berlin, Germany (2012)Google Scholar
  51. 51.
    Stokes,K., Torra, V.: Multiple releases of k-anonymous data sets and k-anonymous relational databases. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 20(06), 839–853 (2012)MathSciNetCrossRefGoogle Scholar
  52. 52.
    Stokes, K., Torra, V.: On some clustering approaches for graphs. In: Proceeding of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011) (ISBN 978-1-4244-7315-1), pp. 409–415. Taipei, Taiwan (2011)Google Scholar
  53. 53.
    Stokes, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft. Comput. 16(10), 1657–1670 (2012)CrossRefzbMATHGoogle Scholar
  54. 54.
    Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge Based Syst. 10, 557–570 (2002)Google Scholar
  55. 55.
    Torra, V.: Microaggregation for categorical variables: a median based approach. In: Proceeding of the Privacy in Statistical Databases (PSD 2004). Lecture Notes in Computer Science, vol. 3050, pp. 162–174 (2004)CrossRefGoogle Scholar
  56. 56.
    Torra, V. (2015) A fuzzy microaggregation algorithm using fuzzy c-means, Proc. CCIA 2015, Volume 277: Artificial Intelligence Research and Development, IOS Press, 214–223 DOI:  10.3233/978-1-61499-578-4-214
  57. 57.
    Torra, V., Miyamoto, S.: Evaluating fuzzy clustering algorithms for microdata protection. Privacy in Statistical Databases. Lecture Notes in Computer Science, vol. 3050, pp. 175–186 (2004)CrossRefGoogle Scholar
  58. 58.
    Torra, V., Narukawa, Y.: Modeling Decisions: Information Fusion and Aggregation Operators. Springer, Heidelberg (2007)zbMATHGoogle Scholar
  59. 59.
    Truta, T.M., Campan, A.: K-anonymization incremental maintenance and optimization techniques. In: Proceeding of the 2007 ACM Symposium on Applied Computing, pp. 380–387 (2007)Google Scholar
  60. 60.
    Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Springer, New York (2006)zbMATHGoogle Scholar
  61. 61.
    Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD 2007, pp. 689–700. ACM (2007)Google Scholar
  62. 62.
    Zhou, B., Pei. J.: Preserving privacy in social networks against neighborhood attacks. In: Proceeding of the ICDE 2008 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Vicenç Torra
    • 1
    Email author
  • Guillermo Navarro-Arribas
    • 2
  • Klara Stokes
    • 1
  1. 1.University of SkövdeSkövdeSweden
  2. 2.Universitat Autònoma de BarcelonaBarcelonaSpain

Personalised recommendations