International Journal of Information Security

, Volume 18, Issue 2, pp 219–238 | Cite as

Scalable non-deterministic clustering-based k-anonymization for rich networks

  • Miguel Ros-Martín
  • Julián Salas
  • Jordi Casas-RomaEmail author
Regular Contribution


In this paper, we tackle the problem of graph anonymization in the context of privacy-preserving social network mining. We present a greedy and non-deterministic algorithm to achieve k-anonymity on labeled and undirected networks. Our work aims to create a scalable algorithm for real-world big networks, which runs in parallel and uses biased randomization for improving the quality of the solutions. We propose new metrics that consider the utility of the clusters from a recommender system point of view. We compare our approach to SaNGreeA, a well-known state-of-the-art algorithm for k-anonymity generalization. Finally, we have performed scalability tests, with up to 160 machines within the Hadoop framework, for anonymizing a real-world dataset with around 830 K nodes and 63 M relationships, demonstrating our method’s utility and practical applicability.


k-Anonymity Privacy-preserving Recommender systems Social networks Graphs 



Jordi Casas-Roma was partially supported by the Spanish MCYT and the FEDER funds under Grants TIN2011-27076-C03 “CO-PRIVACY” and TIN2014-57364-C2-2-R “SMARTGLACIS.” Julián Salas acknowledges the support of a UOC postdoctoral fellowship.


  1. 1.
    Adamic, L.A., Glance, N.: The political blogosphere and the 2004 U.S. election. In: LinkKDD’05, pp. 36–43. ACM (2005)Google Scholar
  2. 2.
    Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 10, 2008 (2008)Google Scholar
  4. 4.
    Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient \(k\)-anonymization using clustering techniques. In: International Conference on Database Systems for Advanced Applications (DASFAA), pp. 188–200 (2007)Google Scholar
  5. 5.
    Cai, B.J., Wang, H.Y., Zheng, H.R., Wang, H.: Evaluation repeated random walks in community detection of social networks. In: ICMLC’10, pp. 1849–1854. IEEE (2010)Google Scholar
  6. 6.
    Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) Privacy, Security, and Trust in KDD, pp. 33–54. Springer, Berlin (2009)CrossRefGoogle Scholar
  7. 7.
    Campan, A., Alufaisan, Y., Truta, T.M.: Preserving communities in anonymized social networks. Trans. Data Privacy 8(1), 55–87 (2015)Google Scholar
  8. 8.
    Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: Anonymizing graphs: measuring quality for clustering. Knowl. Inf. Syst. 44(3), 507–528 (2014)CrossRefGoogle Scholar
  9. 9.
    Casas-Roma, J., Rousseau, F.: Community-preserving generalization of social networks. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1465–1472. ACM, Paris, France (2015)Google Scholar
  10. 10.
    Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: A survey of graph-modification techniques for privacy-preserving on networks. Artif. Intell. Rev. 47(3), 341366 (2017)CrossRefGoogle Scholar
  11. 11.
    Chester, S., Kapron, B., Srivastava, G., Venkatesh, S.: Complexity of social network anonymization. Soc. Netw. Anal. Min. 3(2), 151–166 (2012)CrossRefGoogle Scholar
  12. 12.
    Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why Waldo befriended the dummy? k-anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)CrossRefGoogle Scholar
  13. 13.
    Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 1–6 (2004)CrossRefGoogle Scholar
  14. 14.
    Erdös, P., Rényi, A.: On random graphs I. Publ. Math. 6, 290–297 (1959)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Hay, M., Miklau, G., Jensen, D., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)CrossRefGoogle Scholar
  16. 16.
    Hay, M., Liu, K., Miklau, G., Pei, J., Terzi, E.: Privacy-aware data management in information networks. In: International Conference on Management of Data (SIGMOD), pp. 1201–1204. ACM Press, New York, USA (2011)Google Scholar
  17. 17.
    He, J., Chu, W.W.: A social network-based recommender system (SNRS). Ann. Inf. Syst. 12, 47–74 (2010)CrossRefGoogle Scholar
  18. 18.
    Juan, A.A., Fauln, J., Ferrer, A., Lourenço, H.R., Barrios, B.: MIRHA: multi-start biased randomization of heuristics with adaptive local search for solving non-smooth routing problems. TOP 21(1), 109–132 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013).
  20. 20.
    Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: ACM SIGMOD International Conference on Management of Data, pp. 93–106. ACM,NY, USA (2008)Google Scholar
  21. 21.
    Lu, X., Song, Y., Bressan, S.: Fast identity anonymization on graphs. In: 23rd International Conference on Database and Expert Systems Applications, pp. 281–295. Springer, Berlin (2012)Google Scholar
  22. 22.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: 22nd IEEE International Conference on Data Engineering (2006)Google Scholar
  23. 23.
    Nettleton, D.F., Salas, J.: A data driven anonymization system for information rich online social network graphs. Expert Syst. Appl. 55, 87–105 (2016)CrossRefGoogle Scholar
  24. 24.
    Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 1–16 (2003)Google Scholar
  25. 25.
    Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)CrossRefGoogle Scholar
  26. 26.
    Salas, J., Torra, V.: Graphic sequences, distances and k-degree anonymity. Disc. Appl. Math. 188, 2531 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Salas, J., Torra, V.: Improving the characterization of P-stability for applications in network privacy. Disc. Appl. Math. 206, 109114 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Salas, J.: Sampling and merging for graph anonymization. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., Yañez, C. (eds.) Modeling Decisions for Artificial Intelligence, pp. 250–260. Springer, Andorra (2016)CrossRefGoogle Scholar
  29. 29.
    Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)CrossRefGoogle Scholar
  30. 30.
    Sihag, V.K.: A clustering approach for structural \(k\)-anonymity in social networks using genetic algorithm. In: CUBE International Information Technology Conference, pp. 701–706. ACM, Pune, India (2012)Google Scholar
  31. 31.
    Singh, L., Schramm, C.: Identifying similar neighborhood structures in private social networks. In: International Conference on Data Mining Workshops, pp. 507–516. IEEE, Sydney, NSW (2010)Google Scholar
  32. 32.
    Skarkala, M.E., Maragoudakis, M., Gritzalis, S., Mitrou, L., Toivonen, H., Moen, P.: Privacy preservation by k-anonymization of weighted social networks. In: Proceedings of ASONAM. pp. 423–428. IEEE Computer Society (2012)Google Scholar
  33. 33.
    Stokes, K., Torra, V.: On some clustering approaches for graphs. In: 2011 IEEE International Conference on Fuzzy Systems, pp. 409–415 (2011)Google Scholar
  34. 34.
    Stokes k, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft Comput. 16(10), 16571670 (2012)CrossRefzbMATHGoogle Scholar
  35. 35.
    Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Tassa, T., Cohen, D.J.: Anonymization of centralized and distributed social networks by sequential clustering. IEEE Trans. Knowl. Data Eng. 25(2), 311–324 (2013)CrossRefGoogle Scholar
  37. 37.
    Wu, X., Ying, X., Liu, K., Chen, L.: Managing and Mining Graph Data, Chapter A Survey of Privacy-Preservation of Graphs and Social Networks, pp. 421–453. Springer, Boston, MA (2010)Google Scholar
  38. 38.
    Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)CrossRefGoogle Scholar
  39. 39.
    Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE (2008)Google Scholar
  40. 40.
    Zhou, B., Pei, J., Luk, W.S.: A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newsl. 10(2), 1222 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.XINGBarcelonaSpain
  2. 2.Internet Interdisciplinary Institute (IN3)Universitat Oberta de Catalunya (UOC)BarcelonaSpain

Personalised recommendations