Skip to main content
Log in

Scalable non-deterministic clustering-based k-anonymization for rich networks

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

In this paper, we tackle the problem of graph anonymization in the context of privacy-preserving social network mining. We present a greedy and non-deterministic algorithm to achieve k-anonymity on labeled and undirected networks. Our work aims to create a scalable algorithm for real-world big networks, which runs in parallel and uses biased randomization for improving the quality of the solutions. We propose new metrics that consider the utility of the clusters from a recommender system point of view. We compare our approach to SaNGreeA, a well-known state-of-the-art algorithm for k-anonymity generalization. Finally, we have performed scalability tests, with up to 160 machines within the Hadoop framework, for anonymizing a real-world dataset with around 830 K nodes and 63 M relationships, demonstrating our method’s utility and practical applicability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.xing.com.

  2. The Scala source code is available at: https://bitbucket.org/jcasasr/scan-algorithm/.

  3. Available at: http://igraph.org.

References

  1. Adamic, L.A., Glance, N.: The political blogosphere and the 2004 U.S. election. In: LinkKDD’05, pp. 36–43. ACM (2005)

  2. Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  3. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 10, 2008 (2008)

    Google Scholar 

  4. Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient \(k\)-anonymization using clustering techniques. In: International Conference on Database Systems for Advanced Applications (DASFAA), pp. 188–200 (2007)

  5. Cai, B.J., Wang, H.Y., Zheng, H.R., Wang, H.: Evaluation repeated random walks in community detection of social networks. In: ICMLC’10, pp. 1849–1854. IEEE (2010)

  6. Campan, A., Truta, T.M.: Data and structural k-anonymity in social networks. In: Bonchi, F., Ferrari, E., Jiang, W., Malin, B. (eds.) Privacy, Security, and Trust in KDD, pp. 33–54. Springer, Berlin (2009)

    Chapter  Google Scholar 

  7. Campan, A., Alufaisan, Y., Truta, T.M.: Preserving communities in anonymized social networks. Trans. Data Privacy 8(1), 55–87 (2015)

    Google Scholar 

  8. Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: Anonymizing graphs: measuring quality for clustering. Knowl. Inf. Syst. 44(3), 507–528 (2014)

    Article  Google Scholar 

  9. Casas-Roma, J., Rousseau, F.: Community-preserving generalization of social networks. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 1465–1472. ACM, Paris, France (2015)

  10. Casas-Roma, J., Herrera-Joancomartí, J., Torra, V.: A survey of graph-modification techniques for privacy-preserving on networks. Artif. Intell. Rev. 47(3), 341366 (2017)

    Article  Google Scholar 

  11. Chester, S., Kapron, B., Srivastava, G., Venkatesh, S.: Complexity of social network anonymization. Soc. Netw. Anal. Min. 3(2), 151–166 (2012)

    Article  Google Scholar 

  12. Chester, S., Kapron, B.M., Ramesh, G., Srivastava, G., Thomo, A., Venkatesh, S.: Why Waldo befriended the dummy? k-anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3(3), 381–399 (2013)

    Article  Google Scholar 

  13. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 1–6 (2004)

    Article  Google Scholar 

  14. Erdös, P., Rényi, A.: On random graphs I. Publ. Math. 6, 290–297 (1959)

    MathSciNet  MATH  Google Scholar 

  15. Hay, M., Miklau, G., Jensen, D., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endow. 1(1), 102–114 (2008)

    Article  Google Scholar 

  16. Hay, M., Liu, K., Miklau, G., Pei, J., Terzi, E.: Privacy-aware data management in information networks. In: International Conference on Management of Data (SIGMOD), pp. 1201–1204. ACM Press, New York, USA (2011)

  17. He, J., Chu, W.W.: A social network-based recommender system (SNRS). Ann. Inf. Syst. 12, 47–74 (2010)

    Article  Google Scholar 

  18. Juan, A.A., Fauln, J., Ferrer, A., Lourenço, H.R., Barrios, B.: MIRHA: multi-start biased randomization of heuristics with adaptive local search for solving non-smooth routing problems. TOP 21(1), 109–132 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  19. Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml

  20. Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: ACM SIGMOD International Conference on Management of Data, pp. 93–106. ACM,NY, USA (2008)

  21. Lu, X., Song, Y., Bressan, S.: Fast identity anonymization on graphs. In: 23rd International Conference on Database and Expert Systems Applications, pp. 281–295. Springer, Berlin (2012)

  22. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: privacy beyond k-anonymity. In: 22nd IEEE International Conference on Data Engineering (2006)

  23. Nettleton, D.F., Salas, J.: A data driven anonymization system for information rich online social network graphs. Expert Syst. Appl. 55, 87–105 (2016)

    Article  Google Scholar 

  24. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 1–16 (2003)

    Google Scholar 

  25. Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. PNAS 105(4), 1118–1123 (2008)

    Article  Google Scholar 

  26. Salas, J., Torra, V.: Graphic sequences, distances and k-degree anonymity. Disc. Appl. Math. 188, 2531 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  27. Salas, J., Torra, V.: Improving the characterization of P-stability for applications in network privacy. Disc. Appl. Math. 206, 109114 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  28. Salas, J.: Sampling and merging for graph anonymization. In: Torra, V., Narukawa, Y., Navarro-Arribas, G., Yañez, C. (eds.) Modeling Decisions for Artificial Intelligence, pp. 250–260. Springer, Andorra (2016)

    Chapter  Google Scholar 

  29. Samarati, P.: Protecting respondents identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  30. Sihag, V.K.: A clustering approach for structural \(k\)-anonymity in social networks using genetic algorithm. In: CUBE International Information Technology Conference, pp. 701–706. ACM, Pune, India (2012)

  31. Singh, L., Schramm, C.: Identifying similar neighborhood structures in private social networks. In: International Conference on Data Mining Workshops, pp. 507–516. IEEE, Sydney, NSW (2010)

  32. Skarkala, M.E., Maragoudakis, M., Gritzalis, S., Mitrou, L., Toivonen, H., Moen, P.: Privacy preservation by k-anonymization of weighted social networks. In: Proceedings of ASONAM. pp. 423–428. IEEE Computer Society (2012)

  33. Stokes, K., Torra, V.: On some clustering approaches for graphs. In: 2011 IEEE International Conference on Fuzzy Systems, pp. 409–415 (2011)

  34. Stokes k, K., Torra, V.: Reidentification and k-anonymity: a model for disclosure risk in graphs. Soft Comput. 16(10), 16571670 (2012)

    Article  MATH  Google Scholar 

  35. Sweeney, L.: k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  36. Tassa, T., Cohen, D.J.: Anonymization of centralized and distributed social networks by sequential clustering. IEEE Trans. Knowl. Data Eng. 25(2), 311–324 (2013)

    Article  Google Scholar 

  37. Wu, X., Ying, X., Liu, K., Chen, L.: Managing and Mining Graph Data, Chapter A Survey of Privacy-Preservation of Graphs and Social Networks, pp. 421–453. Springer, Boston, MA (2010)

  38. Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33(4), 452–473 (1977)

    Article  Google Scholar 

  39. Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: ICDE (2008)

  40. Zhou, B., Pei, J., Luk, W.S.: A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor. Newsl. 10(2), 1222 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

Jordi Casas-Roma was partially supported by the Spanish MCYT and the FEDER funds under Grants TIN2011-27076-C03 “CO-PRIVACY” and TIN2014-57364-C2-2-R “SMARTGLACIS.” Julián Salas acknowledges the support of a UOC postdoctoral fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jordi Casas-Roma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ros-Martín, M., Salas, J. & Casas-Roma, J. Scalable non-deterministic clustering-based k-anonymization for rich networks. Int. J. Inf. Secur. 18, 219–238 (2019). https://doi.org/10.1007/s10207-018-0409-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-018-0409-1

Keywords

Navigation