Skip to main content
Log in

Anonymizing graphs: measuring quality for clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Anonymization of graph-based data is a problem, which has been widely studied last years, and several anonymization methods have been developed. Information loss measures have been carried out to evaluate the noise introduced in the anonymized data. Generic information loss measures ignore the intended anonymized data use. When data has to be released to third-parties, and there is no control on what kind of analyses users could do, these measures are the standard ones. In this paper we study different generic information loss measures for graphs comparing such measures to the cluster-specific ones. We want to evaluate whether the generic information loss measures are indicative of the usefulness of the data for subsequent data mining processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Aggarwal CC, Wang H (eds) (2010) Managing and mining graph data. Springer, New York

    MATH  Google Scholar 

  2. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech 10:P10008

    Article  Google Scholar 

  3. Budi A, Lo D, Jiang L, Lucia (2011) \(kb\)-Anonymity: a model for anonymized behaviour-preserving test and debugging data. ACM SIGPLAN conference on programming language design and implementation (PLDI). ACM Press, New York, pp 447–457

  4. Cai B-J, Wang H-Y, Zheng H-R, Wang H (2010) Evaluation repeated random walks in community detection of social networks. In: 2010 International conference on machine learning and cybernetics (ICMLC). IEEE Computer Society, Qingdao, pp 1849–1854

  5. Casas-Roma J, Herrera-Joancomartí J, Torra V (2013) An algorithm for \(k\)-degree anonymity on large networks. In: Proceedings of the 2013 international conference on advances on social networks analysis and mining (ASONAM). IEEE Computer Society, Niagara Falls, pp 671–675

  6. Chakrabarti D and Faloutsos C (2006) Graph mining: Laws, generators, and algorithms. ACM Comput Surv 38(1):2:1–2:69

  7. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):066111

    Article  Google Scholar 

  8. Cormode G, Srivastava D, Yu T, Zhang Q (2010) Anonymizing bipartite graph data using safe groupings. Proc VLDB Endow 19(1):115–139

    Google Scholar 

  9. Das S, Egecioglu Ö, Abbadi A (2010) Anonymizing weighted social network graphs. In: IEEE 26th international conference on data engineering (ICDE). IEEE Computer Society, Long Beach, pp 904–907

  10. Dongen S-M (2000) Graph clustering by flow simulation. Dissertation, University of Utrecht

  11. Dwork C (2006) Differential privacy. In: Proceedings of the 33rd international conference on automata, languages and programming (ICALP). Springer, Berlin, pp 1–12

  12. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826

    Article  MathSciNet  MATH  Google Scholar 

  13. Gleiser PM, Danon L (2003) Community structure in jazz. Adv Complex Syst 6(04):565–573

    Article  Google Scholar 

  14. Guimerà R, Danon L, Díaz-Guilera A, Giralt F, Arenas A (2003) Self-similar community structure in a network of human interactions. Phys Rev E 68(6):065103

    Article  Google Scholar 

  15. Hay M, Miklau G, Jensen D, Weis P, Srivastava S (2007) Anonymizing social networks, Report. University of Massachusetts, Amherst

    Google Scholar 

  16. Hay M, Miklau G, Jensen D, Towsley D, Weis P (2008) Resisting structural re-identification in anonymized social networks. Proc VLDB Endow 1(1):102–114

    Article  Google Scholar 

  17. Hay M, Li C, Miklau G, Jensen D (2009) Accurate Estimation of the Degree Distribution of Private Networks. In: 9th International conference on data mining (ICDM). IEEE Computer Society, Miami, pp 169–178

  18. Herrera-Joancomartí J, Pérez-Solà C (2011) Online social Honeynets: trapping web crawlers in OSN. In: Proceedings of the 2011 international conference on modeling decisions for artificial intelligence (MDAI). Springer, Girona, pp 115–131

  19. Lancichinetti A and Fortunato S (2009) Community detection algorithms: a comparative analysis. In: Proceedings of the fourth international ICST conference on performance evaluation methodologies and tools. ICST, Pisa, pp 27:1–27:2

  20. Li N, Li T, Venkatasubramanian S (2007) \(t\)-Closeness: privacy beyond \(k\)-anonymity and \(l\)-diversity. In: 23rd International conference on data engineering (ICDE). IEEE Computer Society, Istanbul, pp 106–115

  21. Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: Proceedings of the ACM international conference on management of data (SIGMOD). ACM Press, New York, pp 93–106

  22. Lucia Lo D, Jiang L, Budi A (2012) \(kb^{e}\)-Anonymity: test data anonymization for evolving programs. In: International conference on automated software engineering (ASE). ACM Press, New York, pp 262–265

  23. Machanavajjhala A, Kifer D, Gehrke J and Venkitasubramaniam M (2007) \(l\)-diversity: privacy beyond \(k\)-anonymity. ACM Trans Knowl Discov Data 1(1):3:1–3:12

  24. Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM conference on internet measurement (ICM). ACM Press, New York, pp 29–42

  25. Newman MEJ, Girvan M (2003) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113

    Article  Google Scholar 

  26. Pons P, Latapy M (2005) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):191–218

    Article  MathSciNet  Google Scholar 

  27. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123

    Article  Google Scholar 

  28. Sweeney L (2002) \(k\)-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570

    Article  MathSciNet  MATH  Google Scholar 

  29. Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: Proceedings of the SIAM international conference on data mining (SDM). SIAM, Atlanta, pp 739–750

  30. Ying X, Pan K, Wu X and Guo L (2009) Comparisons of randomization and \(k\)-degree anonymization schemes for privacy preserving social network publishing. In: Proceedings of the 3rd workshop on social network mining and analysis (SNA-KDD). ACM Press, New York, pp 10:1–10:10

  31. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Google Scholar 

  32. Zhang K, Lo D, Lim E, Prasetyo P (2013) Mining indirect antagonistic communities from social interactions. Knowl Inf Syst 35(3):553–583

    Article  Google Scholar 

  33. Zheleva E, Getoor L (2011) Privacy in social networks: a survey. In: Aggarwal CC (ed) Social network data analytics, 1st edn. Springer, Berlin, pp 277–306

    Chapter  Google Scholar 

  34. Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 24th international conference on data engineering (ICDE). IEEE Computer Society, Washington, pp 506–515

  35. Zou L, Chen L, Özsu MT (2009) \(K\)-Automorphism: a general framework for privacy preserving network publication. Proc VLDB Endow 2(1):946–957

    Article  Google Scholar 

Download references

Acknowledgments

This work was partly funded by the Spanish Government through projects TIN2011-27076-C03-02 “CO-PRIVACY”, CONSOLIDER INGENIO 2010 CSD2007-0004 “ARES” and TIN2010-15764 “N-KHRONOUS”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jordi Casas-Roma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Casas-Roma, J., Herrera-Joancomartí, J. & Torra, V. Anonymizing graphs: measuring quality for clustering. Knowl Inf Syst 44, 507–528 (2015). https://doi.org/10.1007/s10115-014-0774-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-014-0774-7

Keywords

Navigation