Skip to main content
Log in

k-Degree anonymity and edge selection: improving data utility in large networks

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The problem of anonymization in large networks and the utility of released data are considered in this paper. Although there are some anonymization methods for networks, most of them cannot be applied in large networks because of their complexity. In this paper, we devise a simple and efficient algorithm for k-degree anonymity in large networks. Our algorithm constructs a k-degree anonymous network by the minimum number of edge modifications. We compare our algorithm with other well-known k-degree anonymous algorithms and demonstrate that information loss in real networks is lowered. Moreover, we consider the edge relevance in order to improve the data utility on anonymized networks. By considering the neighbourhood centrality score of each edge, we preserve the most important edges of the network, reducing the information loss and increasing the data utility. An evaluation of clustering processes is performed on our algorithm, proving that edge neighbourhood centrality increases data utility. Lastly, we apply our algorithm to different large real datasets and demonstrate their efficiency and practical utility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. A preliminary, short version [9] of this paper appeared at ASONAM 2013.

  2. Java implementation and source code available at: https://jcasasr.wordpress.com.

References

  1. Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election. In: International workshop on link discovery. ACM, New York, NY, USA, pp 36–43

  2. Backstrom L, Dwork C, Kleinberg J (2007) Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography. In: International conference on world wide web. ACM, New York, NY, USA, pp 181–190

  3. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008

    Article  Google Scholar 

  4. Boldi P, Bonchi F, Gionis A, Tassa T (2012) Injecting uncertainty in graphs for identity obfuscation. Proc VLDB Endow 5(11):1376–1387

    Article  Google Scholar 

  5. Bredereck R, Froese V, Hartung S, Nichterlein A, Niedermeier R, Talmon N (2014) The complexity of degree anonymization by vertex addition. In: Proceedings of the 10th international conference on algorithmic aspects in information and management. Springer, Vancouver, BC, Canada, pp 44–55

  6. Cai B-J, Wang H-Y, Zheng H-R, Wang H (2010) Evaluation repeated random walks in community detection of social networks. In: International conference on machine learning and cybernetics. IEEE, Qingdao, China, pp 1849–1854

  7. Campan A, Alufaisan Y, Truta TM (2015) Preserving communities in anonymized social networks. Trans Data Priv 8(1):55–87

    Google Scholar 

  8. Campan A, Truta TM (2009) Data and structural \(k\)-anonymity in social networks. In: Bonchi F, Ferrari E, Jiang W, Malin B (eds) Privacy, security, and trust in KDD. Springer, New York, pp 33–54

  9. Casas-Roma J, Herrera-Joancomartí J, Torra V (2013) An algorithm for \(k\)-degree anonymity on large networks. In: IEEE international conference on advances on social networks analysis and mining. IEEE, Niagara Falls, CA, USA, pp 671–675

  10. Casas-Roma J, Herrera-Joancomartí J, Torra V (2013) Analyzing the impact of edge modifications on networks. In: International conference on modeling decisions for artificial intelligence. Springer, Barcelona, Spain, pp 296–307

  11. Cheng J, Fu AW, Liu J (2010) \(k\)-isomorphism: privacy preserving network publication against structural attacks. In: International conference on management of data. ACM, New York, NY, USA, pp 459–470

  12. Chester S, Kapron BM, Ramesh G, Srivastava G, Thomo A, Venkatesh S (2011) \(k\)-anonymization of social networks by vertex addition. In: ADBIS 2011 research communications, pp 107–116

  13. Chester S, Gaertner J, Stege U, Venkatesh S (2012) Anonymizing subsets of social networks with degree constrained subgraphs. In: IEEE International conference on advances on social networks analysis and mining. IEEE, Washington, DC, USA, pp 418–422

  14. Chester S, Kapron BM, Ramesh G, Srivastava G, Thomo A, Venkatesh S (2013) Why Waldo befriended the dummy? \(k\)-anonymization of social networks with pseudo-nodes. Soc Netw Anal Min 3(3):381–399

    Article  Google Scholar 

  15. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):1–6

    Article  Google Scholar 

  16. De Capitani di Vimercati S, Foresti S, Livraga G, Samarati P (2012) Data privacy: definitions and techniques. Int J Uncertain Fuzziness Knowl Based Syst 20(6):793–818

    Article  MATH  Google Scholar 

  17. Dwork C (2006) Differential privacy. In: International conference on automata, languages and programming, vol 4052, pp 1–12

  18. Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1):86–95

    Article  Google Scholar 

  19. Ferri F, Grifoni P, Guzzo T (2011) New forms of social and professional digital relationships: the case of Facebook. Soc Netw Anal Min 2(2):121–137

    Article  Google Scholar 

  20. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826

    Article  MathSciNet  MATH  Google Scholar 

  21. Hansen SL, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044

    Article  Google Scholar 

  22. Hartung S, Hoffmann C, Nichterlein A (2014) Improved upper and lower bound heuristics for degree anonymization in social networks. In: Proceedings of the 13th international symposium on experimental algorithms. Springer, Copenhagen, Denmark, pp 376–387

  23. Hay M, Miklau G, Jensen D, Weis P, Srivastava S (2007) Anonymizing social networks. Technical Report No. 07-19, Computer Science Department, University of Massachusetts, Amherst

  24. Hay M, Miklau G, Jensen D, Towsley D, Weis P (2008) Resisting structural re-identification in anonymized social networks. Proc VLDB Endow 1(1):102–114

    Article  Google Scholar 

  25. Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: IEEE International conference on data mining (ICDM). IEEE, Miami, FL, USA, pp 169–178

  26. Hay M, Liu K, Miklau G, Pei J, Terzi E (2011) Privacy-aware data management in information networks. In: International conference on management of data. ACM Press, New York, NY, USA, pp 1201–1204

  27. Krebs V (2006). US politics book. Retrieved from http://www.orgnet.com

  28. Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5):056117

    Article  Google Scholar 

  29. Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–40

    Article  Google Scholar 

  30. Leskovec J, Adamic LA, Huberman BA (2007) The dynamics of viral marketing. ACM Trans Web 1(1):5:1–5:46

    Article  Google Scholar 

  31. Leskovec J, Lang K, Dasgupta A, Mahoney M (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123

    Article  MathSciNet  MATH  Google Scholar 

  32. Li N, Li T, Venkatasubramanian S (2007) \(t\)-closeness: privacy beyond \(k\)-anonymity and \(\ell \)-diversity. In: IEEE international conference on data engineering. IEEE, pp 106–115

  33. Liu K, Terzi E (2008) Towards identity anonymization on graphs. In ACM SIGMOD international conference on management of data. ACM, New York, NY, USA, pp 93–106

  34. Lu X, Song Y, Bressan S (2012) Fast identity anonymization on graphs. In: Proceedings of the 23rd international conference on database and expert systems applications. Springer, Vienna, Austria, pp 281–295

  35. Nagle F (2013) Privacy breach analysis in social networks. In: Özyer T, Erdem Z, Rokne J, Khoury S (eds) Mining social networks and security informatics. Springer, Dordrecht, pp 63–77

    Chapter  Google Scholar 

  36. Nguyen HH, Imine A, Rusinowitch M (2015) Anonymizing social graphs via uncertainty semantics. In: Proceedings of the 10th ACM symposium on information, computer and communications security. ACM, Singapore, pp 495–506

  37. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) \(\ell \)-diversity: privacy beyond k\(k\)-anonymity. ACM Trans Knowl Discov Data 1(1):3:1–3:12

    Article  Google Scholar 

  38. McSherry F, Mironov I (2009) Differentially private recommender systems. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 627–636

  39. Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: 20th international symposium computer and information sciences. Springer, Istanbul, Turkey, pp 284–293

  40. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123

    Article  Google Scholar 

  41. Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027

    Article  Google Scholar 

  42. Sweeney L (2002) \(k\)-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570

    Article  MathSciNet  MATH  Google Scholar 

  43. Tripathy BK, Panda GK (2010) A new approach to manage security against neighborhood attacks in social networks. In: IEEE international conference on advances on social networks analysis and mining. IEEE, pp 264–269

  44. Wu W, Xiao Y, Wang W, He Z, Wang Z (2010) \(k\)-symmetry model for identity anonymization in social networks. In: International conference on extending database technology. ACM, New York, NY, USA, pp 111–122

  45. Yahoo! Webscope, Yahoo! Instant Messenger friends connectivity graph, version 1.0. http://research.yahoo.com/Academic_Relations

  46. Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: SIAM Conference on data mining. SIAM, Atlanta, GA, USA, pp 739–750

  47. Ying X, Pan K, Wu X, Guo L (2009) Comparisons of randomization and \(k\)-degree anonymization schemes for privacy preserving social network publishing. In: Workshop on social network mining and analysis. ACM, New York, NY, USA, pp 10:1–10:10

  48. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473

    Article  Google Scholar 

  49. Zhang K, Lo D, Lim E-P, Prasetyo PK (2013) Mining indirect antagonistic communities from social interactions. Knowl Inf Syst (KAIS) 35(3):553–583

    Article  Google Scholar 

  50. Zheleva E, Getoor L (2011) Privacy in social networks: a survey. Social network data analytics. Springer, Berlin

    Google Scholar 

  51. Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: IEEE International conference on data engineering (ICDE). IEEE, Washington, DC, USA, pp 506–515

  52. Zhou B, Pei J (2011) The \(k\)-anonymity and \(\ell \)-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28(1):47–77

    Article  Google Scholar 

  53. Zhou B, Pei J, Luk W (2008) A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor Newsl 10(2):12–22

    Article  Google Scholar 

  54. Zou L, Chen L, Özsu MT (2009) \(k\)-automorphism: a general framework for privacy preserving network publication. Proc VLDB Endow 2(1):946–957

    Article  Google Scholar 

Download references

Acknowledgments

This work was partly funded by the Spanish Government through Grants TIN2011-27076-C03-02 “CO-PRIVACY” and TIN2014-57364-C2-2-R “SMARTGLACIS.”

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jordi Casas-Roma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Casas-Roma, J., Herrera-Joancomartí, J. & Torra, V. k-Degree anonymity and edge selection: improving data utility in large networks. Knowl Inf Syst 50, 447–474 (2017). https://doi.org/10.1007/s10115-016-0947-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0947-7

Keywords

Navigation