Advertisement

K-anonymity for social networks containing rich structural and textual information

  • Yifan Hao
  • Huiping Cao
  • Chuan Hu
  • Kabi Bhattarai
  • Satyajayant Misra
Original Article

Abstract

When social networks are released for analysis, individuals’ sensitive information (e.g., node identities) in the network may be exposed. To avoid unwanted information exposure, social networks need to be anonymized before they are published. In the literature, many approaches exist to anonymize social networks to prevent attacks by adversaries that know the network structures such as node degrees and neighbors. However, these techniques cannot prevent the leakage of valuable identification information during social network analysis if the social network graphs contain both structural and textual information. In this paper, we study the problem of anonymizing social networks to prevent individual identifications which use both structural (node degrees) and textual (edge labels) information in graphs. We formally define the problem as Structure and Text aware \(K\)-anonymity of social networks (STK-Anonymity). In an STK-anonymized network, each individual is \(ST\)-equivalent to at least \(K-1\) other nodes. The major challenge in achieving STK-Anonymity comes from the correlation of edge labels, which causes the propagation of edge anonymization. It has been shown that it is intractable to optimally \(K\)-anonymizing the label sequences of edge-labeled graphs. To address the challenge, we present a two-phase approach which consists of two heuristics in the first phase to process partial graph structures (node degrees in particular) and a set-enumeration tree-based approach in the second phase to anonymize edge labels. Results from extensive experiments on both real and synthetic datasets are presented to show the effectiveness and efficiency of our approaches.

Keywords

K-anonymity Set-enumeration tree Graph  Social network 

References

  1. Aggarwal CC, Khan A, Yan X (2011) On flow authority discovery in social networks. In: Proceedings of SIAM international conference on data mining (SDM). SIAM/Omnipress, pp 522–533Google Scholar
  2. Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: ICDT, pp 246–258Google Scholar
  3. Backstrom L, Dwork C, Kleinberg JM (2007) Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of World Wide Web Conference (WWW), pp 181–190Google Scholar
  4. Backstrom L, Huttenlocher DP, Kleinberg JM, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 44–54Google Scholar
  5. Bhagat S, Cormode G, Krishnamurthy B, Srivastava D (2010) Privacy in dynamic social networks. In: Proceedings of World Wide Web Conference (WWW), pp 1059–1060Google Scholar
  6. Bonchi F, Gionis A, Tassa T (2011) Identity obfuscation in graphs through the information theoretic lens. In: Proceedings of IEEE International conference on data engineering (ICDE), pp 924–935Google Scholar
  7. Campan A, Truta TM (2008) Data and structural k-anonymity in social networks. In: ACM International workshop on privacy, security, and trust in KDD (PinKDD), pp 33–54Google Scholar
  8. Chakrabarti D, Zhan Y, Faloutsos C (2004) R-MAT: a recursive model for graph mining. In: Proceedings of SIAM International conference on data mining (SDM)Google Scholar
  9. Chen C, Yan X, Zhu F, Han J, Yu PS (2008) Graph olap: towards online analytical processing on graphs. In: Proceedings of IEEE International conference on data mining (ICDM). IEEE Computer Society, pp 103–112Google Scholar
  10. Cheng J, Fu AWC, Liu J (2010) K-isomorphism: privacy preserving network publication against structural attacks. In: Proceedings of ACM SIGMOD International conference on management of data, pp 459–470Google Scholar
  11. Chester S, Kapron BM, Srivastava G, Venkatesh S (2013) Complexity of social network anonymization. Soc Netw Anal Min 3(2):151–166CrossRefGoogle Scholar
  12. Cormen TH, Leiserson CE, Rivest RL (2009) Introduction to Algorithms. The MIT Press, MassachusettszbMATHGoogle Scholar
  13. Cormode G, Srivastava D, Bhagat S, Krishnamurthy B (2009) Class-based graph anonymization for social network data. Proc VLDB Endow 2(1):766–777CrossRefGoogle Scholar
  14. Das S, Egecioglu Ö, Abbadi AE (2010) Anonymizing weighted social network graphs. In: Proceedings of IEEE International Conference on Data Engineering (ICDE), pp 904–907Google Scholar
  15. Das S, Egecioglu Ö, El Abbadi A (2012) Anónimos: an LP-based approach for anonymizing weighted social network graphs. IEEE Trans Knowl Data Eng 24(4):590–604CrossRefGoogle Scholar
  16. Fard AM, Wang K, Yu PS (2012) Limiting link disclosure in social network analysis through subgraph-wise perturbation. In: Proceedings of international conference on extending database technology (EDBT), pp 109–119Google Scholar
  17. Han J, Yan X, Yu PS (2009) Scalable olap and mining of information networks. In: Proceedings of international conference n extending database technology (EDBT), p 1159Google Scholar
  18. Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: Proceedings of IEEE international conference on data mining (ICDM), pp 169–178Google Scholar
  19. Hay M, Miklau G, Jensen D, Towsley DF, Li C (2010) Resisting structural re-identification in anonymized social networks. VLDB J 19(6):797–823CrossRefGoogle Scholar
  20. Hay M, Miklau G, Jensen D, Towsley DF, Weis P (2008) Resisting structural re-identification in anonymized social networks. Proc VLDB Endow 1(1):102–114CrossRefGoogle Scholar
  21. Bayardo RJ, Agrawal R (2005) Data privacy through optimal k-anonymization. In: Proceedings of IEEE intlernational conference on data engineering (ICDE), pp 217–228Google Scholar
  22. Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 611–617Google Scholar
  23. Lee Y-S (1995) Graphical demonstration of an optimality property of the median. Am Stat 49(4):369–372Google Scholar
  24. LeFevre K, DeWitt DJ, Ramakrishnan R (2005) Incognito: efficient full-domain k-anonymity. In: Proceedings of ACM SIGMOD intlernational conference on management of data, pp 49–60Google Scholar
  25. Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of IEEE international conference on data engineering (ICDE), pp 106–115Google Scholar
  26. Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: Proceedings of ACM SIGMOD international conference on management of data, pp 93–106Google Scholar
  27. Liu L, Wang J, Liu J, Zhang J (2009) Privacy preservation in social networks with sensitive edge weights. In: Proceedings of SIAM international conference on data mining (SDM), pp 954–965Google Scholar
  28. Liu X, Yang X (2011) A generalization based approach for anonymizing weighted social network graphs. In: WAIM, pp 118–130Google Scholar
  29. Lu X, Song Y, Bressan S (2012) Fast identity anonymization on graphs. In: Proceedings of international conference on database and expert systems applications (DEXA), pp 281–295Google Scholar
  30. Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) l-diversity: privacy beyond k-anonymity. In: Proceedings of IEEE international conference on data engineering (ICDE), p 24Google Scholar
  31. McCallum A, Corrada-Emmanuel A, Wang X (2005) Topic and role discovery in social networks. In: International joint conference on artificial intelligence (IJCAI), pp 786–791Google Scholar
  32. Medforth N, Wang K (2011) Privacy risk in graph stream publishing for social network data. In: Proceedings of IEEE international conference on data mining (ICDM), pp 437–446Google Scholar
  33. Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of ACM symposium on principles of database systems (PODS), pp 223–228Google Scholar
  34. Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: IEEE symposium on security and privacy, pp 173–187Google Scholar
  35. Nobari S, Karras P, Pang H, Bressan S (2014) L-opacity: linkage-aware graph anonymization. In: Proceedings of international conference on extending database technology (EDBT), pp 583–594Google Scholar
  36. Park H, Shim K (2007) Approximate algorithms for k-anonymity. In: Proceedings of ACM SIGMOD international conference on management of data, pp 67–78Google Scholar
  37. Rymon R (1992) Search through systematic set enumeration. In: International conference on principles of knowledge representation and reasoning (KR), pp 539–550Google Scholar
  38. Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRefGoogle Scholar
  39. Seary AJ, Richards WD (2000) Spectral methods for analyzing and visualizing networks: an introduction. In: Workshop summary and papers, pp 209–228Google Scholar
  40. Song Y, Karras P, Xiao Q, Bressan S (2012) Sensitive label privacy protection on social network data. In: International conference on scientific and statistical database management (SSDBM), pp 562–571Google Scholar
  41. Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570CrossRefMathSciNetzbMATHGoogle Scholar
  42. Tai CH, Yu PS, Yang DN, Chen MS (2011) Privacy-preserving social network publication against friendship attacks. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 1262–1270Google Scholar
  43. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393:440–442CrossRefGoogle Scholar
  44. Wu W, Xiao Y, Wang W, He Z, Wang Z (2010) K-symmetry model for identity anonymization in social networks. In: Proceedings of international conference on extending database technology (EDBT), pp 111–122Google Scholar
  45. Xue M, Karras P, Raïssi C, Kalnis P, Pung HK (2012) In: CIKM Delineating social network data anonymization via random edge perturbation, pp 475–484Google Scholar
  46. Ying X, Pan K, Wu X, Guo L (2009) Comparisons of randomization and k-degree anonymization schemes for privacy preserving social network publishing. In: Workshop on social network mining and analysis (SNA-KDD), p 10Google Scholar
  47. Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: Proceedings of SIAM International Conference on Data Mining (SDM), pp 739–750Google Scholar
  48. Yuan M, Chen L (2011) Node protection in weighted social networks. DASFAA 1:123–137Google Scholar
  49. Yuan M, Chen L, Yu PS (2010) Personalized privacy protection in social networks. Proc VLDB Endow 4(2):141–150CrossRefGoogle Scholar
  50. Zheleva E, Getoor L (2007) Preserving the privacy of sensitive relationships in graph data. In: ACM international workshop on privacy, security, and trust in KDD (PinKDD), pp 153–171Google Scholar
  51. Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: Proceedings of IEEE international conference on data engineering (ICDE), pp 506–515Google Scholar
  52. Zhou B, Pei J (2011) The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28(1):47–77CrossRefMathSciNetGoogle Scholar
  53. Zou L, Chen L, Özsu MT (2009) K-automorphism: a general framework for privacy preserving network publication. Proc VLDB Endow 2(1):946–957CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Wien 2014

Authors and Affiliations

  • Yifan Hao
    • 1
  • Huiping Cao
    • 1
  • Chuan Hu
    • 1
  • Kabi Bhattarai
    • 1
  • Satyajayant Misra
    • 1
  1. 1.Computer ScienceNew Mexico State UniversityLas CrucesUSA

Personalised recommendations