Skip to main content

Advertisement

Log in

The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Recently, more and more social network data have been published in one way or another. Preserving privacy in publishing social network data becomes an important concern. With some local knowledge about individuals in a social network, an adversary may attack the privacy of some victims easily. Unfortunately, most of the previous studies on privacy preservation data publishing can deal with relational data only, and cannot be applied to social network data. In this paper, we take an initiative toward preserving privacy in social network data. Specifically, we identify an essential type of privacy attacks: neighborhood attacks. If an adversary has some knowledge about the neighbors of a target victim and the relationship among the neighbors, the victim may be re-identified from a social network even if the victim’s identity is preserved using the conventional anonymization techniques. To protect privacy against neighborhood attacks, we extend the conventional k-anonymity and l-diversity models from relational data to social network data. We show that the problems of computing optimal k-anonymous and l-diverse social networks are NP-hard. We develop practical solutions to the problems. The empirical study indicates that the anonymized social network data by our methods can still be used to answer aggregate network queries with high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adamic L, Adar E (2005) How to search a social network. Soc Netw 27(3): 187–203

    Article  Google Scholar 

  2. Backstrom L, Huttenlocher D, Kleinberg J, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’06), ACM Press, New York, pp 44–54

  3. Backstrom L, Dwork C, Kleinberg J (2007) Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of the 16th international conference on World Wide Web (WWW’07), ACM Press, New York, pp 181–190

  4. Bhagat S, Cormode G, Krishnamurthy B, Srivastava D (2009) Class-based graph anonymization for social network data. PVLDB 2(1): 766–777

    Google Scholar 

  5. Campan A, Truta TM (2008) A clustering approach for data and structural anonymity in social networks. In: Proceedings of the 2nd ACM SIGKDD international workshop on privacy, security, and trust in KDD (PinKDD’08), in conjunction with KDD’08, Las Vegas, Nevada

  6. Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM international conference on data mining (SDM’04), SIAM, Philadelphia

  7. Cormen TH, Leiserson CE, Rivest RL, Stein C (2002) Introduction to algorithms, 2nd edn. MIT Press and McGraw-Hill, Cambridge

    Google Scholar 

  8. Cormode G, Srivastava D, Yu T, Zhang Q (2008) Anonymizing bipartite graph data using safe groupings. PVLDB 1(1): 833–844

    Google Scholar 

  9. Coull SE, Monrose F, Reiter MK, Bailey M (2009) The challenges of effectively anonymizing network data. In: Proceedings of the 2009 cybersecurity applications & technology conference for homeland security (CATCH’09), IEEE Computer Society, Washington, DC, pp 230–236

  10. Dwork C (2008) Differential privacy: a survey of results. In: Proceedings of the 5th international conference on theory and applications of models of computation. Lecture notes in computer science, vol 4978. Springer, pp 1–19

  11. Dwork C, Smith A (2008) Differential privacy for statistics: what we know and what we want to learn. In: Proceedings of NCHS/CDC data confidentiality workshop

  12. Faloutsos M, Faloutsos P, Faloutsos C (1999) On power law relationships of the internet topology. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication (SIGCOMM’99), ACM Press, New York, pp 251–262

  13. Garey MR, Johnson DS (1979) Computers and intractability: a guide to the theory of NP-completeness. W. H. Freeman & Co., New York

    MATH  Google Scholar 

  14. Getoor L, Diehl CP (2005) Link mining: a survey. ACM SIGKDD Explor Newsl 7(2): 3–12

    Article  Google Scholar 

  15. Gkoulalas-Divanis A, Verykios VS (2009) Hiding sensitive knowledge without side effects. Knowl Inf Syst 20(3): 263–299

    Article  Google Scholar 

  16. Hay M, Miklau G, Jensen D, Weis P, Srivastava S (2007) Anonymizing social networks. Tech. Rep. 07-19, University of Massachusetts Amherst

  17. Hay M, Miklau G, Jensen D, Towsley D (2008) Resisting structural identification in anonymized social networks. PVLDB 1(1): 102–114

    Google Scholar 

  18. Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: Proceedings of the 2009 ninth IEEE international conference on data mining (ICDM’09), IEEE Computer Society, Washington, DC, pp 169–178

  19. Hazan E, Safra S, Schwartz O (2003) On the complexity of approximating k-dimensional matching. In: Proceedings of the 6th international workshop on approximation algorithms for combinatorial optimization problems and of the 7th international workshop on randomization and computation techniques in computer science (RANDOM-APPROX’03), LNCS, vol 2764. Springer, Berlin, pp 83–97

  20. Korolova A, Motwani R, Nabar SU, Xu Y (2008) Link privacy in social networks. In: Proceedings of the 24th international conference on data engineering (ICDE’08), IEEE, pp 1355–1357

  21. Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311(5757): 88–90

    Article  MathSciNet  Google Scholar 

  22. Kumar R, Novak J, Tomkins A (2006) Structure and evolution of online social networks. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’06), ACM Press, New York, pp 611–617

  23. Li N, Li T, Venkatasubramanian S (2007) t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd international conference on data engineering (ICDE’07), IEEE, pp 106–115

  24. Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data (SIGMOD’08), ACM Press, New York, pp 93–106

  25. Liu K, Das K, Grandison T, Kargupta H (2008) Privacy-preserving data analysis on graphs and social networks. In: Kargupta H, Han J, Yu P, Motwani R, Kumar V (eds) Next generation data mining. CRC Press, Boca Raton

    Google Scholar 

  26. Luo H, Fan J, Lin X, Zhou A, Bertino E (2009) A distributed approach to enabling privacy-preserving model-based classifier training. Knowl Inf Syst 20(2): 157–185

    Article  Google Scholar 

  27. Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M (2006) L-diversity: privacy beyond k-anonymity. In: Proceedings of the 22nd IEEE international conference on data engineering (ICDE’06), IEEE Computer Society, Washington, DC

  28. Machanavajjhala A, Kifer D, Abowd JM, Gehrke J, Vilhuber L (2008) Privacy: theory meets practice on the map. In: Proceedings of the 24th international conference on data engineering (ICDE’08), pp 277–286

  29. Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS’04), ACM, New York, pp 223–228

  30. Muhlestein D, Lim S (2010) Online learning with social computing based interest sharing. Knowl Inf Syst. doi:10.1007/s10115-009-0265-4

  31. Qiu L, Li Y, Wu X (2008) Protecting business intelligence and customer privacy while outsourcing data mining tasks. Knowl Inf Syst 17(2): 99–120

    Article  Google Scholar 

  32. Rastogi V, Hay M, Miklau G, Suciu D (2009) Relationship privacy: output perturbation for queries with joins. In: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems (PODS’09), ACM, New York, pp 107–116

  33. Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng (TKDE) 13(6): 1010–1027

    Article  Google Scholar 

  34. Samarati P, Sweeney L (1998) Generalizing data to provide anonymity when disclosing information. In: Proceedings of the 7th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS’98), ACM Press, New York, p 188

  35. Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5): 557–570

    Article  MATH  MathSciNet  Google Scholar 

  36. Wang DW, Liau CJ, Hsu TS (2006) Privacy protection in social network data disclosure based on granular computing. In: Proceedings of the 2006 IEEE international conference on fuzzy systems, Vancouver, BC, pp 997–1003

  37. Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, New York

    Google Scholar 

  38. Xiao X, Tao Y (2006) Anatomy: simple and effective privacy preservation. In: Dayal U, Whang KY, Lomet DB, Alonso G, Lohman GM, Kersten ML, Cha SK, Kim YK (eds) Proceedings of the 32nd international conference on very large data bases (VLDB’06), ACM, pp 139–150

  39. Xiao X, Tao Y (2006b) Personalized privacy preservation. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data (SIGMOD’06), ACM Press, New York, pp 229–240

  40. Xiao X, Tao Y (2007) M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data (SIGMOD’07), ACM, New York, pp 689–700

  41. Xiao X, Tao Y (2008) Output perturbation with query relaxation. PVLDB 1(1): 857–869

    Google Scholar 

  42. Xu J, Wang W, Pei J, Wang X, Shi B, Fu AWC (2006) Utility-based anonymization using local recoding. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’06), ACM Press, New York, pp 785–790

  43. Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM’02), IEEE Computer Society, Washington, DC, p 721

  44. Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data (SIGMOD’04), ACM Press, New York, pp 335–346

  45. Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: Proceedings of the 2008 SIAM international conference on data mining (SDM’08), SIAM, pp 739–750

  46. Ying X, Wu X (2009a) On link privacy in randomizing social networks. In: Proceedings of the 13th Pacific-Asia conference on advances in knowledge discovery and data mining, Springer, pp 28–39

  47. Ying X, Wu X (2009b) On randomness measures for social networks. In: Proceedings of the 2009 SIAM international conference on data mining, SIAM, pp 709–720

  48. Zheleva E, Getoor L (2007) Preserving the privacy of sensitive relationships in graph data. In: Proceedings of the 1st ACM SIGKDD workshop on privacy, security, and trust in KDD (PinKDD’07)

  49. Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 24th IEEE international conference on data engineering (ICDE’08), IEEE Computer Society, Cancun, pp 506–515

  50. Zhou B, Pei J, Luk WS (2008) A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explor 10(2): 12–22

    Article  Google Scholar 

  51. Zou L, Chen L, Özsu MT (2009) K-automorphism: a general framework for privacy preserving network publication. PVLDB 2(1): 946–957

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Zhou.

Additional information

A preliminary version of this paper appears as Zhou and Pei [49]. This research is supported in part by an NSERC Discovery Grant and an NSERC Discovery Accelerator Supplement Grant. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, B., Pei, J. The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28, 47–77 (2011). https://doi.org/10.1007/s10115-010-0311-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0311-2

Keywords

Navigation