Anonymizing Social Network Data for Maximal Frequent-Sharing Pattern Mining

  • Benjamin C. M. Fung
  • Yan’an Jin
  • Jiaming Li
  • Junqiang Liu
Chapter
Part of the Lecture Notes in Social Networks book series (LNSN)

Abstract

Social network data provide valuable information for companies to better understand the characteristics of their potential customers with respect to their communities. Yet, sharing social network data in its raw form raises serious privacy concerns because a successful privacy attack not only compromises the sensitive information of the target victim but also divulges the relationship with his/her friends or even their private information. In recent years, several anonymization techniques have been proposed to solve these issues. Most of them focus on how to achieve a given privacy model but fail to preserve the data mining knowledge required for data recipients. In this paper, we propose a method to \(k\)-anonymize a social network dataset with the goal of preserving frequent sharing patterns and maximal frequent sharing patterns, the most important kinds of knowledge required for marketing and consumer behavior analysis. Experimental results on real-life data illustrate the trade-off between privacy and utility loss with respect to the preservation of (maximal) frequent sharing patterns.

Keywords

Privacy protection Anonymization Neighborhood attack Data mining Frequent sharing pattern 

References

  1. 1.
    Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, pp 247–255Google Scholar
  2. 2.
    Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 439–450Google Scholar
  3. 3.
    Backstrom L, Dwork C, Kleinberg J (2007) Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of the 16th international conference on world wide web, pp 181–190Google Scholar
  4. 4.
    Bonchi F, Gionis A, Tassa T (2011) Identity obfuscation in graphs through the information theoretic lens. In: Proceedings of the 27th IEEE international conference on data engineering (ICDE), pp 924–935Google Scholar
  5. 5.
    Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th international conference on data engineering, pp 443–452Google Scholar
  6. 6.
    Campan A, Truta TM (2008) A clustering approach for data and structural anonymity in social networks. In: Proceedings of the 2nd ACM SIGKDD international workshop on privacy, security, and trust in KDD workshop, pp 1–10Google Scholar
  7. 7.
    Cheng J, Wai-Chee Fu A, Liu J (2010) K-isomorphism: privacy preserving network publication against structural attacks. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 459–470Google Scholar
  8. 8.
    Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication, pp 251–262Google Scholar
  9. 9.
    Fukuzaki M, Seki M, Kashima H, Sese J (2010) Finding itemset-sharing patterns in a large itemset-associated graph. In Proceedings of the 14th Pacific-Asia conference on advances in knowledge discovery and data mining, pp 147–159Google Scholar
  10. 10.
    Fung BCM, Jin Y, Li J (2013). Preserving privacy and frequent sharing patterns for social network data publishing. In: Proceedings of the 5th IEEE/ACM international conference on social networks analysis and mining (ASONAM), Niagara Falls, Canada, pp 479–485Google Scholar
  11. 11.
    Fung BCM, Wang K, Chen R, Yu PS (2010) Privacy-preserving data publishing: a survey of recent developments. ACM Comput Surv 42(4):14:1–14:53CrossRefGoogle Scholar
  12. 12.
    Fung BCM, Wang K, Wai-Chee Fu A, Yu PS (2010) Introduction to privacy-preserving data publishing: concepts and techniques. Data mining and knowledge discovery. Chapman & Hall/CRC, Boca RatonGoogle Scholar
  13. 13.
    Garey MR, Johnson DS (1979) Computers and intractability; a guide to the theory of NP-completeness. W. H. Freeman and Company, New YorkMATHGoogle Scholar
  14. 14.
    Hanhijärvi S, Garriga GC, Puolamäki K (2009) Randomization techniques for graphs. In: Proceedings of the 9th SIAM international conference on data mining (SDM), pp 780–791Google Scholar
  15. 15.
    Hay M, Miklau G, Jensen D, Towsley D, Weis P (2008) Resisting structural re-identification in anonymized social networks. Proc VLDB Endow 1(1):102–114CrossRefGoogle Scholar
  16. 16.
    Hay M, Miklau G, Jensen D, Weis P, Srivastava S (2007) Anonymizing social networks. Technical Report 07–19, Computer Science Department, University of Massachusetts AmherstGoogle Scholar
  17. 17.
    Korolova A, Motwani R, Nabar SU, Xu Y (2008) Link privacy in social networks. In: Proceedings of the 17th ACM Conference on information and knowledge management, pp 289–298Google Scholar
  18. 18.
    Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data (TKDD), vol 1Google Scholar
  19. 19.
    Wu XYL, Wu X (2010) Reconstruction from randomized graph via low rank approximation. In: Proceedings of the 10th SIAM international conference on data mining, pp 60–71Google Scholar
  20. 20.
    Liu K, Terzi E (2008) Towards identity anonymization graphs. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–106Google Scholar
  21. 21.
    Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data (TKDD), vol 1Google Scholar
  22. 22.
    Mohammed N, Fung BCM, Debbabi M (2011) Anonymity meets game theory: secure data integration with malicious participants. Very Large Data Bases J (VLDBJ) 20(4):567–-588CrossRefGoogle Scholar
  23. 23.
    Mohammed N, Fung BCM, Hung PCK, Lee C-K (2010) Centralized and distributed anonymization for high-dimensional healthcare data. ACM Trans Knowl Discov Data (TKDD) 4(4):18:1–18:33Google Scholar
  24. 24.
    Narayanan A, Shmatikov V (2009) De-anonymizing social networks. In: Proceedings of the IEEE symposium on security and privacy (S&P)Google Scholar
  25. 25.
    Samarati P (2001) Protecting respondents privacy in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027CrossRefGoogle Scholar
  26. 26.
    Pierangela S, Latanya S (1998) Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report, SRI InternationalGoogle Scholar
  27. 27.
    Sweeney L (2002) k-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl-based Syst 10(5):557–570CrossRefMATHMathSciNetGoogle Scholar
  28. 28.
    Wang K, Fung BCM, Yu PS (2007) Handicapping attacker’s confidence. Knowl Inf Syst 11:345–368CrossRefGoogle Scholar
  29. 29.
    Wu W, Xiao Y, Wang W, He Z, Wang Z (2010) K-symmetry model for identity anonymization in social networks. In: Proceedings of the 13th international conference on extending database technology (EDBT)Google Scholar
  30. 30.
    Wu X, Ying X, Liu K, Chen L (2009) A survey of algorithms for privacy-preservation of graphs and social networks, chapter managing and mining graph data. Kluwer Academic Publishers, The NetherlandsGoogle Scholar
  31. 31.
    Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM), pp 721–724Google Scholar
  32. 32.
    Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: Proceedings of the 8th SIAM international conference on data mining (ICDM), pp 739–750Google Scholar
  33. 33.
    Ying X, Wu X (2009) Graph generation with prescribed feature constraints. In: Proceedings of the 9th SIAM international conference on data mining, pp 966–977Google Scholar
  34. 34.
    Zhang L, Zhang W (2009) Edge anonymity in social network graphs. In: Proceedings of the 2009 international conference on computational science and engineering, pp 1–8Google Scholar
  35. 35.
    Zheleva E, Getoor L (2007) Preserving the privacy of sensitive relationships in graph data. In: Proceedings of the 1st ACM SIGKDD international workshop on privacy, security, and trust, pp 153–171Google Scholar
  36. 36.
    Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the 2008 IEEE 24th international conference on data engineering, pp 506–515Google Scholar
  37. 37.
    Zou L, Chen L, Tamer Özsu M (2009) K-automorphism: a general framework for privacy preserving network publication. Proc VLDB Endow 2(1):946–957CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Benjamin C. M. Fung
    • 1
  • Yan’an Jin
    • 2
  • Jiaming Li
    • 3
  • Junqiang Liu
    • 4
  1. 1.McGill UniversityMontrealCanada
  2. 2.Huazhong University of Science and TechnologyHubei University of EconomicsHubeiChina
  3. 3.IBM Canada Software LabTorontoCanada
  4. 4.Zhejiang Gongshang UniversityZhejiangChina

Personalised recommendations