Skip to main content
Log in

Why Waldo befriended the dummy? k-Anonymization of social networks with pseudo-nodes

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

For a graph-based representation of a social network, the identity of participants can be uniquely determined if an adversary has background structural knowledge about the graph. We focus on degree-based attacks, wherein the adversary knows the degrees of particular target vertices and we aim to protect the anonymity of participants through k-anonymization, which ensures that every participant is equivalent to at least k − 1 other participants with respect to degree. We introduce a natural and novel approach of introducing “dummy” participants into the network and linking them to each other and to real participants in order to achieve this anonymity. The advantage of our approach lies in the nature of the results that we derive. We show that if participants have labels associated with them, the problem of anonymizing a subset of participants is NP-Complete. On the other hand, in the absence of labels, we give an \(\mathcal{O}(nk)\) algorithm to optimally k-anonymize a subset of participants or to near-optimally k-anonymize all real and all dummy participants. For degree-based-attacks, such theoretical guarantees are novel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. We define a vertex-labelled graph as the four-tuple \((\hbox{V,E},\Upsigma,\ell)\), where V is a vertex set, \(\hbox{E}\subseteq \hbox{V}\times\hbox{V}\) is a set of undirected edges, \(\Upsigma\) is a set of sensitive labels, and \( \ell:\hbox{V}\mapsto\Upsigma\) is a labelling function that assigns a label to each vertex. We discuss in the paper two types of labels, sensitive and identifying. By \(\Upsigma\), we refer to the former, assuming the latter is stripped from the graph.

  2. We mention specific cases in which these questions have been answered in our discussion of related work in Sect. 6 Even in these cases, however, not all three questions have been fully addressed.

  3. Precise formulations of the problem appear in Sect. 2 for unlabelled graphs and in Sect. 5 for labelled graphs.

  4. For simplicity in this section, we regard a graph as a 2-tuple. We note that equivalently, for consistency, we could express an unlabelled graph as \(\mathcal{G}=(\hbox{V, E},\Upsigma,\ell)\) where \(\exists \sigma\in\Upsigma: \forall v\in\hbox{V}, {\ell}(v)=\sigma\). However, the simpler notation simplifies the exposition.

  5. Considering the Enron email corpus on which we experiment in Sect. 4.1, |V| > 65,000, but only 151 vertices correspond to internal email addresses.

  6. http://snap.stanford.edu/data/.

  7. http://www-personal.umich.edu/mejn/netdata/.

  8. http://www.casos.cs.cmu.edu/computational_tools/datasets/external/polblogs/index11.php.

  9. Recall that a walk is any sequence of adjacent edges, including those which revisit edges and/or vertices.

References

  • Adamic L, Glance N (2005) The political blogosphere and the 2004 u.s. election: divided they blog. In: Proceedings of WWW 2005 workshop on the weblogging ecosystem

  • Aggarwal G, Feder T, Kenthapadi K, Motwani R, Panigrahy R, Thomas D, Zhu A (2005) Anonymizing tables. In: Proceedings of international conference on database theory (ICDT), pp 246–258

  • Akiyama J, Era H, Harary F (1983) Regular graphs containing a given graph. Am Math Month 83:15–17

    MathSciNet  Google Scholar 

  • Backstrom L, Dwork C, Kleinberg JM (2007) Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of conference on world wide web (WWW), pp 181–190

  • Barrat A, Weigt M (2000) On the properties of small-world network models. Eur Phys J B 13(3):547–560

    Google Scholar 

  • Bodlaender HL, Tan RB, van Leeuwen J (2000) Finding a delta-regular supergraph of minimum order. Tech Rep UU-CS-2000-29, Dept of Computer Science, Utrecht University, Utrecht

  • Chakrabarti, D., Faloutsos, C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv 38(1):2. doi:10.1145/1132952.1132954

    Google Scholar 

  • Cheng J, Fu AWC, Liu J (2010) K-isomorphism: privacy preserving network publication against structural attacks. In: Proceedings of ACM Special Interest Group on Management of Data (SIGMOD), pp 459–470

  • Chester S, Srivastava G (2011) Social network privacy for attribute disclosure attacks. In: Proceedings of advances in social networks analysis and mining (ASONAM)

  • Chester S, Kapron B, Ramesh G, Srivastava G, Thomo A, Venkatesh S (2011) k-anonymization of social networks by vertex addition. In: Proceedings of advances in databases and information systems (ADBIS)

  • Chester S, Gaertner J, Stege U, Venkatesh S (2012a) Anonymizing subsets of social networks with degree constrained subgraphs. In: Proceedings of advances in social networks analysis and mining (ASONAM)

  • Chester S, Kapron B, Srivastava G, Venkatesh S (2012b) Complexity of social network anonymization. Soc Netw Anal Min. doi:10.1007/s13278-012-0059-7

  • Costa LdF, Rodrigues FA, Travieso G, Villas Boas PR (2007) Characterization of complex networks: a survey of measurements. Adv Phys 56:167–242

    Article  Google Scholar 

  • Domingo-Ferrer J (ed) (2002) Inference Control in statistical databases, from theory to practice. In: Lecture Notes in Computer Science, vol 2316. Springer, Berlin

  • Dwork C (2006) Differential privacy. In: ICALP. Springer, Berlin, pp 1–12

  • Erdős P, Kelly P (1967) The minimal regular graph containing a given graph. Am Math Month 70:1074–1075

    Article  Google Scholar 

  • Estrada E, Rodriguez-Velazquez JA (2005) Spectral measures of bipartivity in complex networks. Phys Rev E 72(4):046105. doi:10.1103/PhysRevE.72.046105

    Google Scholar 

  • Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. SIGCOMM Comput Commun Rev 29(4):251–262. doi:10.1145/316194.316229

    Article  Google Scholar 

  • Ferri F, Grifoni P, Guzzo T (2012) New forms of social and professional digital relationships: the case of facebook. Soc Netw Anal Min 2(2):121–137

    Article  Google Scholar 

  • Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99:7821–7826

    Article  MathSciNet  MATH  Google Scholar 

  • González JJS (2002) Extending cell suppression to protect tabular data against several attackers. In: Inference Control in Statistical Databases, pp 34–58

  • Hay M, Miklau G, Jensen D, Towsley DF, Weis P (2008) Resisting structural re-identification in anonymized social networks. Proc Very Large Datab 1(1):102–114

    Google Scholar 

  • Heer J (2005) Prefuse: a toolkit for interactive information visualization. In: CHI 05: Proceedings of the SIGCHI conference on human factors in computing systems. ACM Press, New York, pp 421–430

  • König D (1936) Akademische verlagsgesellschaft. Leipzig

  • Latora V, Marchiori M (2001) Efficient behavior of small-world networks. Phys Rev Lett 87. doi:10.1103/PhysRevLett.87.198701

  • Leskovec J, Kleinberg J, Faloutsos C (2005) Graphs over time: Densification laws, shrinking diameters and possible explanations. In: Proceedings of international conference on knowledge discovery and data mining (KDD)

  • Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of conference on world wide web (WWW), pp 695–704

  • Li N, Li T, Venkatasubramanian S (2007) t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of of IEEE 23rd international conference on data engineering (ICDE07)

  • Liu K, Terzi E (2008) Towards identity anonymization on graphs. In: Proceedings of ACM Special Interest Group on Management of Data (SIGMOD), pp 93–106

  • Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: Privacy beyond k-anonymity. ACM Trans. Knowl. Discov. Data 1(1). doi:10.1145/1217299.1217302

  • McSherry F, Mironov I (2009) Differentially private recommender systems: building privacy into the netflix prize contenders. In: Proceedings of international conference on knowledge discovery and data mining (KDD), pp 627–636

  • Meyerson A, Williams R (2004) On the complexity of optimal k-anonymity. In: Principles of database systems, pp 223–228

  • Milgram S (1967) The small world problem. Psychol Today 2:60–67

    Google Scholar 

  • Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3). doi:10.1103/PhysRevE.74.036104

  • Robertson DA, Ethier R (2002) Cell suppression: experience and theory. In: Inference control in statistical databases, pp 8–20

  • Sweeney L (2002) k-anonymity: A model for protecting privacy. Int J Uncertainty Fuzziness Knowl Based Syst 10(5):557–570

    Article  MathSciNet  MATH  Google Scholar 

  • Thompson B, Yao D (2009) The union-split algorithm and cluster-based anonymization of social networks. In: Proceedings of ACM symposium on information, computer and communications security (ASIACCS), pp 218–227

  • Wang Y, Xie L, Zheng B, Lee KCK (2011) Utility-oriented k-anonymization on social networks. In: Proceedings of the 16th international conference on Database systems for advanced applications, vol Part I, DASFAA’11. Springer, Berlin, pp 78–92

  • Wu W, Xiao Y, Wang W, He Z, Wang Z (2010) k-symmetry model for identity anonymization in social networks. In: Proceedings of international conference on extending database technology (EDBT), pp 111–122

  • Ying X, Pan K, Wu X, Guo L (2009) Comparisons of randomization and k-degree anonymization schemes for privacy preserving social network publishing. In: Proceedings of 3rd workshop on social network mining and analysis (SNA-KDD). ACM, New York, pp 10:1–10:10

  • Yuan M, Chen L, Yu PS (2010) Personalized privacy protection in social networks. Proc Very Large Datab 4(2):141–150

    Google Scholar 

  • Zheleva E, Getoor L (2007) Preserving the privacy of sensitive relationships in graph data. In: Proceedings of privacy, security, and trust in KDD (PinKDD), pp 153–171

  • Zhou B, Pei J (2011) The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowledge Information Systems 28(1):47–77

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean Chester.

Additional information

A preliminary, short version (Chester et al. 2011) of this paper appeared at ADBIS 2011.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chester, S., Kapron, B.M., Ramesh, G. et al. Why Waldo befriended the dummy? k-Anonymization of social networks with pseudo-nodes. Soc. Netw. Anal. Min. 3, 381–399 (2013). https://doi.org/10.1007/s13278-012-0084-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13278-012-0084-6

Keywords

Navigation