De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

Bringmann, Karl; Friedrich, Tobias; Krohmer, Anton

doi:10.1007/s00453-017-0395-0

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

Published: 15 November 2017

Volume 80, pages 3397–3427, (2018)
Cite this article

Algorithmica Aims and scope Submit manuscript

244 Accesses
3 Citations
Explore all metrics

Abstract

There are hundreds of online social networks with altogether billions of users. Many such networks publicly release structural information, with all personal information removed. Empirical studies have shown, however, that this provides a false sense of privacy—it is possible to identify almost all users that appear in two such anonymized network as long as a few initial mappings are known. We analyze this problem theoretically by reconciling two versions of an artificial power-law network arising from independent subsampling of vertices and edges. We present a new algorithm that identifies most vertices and makes no wrong identifications with high probability. The number of vertices matched is shown to be asymptotically optimal. For an n-vertex graph, our algorithm uses \(n^\varepsilon \) seed nodes (for an arbitrarily small \(\varepsilon \)) and runs in quasilinear time. This improves previous theoretical results which need \(\Theta (n)\) seed nodes and have runtimes of order \(n^{1+\Omega (1)}\). Additionally, the applicability of our algorithm is studied experimentally on different networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

The Complexity of Finding a Large Subgraph under Anonymity Constraints

Improved Upper and Lower Bound Heuristics for Degree Anonymization in Social Networks

Notes

As our graphs are given by adjacency lists, this means that for each graph \(G_i\) we pick a random permutation \(\pi _i\). Then the jth adjacency list becomes the \(\pi _i(j)\)th adjacency list, each entry \(\ell \) of an adjacency list is replaced by \(\pi _i(\ell )\), and finally we sort each adjacency list to obtain a proper description of the permuted graph.
Throughout the paper, we say that a bound holds with high probability (w.h.p.) if it holds with probability at least \(1-n^{-c}\) for some \(c > 0\).
In the whole paper \(\mathcal {O}(\cdot )\) and \(\Omega (\cdot )\) hide any dependency on the power law exponent \(\beta \) of G. We always assume \(2<\beta < 3\).
A realistic application of deanonymization algorithms such as ours could be to manually identify a few high degree nodes, corresponding to public figures, and to run the algorithm to identify the remaining nodes. For the manual step, one may exploit additional metadata—such high-profile vertices are typically public and share lots of information. The algorithm itself does not rely on any such information and only uses graph structure.
In this model, a bipartite graph of users and interests is constructed; and two users are connected if they share an interest. To create two subsampled graphs, each interest is deleted independently with probability 0.25 in both graphs.
http://socialnetworks.mpi-sws.org/datasets.html.
http://snap.stanford.edu/data/.

References

Aiello, W., Chung, F., Lu, L.: A random graph model for massive graphs. In: 32nd Annual ACM Symposium on Theory of Computing (STOC), pp. 171–180 (2000)
Aiello, W., Chung, F., Lu, L.: A random graph model for power law graphs. Exp. Math. 10(1), 53–66 (2001)
Article MathSciNet MATH Google Scholar
Alon, N., Spencer, J.H.: The Probabilistic Method, 3rd edn. Wiley, Hoboken (2008)
Book MATH Google Scholar
Amini, H., Fountoulakis, N.: What I tell you three times is true: bootstrap percolation in small worlds. In: 8th International Workshop on Internet and Network Economics (WINE), pp. 462–474 (2012)
Arvind, V., Köbler, J., Kuhnert, S., Vasudev, Y.: Approximate graph isomorphism. In: 37th International Symposium on Mathematical Foundations of Computer Science (MFCS), pp. 100–111. Springer (2012)
Backstrom, L., Dwork, C., Kleinberg, J.: Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography. Commun. ACM 54(12), 133–141 (2011)
Article Google Scholar
Barabási, A.-L., Albert, R.: Emergence of scaling in random networks. Science 286, 509–512 (1999)
Article MathSciNet MATH Google Scholar
Bringmann, K., Friedrich, T., Krohmer, A.: De-anonymization of heterogeneous random graphs in quasilinear time. In: 22nd European Symposium on Algorithms (ESA). Lecture Notes in Computer Science. Springer (2014)
Chung, F., Lu, L.: Connected components in random graphs with given expected degree sequences. Ann. Comb. 6(2), 125–145 (2002)
Article MathSciNet MATH Google Scholar
Chung, F., Lu, L.: The average distances in random graphs with given expected degrees. Proc. Natl. Acad. Sci. (PNAS) 99(25), 15879–15882 (2002)
Article MathSciNet MATH Google Scholar
Chung, F., Lu, L.: The average distance in a random graph with given expected degrees. Internet Math. 1(1), 91–113 (2004)
Article MathSciNet MATH Google Scholar
Dereich, S., Mönch, C., Mörters, P.: Typical distances in ultrasmall random networks. Adv. Appl. Prob. 44(2), 583–601 (2012)
Article MathSciNet MATH Google Scholar
Dwork, C.: Differential privacy. In: 33rd International Colloquium on Automata, Languages, and Programming (ICALP), pp. 1–12 (2006)
Fountoulakis, N., Panagiotou, K., Sauerwald, T.: Ultra-fast rumor spreading in social networks. In: 23rd Symposium Discrete Algorithms (SODA), pp. 1642–1660 (2012)
Friedrich, T., Krohmer, A.: Parameterized clique on scale-free networks. In: 23rd International Symposium on Algorithms and Computation (ISAAC), pp. 659–668 (2012)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: 30th Annual ACM Symposium on Theory of Computing (STOC), pp. 604–613 (1998)
Kim, J.H., Vu, V.H.: Concentration of multivariate polynomials and its applications. Combinatorica 20(3), 417–434 (2000)
Article MathSciNet MATH Google Scholar
Korula, N., Lattanzi, S.: An efficient reconciliation algorithm for social networks. In: 40th International Conference on Very Large Data Bases (VLDB), pp. 377–388 (2014)
Lattanzi, S., Sivakumar, D.: Affiliation networks. In: 41st Annual ACM Symposium on Theory of Computing (STOC), pp. 427–434 (2009)
McGregor, A., Mironov, I., Pitassi, T., Reingold, O., Talwar, K., Vadhan, S.P.: The limits of two-party differential privacy. In: 51th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 81–90 (2010)
Mitzenmacher, M.: A brief history of generative models for power law and lognormal distributions. Internet Math 1(2), 226–251 (2004)
Article MathSciNet MATH Google Scholar
Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: 30th IEEE Symposium on Security and Privacy (SP), pp. 173–187 (2009)
Newman, I., Sohler, C.: Every property of hyperfinite graphs is testable. In: 43rd Annual ACM Symposium on Theory of Computing (STOC), pp. 675–684 (2011)
Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)
Article MathSciNet MATH Google Scholar
Novak, J., Raghavan, P., Tomkins, A.: Anti-aliasing on the web. In: 13th International Conference on World Wide Web (WWW), pp. 30–39 (2004)
Rao, J.R., Rohatgi, P.: Can pseudonymity really guarantee privacy? In: 9th USENIX Security Symposium (USENIX), pp. 85–96 (2000)
Sala, A., Zhao, X., Wilson, C., Zheng, H., Zhao, B.Y.: Sharing graphs using differentially private graph models. In: ACM SIGCOMM Conference on Internet Measurement Conference (IMC), pp. 81–98 (2011)
van der Hofstad, R.: Random graphs and complex networks. www.win.tue.nl/~rhofstad/NotesRGCN.pdf (2014)
Vijayraghavan, A., Wu, Y., Yoshida, Y., Zhou, Y.: Graph isomorphism: approximate and robust. Unpublished manuscript, available from the authors (2013)
Wondracek, G., Holz, T., Kirda, E., Kruegel, C.: A practical attack to de-anonymize social network users. In: IEEE Symposium on Security and Privacy (SP), pp. 223–238 (2010)
Zafarani, R., Liu, H.: Connecting corresponding identities across communities. In: 3rd International Conference on Weblogs and Social Media (ICWSM), pp. 354–357 (2009)
Zheleva, E., Getoor, L.: To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In: 18th International Conference on World Wide Web (WWW), pp. 531–540 (2009)

Download references

Acknowledgements

We thank Silvio Lattanzi from Google Inc. for fruitful discussions, sharing their data sets, and sending us a preliminary version of [18] at the early stages of this project. Karl Bringmann is a recipient of the Google Europe Fellowship in Randomized Algorithms, and this research is supported in part by this Google Fellowship. Tobias Friedrich received funding from the German Research Foundation (DFG) under Grant Agreement No. FR 2988 (ADLON).

Author information

Authors and Affiliations

Max Planck Institute for Informatics, Saarbrücken, Germany
Karl Bringmann
Hasso Plattner Institute, Potsdam, Germany
Tobias Friedrich & Anton Krohmer

Authors

Karl Bringmann
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Friedrich
View author publications
You can also search for this author in PubMed Google Scholar
Anton Krohmer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anton Krohmer.

Additional information

A preliminary conference version [8] without most proofs appeared in the 22nd European Symposium on Algorithms (ESA 2014).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bringmann, K., Friedrich, T. & Krohmer, A. De-anonymization of Heterogeneous Random Graphs in Quasilinear Time. Algorithmica 80, 3397–3427 (2018). https://doi.org/10.1007/s00453-017-0395-0

Download citation

Received: 07 September 2016
Accepted: 08 November 2017
Published: 15 November 2017
Issue Date: November 2018
DOI: https://doi.org/10.1007/s00453-017-0395-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

Abstract

Access this article

Similar content being viewed by others

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

The Complexity of Finding a Large Subgraph under Anonymity Constraints

Improved Upper and Lower Bound Heuristics for Degree Anonymization in Social Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

Abstract

Access this article

Similar content being viewed by others

De-anonymization of Heterogeneous Random Graphs in Quasilinear Time

The Complexity of Finding a Large Subgraph under Anonymity Constraints

Improved Upper and Lower Bound Heuristics for Degree Anonymization in Social Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation