Advertisement

The VLDB Journal

, Volume 19, Issue 6, pp 797–823 | Cite as

Resisting structural re-identification in anonymized social networks

  • Michael Hay
  • Gerome Miklau
  • David Jensen
  • Don Towsley
  • Chao Li
Special Issue Paper

Abstract

We identify privacy risks associated with releasing network datasets and provide an algorithm that mitigates those risks. A network dataset is a graph representing entities connected by edges representing relations such as friendship, communication or shared activity. Maintaining privacy when publishing a network dataset is uniquely challenging because an individual’s network context can be used to identify them even if other identifying information is removed. In this paper, we introduce a parameterized model of structural knowledge available to the adversary and quantify the success of attacks on individuals in anonymized networks. We show that the risks of these attacks vary based on network structure and size and provide theoretical results that explain the anonymity risk in random networks. We then propose a novel approach to anonymizing network data that models aggregate network structure and allows analysis to be performed by sampling from the model. The approach guarantees anonymity for entities in the network while allowing accurate estimates of a variety of network measures with relatively little bias.

Keywords

Anonymity Anonymization Privacy Networks Social networks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aiello, W., Chung, F.R.K., Lu, L.: A random graph model for massive graphs. In: Proceedings of the Thirty-second Annual ACM Symposium on Theory of Computing, pp. 171–180. ACM Press, New York (2000)Google Scholar
  2. 2.
    Alderson D.L., Li L.: Diversity of graphs with highly variable connectivity. Phys. Rev. E 75(4), 046102 (2007)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Babai, L., Kucera, L.: Canonical labelling of graphs in linear average time. In: Proceedings of the Twentieth Annual Symposium on Foundations of Computer Science, pp. 39–46, IEEE Computer Society, San Juan, Puerto Rico (1979)Google Scholar
  4. 4.
    Backstrom, L., Dwork, C., Kleinberg, J.M.: Wherefore art thou R3579X? Anonymized social networks, hidden patterns and structural steganography. In: Proceedings of the Sixteenth International World Wide Web Conference, pp. 181–190. ACM Press, New York (2007)Google Scholar
  5. 5.
    Barabási A.-L., Albert R.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Blitzstein, J., Diaconis, P.: A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Unpublished (2006)Google Scholar
  7. 7.
    Cohen, W.W.: Enron email dataset. http://www.cs.cmu.edu/~enron/, (2005)
  8. 8.
    Cormode G., Srivastava D., Bhagat S., Krishnamurthy B.: Class-based graph anonymization for social network data. Proc. VLDB Endowment 2(1), 766–777 (2009)Google Scholar
  9. 9.
    Cormode G., Srivastava D., Yu T., Zhang Q.: Anonymizing bipartite graph data using safe groupings. Proc. VLDB Endowment 1(1), 833–844 (2008)Google Scholar
  10. 10.
    Corneil D.G., Gotlieb C.C.: An efficient algorithm for graph isomorphism. J. ACM 17, 51–64 (1970)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Costa L., Rodrigues F., Travieso G., Boas P.: Characterization of complex networks: A survey of measurements. Adv. Phys. 56(1), 167–242 (2007)CrossRefGoogle Scholar
  12. 12.
    Dwork, C.: Differential privacy: A survey of results. In Proceedings of the Theory and Applications of Models of Computation Fifth International Conference, number 4978 in Lecture Notes in Computer Science, pp. 1–19. Springer, Xi’an, 25–29 April (2008)Google Scholar
  13. 13.
    Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Proceedings of the Third Theory of Cryptography Conference, number 3876 in Lecture Notes in Computer Science, pp. 265–284. Springer, New York, 4–7 March (2006)Google Scholar
  14. 14.
    Erdös P., Rényi A.: On the evolution of random graphs. Bull. Inst. Int. Stat. 38, 343–347 (1961)zbMATHGoogle Scholar
  15. 15.
    Friedkin N.: Horizons of observability and limits of informal control in organizations. Soc. Forces 62(1), 54–77 (1983)CrossRefGoogle Scholar
  16. 16.
    Frikken, K.B., Golle, P.: Private social network analysis: How to assemble pieces of a graph privately. In: Proceedings of the ACM Workshop on Privacy in the Electronic Society, pp. 89–98. ACM Press, Alexandria, VA, 30 October (2006)Google Scholar
  17. 17.
    Ganta, S.R., Kasiviswanathan, S.P., Smith, A.: Composition attacks and auxiliary information in data privacy. In: Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 265–273. ACM Press, New York (2008)Google Scholar
  18. 18.
    Hay, M., Li, C., Miklau, G., Jensen, D.: Accurate estimation of the degree distribution of private networks. In: Proceedings of the IEEE International Conference on Data Mining, pp. 169–178. IEEE Computer Society, Los Alamitos (2009)Google Scholar
  19. 19.
    Hay M., Miklau G., Jensen D.D., Towsley D.F., Weis P.: Resisting structural re-identification in anonymized social networks. Proc. VLDB Endowment 1(1), 102–114 (2008)Google Scholar
  20. 20.
    Hay, M., Miklau, G., Jensen, D.D., Weis, P., Srivastava, S.: Anonymizing social networks. Technical Report UM-CS-2007-19, Department of Computer Science, University of Massachusetts, Amherst (2007)Google Scholar
  21. 21.
    Hay M., Rastogi V., Miklau G., Suciu D.: Boosting the accuracy of differentially-private histograms through consistency. Proc. VLDB Endowment 3(1), 1021–1032 (2010)Google Scholar
  22. 22.
    Holme P., Kim B.J.: Growing scale-free networks with tunable clustering. Phys. Rev. E 65(2), 026107 (2002)CrossRefGoogle Scholar
  23. 23.
    Kifer, D.: Attacks on privacy and de Finetti’s theorem. In: Proceedings of the Thirty-fifth SIGMOD International Conference on Management of Data, pp. 127–138. ACM Press, New York (2009)Google Scholar
  24. 24.
    Kleinberg J.M.: Navigation in a small world. Nature 406(6798), 845 (2000)CrossRefGoogle Scholar
  25. 25.
    Korolova A., Motwani R., Nabar S.U., Xu Y.: Link privacy in social networks. In: Proceedings of the Seventeenth ACM Conference on Information and Knowledge Management, pp. 289–298. ACM Press, Napa Valley, 26–30 October (2008)Google Scholar
  26. 26.
    Levina, E., Bickel, P.: The earth mover’s distance is the mallows distance: some insights from statistics. In: Proceedings of the Eighth IEEE International Conference on Computer Vision, vol. 2, pp. 251–256 (2001)Google Scholar
  27. 27.
    Li L., Alderson D., Doyle J., Willinger W.: Towards a theory of scale-free graphs: Definition, properties, and implications. Int. Math. 2(4), 431–523 (2005)zbMATHMathSciNetGoogle Scholar
  28. 28.
    Li, L., Alderson, D., Willinger, W., Doyle, J.: A first-principles approach to understanding the internet’s router-level topology. In: Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 3–14. ACM Press, New York (2004)Google Scholar
  29. 29.
    Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and -diversity. In: Proceedings of the Twenty-third International Conference on Data Engineering, pp. 106–115. IEEE Computer Society, Istanbul (2007)Google Scholar
  30. 30.
    Liu, K., Terzi, E.: Towards identity anonymization on graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 93–106. ACM Press, New York, 10–12 June (2008)Google Scholar
  31. 31.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: -diversity: Privacy beyond k-anonymity. In: Proceedings of the Twenty-second International Conference on Data Engineering, p. 24. IEEE Computer Society, Atlanta (2006)Google Scholar
  32. 32.
    Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J., Halpern, J.Y.: Worst-case background knowledge for privacy-preserving data publishing. In: Proceedings of the Twenty-third International Conference on Data Engineering, pp. 126–135. IEEE Computer Society, Istanbul (2007)Google Scholar
  33. 33.
    Mitzenmacher M., Upfal E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005)zbMATHGoogle Scholar
  34. 34.
    Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In Proceedings of the IEEE Symposium on Security and Privacy, pp. 173–187. IEEE Computer Society , Oakland, 17–20 May (2009)Google Scholar
  35. 35.
    Newman M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    Nissim K., Raskhodnikova S., Smith A.: Smooth sensitivity and sampling in private data analysis. In: Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing, pp. 75–84. ACM Press, New York, 11–13 June (2007)Google Scholar
  37. 37.
    Potterat J.J., Phillips-Plummer L., Muth S.Q., Rothenberg R.B., Woodhouse D.E., Maldonado-Long T.S., Zimmerman H.P., Muth J.B.: Risk network structure in the early epidemic phase of HIV transmission in Colorado Springs. Sex. Transm. Infect. 78(Suppl. 1), i159–i163 (2002)CrossRefGoogle Scholar
  38. 38.
    Rastogi, V., Hay, M., Miklau, G., Suciu, D.: Relationship privacy: Output perturbation for queries with joins. In: Proceedings of the Twenty-Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 107–116. ACM Press, Providence June 29–July 2 (2009)Google Scholar
  39. 39.
    Rastogi, V., Hong, S., Suciu, D.: The boundary between privacy and utility in data publishing. In: Proceedings of the Thirty-third International Conference on Very Large Data Bases, pp. 531–542. ACM Press, New York (2007)Google Scholar
  40. 40.
    Russell S.J., Norvig P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, Upper Saddle River (2003)Google Scholar
  41. 41.
    Samarati P.: Protecting respondents’ privacy in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)CrossRefGoogle Scholar
  42. 42.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical Report SRI-CSL-98-04, Computer Science Laboratory, SRI International (1998)Google Scholar
  43. 43.
    Singh, L., Zhan, J.: Measuring topological anonymity in social networks. In: Proceedings of the IEEE International Conference on Granular Computing, pp. 770–770 (2007)Google Scholar
  44. 44.
    Sweeney L.: K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  45. 45.
    Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., Willinger, W.: Network topology generators: degree-based vs. structural. In: Proceedings of the 2002 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 147–159. ACM Press, New York (2002)Google Scholar
  46. 46.
    Wang, D.-W., Liau, C.-J., sheng Hsu, T.: Privacy protection in social network data disclosure based on granular computing. In: Proceedings of the IEEE International Conference on Fuzzy Systems, pp. 997–1003 (2006)Google Scholar
  47. 47.
    Watts D.J., Strogatz S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)CrossRefGoogle Scholar
  48. 48.
    Wong, R.C.-W., Fu, A.W.-C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 543–554. VLDB Endowment (2007)Google Scholar
  49. 49.
    Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 689–700. ACM Press, New York (2007)Google Scholar
  50. 50.
    Ying, X., Wu, X.: Randomizing social networks: A spectrum preserving approach. In: Proceedings of the SIAM International Conference on Data Mining, pp. 739–750. Society for Industrial and Applied Mathematics, Atlanta, 24–26 April (2008)Google Scholar
  51. 51.
    Zheleva, E., Getoor, L.: Preserving the privacy of sensitive relationships in graph data. In: First ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD, Revised Selected Papers, number 4890 in Lecture Notes in Computer Science, pp. 153–171. Springer, San Jose (2007)Google Scholar
  52. 52.
    Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: Proceedings of the Twenty-fourth International Conference on Data Engineering, pp. 506–515. IEEE Computer Society, Cancun (2008)Google Scholar
  53. 53.
    Zou L., Chen L., Özsu M.T.: K-Automorphism: A general framework for privacy preserving network publication. Proc. VLDB Endowment 2(1), 946–957 (2009)Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Michael Hay
    • 1
  • Gerome Miklau
    • 1
  • David Jensen
    • 1
  • Don Towsley
    • 1
  • Chao Li
    • 1
  1. 1.Department of Computer ScienceUniversity of Massachusetts AmherstAmherstUSA

Personalised recommendations