The VLDB Journal

, Volume 19, Issue 1, pp 115–139

Anonymizing bipartite graph data using safe groupings

  • Graham Cormode
  • Divesh Srivastava
  • Ting Yu
  • Qing Zhang
Special Issue Paper

Abstract

Private data often come in the form of associations between entities, such as customers and products bought from a pharmacy, which are naturally represented in the form of a large, sparse bipartite graph. As with tabular data, it is desirable to be able to publish anonymized versions of such data, to allow others to perform ad hoc analysis of aggregate graph properties. However, existing tabular anonymization techniques do not give useful or meaningful results when applied to graphs: small changes or masking of the edge structure can radically change aggregate graph properties. We introduce a new family of anonymizations for bipartite graph data, called (k, ℓ)-groupings. These groupings preserve the underlying graph structure perfectly, and instead anonymize the mapping from entities to nodes of the graph. We identify a class of “safe” (k, ℓ)-groupings that have provable guarantees to resist a variety of attacks, and show how to find such safe groupings. We perform experiments on real bipartite graph data to study the utility of the anonymized version, and the impact of publishing alternate groupings of the same graph data. Our experiments demonstrate that (k, ℓ)-groupings offer strong tradeoffs between privacy and utility.

Keywords

Privacy Microdata Graph Query answering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Backstrom, L., Dwork, C., Kleinberg, J.: Wherefore are thou R3579X? Anonymized social networks, hidden patterns and structural steganography. In: International Conference on World Wide Web (WWW) (2007)Google Scholar
  2. 2.
    Bennett, J., Lanning, S.: The Netflix prize. In: KDDCup Workshop (2007)Google Scholar
  3. 3.
    Bhagat, S., Cormode, G., Krishnamurthy, B., Srivastava, D.: Class-based graph anonymization for social network data. In: International Conference on Very Large Data Bases (2009)Google Scholar
  4. 4.
    Campan, A., Truta, T.M.: A clustering approach for data and structural anonymity in social networks. In: International Workshop on Privacy, Security and Trust in KDD (PinKDD) (2008)Google Scholar
  5. 5.
    Garey M.R., Johnson D.S. (1979) Computers and Intractability, a Guide to the Theory of NP-Completeness. W.H. Freeman and Company, San FranciscoMATHGoogle Scholar
  6. 6.
    Ghinita, G., Tao, Y., Kalnis, P.: On the anonymization of sparse high-dimensional data. In: IEEE International Conference on Data Engineering (2008)Google Scholar
  7. 7.
    Hay, M., Jensen, D., Miklau, G., Towsley, D., Weis, P.: Resisting structural re-identification in anonymized social networks. In: International Conference on Very Large Data Bases (2008)Google Scholar
  8. 8.
    Hay, M., Miklau, G., Jensen, D., Weis, P., Srivastava, S.: Anonymizing social networks. Technical Report 07-19, University of Massachusetts Amherst (2007)Google Scholar
  9. 9.
    Korolova, A., Motwani, R., Nabar, S., Xu, Y.: Link privacy in social networks. In: ACM Conference on Information and Knowledge Management (CIKM) (2008)Google Scholar
  10. 10.
    Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and l-diversity. In: IEEE International Conference on Data Engineering (2007)Google Scholar
  11. 11.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: -diversity: Privacy beyond k-anonymity. In: IEEE International Conference on Data Engineering (2006)Google Scholar
  12. 12.
    Martin, D.J., Kifer, D., Machanavajjhala, A., Gehrke, J.: Worse-case background knowledge for privacy-preserving data publishing. In: IEEE International Conference on Data Engineering (2007)Google Scholar
  13. 13.
    Narayanan, A., Shmatikov, V.: How to break anonymity of the Netflix prize dataset. Technical Report arXiv:cs/0610105v1, arXiv (2006)Google Scholar
  14. 14.
    Nergiz, M.E., Clifton, C., Nergiz, A.E.: Multirelational k-anonymity. In: IEEE International Conference on Data Engineering (2007)Google Scholar
  15. 15.
    Samarati P. (2001) Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6): 1010–1027CrossRefGoogle Scholar
  16. 16.
    Sweeney L. (2002) k-Anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 10(5): 557–570MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. In: International Conference on Very Large Data Bases (2008)Google Scholar
  18. 18.
    Wong, R., Li, J., Fu, A., Wang, K.: (α, k)-anonymity: An enhanced k-anonymity model for privacy-preserving data publishing. In: ACM SIGKDD (2006)Google Scholar
  19. 19.
    Wong, R.C.-W., Fu, A.W.-C., Wang, K., Pei, J.: Minimality attack in privacy preserving data publishing. In: International Conference on Very Large Data Bases (2007)Google Scholar
  20. 20.
    Xiao, X., Tao, Y.: Anatomy: Simple and effective privacy preservation. In: International Conference on Very Large Data Bases (2006)Google Scholar
  21. 21.
    Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: ACM SIGMOD International Conference on Management of Data (2007)Google Scholar
  22. 22.
    Xu, Y., Wang, K., Fu, A.W.-C., Yu, P.S.: Anonymizing transaction databases for publication. In: ACM SIGKDD (2008)Google Scholar
  23. 23.
    Zhang, Q., Koudas, N., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: IEEE International Conference on Data Engineering (2007)Google Scholar
  24. 24.
    Zheleva, E., Getoor, L.: Preserving the privacy of sensitive relationships in graph data. In: International Workshop on Privacy, Security and Trust in KDD (PinKDD) (2007)Google Scholar
  25. 25.
    Zhou, B., Pei, J.: Preserving privacy in social networks against neighborhood attacks. In: IEEE International Conference on Data Engineering (2008)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Graham Cormode
    • 1
  • Divesh Srivastava
    • 1
  • Ting Yu
    • 2
  • Qing Zhang
    • 2
    • 3
  1. 1.AT&T Labs-ResearchFlorham ParkUSA
  2. 2.North Carolina State UniversityRaleighUSA
  3. 3.TeradataEl SegundoUSA

Personalised recommendations