The VLDB Journal

, Volume 20, Issue 1, pp 83–106 | Cite as

Local and global recoding methods for anonymizing set-valued data

Regular Paper

Abstract

In this paper, we study the problem of protecting privacy in the publication of set-valued data. Consider a collection of supermarket transactions that contains detailed information about items bought together by individuals. Even after removing all personal characteristics of the buyer, which can serve as links to his identity, the publication of such data is still subject to privacy attacks from adversaries who have partial knowledge about the set. Unlike most previous works, we do not distinguish data as sensitive and non-sensitive, but we consider them both as potential quasi-identifiers and potential sensitive data, depending on the knowledge of the adversary. We define a new version of the k-anonymity guarantee, the km-anonymity, to limit the effects of the data dimensionality, and we propose efficient algorithms to transform the database. Our anonymization model relies on generalization instead of suppression, which is the most common practice in related works on such data. We develop an algorithm that finds the optimal solution, however, at a high cost that makes it inapplicable for large, realistic problems. Then, we propose a greedy heuristic, which performs generalizations in an Apriori, level-wise fashion. The heuristic scales much better and in most of the cases finds a solution close to the optimal. Finally, we investigate the application of techniques that partition the database and perform anonymization locally, aiming at the reduction of the memory consumption and further scalability. A thorough experimental evaluation with real datasets shows that a vertical partitioning approach achieves excellent results in practice.

Keywords

Database privacy Set-valued data Anonymity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB ’05: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 901–909. VLDB Endowment (2005)Google Scholar
  2. 2.
    Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R. Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proceedings of ACM PODS, pp. 153–162 (2006)Google Scholar
  3. 3.
    Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. J. Priv. Tech. (Paper number:20051120001) (2005)Google Scholar
  4. 4.
    Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D.: Anonymity preserving pattern discovery. VLDB J. (accepted for publication) (2008)Google Scholar
  5. 5.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of ICDE, pp 217–228 (2005)Google Scholar
  6. 6.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. pp. 137–150. December (2004)Google Scholar
  7. 7.
    Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: vldb, pp. 758–769 (2007)Google Scholar
  8. 8.
    Ghinita, G., Tao, Y., Kalnis, P.: On the anonymization of sparse high-dimensional data. In: Proceedings of ICDE (2008)Google Scholar
  9. 9.
    Han, J., Fu, Y.: Discovery of multiple-level association rules from large databases. In: vldb, pp. 420–431 (1995)Google Scholar
  10. 10.
    Han J., Fu Y.: Mining multiple-level association rules in large databases. IEEE TKDE 11(5), 798–805 (1999)Google Scholar
  11. 11.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD, pp. 1–12 (2000)Google Scholar
  12. 12.
    He Y., Naughton J.F.: Anonymization of set-valued data via top-down, local generalization. PVLDB 2(1), 934–945 (2009)Google Scholar
  13. 13.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of SIGKDD, pp. 279–288 (2002)Google Scholar
  14. 14.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain k-anonymity. In: Proceedings of ACM SIGMOD, pp. 49–60 (2005)Google Scholar
  15. 15.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: Proceedings of ICDE (2006)Google Scholar
  16. 16.
    Li, N., Li, T., Venkatasubramanian, S.: t-Closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of ICDE, pp. 106–115 (2007)Google Scholar
  17. 17.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: privacy beyond k-anonymity. In: Proceedings of ICDE (2006)Google Scholar
  18. 18.
    Meyerson, A., Williams, R.: On the complexity of optimal K-anonymity. In: Proceedings of ACM PODS, pp. 223–228 (2004)Google Scholar
  19. 19.
    Nergiz, M., Clifton, C., Nergiz, A.: Multirelational k-anonymity. Technical Report CSD TR 08-002Google Scholar
  20. 20.
    Nergiz, M., Clifton, C., Nergiz, A.: Multirelational k-anonymity. In: Proceedings of ICDE, pp. 1417–1421 (2007)Google Scholar
  21. 21.
    Nergiz M.E., Clifton C.: Thoughts on k-anonymization. Data. Knowl. Eng. 63(3), 622–645 (2007)CrossRefGoogle Scholar
  22. 22.
    Park, H., Shim, K.: Approximate algorithms for k-anonymity. In: Proceedings of ACM SIGMOD, pp. 67–78 (2007)Google Scholar
  23. 23.
    Press W.H., Teukolsky S.A., Vetterling W.T., Flannery B.P.: Numerical recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)MATHGoogle Scholar
  24. 24.
    Samarati P.: Protecting respondents’ identities in microdata release. IEEE TKDE 13(6), 1010–1027 (2001)Google Scholar
  25. 25.
    Sweeney L.: k-Anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)MATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Terrovitis, M., Mamoulis, N., Kalnis, P.: Privacy-preserving anonymization of set-valued data. In: Proceedings of the VLDB Endowment (PVLDB) (former VLDB proceedings) 1(1) (2008)Google Scholar
  27. 27.
    Verykios V.S., Elmagarmid A.K., Bertino E., Saygin Y., Dasseni E.: Association rule hiding. IEEE TKDE 16(4), 434–447 (2004)Google Scholar
  28. 28.
    Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: Proceedings of VLDB, pp. 139–150 (2006)Google Scholar
  29. 29.
    Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.: Utility-based anonymization using local recoding. In: Proceedings of SIGKDD, pp. 785–790 (2006)Google Scholar
  30. 30.
    Xu, Y., Wang, K., Fu, A.W.-C., Yu, P.S.: Anonymizing transaction databases for publication. In: Proceedings of KDD, pp. 767–775 (2008)Google Scholar
  31. 31.
    Zhang, Q., Koudas, N., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: Proceedings of ICDE, pp. 116–125 (2007)Google Scholar
  32. 32.
    Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of KDD, pp. 401–406 (2001)Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Manolis Terrovitis
    • 1
  • Nikos Mamoulis
    • 2
  • Panos Kalnis
    • 3
  1. 1.Institute for the Management of Information Systems (IMIS)Research Center “Athena”AthenaGreece
  2. 2.Department of Computer ScienceUniversity of Hong KongHong KongChina
  3. 3.Division of Mathematical and Computer Sciences and EngineeringKing Abdullah University of Science and TechnologyThuwalSaudi Arabia

Personalised recommendations