Advertisement

The VLDB Journal

, Volume 20, Issue 1, pp 59–81 | Cite as

SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness

  • Jianneng Cao
  • Panagiotis Karras
  • Panos Kalnis
  • Kian-Lee Tan
Regular Paper

Abstract

Today, the publication of microdata poses a privacy threat: anonymous personal records can be re-identified using third data sources. Past research has tried to develop a concept of privacy guarantee that an anonymized data set should satisfy before publication, culminating in the notion of t-closeness. To satisfy t-closeness, the records in a data set need to be grouped into Equivalence Classes (ECs), such that each EC contains records of indistinguishable quasi-identifier values, and its local distribution of sensitive attribute (SA) values conforms to the global table distribution of SA values. However, despite this progress, previous research has not offered an anonymization algorithm tailored for t-closeness. In this paper, we cover this gap with SABRE, a SA Bucketization and REdistribution framework for t-closeness. SABRE first greedily partitions a table into buckets of similar SA values and then redistributes the tuples of each bucket into dynamically determined ECs. This approach is facilitated by a property of the Earth Mover’s Distance (EMD) that we employ as a measure of distribution closeness: If the tuples in an EC are picked proportionally to the sizes of the buckets they hail from, then the EMD of that EC is tightly upper-bounded using localized upper bounds derived for each bucket. We prove that if the t-closeness constraint is properly obeyed during partitioning, then it is obeyed by the derived ECs too. We develop two instantiations of SABRE and extend it to a streaming environment. Our extensive experimental evaluation demonstrates that SABRE achieves information quality superior to schemes that merely applied algorithms tailored for other models to t-closeness, and can be much faster as well.

Keywords

t-closeness Earth Mover’s Distance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu A.: Achieving anonymity via clustering. In: Proceedings of PODS (2006)Google Scholar
  3. 3.
    Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Proceedings of ICDT (2005)Google Scholar
  4. 4.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of ICDE (2005)Google Scholar
  5. 5.
    Byun, J.-W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: Secure Data Management, pp. 48–63 (2006)Google Scholar
  6. 6.
    Cao, J., Carminati, B., Ferrari, E., Tan, K.-L.: Castle: A delay- constrained scheme for ks-anonymizing data streams. In: Proceedings of ICDE (2008)Google Scholar
  7. 7.
    Cao, J., Carminati, B., Ferrari, E., Tan, K.-L.: Castle: Continuously anonymizing data streams. Accepted by IEEE Transactions on Dependable and Secure Computing (2009)Google Scholar
  8. 8.
    Fung, B.C.M., Wang, K., Fu, A.W.-C., Pei, J.: Anonymity for continuous data publishing. In: Proceedings of EDBT (2008)Google Scholar
  9. 9.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: Proceedings of ICDE (2005)Google Scholar
  10. 10.
    Ghinita, G., Karras, P., Kalnis, P., Mamoulis, N.: Fast data anonymization with low information loss. In: Proceedings of VLDB (2007)Google Scholar
  11. 11.
    Ghinita G., Karras P., Kalnis P., Mamoulis N.: A framework for efficient data anonymization under privacy and accuracy constraints. ACM Trans. Database Syst. 34(2), 1–47 (2009)CrossRefGoogle Scholar
  12. 12.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of KDD (2002)Google Scholar
  13. 13.
    Kooiman, P., Willenborg, L., Gouweleeuw, J.: Pram: A method for disclosure limitation for microdata. Research paper/Statistics Netherlands (9705) (1997)Google Scholar
  14. 14.
    LeFevre, K., DeWitt, D.J., Ramakrishnan R.: Incognito: Efficient full-domain k-anonymity. In: Proceedings of SIGMOD (2005)Google Scholar
  15. 15.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE (2006)Google Scholar
  16. 16.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Workload-aware anonymization. In: Proceedings of KDD (2006)Google Scholar
  17. 17.
    Li, J., Tao, Y., Xiao, X.: Preservation of proximity privacy in publishing numerical sensitive data. In: Proceedings of SIGMOD (2008)Google Scholar
  18. 18.
    Li, N., Li, T., Venkatasubramanian S.: t-closeness: Privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of ICDE (2007)Google Scholar
  19. 19.
    Li, N., Li, T., Venkatasubramanian, S.: Closeness: a new privacy measure for data publishing. In: To appear in IEEE Transactions on Knowledge and Data Engineering (2009)Google Scholar
  20. 20.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-diversity: privacy beyond k-anonymity. In: Proceedings of ICDE (2006)Google Scholar
  21. 21.
    Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: Proceedings of PODS (2004)Google Scholar
  22. 22.
    Moon B., Jagadish H.V., Faloutsos C., Saltz J.H.: Analysis of the clustering properties of the hilbert space-filling curve. IEEE Trans. Knowl. Data Eng. 13(1), 124–141 (2001)CrossRefGoogle Scholar
  23. 23.
    Pei, J., Xu, J., Wang, Z., Wang, W., Wang, K.: Maintaining k-anonymity against incremental updates. In: Proceedings of SSDBM (2007)Google Scholar
  24. 24.
    Rebollo-Monedero, D., Forné, J., Domingo-Ferrer, J.: From t-closeness-like privacy to postrandomization via information theory. In: To appear in IEEE Transactions on Knowledge and Data Engineering (2009)Google Scholar
  25. 25.
    Rubner Y., Tomasi C., Guibas L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 99–121 (2000)zbMATHCrossRefGoogle Scholar
  26. 26.
    Samarati P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)CrossRefGoogle Scholar
  27. 27.
    Samarati, P., Sweeney, L.: Generalizing data to provide anonymity when disclosing information (abstract). In: Proceedings of PODS (1998)Google Scholar
  28. 28.
    Wang, K., Fung, B.C.M.: Anonymizing sequential releases. In: Proceedings of KDD (2006)Google Scholar
  29. 29.
    Xiao, X., Tao, Y.: Anatomy: simple and effective privacy preservation. In: Proceedings of VLDB (2006)Google Scholar
  30. 30.
    Xiao, X., Ta,o Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: Proceedings of SIGMOD (2007)Google Scholar
  31. 31.
    Xiao, X., Tao, Y.: Dynamic anonymization: accurate statistical analysis with privacy preservation. In: Proceedings of SIGMOD (2008)Google Scholar
  32. 32.
    Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.-C. Utility-based anonymization using local recoding. In: Proceedings of KDD (2006)Google Scholar
  33. 33.
    Zhang, Q., Koudas, N., Srivastava, D., Yu, T.: Aggregate query answering on anonymized tables. In: Proceedings of ICDE (2007)Google Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Jianneng Cao
    • 1
  • Panagiotis Karras
    • 1
  • Panos Kalnis
    • 2
  • Kian-Lee Tan
    • 1
  1. 1.School of ComputingNational University of SingaporeSingaporeRepublic of Singapore
  2. 2.Division of Mathematical and Computer Sciences and EngineeringKing Abdullah University of Science and TechnologyThuwalSaudi Arabia

Personalised recommendations