Advertisement

Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures

  • Jiuyong Li
  • Raymond Chi-Wing Wong
  • Ada Wai-Chee Fu
  • Jian Pei
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4081)

Abstract

Individual privacy will be at risk if a published data set is not properly de-identified. k-anonymity is a major technique to de-identify a data set. A more general view of k-anonymity is clustering with a constraint of the minimum number of objects in every cluster. Most existing approaches to achieving k-anonymity by clustering are for numerical (or ordinal) attributes. In this paper, we study achieving k-anonymity by clustering in attribute hierarchical structures. We define generalisation distances between tuples to characterise distortions by generalisations and discuss the properties of the distances. We conclude that the generalisation distance is a metric distance. We propose an efficient clustering-based algorithm for k-anonymisation. We experimentally show that the proposed method is more scalable and causes significantly less distortions than an optimal global recoding k-anonymity method.

Keywords

Equivalence Class Equivalent Class Domain Level Distortion Ratio Attribute Hierarchy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB 2005: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 901–909 (2005)Google Scholar
  2. 2.
    Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 246–258. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Aggarwal, G., Feder, T., Kenthapadi, K., Zhu, A., Panigrahy, R., Thomas, D.: Achieving anonymity via clustering in a metric space. In: PODS 2006: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (2006)Google Scholar
  4. 4.
    Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: PODS 2001: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 247–255. ACM Press, New York (2001)CrossRefGoogle Scholar
  5. 5.
    Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, May 2000, pp. 439–450. ACM Press, New York (2000)CrossRefGoogle Scholar
  6. 6.
    Bayardo, R., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE 2005: The 21st International Conference on Data Engineering, pp. 217–228 (2005)Google Scholar
  7. 7.
    Blake, E.K.C., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
  8. 8.
    Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)CrossRefGoogle Scholar
  9. 9.
    Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: ICDE 2005: The 21st International Conference on Data Engineering, pp. 205–216 (2005)Google Scholar
  11. 11.
    Hundepool, A., Willenborg, L.: μ-and τ- argus: software for statistical disclosure control. In: Third international seminar on statsitcal confidentiality, Bled (1996)Google Scholar
  12. 12.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 279–288 (2002)Google Scholar
  13. 13.
    LeFevre, K., DeWitt, D., Ramakrishnan, R.: Multidimensional k-anonymity. In M. Technical Report 1521, University of Wisconsin (2005)Google Scholar
  14. 14.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full-domain k-anonymity. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 49–60 (2005)Google Scholar
  15. 15.
    Lindell, Y., Pinkas, B.: Privacy preserving data mining. Journal of Cryptology 15(3), 177–206 (2002)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Machanavajjhala, A., Gehrke, J., Kifer, D.: l-diversity: privacy beyond k-anonymity. In: The 22st International Conference on Data Engineering (ICDE 2006) (to appear, 2006)Google Scholar
  17. 17.
    Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS 2004: Proceedings of the twenty fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 223–228 (2004)Google Scholar
  18. 18.
    Rizvi, S., Haritsa, J.: Maintaining data privacy in association rule mining. In: Proceedings of the 28th Conference on Very Large Data Base (VLDB 2002), pp. 682–693. VLDB Endowment (2002)Google Scholar
  19. 19.
    Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)CrossRefGoogle Scholar
  20. 20.
    Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International journal on uncertainty, Fuzziness and knowldege based systems 10(5), 571–588 (2002)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Sweeney, L.: k-anonymity: a model for protecting privacy. International journal on uncertainty, Fuzziness and knowldege based systems 10(5), 557–570 (2002)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 24–27. ACM Press, New York (2003)Google Scholar
  23. 23.
    Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: A data mining solution to privacy protection. In: ICDM 2004: The fourth IEEE International Conference on Data Mining, pp. 249–256 (2004)Google Scholar
  24. 24.
    Wright, R., Yang, Z.: Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 713–718. ACM Press, New York (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jiuyong Li
    • 1
  • Raymond Chi-Wing Wong
    • 2
  • Ada Wai-Chee Fu
    • 2
  • Jian Pei
    • 3
  1. 1.Department of Mathematics and ComputingThe University of Southern QueenslandAustralia
  2. 2.Department of Computer Science and EngineeringThe Chinese University of Hong Kong 
  3. 3.School of Computing ScienceSimon Fraser UniversityCanada

Personalised recommendations