Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures

Li, Jiuyong; Wong, Raymond Chi-Wing; Fu, Ada Wai-Chee; Pei, Jian

doi:10.1007/11823728_39

Jiuyong Li¹⁸,
Raymond Chi-Wing Wong¹⁹,
Ada Wai-Chee Fu¹⁹ &
…
Jian Pei²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4081))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

913 Accesses
50 Citations

Abstract

Individual privacy will be at risk if a published data set is not properly de-identified. k-anonymity is a major technique to de-identify a data set. A more general view of k-anonymity is clustering with a constraint of the minimum number of objects in every cluster. Most existing approaches to achieving k-anonymity by clustering are for numerical (or ordinal) attributes. In this paper, we study achieving k-anonymity by clustering in attribute hierarchical structures. We define generalisation distances between tuples to characterise distortions by generalisations and discuss the properties of the distances. We conclude that the generalisation distance is a metric distance. We propose an efficient clustering-based algorithm for k-anonymisation. We experimentally show that the proposed method is more scalable and causes significantly less distortions than an optimal global recoding k-anonymity method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C.: On k-anonymity and the curse of dimensionality. In: VLDB 2005: Proceedings of the 31st international conference on Very large data bases, VLDB Endowment, pp. 901–909 (2005)
Google Scholar
Aggarwal, G., Feder, T., Kenthapadi, K., Motwani, R., Panigrahy, R., Thomas, D., Zhu, A.: Anonymizing tables. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 246–258. Springer, Heidelberg (2004)
Chapter Google Scholar
Aggarwal, G., Feder, T., Kenthapadi, K., Zhu, A., Panigrahy, R., Thomas, D.: Achieving anonymity via clustering in a metric space. In: PODS 2006: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (2006)
Google Scholar
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: PODS 2001: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 247–255. ACM Press, New York (2001)
Chapter Google Scholar
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proc. of the ACM SIGMOD Conference on Management of Data, May 2000, pp. 439–450. ACM Press, New York (2000)
Chapter Google Scholar
Bayardo, R., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE 2005: The 21st International Conference on Data Engineering, pp. 217–228 (2005)
Google Scholar
Blake, E.K.C., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)
Article Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)
Article MathSciNet Google Scholar
Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: ICDE 2005: The 21st International Conference on Data Engineering, pp. 205–216 (2005)
Google Scholar
Hundepool, A., Willenborg, L.: μ-and τ- argus: software for statistical disclosure control. In: Third international seminar on statsitcal confidentiality, Bled (1996)
Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 279–288 (2002)
Google Scholar
LeFevre, K., DeWitt, D., Ramakrishnan, R.: Multidimensional k-anonymity. In M. Technical Report 1521, University of Wisconsin (2005)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient full-domain k-anonymity. In: SIGMOD 2005: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 49–60 (2005)
Google Scholar
Lindell, Y., Pinkas, B.: Privacy preserving data mining. Journal of Cryptology 15(3), 177–206 (2002)
Article MATH MathSciNet Google Scholar
Machanavajjhala, A., Gehrke, J., Kifer, D.: l-diversity: privacy beyond k-anonymity. In: The 22st International Conference on Data Engineering (ICDE 2006) (to appear, 2006)
Google Scholar
Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS 2004: Proceedings of the twenty fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 223–228 (2004)
Google Scholar
Rizvi, S., Haritsa, J.: Maintaining data privacy in association rule mining. In: Proceedings of the 28th Conference on Very Large Data Base (VLDB 2002), pp. 682–693. VLDB Endowment (2002)
Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)
Article Google Scholar
Sweeney, L.: Achieving k-anonymity privacy protection using generalization and suppression. International journal on uncertainty, Fuzziness and knowldege based systems 10(5), 571–588 (2002)
Article MATH MathSciNet Google Scholar
Sweeney, L.: k-anonymity: a model for protecting privacy. International journal on uncertainty, Fuzziness and knowldege based systems 10(5), 557–570 (2002)
Article MATH MathSciNet Google Scholar
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 24–27. ACM Press, New York (2003)
Google Scholar
Wang, K., Yu, P.S., Chakraborty, S.: Bottom-up generalization: A data mining solution to privacy protection. In: ICDM 2004: The fourth IEEE International Conference on Data Mining, pp. 249–256 (2004)
Google Scholar
Wright, R., Yang, Z.: Privacy-preserving bayesian network structure computation on distributed heterogeneous data. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 713–718. ACM Press, New York (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computing, The University of Southern Queensland, Australia
Jiuyong Li
Department of Computer Science and Engineering, The Chinese University of Hong Kong,
Raymond Chi-Wing Wong & Ada Wai-Chee Fu
School of Computing Science, Simon Fraser University, Canada
Jian Pei

Authors

Jiuyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Raymond Chi-Wing Wong
View author publications
You can also search for this author in PubMed Google Scholar
Ada Wai-Chee Fu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, A-1040, Wien, Austria
A Min Tjoa
Department of Software and Computing Systems, University of Alicante, Spain
Juan Trujillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Wong, R.CW., Fu, A.WC., Pei, J. (2006). Achieving k-Anonymity by Clustering in Attribute Hierarchical Structures. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2006. Lecture Notes in Computer Science, vol 4081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11823728_39

Download citation

DOI: https://doi.org/10.1007/11823728_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37736-8
Online ISBN: 978-3-540-37737-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics