Skip to main content

Speeding Up Clustering-Based k-Anonymisation Algorithms with Pre-partitioning

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 4587)

Abstract

K-anonymisation is a technique for protecting privacy contained within a dataset. Many k-anonymisation algorithms have been proposed, and one class of such algorithms are clustering-based. These algorithms can offer high quality solutions, but are rather inefficient to execute. In this paper, we propose a method that partitions a dataset into groups first and then clusters the data within each group for k-anonymisation. Our experiments show that combining partitioning with clustering can improve the performance of clustering-based k-anonymisation algorithms significantly while maintaining the quality of anonymisations they produce.

Keywords

  • Privacy Protection
  • Optimal Grouping
  • Credit Card Number
  • Small Subspace
  • Shopping Preference

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sweeney, L.: k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10, 557–570 (2002)

    CrossRef  MATH  MathSciNet  Google Scholar 

  2. LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: ICDE 2006, vol. 25 (2006)

    Google Scholar 

  3. Bayardo, R., Agrawal, R.: Data privacy through optimal k-anonymization. In: ICDE 2005, pp. 217–228 (2005)

    Google Scholar 

  4. Byun, J., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymity using clustering technique. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443. Springer, Heidelberg (to appear, 2007)

    Google Scholar 

  5. Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recoding. In: KDD 2006, pp. 785–790 (2006)

    Google Scholar 

  6. Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in k-anonymisation. In: SAC 2007, pp. 370–374 (2007)

    Google Scholar 

  7. Thaper, N., Guha, S., Indyk, P., Koudas, N.: Dynamic multidimensional histograms. In: SIGMOD 2002, pp. 428–439 (2002)

    Google Scholar 

  8. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-diversity: Privacy beyond k-anonymity. In: ICDE 2006, vol. 24 (2006)

    Google Scholar 

  9. Aggarwal, G., Kenthapadi, F., Motwani, K., Panigrahy, R., Thomas, D., Zhu, A.: Approximation algorithms for k-anonymity. Journal of Privacy Technology (2005)

    Google Scholar 

  10. McCallum, A., Nigam, K., Ungar, L.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD 2000, pp. 169–178 (2000)

    Google Scholar 

  11. Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large databases. In: SIGMOD 1998, pp. 73–84 (1998)

    Google Scholar 

  12. Oliveira, S., Zaiane, O.: Privacy preserving clustering by data transformation. In: Proceedings of the XVIII SBBD, pp. 304–318 (2003)

    Google Scholar 

  13. Friedman, J., Bentley, J., Finkel, R.: An algorithm for finding best matches in logarithmic time. ACM Trans. on Mathematical Software 3(3) (1977)

    Google Scholar 

  14. Hettich, S., Merz, C.: Uci repository of machine learning databases (1998)

    Google Scholar 

  15. Narayan, B., Murthy, C., Pal, S.K.: Maxdiff kd-trees for data condensation. Pattern Recognition Letters 27, 187–200 (2006)

    CrossRef  Google Scholar 

  16. Byun, J., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: Secure Data Management 2006, pp. 48–63 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Richard Cooper Jessie Kennedy

Rights and permissions

Reprints and Permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Loukides, G., Shao, J. (2007). Speeding Up Clustering-Based k-Anonymisation Algorithms with Pre-partitioning. In: Cooper, R., Kennedy, J. (eds) Data Management. Data, Data Everywhere. BNCOD 2007. Lecture Notes in Computer Science, vol 4587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73390-4_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73390-4_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73389-8

  • Online ISBN: 978-3-540-73390-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics