On Robust and Effective K-Anonymity in Large Databases

  • Wen Jin
  • Rong Ge
  • Weining Qian
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)


The challenge of privacy-preserving data mining lies in respecting privacy requirements while discovering the original interesting patterns or structures. Existing methods loose the correlations among attributes by transforming the different attributes independently, or cannot guarantee the minimum abstraction level required by legal policies. In this paper, we propose a novel privacy-preserving transformation framework for distance-based mining operations based on the concept of privacy-preserving MicroClusters that satisfy a privacy constraint as well as a significance constraint. Our framework well extends the robustness of the state-of-the-art k-anonymity model by introducing a privacy constraint (minimum radius) while keeping its effectiveness by a significance constraint (minimum number of corresponding data records). The privacy-preserving MicroClusters are made public for data mining purposes, but the original data records are kept private. We present efficient methods for generating and maintaining privacy-preserving MicroClusters and show that data mining operations such as clustering can easily be adapted to the public data represented by MicroClusters instead of the private data records. The experiment demonstrates that the proposed methods achieve accurate clusterings results while preserving the privacy.


Minimum Radius Split Operation Range Constraint Privacy Constraint Privacy Preserve Data Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, D., et al.: On the design and quantification of privacy preserving data mining algorithms. In: PODS 2001 (2001)Google Scholar
  2. 2.
    Agrawal, S., Haritsa, J.R.: A Framework for High-Accuracy Privacy-Preserving Mining. In: ICDE 2005 (2005)Google Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Privacy Preserving Data Mining. In: SIGMOD 2000 (2000)Google Scholar
  4. 4.
    Aggarwal, C.C., Yu, P.S.: A Condensation Approach to Privacy Preserving Data Mining. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 183–199. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Bayardo, R.J., Agrawal, R.: Data Privacy through Optimal k-Anonymization. In: ICDE 2005 (2005)Google Scholar
  6. 6.
    Fung, B.M., et al.: Top-Down Specialization for Information and Privacy Preservation. In: ICDE 2005 (2005)Google Scholar
  7. 7.
    Garey, M.R., Johnson, D.S.: Computers and Intractability. W.H.Freeman, New York (1979)MATHGoogle Scholar
  8. 8.
    US Department of Health and Human Services, http://www.hhs.gov/ocr/hipaa/
  9. 9.
    Iyengar, V.S.: Transforming Data to Satisfy Privacy Constraints. In: KDD (2002)Google Scholar
  10. 10.
    Klusch, M., Lodi, S., et al.: Distributed Clustering Based on Sampling Local Density Estimates. In: IJCAI 2003 (2003)Google Scholar
  11. 11.
    Knorr, E.M., Ng, R.T.: Algorithms for Mining Distance-Based Outliers in Large Datasets. In: VLDB 1998(1998)Google Scholar
  12. 12.
    LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: Efficient Full-Domain KAnonymity. In: SIGMOD 2005 (2005)Google Scholar
  13. 13.
    Kaufman, L., et al.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Chichester (1990)CrossRefMATHGoogle Scholar
  14. 14.
    Merugu, S., Ghosh, J.: Privacy-preserving Distributed Clustering using Generative Models. In: ICDM 2003 (2003)Google Scholar
  15. 15.
    Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)CrossRefMATHGoogle Scholar
  16. 16.
    Data Mining: Staking a Claim on Your Privacy. Office of the Information and Privacy Commissioner (January 1998)Google Scholar
  17. 17.
    Oliveira, S.R., Zaiane, O.R.: Privacy Preserving Clustering By Data Transformation. In: SBBD 2003 (2003)Google Scholar
  18. 18.
    Pinaks, B.: Cryptographic Techniques for Privacy Preserving Data Mining. In: SIGKDD Explorations 2000, vol. 4,2 (2000)Google Scholar
  19. 19.
    Ramaswamy, S., et al.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: SIGMOD 2000 (2000)Google Scholar
  20. 20.
    Sweeney, L.: k-anonymity: A Model for Protecting Privacy. In: IJUFKS 2002 (2002)Google Scholar
  21. 21.
    Sweeney, L.: Achieving k-Anonymity Privacy Protection Using Generalization and Suppression. In: IJUFKS 2002 (2002)Google Scholar
  22. 22.
    Schlrer, J.: Security of Statistical Databases: Multidimensional Transformation. TODS 6(1) (March 1981)Google Scholar
  23. 23.
    Vaidya, J., Clifton, C.: Privacy-Preserving K-Means Clustering over Vertically Partitioned Data. In: KDD 2003 (2003)Google Scholar
  24. 24.
    Yao, A.C.: How to Generate and Exchange Secrets. In: FOCS, pp. 162–167. IEEE, Los Alamitos (1986)Google Scholar
  25. 25.
    Zhang, T., et al.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD 1996 Datasets (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wen Jin
    • 1
  • Rong Ge
    • 1
  • Weining Qian
    • 2
  1. 1.School of Computing ScienceSimon Fraser UniversityCanada
  2. 2.Department of Computer ScienceFudan UniversityChina

Personalised recommendations