Acta Informatica

, Volume 48, Issue 1, pp 51–66

Efficient systematic clustering method for k-anonymization

Original Article

Abstract

This paper presents a clustering (Clustering partitions record into clusters such that records within a cluster are similar to each other, while records in different clusters are most distinct from one another.) based k-anonymization technique to minimize the information loss while at the same time assuring data quality. Privacy preservation of individuals has drawn considerable interests in data mining research. The k-anonymity model proposed by Samarati and Sweeney is a practical approach for data privacy preservation and has been studied extensively for the last few years. Anonymization methods via generalization or suppression are able to protect private information, but lose valued information. The challenge is how to minimize the information loss during the anonymization process. We refer to the challenge as a systematic clustering problem for k-anonymization which is analysed in this paper. The proposed technique adopts group-similar data together and then anonymizes each group individually. The structure of systematic clustering problem is defined and investigated through paradigm and properties. An algorithm of the proposed problem is developed and shown that the time complexity is in \({O(\frac{n^{2}}{k})}\), where n is the total number of records containing individuals concerning their privacy. Experimental results show that our method attains a reasonable dominance with respect to both information loss and execution time. Finally the algorithm illustrates the usability for incremental datasets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: International Conference on Data Engineering (2005)Google Scholar
  2. 2.
    Byun J.W., Bertino E.: Micro-views, or on how to protect privacy while enhancing data usability: concepts and challenges. SIGMOD 35(1), 9–13 (2006)CrossRefGoogle Scholar
  3. 3.
    Byun, J.W., Kamra, A., Bertino, E., Li, N.: Efficient k-anonymization using clustering techniques. In: International Conference on Database Systems for Advanced Applications (DASFAA) (2007)Google Scholar
  4. 4.
    Byun, J.W., Sohn, Y., Bertino, E., Li, N.: Secure anonymization for incremental datasets. In: 3rd VLDB Workshop on Secure Data Management (SDM) (2006)Google Scholar
  5. 5.
    Chiu, C.-C., Tsai, C.-Y.: A k-anonymity clustering method for effective data privacy preservation. In: Third International Conference on Advanced Data Mining and Applications (ADMA) (2007)Google Scholar
  6. 6.
    Ciriani V., di Vimercati S.D.C., Foresti S., Samarati P.: k-anonymous data mining: a aurvey. In: Aggarwal, C.C., Yu, P.S. (eds) Privacy-Preserving Data Mining: Models and Algorithms, pp. 103–134. Kluwer Academic Publishers, Boston (2008)Google Scholar
  7. 7.
    Fung, B.C.M., Wang, K., Yu, P.S.: Top-down specialization for information and privacy preservation. In: International Conference on Data Engineering (2005)Google Scholar
  8. 8.
    Gonzalez T.Z.: Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38, 293–306 (1985)MATHCrossRefGoogle Scholar
  9. 9.
    Hettich, C.B.S., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar
  10. 10.
    Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: SIGKDD (2002)Google Scholar
  11. 11.
    LeFevre, K., DeWitt, D., Ramakrishnan, R.: Incogniti: efficient full-domain k-anonymity. In: ACM International Conference on Management of Data (2005)Google Scholar
  12. 12.
    LeFevre, K., DeWitt, D., Ramakrishnan, R.: Mondrian multidimensional k-anonymity. In: International Conference on Data Engineering (2006)Google Scholar
  13. 13.
    Li, N., Li, T.: t-closeness: privacy beyond k-anonymity and l-diversity. In: ICDE (2007)Google Scholar
  14. 14.
    Lin, J.L., Wei, M.C.: An efficient clustering method for k-anonymization. In: Proceedings of the 2008 International Workshop on Privacy and Anonymity in Information Society (2008)Google Scholar
  15. 15.
    Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in k-anonymisation. In: Proceedings of the 2007 ACM Symposium on Applied Computing (2007)Google Scholar
  16. 16.
    Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramanian, M.: l-diversity: privacy beyond k-anonymity. In: ICDE (2006)Google Scholar
  17. 17.
    Meyerson, A., Williams, R.: On the complexity of optimal k-anonymity. In: PODS, pp. 223–228 (2004)Google Scholar
  18. 18.
    Samarati, P.: Protecting respondent’s privacy in microdata release. TKDE, 13(6) (2001)Google Scholar
  19. 19.
    Solanas, A., Sebe, F., Domingo-Ferrer, J.: Micro-aggregation-based heuristics for p-sensitive k-anonymity: One step beyond. In: International Work-shop on Privacy and Anonymity in the Information Society (2008)Google Scholar
  20. 20.
    Sun, X., Li, M., Wang, H., Plank, A.: An efficient hash-based algorithm for minimal k-anonymity. In: ACSC, pp. 101–107, (2008)Google Scholar
  21. 21.
    Sun, X., Wang, H., Li, J.: Priority driven K-Anonymisation for privacy protection. In: AusDM, pp. 73–78 (2008)Google Scholar
  22. 22.
    Sweeney L.: Achieving k-anonymity privacy protection using generalization and supression. Int. J. Uncertainty Fuzziness Knowledge-based Syst. 10(5), 571–588 (2002)MATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Sweeney L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzziness Knowledge-based Syst. 10(5), 557–570 (2002)MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Truta, T., Vinay, B.: Privacy protection: p-sensitive k-anonymity property. In: International Workshop on Privacy Data Management (PDM), p. 94 (2006)Google Scholar
  25. 25.
    Xu, J., Wang, W., Pei, J., Wang, X., Shi, B., Fu, A.W.C.: Utility-based anonymization using local recording. In: KDD 2006, pp. 785–790 (2006)Google Scholar
  26. 26.
    Wong, R.C.-W., Li, J., Fu, A.W.-C., Wang, K.: (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Department of Mathematics and ComputingUniversity of Southern QueenslandToowoombaAustralia
  2. 2.Department of Computer Science and CERIASPurdue UniversityWest LafayetteUSA

Personalised recommendations