Generating k-Anonymous Microdata by Fuzzy Possibilistic Clustering

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10439)

Abstract

Collecting, releasing and sharing microdata about individuals is needed in some domains to support research initiatives aiming to create new valuable knowledge, by means of data mining and analysis tools. Thus, seeking individuals’ anonymity is required to guarantee their privacy prior publication. The k-anonymity by microaggregation, is a widely accepted model for data anonymization. It consists in de-associating the relationship between the identity of data subjects, i.e. individuals, and their confidential information. However, this method shows limits when dealing with real datasets. Indeed, the latter are characterized by their large number of attributes and the presence of noisy data. Thus, decreasing the information loss during the anonymization process is a compelling task to achieve. This paper aims to deal with such challenge. Doing so, we propose a microaggregation algorithm called Micro-PFSOM, based on fuzzy possibilitic clustering. The main thrust of this algorithm stands in applying an hybrid anonymization process.

Keywords

k-anonymity Hybrid micoaggregation Information loss Fuzzy and possibilistic clustering 

References

  1. 1.
    Abidi, B., Yahia, S.B.: Multi-pfkcn: a fuzzy possibilistic clustering algorithm based on neural network. In: Proceedings of International Conference on Fuzzy Systems (FUZZ-IEEE 2013), Hyderabad, India, 7–10 July 2013, pp. 1–8. IEEE (2013)Google Scholar
  2. 2.
    Abidi, B., Yahia, S.B., Bouzeghoub, A.: A new algorithm for fuzzy clustering able to find the optimal number of clusters. In: Proceedings of 24th International Conference on Tools with Artificial Intelligence, ICTAI 2012, Athens, Greece, November 7–9 2012, pp. 806–813. IEEE (2012)Google Scholar
  3. 3.
    Aggarwal, C.C., Yu, P.S.: An introduction to privacy-preserving data mining. In: Aggarwal, C.C., Yu, P.S. (eds.) Privacy-Preserving Data Mining - Models and Algorithms. Advances in Database Systems, vol. 34, pp. 1–9. Springer, Boston (2008)CrossRefGoogle Scholar
  4. 4.
    Bacher, J., Brand, R., Bender, S.: Re-identifying register data by survey data using cluster analysis: an empirical study. Int. J. Uncertainty Fuzz. Knowl. Based Syst. 10(5), 589–607 (2002)CrossRefMATHGoogle Scholar
  5. 5.
    Berkhin, P., Dhillon, I.S.: Knowledge discovery: clustering. In: Meyers, R.A. (ed.) Encyclopedia of Complexity and Systems Science, pp. 5051–5064. Springer, New York (2009)CrossRefGoogle Scholar
  6. 6.
    Borgman, C.L.: The conundrum of sharing research data. J. Am. Soc. Inf. Sci. Technol. (JASIST) 63(6), 1059–1078 (2012)CrossRefGoogle Scholar
  7. 7.
    Chang, C.C., Li, Y.C., Huang, W.H.: Tfrp: An efficient microaggregation algorithm for statistical disclosure control. J. Syst. Softw. 80(11), 1866–1878 (2007)CrossRefGoogle Scholar
  8. 8.
    Dewri, R., Ray, I., Ray, I., Whitley, D.: On the optimal selection of k in the k-anonymity problem. In: Proceedings of the 24th International Conference on Data Engineering, ICDE 7–12 2008, Cancún, México, pp. 1364–1366. IEEE Computer Society, April 2008Google Scholar
  9. 9.
    Domigo-Ferrer, J., Solanas, A., Martínez-Ballesté, A.: Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings oh the IEEE International Conference on Granular Computing, GrC 2006, Atlanta, Georgia, USA, 10–12 May 2006, pp. 774–777 (2006)Google Scholar
  10. 10.
    Domingo-Ferrer, J., Martínez-Ballesté, A., Mateo-Sanz, J.M., Sebé, F.: Efficient multivariate data-oriented microaggregation. VLDB J. 15(4), 355–369 (2006)CrossRefGoogle Scholar
  11. 11.
    Domingo-Ferrer, J., Torra, V.: Disclosure risk assessment in statistical data protection. J. Comput. Appl. Math. 164–165(1), 285–293 (2004)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min. Knowl. Disc. 11(2), 195–212 (2005)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi: 10.1007/11787006_1 CrossRefGoogle Scholar
  14. 14.
    Simson, L.: Garfinkel. De-identification of personal information. Technical report, National Institute of Standards and Technologie (2015)Google Scholar
  15. 15.
    Hu, W., Xie, D., Tan, T., Maybank, S.: Learning activity patterns using fuzzy self-organizing neural network. Syst. Man Cybern. Part B 34(3), 1618–1626 (2004)CrossRefGoogle Scholar
  16. 16.
    Ehrlich, R., Bezdek, J., Full, W.: FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)Google Scholar
  17. 17.
    Kohonen, T., Schroeder, M.R., Huang, T.S.: Self-Organizing Maps, Chap. 3. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  18. 18.
    Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1(2), 98–110 (1993)CrossRefGoogle Scholar
  19. 19.
    Li, N., Li, T., Venkatasubramanian, S.: t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, The Marmara Hotel, Istanbul, Turkey, 15–20 April 2007, pp. 106–115. IEEE (2007)Google Scholar
  20. 20.
    Lin, J.L., Wen, T.H., Hsieh, J.C., Chang, P.C.: Density-based microaggregation for statistical disclosure control. Expert Syst. Appl. 37(4), 3256–3263 (2010)CrossRefGoogle Scholar
  21. 21.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: privacy beyond k-anonymity. ACM Trans. Knowl. Disc. Data (TKDD) 1(1), 3 (2007)CrossRefGoogle Scholar
  22. 22.
    Oganian, A., Domingo-Ferrer, J.: On the complexity of optimal microaggregation for statistical disclosure control. Stat. J. United Nations Econ. Comission Eur. 18, 345–354 (2001)Google Scholar
  23. 23.
    Ohm, P.: Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev. 57(6), 1701–1777 (2010)Google Scholar
  24. 24.
    Pal, N.R., Pal, K., Keller, J.M., Bezdek, J.C.: A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Syst. 13(4), 517–530 (2005)CrossRefGoogle Scholar
  25. 25.
    Ramachandran, A., Singh, L., Porter, E., Nagle, F.: Exploring re-identification risks in public domains. In: Proceedings of the Tenth Annual International Conference on Privacy, Security and Trust, PST 2012, Paris, France, 16–18 July 2012, pp. 35–42. IEEE (2012)Google Scholar
  26. 26.
    Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertainty Fuzz. Knowl. Based Syst. 10(5), 557–570 (2002)MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Torra, V., Miyamoto, S.: Evaluating fuzzy clustering algorithms for microdata protection. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 175–186. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-25955-8_14 CrossRefGoogle Scholar
  28. 28.
    Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Sciences of TunisUniversity of Tunis El ManarTunisTunisia

Personalised recommendations