Hybrid microaggregation for privacy preserving data mining

  • Balkis Abidi
  • Sadok Ben Yahia
  • Charith Perera
Original Research


k-Anonymity by microaggregation is one of the most commonly used anonymization techniques. This success is owe to the achievement of a worth of interest trade-off between information loss and identity disclosure risk. However, this method may have some drawbacks. On the disclosure limitation side, there is a lack of protection against attribute disclosure. On the data utility side, dealing with a real datasets is a challenging task to achieve. Indeed, the latter are characterized by their large number of attributes and the presence of noisy data, such that outliers or, even, data with missing values. Generating an anonymous individual data useful for data mining tasks, while decreasing the influence of noisy data is a compelling task to achieve. In this paper, we introduce a new microaggregation method, called HM-pfsom, based on fuzzy possibilistic clustering. Our proposed method operates through an hybrid manner. This means that the anonymization process is applied per block of similar data. Thus, we can help to decrease the information loss during the anonymization process. The HM-pfsom approach proposes to study the distribution of confidential attributes within each sub-dataset. Then, according to the latter distribution, the privacy parameter k is determined, in such a way to preserve the diversity of confidential attributes within the anonymized microdata. This allows to decrease the disclosure risk of confidential information.


Hybrid micoaggregation Information loss Identity disclosure risk Attribute disclosure risk Fuzzy and possibilistic clustering 



  1. Abidi B, Ben Yahia S (2013) Multi-PFKCN: a fuzzy possibilistic clustering algorithm based on neural network. In: Proceedings of international conference on fuzzy systems (FUZZ-IEEE 2013), Hyderabad, India, 7–10 July, 2013, IEEE, pp 1–8Google Scholar
  2. Abidi B, Ben Yahia S, Bouzeghoub A (2012) A new algorithm for fuzzy clustering able to find the optimal number of clusters. In: Proceedings of 24th international conference on tools with artificial intelligence, ICTAI 2012, Athens, Greece, November 7–9, 2012, IEEE, pp 806–813Google Scholar
  3. Aggarwal CC, Yu PS (2008) An introduction to privacy-preserving data mining. In: Privacy-preserving data mining—models and algorithms, advances in database systems, vol 34, Springer, pp 1–9Google Scholar
  4. Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450CrossRefGoogle Scholar
  5. Bennardo A, Pagano M, Piccolo S (2015) Multiple bank lending, creditor rights, and information sharing. Rev FinancGoogle Scholar
  6. Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203CrossRefGoogle Scholar
  7. Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Tech. rep., Computational Aspects of Statistical ConfidentialityGoogle Scholar
  8. Chang C, Li Y, Huang W (2007) TFRP: an efficient microaggregation algorithm for statistical disclosure control. J Syst Softw 80(11):1866–1878CrossRefGoogle Scholar
  9. Chen K, Liu L (2008) A survey of multiplicative perturbation for privacy-preserving data mining. In: Privacy-preserving data mining—models and algorithms, advances in database systems, vol 34, Springer, pp 157–181Google Scholar
  10. Chittaranjan C, Blom J, Gatica-Perez D (2013) Mining large-scale smartphone data for personality studies. Person Ubiq Comput 17(3):433–450CrossRefGoogle Scholar
  11. Chui M, Farrell D, Jackson K (2014) How government can promote open data. Tech. rep., McKinsey Global InstituteGoogle Scholar
  12. Ciriani V, di Vimercati SDC, Foresti S, Samarati P (2007) Microdata protection. In: Secure data management in decentralized systems, advances in information security, vol 33, Springer, pp 291–321Google Scholar
  13. Curry E, Dustdar S, Sheng QZ, Sheth A (2016) Smart cities—enabling services and applications. J Internet Serv Appl 7:1CrossRefGoogle Scholar
  14. Domigo-Ferrer J, Solanas A, Martínez-Ballesté A (2006) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings oh the IEEE international conference on granular computing, GrC 2006, Atlanta, Georgia, USA, May 10–12, 2006, pp 774–777Google Scholar
  15. Domingo-Ferrer J (2008) A survey of inference control methods for privacy-preserving data mining. In: Privacy-preserving data mining, vol 34, Springer, pp 53–80Google Scholar
  16. Domingo-Ferrer J, González-Nicolás Ú (2010) Hybrid microdata using microaggregation. Inf Sci 180(15):2834–2844CrossRefGoogle Scholar
  17. Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201CrossRefGoogle Scholar
  18. Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164–165(1):285–293MathSciNetCrossRefGoogle Scholar
  19. Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212MathSciNetCrossRefGoogle Scholar
  20. Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’03, pp 505–510Google Scholar
  21. Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’02, pp 217–228Google Scholar
  22. Garfinkel SL (2015) De-identification of personal information. Tech. rep., National Institute of Standards and TechnologieGoogle Scholar
  23. Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, de Wolf P (2012) Statistical disclosure control. Wiley, OxfordCrossRefGoogle Scholar
  24. Johnson M, Egelman S, Bellovin SM (2012) Facebook and privacy: It’s complicated. In: Proceedings of the eighth symposium on usable privacy and security, ACM, SOUPS ’12, pp 9:1–9:15Google Scholar
  25. Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4):387–414CrossRefGoogle Scholar
  26. Krishnapuram R, Keller J (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110CrossRefGoogle Scholar
  27. Lin J, Wen T, Hsieh J, Chang P (2010) Density-based microaggregation for statistical disclosure control. Expert Syst Appl 37(4):3256–3263CrossRefGoogle Scholar
  28. Liu K, Giannella C, Kargupta H (2008) A survey of attack techniques on privacy-preserving data perturbation methods. Springer, US, pp 359–381Google Scholar
  29. Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3CrossRefGoogle Scholar
  30. Martínez-Ballesté A, Solanas A, Domingo-Ferrer J, Mateo-Sanz JM (2007) A genetic approach to multivariate microaggregation for database privacy. In: Proceedings of the 23rd international conference on data engineering workshops, ICDE 2007, 15–20 April 2007, Istanbul, Turkey, pp 180–185Google Scholar
  31. Matwin S (2013) Privacy-preserving data mining techniques: survey and challenges. Springer, Berlin, pp 209–221Google Scholar
  32. Mivule K (2013) Utilizing noise addition for data privacy, an overview. Computing Research Repository (CoRR)Google Scholar
  33. Nin J, Torra V (2009) Analysis of the univariate microaggregation disclosure risk. New Gener Comput 27(3):197–214CrossRefGoogle Scholar
  34. Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412CrossRefGoogle Scholar
  35. Novotny R, Kuchta R, Kadlec J (2014) Smart city concept, applications and services. J Telecommun Syst Manag 3:2Google Scholar
  36. Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J United Nations Econ Comission Eur 18:345–354Google Scholar
  37. Ohm P (2010) Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev 57(6):1701–1777Google Scholar
  38. Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Tech. rep., Esprit SDC ProjectGoogle Scholar
  39. Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530CrossRefGoogle Scholar
  40. Peersman G (2014) Overview: data collection and analysis methods in impact evaluation. Methodological briefs—impact evaluation no. 10, UNICEF Office of ResearchGoogle Scholar
  41. Rider AK, Chawla NV (2013) An ensemble topic model for sharing healthcare data and predicting disease risk. In: Proceedings of the international conference on bioinformatics, computational biology and biomedical informatics, ACM, BCB’13, pp 333–340Google Scholar
  42. Solanas A, González-Nicolás Ú, Martínez-Ballesté A (2012) Mixing genetic algorithms and V-MDAV to protect microdata. In: Computational intelligence for privacy and security, pp 115–133CrossRefGoogle Scholar
  43. Solon O (2018) Facebook says cambridge analytica may have gained 37m more users’ dataGoogle Scholar
  44. Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl Based Syst 10(5):557–570MathSciNetCrossRefGoogle Scholar
  45. Templ M (2008) Statistical disclosure control for microdata using the r-package sdcmicro. Trans Data Privacy 1(2):67–85MathSciNetGoogle Scholar
  46. Templ M, Kowarik A, Meindl B (2015) Statistical disclosure control for micro-data using the r package sdcmicro. J Stat Softw 67:4CrossRefGoogle Scholar
  47. Teplitzky S (2014) Open data, [open] access: linking data sharing and article sharing in the earth sciences. J Lib Scholar CommunGoogle Scholar
  48. Zoonen L (2016) Privacy concerns in smart cities. Gov Inf Q 2016:33Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.LIPAH, Faculty of Sciences of TunisUniversity of El-ManarTunisTunisia
  2. 2.School of Computing ScienceNewcastle UniversityNewcastleUK

Personalised recommendations