Abstract
k-Anonymity by microaggregation is one of the most commonly used anonymization techniques. This success is owe to the achievement of a worth of interest trade-off between information loss and identity disclosure risk. However, this method may have some drawbacks. On the disclosure limitation side, there is a lack of protection against attribute disclosure. On the data utility side, dealing with a real datasets is a challenging task to achieve. Indeed, the latter are characterized by their large number of attributes and the presence of noisy data, such that outliers or, even, data with missing values. Generating an anonymous individual data useful for data mining tasks, while decreasing the influence of noisy data is a compelling task to achieve. In this paper, we introduce a new microaggregation method, called HM-pfsom, based on fuzzy possibilistic clustering. Our proposed method operates through an hybrid manner. This means that the anonymization process is applied per block of similar data. Thus, we can help to decrease the information loss during the anonymization process. The HM-pfsom approach proposes to study the distribution of confidential attributes within each sub-dataset. Then, according to the latter distribution, the privacy parameter k is determined, in such a way to preserve the diversity of confidential attributes within the anonymized microdata. This allows to decrease the disclosure risk of confidential information.
Similar content being viewed by others
References
Abidi B, Ben Yahia S (2013) Multi-PFKCN: a fuzzy possibilistic clustering algorithm based on neural network. In: Proceedings of international conference on fuzzy systems (FUZZ-IEEE 2013), Hyderabad, India, 7–10 July, 2013, IEEE, pp 1–8
Abidi B, Ben Yahia S, Bouzeghoub A (2012) A new algorithm for fuzzy clustering able to find the optimal number of clusters. In: Proceedings of 24th international conference on tools with artificial intelligence, ICTAI 2012, Athens, Greece, November 7–9, 2012, IEEE, pp 806–813
Aggarwal CC, Yu PS (2008) An introduction to privacy-preserving data mining. In: Privacy-preserving data mining—models and algorithms, advances in database systems, vol 34, Springer, pp 1–9
Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450
Bennardo A, Pagano M, Piccolo S (2015) Multiple bank lending, creditor rights, and information sharing. Rev Financ
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Tech. rep., Computational Aspects of Statistical Confidentiality
Chang C, Li Y, Huang W (2007) TFRP: an efficient microaggregation algorithm for statistical disclosure control. J Syst Softw 80(11):1866–1878
Chen K, Liu L (2008) A survey of multiplicative perturbation for privacy-preserving data mining. In: Privacy-preserving data mining—models and algorithms, advances in database systems, vol 34, Springer, pp 157–181
Chittaranjan C, Blom J, Gatica-Perez D (2013) Mining large-scale smartphone data for personality studies. Person Ubiq Comput 17(3):433–450
Chui M, Farrell D, Jackson K (2014) How government can promote open data. Tech. rep., McKinsey Global Institute
Ciriani V, di Vimercati SDC, Foresti S, Samarati P (2007) Microdata protection. In: Secure data management in decentralized systems, advances in information security, vol 33, Springer, pp 291–321
Curry E, Dustdar S, Sheng QZ, Sheth A (2016) Smart cities—enabling services and applications. J Internet Serv Appl 7:1
Domigo-Ferrer J, Solanas A, Martínez-Ballesté A (2006) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings oh the IEEE international conference on granular computing, GrC 2006, Atlanta, Georgia, USA, May 10–12, 2006, pp 774–777
Domingo-Ferrer J (2008) A survey of inference control methods for privacy-preserving data mining. In: Privacy-preserving data mining, vol 34, Springer, pp 53–80
Domingo-Ferrer J, González-Nicolás Ú (2010) Hybrid microdata using microaggregation. Inf Sci 180(15):2834–2844
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164–165(1):285–293
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’03, pp 505–510
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’02, pp 217–228
Garfinkel SL (2015) De-identification of personal information. Tech. rep., National Institute of Standards and Technologie
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, de Wolf P (2012) Statistical disclosure control. Wiley, Oxford
Johnson M, Egelman S, Bellovin SM (2012) Facebook and privacy: It’s complicated. In: Proceedings of the eighth symposium on usable privacy and security, ACM, SOUPS ’12, pp 9:1–9:15
Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4):387–414
Krishnapuram R, Keller J (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110
Lin J, Wen T, Hsieh J, Chang P (2010) Density-based microaggregation for statistical disclosure control. Expert Syst Appl 37(4):3256–3263
Liu K, Giannella C, Kargupta H (2008) A survey of attack techniques on privacy-preserving data perturbation methods. Springer, US, pp 359–381
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3
Martínez-Ballesté A, Solanas A, Domingo-Ferrer J, Mateo-Sanz JM (2007) A genetic approach to multivariate microaggregation for database privacy. In: Proceedings of the 23rd international conference on data engineering workshops, ICDE 2007, 15–20 April 2007, Istanbul, Turkey, pp 180–185
Matwin S (2013) Privacy-preserving data mining techniques: survey and challenges. Springer, Berlin, pp 209–221
Mivule K (2013) Utilizing noise addition for data privacy, an overview. Computing Research Repository (CoRR)
Nin J, Torra V (2009) Analysis of the univariate microaggregation disclosure risk. New Gener Comput 27(3):197–214
Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412
Novotny R, Kuchta R, Kadlec J (2014) Smart city concept, applications and services. J Telecommun Syst Manag 3:2
Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J United Nations Econ Comission Eur 18:345–354
Ohm P (2010) Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev 57(6):1701–1777
Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Tech. rep., Esprit SDC Project
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530
Peersman G (2014) Overview: data collection and analysis methods in impact evaluation. Methodological briefs—impact evaluation no. 10, UNICEF Office of Research
Rider AK, Chawla NV (2013) An ensemble topic model for sharing healthcare data and predicting disease risk. In: Proceedings of the international conference on bioinformatics, computational biology and biomedical informatics, ACM, BCB’13, pp 333–340
Solanas A, González-Nicolás Ú, Martínez-Ballesté A (2012) Mixing genetic algorithms and V-MDAV to protect microdata. In: Computational intelligence for privacy and security, pp 115–133
Solon O (2018) Facebook says cambridge analytica may have gained 37m more users’ data
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl Based Syst 10(5):557–570
Templ M (2008) Statistical disclosure control for microdata using the r-package sdcmicro. Trans Data Privacy 1(2):67–85
Templ M, Kowarik A, Meindl B (2015) Statistical disclosure control for micro-data using the r package sdcmicro. J Stat Softw 67:4
Teplitzky S (2014) Open data, [open] access: linking data sharing and article sharing in the earth sciences. J Lib Scholar Commun
Zoonen L (2016) Privacy concerns in smart cities. Gov Inf Q 2016:33
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abidi, B., Ben Yahia, S. & Perera, C. Hybrid microaggregation for privacy preserving data mining. J Ambient Intell Human Comput 11, 23–38 (2020). https://doi.org/10.1007/s12652-018-1122-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-018-1122-7