Hybrid microaggregation for privacy preserving data mining

Abidi, Balkis; Ben Yahia, Sadok; Perera, Charith

doi:10.1007/s12652-018-1122-7

Hybrid microaggregation for privacy preserving data mining

Original Research
Published: 26 November 2018

Volume 11, pages 23–38, (2020)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

494 Accesses
8 Citations
Explore all metrics

Abstract

k-Anonymity by microaggregation is one of the most commonly used anonymization techniques. This success is owe to the achievement of a worth of interest trade-off between information loss and identity disclosure risk. However, this method may have some drawbacks. On the disclosure limitation side, there is a lack of protection against attribute disclosure. On the data utility side, dealing with a real datasets is a challenging task to achieve. Indeed, the latter are characterized by their large number of attributes and the presence of noisy data, such that outliers or, even, data with missing values. Generating an anonymous individual data useful for data mining tasks, while decreasing the influence of noisy data is a compelling task to achieve. In this paper, we introduce a new microaggregation method, called HM-pfsom, based on fuzzy possibilistic clustering. Our proposed method operates through an hybrid manner. This means that the anonymization process is applied per block of similar data. Thus, we can help to decrease the information loss during the anonymization process. The HM-pfsom approach proposes to study the distribution of confidential attributes within each sub-dataset. Then, according to the latter distribution, the privacy parameter k is determined, in such a way to preserve the diversity of confidential attributes within the anonymized microdata. This allows to decrease the disclosure risk of confidential information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generating k-Anonymous Microdata by Fuzzy Possibilistic Clustering

New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control

Novel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control

References

Abidi B, Ben Yahia S (2013) Multi-PFKCN: a fuzzy possibilistic clustering algorithm based on neural network. In: Proceedings of international conference on fuzzy systems (FUZZ-IEEE 2013), Hyderabad, India, 7–10 July, 2013, IEEE, pp 1–8
Abidi B, Ben Yahia S, Bouzeghoub A (2012) A new algorithm for fuzzy clustering able to find the optimal number of clusters. In: Proceedings of 24th international conference on tools with artificial intelligence, ICTAI 2012, Athens, Greece, November 7–9, 2012, IEEE, pp 806–813
Aggarwal CC, Yu PS (2008) An introduction to privacy-preserving data mining. In: Privacy-preserving data mining—models and algorithms, advances in database systems, vol 34, Springer, pp 1–9
Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450
Article Google Scholar
Bennardo A, Pagano M, Piccolo S (2015) Multiple bank lending, creditor rights, and information sharing. Rev Financ
Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203
Article Google Scholar
Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Tech. rep., Computational Aspects of Statistical Confidentiality
Chang C, Li Y, Huang W (2007) TFRP: an efficient microaggregation algorithm for statistical disclosure control. J Syst Softw 80(11):1866–1878
Article Google Scholar
Chen K, Liu L (2008) A survey of multiplicative perturbation for privacy-preserving data mining. In: Privacy-preserving data mining—models and algorithms, advances in database systems, vol 34, Springer, pp 157–181
Chittaranjan C, Blom J, Gatica-Perez D (2013) Mining large-scale smartphone data for personality studies. Person Ubiq Comput 17(3):433–450
Article Google Scholar
Chui M, Farrell D, Jackson K (2014) How government can promote open data. Tech. rep., McKinsey Global Institute
Ciriani V, di Vimercati SDC, Foresti S, Samarati P (2007) Microdata protection. In: Secure data management in decentralized systems, advances in information security, vol 33, Springer, pp 291–321
Curry E, Dustdar S, Sheng QZ, Sheth A (2016) Smart cities—enabling services and applications. J Internet Serv Appl 7:1
Article Google Scholar
Domigo-Ferrer J, Solanas A, Martínez-Ballesté A (2006) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings oh the IEEE international conference on granular computing, GrC 2006, Atlanta, Georgia, USA, May 10–12, 2006, pp 774–777
Domingo-Ferrer J (2008) A survey of inference control methods for privacy-preserving data mining. In: Privacy-preserving data mining, vol 34, Springer, pp 53–80
Domingo-Ferrer J, González-Nicolás Ú (2010) Hybrid microdata using microaggregation. Inf Sci 180(15):2834–2844
Article Google Scholar
Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201
Article Google Scholar
Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164–165(1):285–293
Article MathSciNet Google Scholar
Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212
Article MathSciNet Google Scholar
Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’03, pp 505–510
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’02, pp 217–228
Garfinkel SL (2015) De-identification of personal information. Tech. rep., National Institute of Standards and Technologie
Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, de Wolf P (2012) Statistical disclosure control. Wiley, Oxford
Book Google Scholar
Johnson M, Egelman S, Bellovin SM (2012) Facebook and privacy: It’s complicated. In: Proceedings of the eighth symposium on usable privacy and security, ACM, SOUPS ’12, pp 9:1–9:15
Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4):387–414
Article Google Scholar
Krishnapuram R, Keller J (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110
Article Google Scholar
Lin J, Wen T, Hsieh J, Chang P (2010) Density-based microaggregation for statistical disclosure control. Expert Syst Appl 37(4):3256–3263
Article Google Scholar
Liu K, Giannella C, Kargupta H (2008) A survey of attack techniques on privacy-preserving data perturbation methods. Springer, US, pp 359–381
Google Scholar
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3
Article Google Scholar
Martínez-Ballesté A, Solanas A, Domingo-Ferrer J, Mateo-Sanz JM (2007) A genetic approach to multivariate microaggregation for database privacy. In: Proceedings of the 23rd international conference on data engineering workshops, ICDE 2007, 15–20 April 2007, Istanbul, Turkey, pp 180–185
Matwin S (2013) Privacy-preserving data mining techniques: survey and challenges. Springer, Berlin, pp 209–221
Google Scholar
Mivule K (2013) Utilizing noise addition for data privacy, an overview. Computing Research Repository (CoRR)
Nin J, Torra V (2009) Analysis of the univariate microaggregation disclosure risk. New Gener Comput 27(3):197–214
Article Google Scholar
Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412
Article Google Scholar
Novotny R, Kuchta R, Kadlec J (2014) Smart city concept, applications and services. J Telecommun Syst Manag 3:2
Google Scholar
Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J United Nations Econ Comission Eur 18:345–354
Google Scholar
Ohm P (2010) Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev 57(6):1701–1777
Google Scholar
Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Tech. rep., Esprit SDC Project
Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530
Article Google Scholar
Peersman G (2014) Overview: data collection and analysis methods in impact evaluation. Methodological briefs—impact evaluation no. 10, UNICEF Office of Research
Rider AK, Chawla NV (2013) An ensemble topic model for sharing healthcare data and predicting disease risk. In: Proceedings of the international conference on bioinformatics, computational biology and biomedical informatics, ACM, BCB’13, pp 333–340
Solanas A, González-Nicolás Ú, Martínez-Ballesté A (2012) Mixing genetic algorithms and V-MDAV to protect microdata. In: Computational intelligence for privacy and security, pp 115–133
Chapter Google Scholar
Solon O (2018) Facebook says cambridge analytica may have gained 37m more users’ data
Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl Based Syst 10(5):557–570
Article MathSciNet Google Scholar
Templ M (2008) Statistical disclosure control for microdata using the r-package sdcmicro. Trans Data Privacy 1(2):67–85
MathSciNet Google Scholar
Templ M, Kowarik A, Meindl B (2015) Statistical disclosure control for micro-data using the r package sdcmicro. J Stat Softw 67:4
Article Google Scholar
Teplitzky S (2014) Open data, [open] access: linking data sharing and article sharing in the earth sciences. J Lib Scholar Commun
Zoonen L (2016) Privacy concerns in smart cities. Gov Inf Q 2016:33
Google Scholar

Download references

Author information

Authors and Affiliations

LIPAH, Faculty of Sciences of Tunis, University of El-Manar, Tunis, Tunisia
Balkis Abidi & Sadok Ben Yahia
School of Computing Science, Newcastle University, Newcastle, UK
Charith Perera

Authors

Balkis Abidi
View author publications
You can also search for this author in PubMed Google Scholar
Sadok Ben Yahia
View author publications
You can also search for this author in PubMed Google Scholar
Charith Perera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Balkis Abidi.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abidi, B., Ben Yahia, S. & Perera, C. Hybrid microaggregation for privacy preserving data mining. J Ambient Intell Human Comput 11, 23–38 (2020). https://doi.org/10.1007/s12652-018-1122-7

Download citation

Received: 11 June 2018
Accepted: 01 November 2018
Published: 26 November 2018
Issue Date: January 2020
DOI: https://doi.org/10.1007/s12652-018-1122-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybrid microaggregation for privacy preserving data mining

Abstract

Access this article

Similar content being viewed by others

Generating k-Anonymous Microdata by Fuzzy Possibilistic Clustering

New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control

Novel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid microaggregation for privacy preserving data mining

Abstract

Access this article

Similar content being viewed by others

Generating k-Anonymous Microdata by Fuzzy Possibilistic Clustering

New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control

Novel Iterative Min-Max Clustering to Minimize Information Loss in Statistical Disclosure Control

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation