Skip to main content
Log in

Hybrid microaggregation for privacy preserving data mining

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

k-Anonymity by microaggregation is one of the most commonly used anonymization techniques. This success is owe to the achievement of a worth of interest trade-off between information loss and identity disclosure risk. However, this method may have some drawbacks. On the disclosure limitation side, there is a lack of protection against attribute disclosure. On the data utility side, dealing with a real datasets is a challenging task to achieve. Indeed, the latter are characterized by their large number of attributes and the presence of noisy data, such that outliers or, even, data with missing values. Generating an anonymous individual data useful for data mining tasks, while decreasing the influence of noisy data is a compelling task to achieve. In this paper, we introduce a new microaggregation method, called HM-pfsom, based on fuzzy possibilistic clustering. Our proposed method operates through an hybrid manner. This means that the anonymization process is applied per block of similar data. Thus, we can help to decrease the information loss during the anonymization process. The HM-pfsom approach proposes to study the distribution of confidential attributes within each sub-dataset. Then, according to the latter distribution, the privacy parameter k is determined, in such a way to preserve the diversity of confidential attributes within the anonymized microdata. This allows to decrease the disclosure risk of confidential information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Abidi B, Ben Yahia S (2013) Multi-PFKCN: a fuzzy possibilistic clustering algorithm based on neural network. In: Proceedings of international conference on fuzzy systems (FUZZ-IEEE 2013), Hyderabad, India, 7–10 July, 2013, IEEE, pp 1–8

  • Abidi B, Ben Yahia S, Bouzeghoub A (2012) A new algorithm for fuzzy clustering able to find the optimal number of clusters. In: Proceedings of 24th international conference on tools with artificial intelligence, ICTAI 2012, Athens, Greece, November 7–9, 2012, IEEE, pp 806–813

  • Aggarwal CC, Yu PS (2008) An introduction to privacy-preserving data mining. In: Privacy-preserving data mining—models and algorithms, advances in database systems, vol 34, Springer, pp 1–9

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. ACM SIGMOD Rec 29(2):439–450

    Article  Google Scholar 

  • Bennardo A, Pagano M, Piccolo S (2015) Multiple bank lending, creditor rights, and information sharing. Rev Financ

  • Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy c-means clustering algorithm. Comput Geosci 10(2–3):191–203

    Article  Google Scholar 

  • Brand R, Domingo-Ferrer J, Mateo-Sanz JM (2002) Reference data sets to test and compare SDC methods for protection of numerical microdata. Tech. rep., Computational Aspects of Statistical Confidentiality

  • Chang C, Li Y, Huang W (2007) TFRP: an efficient microaggregation algorithm for statistical disclosure control. J Syst Softw 80(11):1866–1878

    Article  Google Scholar 

  • Chen K, Liu L (2008) A survey of multiplicative perturbation for privacy-preserving data mining. In: Privacy-preserving data mining—models and algorithms, advances in database systems, vol 34, Springer, pp 157–181

  • Chittaranjan C, Blom J, Gatica-Perez D (2013) Mining large-scale smartphone data for personality studies. Person Ubiq Comput 17(3):433–450

    Article  Google Scholar 

  • Chui M, Farrell D, Jackson K (2014) How government can promote open data. Tech. rep., McKinsey Global Institute

  • Ciriani V, di Vimercati SDC, Foresti S, Samarati P (2007) Microdata protection. In: Secure data management in decentralized systems, advances in information security, vol 33, Springer, pp 291–321

  • Curry E, Dustdar S, Sheng QZ, Sheth A (2016) Smart cities—enabling services and applications. J Internet Serv Appl 7:1

    Article  Google Scholar 

  • Domigo-Ferrer J, Solanas A, Martínez-Ballesté A (2006) Privacy in statistical databases: k-anonymity through microaggregation. In: Proceedings oh the IEEE international conference on granular computing, GrC 2006, Atlanta, Georgia, USA, May 10–12, 2006, pp 774–777

  • Domingo-Ferrer J (2008) A survey of inference control methods for privacy-preserving data mining. In: Privacy-preserving data mining, vol 34, Springer, pp 53–80

  • Domingo-Ferrer J, González-Nicolás Ú (2010) Hybrid microdata using microaggregation. Inf Sci 180(15):2834–2844

    Article  Google Scholar 

  • Domingo-Ferrer J, Mateo-Sanz JM (2002) Practical data-oriented microaggregation for statistical disclosure control. IEEE Trans Knowl Data Eng 14(1):189–201

    Article  Google Scholar 

  • Domingo-Ferrer J, Torra V (2004) Disclosure risk assessment in statistical data protection. J Comput Appl Math 164–165(1):285–293

    Article  MathSciNet  Google Scholar 

  • Domingo-Ferrer J, Torra V (2005) Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Min Knowl Discov 11(2):195–212

    Article  MathSciNet  Google Scholar 

  • Du W, Zhan Z (2003) Using randomized response techniques for privacy-preserving data mining. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’03, pp 505–510

  • Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, KDD ’02, pp 217–228

  • Garfinkel SL (2015) De-identification of personal information. Tech. rep., National Institute of Standards and Technologie

  • Hundepool A, Domingo-Ferrer J, Franconi L, Giessing S, Nordholt ES, Spicer K, de Wolf P (2012) Statistical disclosure control. Wiley, Oxford

    Book  Google Scholar 

  • Johnson M, Egelman S, Bellovin SM (2012) Facebook and privacy: It’s complicated. In: Proceedings of the eighth symposium on usable privacy and security, ACM, SOUPS ’12, pp 9:1–9:15

  • Kargupta H, Datta S, Wang Q, Sivakumar K (2005) Random-data perturbation techniques and privacy-preserving data mining. Knowl Inf Syst 7(4):387–414

    Article  Google Scholar 

  • Krishnapuram R, Keller J (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Syst 1(2):98–110

    Article  Google Scholar 

  • Lin J, Wen T, Hsieh J, Chang P (2010) Density-based microaggregation for statistical disclosure control. Expert Syst Appl 37(4):3256–3263

    Article  Google Scholar 

  • Liu K, Giannella C, Kargupta H (2008) A survey of attack techniques on privacy-preserving data perturbation methods. Springer, US, pp 359–381

    Google Scholar 

  • Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) L-diversity: privacy beyond k-anonymity. ACM Trans Knowl Discov Data 1(1):3

    Article  Google Scholar 

  • Martínez-Ballesté A, Solanas A, Domingo-Ferrer J, Mateo-Sanz JM (2007) A genetic approach to multivariate microaggregation for database privacy. In: Proceedings of the 23rd international conference on data engineering workshops, ICDE 2007, 15–20 April 2007, Istanbul, Turkey, pp 180–185

  • Matwin S (2013) Privacy-preserving data mining techniques: survey and challenges. Springer, Berlin, pp 209–221

    Google Scholar 

  • Mivule K (2013) Utilizing noise addition for data privacy, an overview. Computing Research Repository (CoRR)

  • Nin J, Torra V (2009) Analysis of the univariate microaggregation disclosure risk. New Gener Comput 27(3):197–214

    Article  Google Scholar 

  • Nin J, Herranz J, Torra V (2008) On the disclosure risk of multivariate microaggregation. Data Knowl Eng 67(3):399–412

    Article  Google Scholar 

  • Novotny R, Kuchta R, Kadlec J (2014) Smart city concept, applications and services. J Telecommun Syst Manag 3:2

    Google Scholar 

  • Oganian A, Domingo-Ferrer J (2001) On the complexity of optimal microaggregation for statistical disclosure control. Stat J United Nations Econ Comission Eur 18:345–354

    Google Scholar 

  • Ohm P (2010) Broken promises of privacy: responding to the surprising failure of anonymization. UCLA Law Rev 57(6):1701–1777

    Google Scholar 

  • Pagliuca D, Seri G (1999) Some results of individual ranking method on the system of enterprise accounts annual survey. Tech. rep., Esprit SDC Project

  • Pal NR, Pal K, Keller JM, Bezdek JC (2005) A possibilistic fuzzy c-means clustering algorithm. IEEE Trans Fuzzy Syst 13(4):517–530

    Article  Google Scholar 

  • Peersman G (2014) Overview: data collection and analysis methods in impact evaluation. Methodological briefs—impact evaluation no. 10, UNICEF Office of Research

  • Rider AK, Chawla NV (2013) An ensemble topic model for sharing healthcare data and predicting disease risk. In: Proceedings of the international conference on bioinformatics, computational biology and biomedical informatics, ACM, BCB’13, pp 333–340

  • Solanas A, González-Nicolás Ú, Martínez-Ballesté A (2012) Mixing genetic algorithms and V-MDAV to protect microdata. In: Computational intelligence for privacy and security, pp 115–133

    Chapter  Google Scholar 

  • Solon O (2018) Facebook says cambridge analytica may have gained 37m more users’ data

  • Sweeney L (2002) K-anonymity: a model for protecting privacy. Int J Uncertain Fuzz Knowl Based Syst 10(5):557–570

    Article  MathSciNet  Google Scholar 

  • Templ M (2008) Statistical disclosure control for microdata using the r-package sdcmicro. Trans Data Privacy 1(2):67–85

    MathSciNet  Google Scholar 

  • Templ M, Kowarik A, Meindl B (2015) Statistical disclosure control for micro-data using the r package sdcmicro. J Stat Softw 67:4

    Article  Google Scholar 

  • Teplitzky S (2014) Open data, [open] access: linking data sharing and article sharing in the earth sciences. J Lib Scholar Commun

  • Zoonen L (2016) Privacy concerns in smart cities. Gov Inf Q 2016:33

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Balkis Abidi.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abidi, B., Ben Yahia, S. & Perera, C. Hybrid microaggregation for privacy preserving data mining. J Ambient Intell Human Comput 11, 23–38 (2020). https://doi.org/10.1007/s12652-018-1122-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-018-1122-7

Keywords

Navigation