Skip to main content
Log in

Hiding sensitive frequent itemsets by item removal via two-level multi-objective optimization

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Privacy Preserving Data Mining (PPDM) is an important research area in data mining, which aims at protecting the privacy during the data mining process so that personal data and sensitive information is not revealed to unauthorized persons. PPDM is a critical task as data often contain sensitive information about individuals that can easily compromise their privacy such as their financial status, political beliefs or medical history. In particular, many algorithms were proposed to hide sensitive frequent itemsets (values that frequently co-occur) in a transaction database. However, a major problem is that those algorithms are based on removing entire transactions (records) instead of single items (values). Hence, many changes are made to databases, which lead to losing useful information. To avoid making too many changes to a database while still preserving privacy, this paper proposes a novel PPDM algorithm called NSGAII4ID for hiding sensitive frequent itemsets by only removing some items from transactions rather than removing whole transactions. To our best knowledge, this is the first paper exploring this approach. Sanitization is treated as a multi-objective optimization problem where the goal is to reduce four side effects, namely hiding failure, missing cost, artificial cost, and database dissimilarity. However, we observe that only three side effects matter in our case (the artificial cost is always zero). Besides, another key difference is that NSGAII4ID sanitizes a database by performing optimization at two levels. At the transaction level, a multi-objective optimization algorithm is applied to find a subset of candidate transactions to be modified. Then, at the item level, NSGAII4ID searches for an optimal subset of items to be removed from each candidate transaction. We show that this problem is exactly the Set cover Problem (SCP) which we solve by using a fast greedy polynomial algorithm. We have conducted extensive experiments on four datasets to compare NSGAII4DT with state-of-the-art PPDM algorithms in terms of runtime, memory cost and total number of removed items during sanitization. The obtained results show that NSGAII4DT achieves a very good balance in minimizing side effects compared with the state-of-the art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. For transactions having TID = 2 and TID = 5, the GSCA could have returned another minimal set of victim items which is {i3}.

  2. the source code of our algorithm is available on the Github at the following address: https://github.com/mira34/NSGAII4ID-algorithm

  3. The generator may be donwloaded at : https://sourceforge.net/projects/ibmquestdatagen/

References

  1. AbdelAziz A, Alarabi L, Basalamah S, Hendawi A (2021) A multi-objective optimization method for hospital admission problem—a case study on covid-19 patients. Algorithms 14(2):38

    Article  MathSciNet  Google Scholar 

  2. Abdollahzadeh B, Gharehchopogh F (2021) A multi-objective optimization algorithm for feature selection problems. Engineering with Computers

  3. Aggarwal C, Pei J, Zhang B (2006) On privacy preservation against adversarial data mining. In: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 510–516

  4. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases (VLDB), pp 487–499

  5. Ahmed U, Lin J, Srivastava G, Djenouri Y (2021a) A deep q-learning sanitization approach for privacy preserving data mining. In: International Conference on Distributed Computing and Networking, pp 43–48

  6. Ahmed U, Srivastava G, Lin J (2021b) A machine learning model for data sanitization. Comput Netw 189:107914

    Article  Google Scholar 

  7. Aminifar A, Rabbi F, Lamo KPY (2021) Privacy preserving distributed extremely randomized trees. In: Proc. 36th Annual ACM Symposium on Applied Computing, pp 1102–1105

  8. Bilal N, Galinier p, Guibault F (2013) A new formulation of the set covering problem for metaheuristic approaches. International Scholarly Research Notices

  9. Coleto-Alcudia V, Vega-Rodríguez M (2022) A multi-objective optimization approach for the identification of cancer biomarkers from rna-seq data. Expert Syst Appl 193:116480

    Article  Google Scholar 

  10. Cormen T, Leinserson C, Rivest R, Stein C (2009) Introduction to Algorithms, third edition. The MIT Press

  11. Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multi-objective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  12. Ewees A, Elaziz M A, Oliva D (2021) A new multi-objective optimization algorithm combined with opposition-based learning. Expert Syst Appl 165:113844

    Article  Google Scholar 

  13. Fournier-Viger P, Lin J, Gomariz A, Gueniche T, Soltani A, Deng Z (2016) The spmf open-source data mining library version 2. In: Proc. Joint European conference on machine learning and knowledge discovery in databases, pp 36–40

  14. Fournier-Viger P, Lin J, Vo B, Chi T, Zhang J, Le H (2017) A survey of itemset mining. Wiley Interdiscip Rev: Data Min Knowl Discov 7(4):e1207

  15. Goyal S, Bedi P, Rajawat A, Shaw RN, Ghosh A (2022) Multi-objective fuzzy-swarm optimizer for data partitioning. In: Proceedings of Advanced Computing and Intelligent Technologies, pp 307–318

  16. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  17. Hong T, Lin C, Yang K, Wang S (2013) Using tf-idf to hide sensitive itemsets. Appl Intell 38(4):502–510

    Article  Google Scholar 

  18. Kalia P, Bansal D, Sofat S (2021) A hybrid approach for preserving privacy for real estate data. Int J Inf Comput Secur 15(4):400–410

    Google Scholar 

  19. Kousika N, Premalatha K (2021) An improved privacy-preserving data mining technique using singular value decomposition with three-dimensional rotation data perturbation. The Journal of Supercomputing

  20. Lee J, Jun S (2021) Privacy-preserving data mining for open government data from heterogeneous sources. Gov Inf Q 38(1):101544

    Article  MathSciNet  Google Scholar 

  21. Lin C, Hong T, Chang C, Wang S (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multim Signal Process 4(4):201–214

    Google Scholar 

  22. Lin C, Zhang B, Yang K, TP TH (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. The Scientific World Journal, Hindawi

  23. Lin C, Hong T, Yang K, Wang S (2015) The ga-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230

    Article  Google Scholar 

  24. Lin C, Liu Q, Fournier-Viger P (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53:1–18

    Article  Google Scholar 

  25. Lin J, Wu J, Fournier-Viger P, Djenouri Y, Chen C, Zhang Y (2019a) A sanitization approach to secure shared data in an iot environment. IEEE Access 7:25359–25368

    Article  Google Scholar 

  26. Lin J, Zhang B, Fournier-Viger P, Djenouri Y (2019b) Hiding sensitive itemsets with multiple objective optimization. Soft Computing 23(23):12779–12797

    Article  Google Scholar 

  27. Lin J, Srivastava G, Zhang Y, Djenouri Y, Aloqaily M (2020) Privacy-preserving multiobjective sanitization model in 6g iot environments. IEEE Internet Things J 8(7):5340–5349

    Article  Google Scholar 

  28. Merani M, Croce D, Tinnirello I (2021) Rings for privacy: an architecture for large scale privacy-preserving data mining. IEEE Trans Parallel Distrib Syst 32(6):1340–1352

    Article  Google Scholar 

  29. Nasiri N, Keyvanpour M (2020) Classification and evaluation of privacy preserving data mining methods. In: Proc. 11th IEEE International Conference on Information and Knowledge Technology (IKT), pp 17–22

  30. Oliveira S, Zaiane O (2003) Protecting sensitive knowledge by data sanitization. In: Proc. Third IEEE International conference on data mining, pp 613–616

  31. Pika A, Wynn M, Budiono S, Hofstede AT, der Aalst WV, Reijers H (2020) Privacy-preserving process mining in healthcare. Int J Environ Res 17(5):1612

    Google Scholar 

  32. Shastri M, Pandit A (2021) Remodeling: improved privacy preserving data mining (ppdm). Int J Inf Technol 13(1):131–137

    Google Scholar 

  33. Virupaksha S, Dondeti V (2021) Anonymized noise addition in subspaces for privacy preserved data mining in high dimensional continuous data. Peer Peer Netw Appl 14(3):1608–1628

    Article  Google Scholar 

  34. Wang J, Zhang J, Bao W, Zhu X, Cao B, Yu P (2018a) Not just privacy: Improving performance of private deep learning in mobile cloud. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2407–2416

  35. Wang J, Wu L, Zeadally S, Khan M, He D (2021) Privacy-preserving data aggregation against malicious data mining attack for iot-enabled smart grid. ACM Transactions on Sensor Networks (TOSN) 17 (3):1–25

    Google Scholar 

  36. Wang R, Lai S, Wu G, Xing L, Wang L, Ishibuchi H (2018b) Multi-clustering via evolutionary multi-objective optimization. Inf Sci 450:128–140

    Article  MathSciNet  MATH  Google Scholar 

  37. Wu J, Zhan J, Lin J (2017) Ant colony system sanitization approach to hiding sensitive itemsets. IEEE Access 5:10024–10039

    Article  Google Scholar 

  38. Wu J, Lin J, FV P, Djenouri Y, Chen C, Li Z (2019a) The density-based clustering method for privacy-preserving data mining. Math Biosci Eng 16(3):1718–1728

    Article  MathSciNet  MATH  Google Scholar 

  39. Wu J, Srivastava G, Jolfaei A, Fournier-Viger P, JCW JL (2021a) Hiding sensitive information in ehealth datasets. Futur Gener Comput Syst 117:169–180

    Article  Google Scholar 

  40. Wu J, Srivastava G, Tayeb S, Lin J (2021b) A pso-based sanitization process with multi-thresholds model. In: Proceedings of international conference on pattern recognition, pp 439–446

  41. Wu J, Srivastava G, Yun U, Tayeb S, Lin J (2021c) An evolutionary computation-based privacy-preserving data mining model under a multi threshold constraint. Trans Trans Emerg Telecommun Technol 32 (3):1–17

    Google Scholar 

  42. Wu T, Lin J, Zhang Y, Chen C (2019b) A grid-based swarm intelligence algorithm for privacy-preserving data mining. Appl Sci 9(4):774

    Article  Google Scholar 

  43. Zhang X, Jiang S, Liu Y, Jiang T, Zhou Y (2021) Privacy-preserving scheme with account-mapping and noise-adding for energy trading based on consortium blockchain. IEEE Transactions on Network and Service Management. https://doi.org/10.1109/TNSM.2021.3110980

  44. Zorarpacıimath E, Özel S (2021a) Privacy preserving classification over differentially private data. Wiley Interdiscip Rev: Data Min Knowl Discov 11(3):e1399

    Google Scholar 

  45. Zorarpacıimath E, Özel S (2021b) Privacy preserving rule-based classifier using modified artificial bee colony algorithm. Expert Syst Appl 183:115437

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Fournier-Viger.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lefkir, M., Nouioua, F. & Fournier-Viger, P. Hiding sensitive frequent itemsets by item removal via two-level multi-objective optimization. Appl Intell 53, 10027–10052 (2023). https://doi.org/10.1007/s10489-022-03808-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03808-6

Keywords

Navigation