Hiding sensitive itemsets with multiple objective optimization

Abstract

Privacy-preserving data mining (PPDM) has become an important research topic, as it can hide sensitive information, while ensuring that information can still be extracted for decision making. While performing the sanitization progress for hiding the sensitive information, three side effects such as hiding failure, missing cost, and artificial cost happen at the same time. Several evolutionary algorithms were introduced to minimize those three side effects of PPDM using a single-objective function that generates one solution for sanitization. This paper presents a multiobjective algorithm (NSGA2DT) with two strategies for hiding sensitive information with transaction deletion based on the NSGA-II framework. To obtain better balance of side effects, the designed NSGA2DT takes database dissimilarity (Dis) as one more factor to achieve better performance in terms of four side effects. Moreover, instead of a single solution of the sanitization progress, the designed NSGA2DT provides more than one solutions than those of single-objective evolutionary algorithms, which shows flexibility to select the most appropriate transactions for deletion depending on user’s preference. A Fast SoRting strategy (FSR) and the pre-large concept are utilized, respectively, in this paper to find the optimized transactions for deletion and speed up the iterative process. Based on the developed NSGA2DT, the set of several Pareto solutions can be easily discovered, thus avoiding the problem of local optimization of single-objective approaches. Besides, the designed NSGA2DT does not require to set initial weights for evaluating the side effects, and thus, the results could not be seriously influenced by the predefined weights. Experimental results show that the proposed NSGA2DT provides satisfactory results with reduced side effects, compared to previous evolutionary approaches with single-objective function.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

References

  1. Agrawal R, Srikant R (1994a) Quest synthetic data generator. IBM Almaden Research Center. http://www.Almaden.ibm.com/cs/quest/syndata.html

  2. Agrawal R, Srikant R (1994b) Fast algorithms for mining association rules in large databases. In: The international conference on very large data base. pp 487–499

  3. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM international conference on management of data, vol 29. pp 439–450

  4. Cheng P, Lee I, Lin CW, Pan JS (2016) Association rule hiding based on evolutionary multi-objective optimization. Intell Data Anal 20(3):495–514

    Article  Google Scholar 

  5. Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering. pp 106–114

  6. Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: The international conference on database systems for advanced applications. pp 185–194

  7. Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–347

    Article  Google Scholar 

  8. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: The international workshop on information hiding. pp 369–383

  9. Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  10. Derigs U, Kabath M, Zils M (1999) Adaptive genetic algorithms: a methodology for dynamic autoconfiguration of genetic search algorithms. Meta-Heuristics. pp 231–248

  11. Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, vol 3876. pp 265–284

  12. Emmerich M, Beume N, Naujoks B (2005) An EMO algorithm using the hypervolume measure as selection criterion. In: The international conference on evolutionary multi-criterion optimization. pp 62–76

  13. Emmerich MTM, Deutz AH (2018) A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Nat Comput 17(3):585–609

    MathSciNet  Article  Google Scholar 

  14. Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: The international conference on genetic algorithms. pp 416–423

  15. Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z (2016) The SPMF open-source data mining library version 2. In: Joint European conference on machine learning and knowledge discovery in databases. pp 36–40

  16. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc, Boston

    Google Scholar 

  17. Han S, Ng WK (2007) Privacy-preserving genetic algorithms for rule discovery. In: The international conference on data warehousing and knowledge discovery. pp 407–417

  18. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87

    MathSciNet  Article  Google Scholar 

  19. Hasan ASMT, Jiang Q, Chen H, Wang S (2018) A new approach to privacy-preserving multiple independent data publishing. Appl Sci 8(5):1–22

    Article  Google Scholar 

  20. Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press, Cambridge

    Google Scholar 

  21. Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129

    Article  Google Scholar 

  22. Hong TP, Lin CW, Yang KT, Wang SL (2012) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510

    Article  Google Scholar 

  23. Hongcheng T (2012) An improved adaptive genetic algorithm. In: Knowledge discovery and data mining. pp 717–723

  24. Kalyani G, Chandra Sekhara Rao MVP, Janakiramaiah B (2017) Decision tree based data reconstruction for privacy preserving classification rule mining. Informatica 41:289–304

    MathSciNet  Google Scholar 

  25. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: IEEE international conference on neural networks. pp 1942–1948

  26. Knowles J, Corne D (1999) The pareto archived evolution strategy: a new baseline algorithm for pareto multiobjective optimisation. In: The congress on evolutionary computation. pp 98–105

  27. Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4:201–227

    Google Scholar 

  28. Lin CW, Zhang B, Yang KT, Hong TP (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. Sci World J 398269:1–13

    Google Scholar 

  29. Lin CW, Hong TP, Yang KT, Wang SL (2015) The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230

    Article  Google Scholar 

  30. Lin JCW, Liu Q, Fournier-Viger P (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53(C):1–18

    Google Scholar 

  31. Lin JCW, Yang L, Fournier-Viger P, Hong TP (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238

    Article  Google Scholar 

  32. Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: The annual international cryptology conference on advances in cryptology. pp 36–54

  33. Liu F, Li T (2018) A clustering k-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 4945152:1–8

    Google Scholar 

  34. Marco D, Sabrina O, Thomas S (2004) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39

    Google Scholar 

  35. Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582

    Article  Google Scholar 

  36. Motlagh FN, Sajedi H (2016) MOSAR: a multi-objective strategy for hiding sensitive association rules using genetic algorithm. Appl Artif Intell 30(9):823–843

    Article  Google Scholar 

  37. Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining. pp 43–54

  38. Ping G, Chunbo X, Yi C, Jing L, Yanqing L (2014) Adaptive ant colony optimization algorithm. In: The international conference on mechatronics and control. pp 95–98

  39. Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: The international conference on genetic algorithms, vol 2, no 1. pp 93–100

  40. Srinivas N, Deb K (1994) Multiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248

    Article  Google Scholar 

  41. Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359

    MathSciNet  Article  Google Scholar 

  42. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record 33:50–57

    Article  Google Scholar 

  43. Wu YH, Chiang CM, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19:29–42

    Article  Google Scholar 

  44. Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4):257–271

    Article  Google Scholar 

  45. Zitzler E, Laumanns M, Thiele L (2001) SPEA2: improving the strength Pareto evolutionary algorithm. In: Evolutionary methods for design, optimization and control with applications to industrial problems. pp 95–100

  46. Zhan ZH, Zhang J, Li Y, Chung HSH (2009) Adaptive particle swarm optimization. IEEE Trans Syst Man Cybern B 39(6):1362–1381

    Article  Google Scholar 

  47. Zhang Q, Li H (2007) MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the Shenzhen Technical Project under JCYJ20170307151733005 and KQJSCX20170726103424709.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jerry Chun-Wei Lin.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest in this paper.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lin, J.C., Zhang, Y., Zhang, B. et al. Hiding sensitive itemsets with multiple objective optimization. Soft Comput 23, 12779–12797 (2019). https://doi.org/10.1007/s00500-019-03829-3

Download citation

Keywords

  • PPDM
  • Sanitization
  • Evolutionary computation
  • Pre-large concept
  • Pareto solutions