Advertisement

Hiding sensitive itemsets with multiple objective optimization

  • Jerry Chun-Wei LinEmail author
  • Yuyu Zhang
  • Binbin Zhang
  • Philippe Fournier-Viger
  • Youcef Djenouri
Methodologies and Application
  • 13 Downloads

Abstract

Privacy-preserving data mining (PPDM) has become an important research topic, as it can hide sensitive information, while ensuring that information can still be extracted for decision making. While performing the sanitization progress for hiding the sensitive information, three side effects such as hiding failure, missing cost, and artificial cost happen at the same time. Several evolutionary algorithms were introduced to minimize those three side effects of PPDM using a single-objective function that generates one solution for sanitization. This paper presents a multiobjective algorithm (NSGA2DT) with two strategies for hiding sensitive information with transaction deletion based on the NSGA-II framework. To obtain better balance of side effects, the designed NSGA2DT takes database dissimilarity (Dis) as one more factor to achieve better performance in terms of four side effects. Moreover, instead of a single solution of the sanitization progress, the designed NSGA2DT provides more than one solutions than those of single-objective evolutionary algorithms, which shows flexibility to select the most appropriate transactions for deletion depending on user’s preference. A Fast SoRting strategy (FSR) and the pre-large concept are utilized, respectively, in this paper to find the optimized transactions for deletion and speed up the iterative process. Based on the developed NSGA2DT, the set of several Pareto solutions can be easily discovered, thus avoiding the problem of local optimization of single-objective approaches. Besides, the designed NSGA2DT does not require to set initial weights for evaluating the side effects, and thus, the results could not be seriously influenced by the predefined weights. Experimental results show that the proposed NSGA2DT provides satisfactory results with reduced side effects, compared to previous evolutionary approaches with single-objective function.

Keywords

PPDM Sanitization Evolutionary computation Pre-large concept Pareto solutions 

Notes

Acknowledgements

This research was partially supported by the Shenzhen Technical Project under JCYJ20170307151733005 and KQJSCX20170726103424709.

Compliance with ethical standards

Conflict of interest

The authors declare that there are no conflicts of interest in this paper.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

References

  1. Agrawal R, Srikant R (1994a) Quest synthetic data generator. IBM Almaden Research Center. http://www.Almaden.ibm.com/cs/quest/syndata.html
  2. Agrawal R, Srikant R (1994b) Fast algorithms for mining association rules in large databases. In: The international conference on very large data base. pp 487–499Google Scholar
  3. Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM international conference on management of data, vol 29. pp 439–450Google Scholar
  4. Cheng P, Lee I, Lin CW, Pan JS (2016) Association rule hiding based on evolutionary multi-objective optimization. Intell Data Anal 20(3):495–514CrossRefGoogle Scholar
  5. Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering. pp 106–114Google Scholar
  6. Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: The international conference on database systems for advanced applications. pp 185–194Google Scholar
  7. Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–347CrossRefGoogle Scholar
  8. Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: The international workshop on information hiding. pp 369–383Google Scholar
  9. Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197CrossRefGoogle Scholar
  10. Derigs U, Kabath M, Zils M (1999) Adaptive genetic algorithms: a methodology for dynamic autoconfiguration of genetic search algorithms. Meta-Heuristics. pp 231–248Google Scholar
  11. Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, vol 3876. pp 265–284Google Scholar
  12. Emmerich M, Beume N, Naujoks B (2005) An EMO algorithm using the hypervolume measure as selection criterion. In: The international conference on evolutionary multi-criterion optimization. pp 62–76Google Scholar
  13. Emmerich MTM, Deutz AH (2018) A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Nat Comput 17(3):585–609MathSciNetCrossRefGoogle Scholar
  14. Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: The international conference on genetic algorithms. pp 416–423Google Scholar
  15. Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z (2016) The SPMF open-source data mining library version 2. In: Joint European conference on machine learning and knowledge discovery in databases. pp 36–40Google Scholar
  16. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc, BostonzbMATHGoogle Scholar
  17. Han S, Ng WK (2007) Privacy-preserving genetic algorithms for rule discovery. In: The international conference on data warehousing and knowledge discovery. pp 407–417Google Scholar
  18. Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87MathSciNetCrossRefGoogle Scholar
  19. Hasan ASMT, Jiang Q, Chen H, Wang S (2018) A new approach to privacy-preserving multiple independent data publishing. Appl Sci 8(5):1–22CrossRefGoogle Scholar
  20. Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press, CambridgeCrossRefGoogle Scholar
  21. Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129CrossRefzbMATHGoogle Scholar
  22. Hong TP, Lin CW, Yang KT, Wang SL (2012) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510CrossRefGoogle Scholar
  23. Hongcheng T (2012) An improved adaptive genetic algorithm. In: Knowledge discovery and data mining. pp 717–723Google Scholar
  24. Kalyani G, Chandra Sekhara Rao MVP, Janakiramaiah B (2017) Decision tree based data reconstruction for privacy preserving classification rule mining. Informatica 41:289–304MathSciNetGoogle Scholar
  25. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: IEEE international conference on neural networks. pp 1942–1948Google Scholar
  26. Knowles J, Corne D (1999) The pareto archived evolution strategy: a new baseline algorithm for pareto multiobjective optimisation. In: The congress on evolutionary computation. pp 98–105Google Scholar
  27. Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4:201–227Google Scholar
  28. Lin CW, Zhang B, Yang KT, Hong TP (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. Sci World J 398269:1–13Google Scholar
  29. Lin CW, Hong TP, Yang KT, Wang SL (2015) The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230CrossRefGoogle Scholar
  30. Lin JCW, Liu Q, Fournier-Viger P (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53(C):1–18Google Scholar
  31. Lin JCW, Yang L, Fournier-Viger P, Hong TP (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238CrossRefGoogle Scholar
  32. Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: The annual international cryptology conference on advances in cryptology. pp 36–54Google Scholar
  33. Liu F, Li T (2018) A clustering k-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 4945152:1–8Google Scholar
  34. Marco D, Sabrina O, Thomas S (2004) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39Google Scholar
  35. Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582CrossRefGoogle Scholar
  36. Motlagh FN, Sajedi H (2016) MOSAR: a multi-objective strategy for hiding sensitive association rules using genetic algorithm. Appl Artif Intell 30(9):823–843CrossRefGoogle Scholar
  37. Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining. pp 43–54Google Scholar
  38. Ping G, Chunbo X, Yi C, Jing L, Yanqing L (2014) Adaptive ant colony optimization algorithm. In: The international conference on mechatronics and control. pp 95–98Google Scholar
  39. Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: The international conference on genetic algorithms, vol 2, no 1. pp 93–100Google Scholar
  40. Srinivas N, Deb K (1994) Multiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248CrossRefGoogle Scholar
  41. Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359MathSciNetCrossRefzbMATHGoogle Scholar
  42. Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record 33:50–57CrossRefGoogle Scholar
  43. Wu YH, Chiang CM, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19:29–42CrossRefGoogle Scholar
  44. Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4):257–271CrossRefGoogle Scholar
  45. Zitzler E, Laumanns M, Thiele L (2001) SPEA2: improving the strength Pareto evolutionary algorithm. In: Evolutionary methods for design, optimization and control with applications to industrial problems. pp 95–100Google Scholar
  46. Zhan ZH, Zhang J, Li Y, Chung HSH (2009) Adaptive particle swarm optimization. IEEE Trans Syst Man Cybern B 39(6):1362–1381CrossRefGoogle Scholar
  47. Zhang Q, Li H (2007) MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Jerry Chun-Wei Lin
    • 1
    • 2
    Email author
  • Yuyu Zhang
    • 1
  • Binbin Zhang
    • 3
    • 4
  • Philippe Fournier-Viger
    • 5
  • Youcef Djenouri
    • 6
  1. 1.School of Computer Science and TechnologyHarbin Institute of Technology (Shenzhen)ShenzhenChina
  2. 2.Department of Computing, Mathematics, and PhysicsWestern Norway University of Applied SciencesBergenNorway
  3. 3.Department of Biochemistry and Molecular BiologyShenzhen University Health Science CenterShenzhenChina
  4. 4.Center for Anti-Aging and Regenerative MedicineShenzhen University Health Science CenterShenzhenChina
  5. 5.School of Natural Sciences and HumanitiesHarbin Institute of Technology Shenzhen Graduate SchoolShenzhenChina
  6. 6.IMADA, Southern Denmark UniversityOdenseDenmark

Personalised recommendations