Hiding sensitive itemsets with multiple objective optimization
- 13 Downloads
Abstract
Privacy-preserving data mining (PPDM) has become an important research topic, as it can hide sensitive information, while ensuring that information can still be extracted for decision making. While performing the sanitization progress for hiding the sensitive information, three side effects such as hiding failure, missing cost, and artificial cost happen at the same time. Several evolutionary algorithms were introduced to minimize those three side effects of PPDM using a single-objective function that generates one solution for sanitization. This paper presents a multiobjective algorithm (NSGA2DT) with two strategies for hiding sensitive information with transaction deletion based on the NSGA-II framework. To obtain better balance of side effects, the designed NSGA2DT takes database dissimilarity (Dis) as one more factor to achieve better performance in terms of four side effects. Moreover, instead of a single solution of the sanitization progress, the designed NSGA2DT provides more than one solutions than those of single-objective evolutionary algorithms, which shows flexibility to select the most appropriate transactions for deletion depending on user’s preference. A Fast SoRting strategy (FSR) and the pre-large concept are utilized, respectively, in this paper to find the optimized transactions for deletion and speed up the iterative process. Based on the developed NSGA2DT, the set of several Pareto solutions can be easily discovered, thus avoiding the problem of local optimization of single-objective approaches. Besides, the designed NSGA2DT does not require to set initial weights for evaluating the side effects, and thus, the results could not be seriously influenced by the predefined weights. Experimental results show that the proposed NSGA2DT provides satisfactory results with reduced side effects, compared to previous evolutionary approaches with single-objective function.
Keywords
PPDM Sanitization Evolutionary computation Pre-large concept Pareto solutionsNotes
Acknowledgements
This research was partially supported by the Shenzhen Technical Project under JCYJ20170307151733005 and KQJSCX20170726103424709.
Compliance with ethical standards
Conflict of interest
The authors declare that there are no conflicts of interest in this paper.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
References
- Agrawal R, Srikant R (1994a) Quest synthetic data generator. IBM Almaden Research Center. http://www.Almaden.ibm.com/cs/quest/syndata.html
- Agrawal R, Srikant R (1994b) Fast algorithms for mining association rules in large databases. In: The international conference on very large data base. pp 487–499Google Scholar
- Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM international conference on management of data, vol 29. pp 439–450Google Scholar
- Cheng P, Lee I, Lin CW, Pan JS (2016) Association rule hiding based on evolutionary multi-objective optimization. Intell Data Anal 20(3):495–514CrossRefGoogle Scholar
- Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering. pp 106–114Google Scholar
- Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: The international conference on database systems for advanced applications. pp 185–194Google Scholar
- Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–347CrossRefGoogle Scholar
- Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: The international workshop on information hiding. pp 369–383Google Scholar
- Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197CrossRefGoogle Scholar
- Derigs U, Kabath M, Zils M (1999) Adaptive genetic algorithms: a methodology for dynamic autoconfiguration of genetic search algorithms. Meta-Heuristics. pp 231–248Google Scholar
- Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, vol 3876. pp 265–284Google Scholar
- Emmerich M, Beume N, Naujoks B (2005) An EMO algorithm using the hypervolume measure as selection criterion. In: The international conference on evolutionary multi-criterion optimization. pp 62–76Google Scholar
- Emmerich MTM, Deutz AH (2018) A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Nat Comput 17(3):585–609MathSciNetCrossRefGoogle Scholar
- Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: The international conference on genetic algorithms. pp 416–423Google Scholar
- Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z (2016) The SPMF open-source data mining library version 2. In: Joint European conference on machine learning and knowledge discovery in databases. pp 36–40Google Scholar
- Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc, BostonzbMATHGoogle Scholar
- Han S, Ng WK (2007) Privacy-preserving genetic algorithms for rule discovery. In: The international conference on data warehousing and knowledge discovery. pp 407–417Google Scholar
- Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87MathSciNetCrossRefGoogle Scholar
- Hasan ASMT, Jiang Q, Chen H, Wang S (2018) A new approach to privacy-preserving multiple independent data publishing. Appl Sci 8(5):1–22CrossRefGoogle Scholar
- Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press, CambridgeCrossRefGoogle Scholar
- Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129CrossRefzbMATHGoogle Scholar
- Hong TP, Lin CW, Yang KT, Wang SL (2012) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510CrossRefGoogle Scholar
- Hongcheng T (2012) An improved adaptive genetic algorithm. In: Knowledge discovery and data mining. pp 717–723Google Scholar
- Kalyani G, Chandra Sekhara Rao MVP, Janakiramaiah B (2017) Decision tree based data reconstruction for privacy preserving classification rule mining. Informatica 41:289–304MathSciNetGoogle Scholar
- Kennedy J, Eberhart R (1995) Particle swarm optimization. In: IEEE international conference on neural networks. pp 1942–1948Google Scholar
- Knowles J, Corne D (1999) The pareto archived evolution strategy: a new baseline algorithm for pareto multiobjective optimisation. In: The congress on evolutionary computation. pp 98–105Google Scholar
- Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4:201–227Google Scholar
- Lin CW, Zhang B, Yang KT, Hong TP (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. Sci World J 398269:1–13Google Scholar
- Lin CW, Hong TP, Yang KT, Wang SL (2015) The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230CrossRefGoogle Scholar
- Lin JCW, Liu Q, Fournier-Viger P (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53(C):1–18Google Scholar
- Lin JCW, Yang L, Fournier-Viger P, Hong TP (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238CrossRefGoogle Scholar
- Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: The annual international cryptology conference on advances in cryptology. pp 36–54Google Scholar
- Liu F, Li T (2018) A clustering k-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 4945152:1–8Google Scholar
- Marco D, Sabrina O, Thomas S (2004) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39Google Scholar
- Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582CrossRefGoogle Scholar
- Motlagh FN, Sajedi H (2016) MOSAR: a multi-objective strategy for hiding sensitive association rules using genetic algorithm. Appl Artif Intell 30(9):823–843CrossRefGoogle Scholar
- Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining. pp 43–54Google Scholar
- Ping G, Chunbo X, Yi C, Jing L, Yanqing L (2014) Adaptive ant colony optimization algorithm. In: The international conference on mechatronics and control. pp 95–98Google Scholar
- Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: The international conference on genetic algorithms, vol 2, no 1. pp 93–100Google Scholar
- Srinivas N, Deb K (1994) Multiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248CrossRefGoogle Scholar
- Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359MathSciNetCrossRefzbMATHGoogle Scholar
- Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record 33:50–57CrossRefGoogle Scholar
- Wu YH, Chiang CM, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19:29–42CrossRefGoogle Scholar
- Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4):257–271CrossRefGoogle Scholar
- Zitzler E, Laumanns M, Thiele L (2001) SPEA2: improving the strength Pareto evolutionary algorithm. In: Evolutionary methods for design, optimization and control with applications to industrial problems. pp 95–100Google Scholar
- Zhan ZH, Zhang J, Li Y, Chung HSH (2009) Adaptive particle swarm optimization. IEEE Trans Syst Man Cybern B 39(6):1362–1381CrossRefGoogle Scholar
- Zhang Q, Li H (2007) MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731CrossRefGoogle Scholar