Skip to main content
Log in

Hiding sensitive itemsets with multiple objective optimization

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Privacy-preserving data mining (PPDM) has become an important research topic, as it can hide sensitive information, while ensuring that information can still be extracted for decision making. While performing the sanitization progress for hiding the sensitive information, three side effects such as hiding failure, missing cost, and artificial cost happen at the same time. Several evolutionary algorithms were introduced to minimize those three side effects of PPDM using a single-objective function that generates one solution for sanitization. This paper presents a multiobjective algorithm (NSGA2DT) with two strategies for hiding sensitive information with transaction deletion based on the NSGA-II framework. To obtain better balance of side effects, the designed NSGA2DT takes database dissimilarity (Dis) as one more factor to achieve better performance in terms of four side effects. Moreover, instead of a single solution of the sanitization progress, the designed NSGA2DT provides more than one solutions than those of single-objective evolutionary algorithms, which shows flexibility to select the most appropriate transactions for deletion depending on user’s preference. A Fast SoRting strategy (FSR) and the pre-large concept are utilized, respectively, in this paper to find the optimized transactions for deletion and speed up the iterative process. Based on the developed NSGA2DT, the set of several Pareto solutions can be easily discovered, thus avoiding the problem of local optimization of single-objective approaches. Besides, the designed NSGA2DT does not require to set initial weights for evaluating the side effects, and thus, the results could not be seriously influenced by the predefined weights. Experimental results show that the proposed NSGA2DT provides satisfactory results with reduced side effects, compared to previous evolutionary approaches with single-objective function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Agrawal R, Srikant R (1994a) Quest synthetic data generator. IBM Almaden Research Center. http://www.Almaden.ibm.com/cs/quest/syndata.html

  • Agrawal R, Srikant R (1994b) Fast algorithms for mining association rules in large databases. In: The international conference on very large data base. pp 487–499

  • Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: ACM international conference on management of data, vol 29. pp 439–450

  • Cheng P, Lee I, Lin CW, Pan JS (2016) Association rule hiding based on evolutionary multi-objective optimization. Intell Data Anal 20(3):495–514

    Article  Google Scholar 

  • Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering. pp 106–114

  • Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: The international conference on database systems for advanced applications. pp 185–194

  • Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu MY (2002) Tools for privacy preserving distributed data mining. SIGKDD Explor 4(2):28–347

    Article  Google Scholar 

  • Dasseni E, Verykios VS, Elmagarmid AK, Bertino E (2001) Hiding association rules by using confidence and support. In: The international workshop on information hiding. pp 369–383

  • Deb K, Pratap A, Agrawal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197

    Article  Google Scholar 

  • Derigs U, Kabath M, Zils M (1999) Adaptive genetic algorithms: a methodology for dynamic autoconfiguration of genetic search algorithms. Meta-Heuristics. pp 231–248

  • Dwork C, McSherry F, Nissim K, Smith A (2006) Calibrating noise to sensitivity in private data analysis. In: Theory of cryptography conference, vol 3876. pp 265–284

  • Emmerich M, Beume N, Naujoks B (2005) An EMO algorithm using the hypervolume measure as selection criterion. In: The international conference on evolutionary multi-criterion optimization. pp 62–76

  • Emmerich MTM, Deutz AH (2018) A tutorial on multiobjective optimization: fundamentals and evolutionary methods. Nat Comput 17(3):585–609

    Article  MathSciNet  Google Scholar 

  • Fonseca CM, Fleming PJ (1993) Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. In: The international conference on genetic algorithms. pp 416–423

  • Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z (2016) The SPMF open-source data mining library version 2. In: Joint European conference on machine learning and knowledge discovery in databases. pp 36–40

  • Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Longman Publishing Co., Inc, Boston

    MATH  Google Scholar 

  • Han S, Ng WK (2007) Privacy-preserving genetic algorithms for rule discovery. In: The international conference on data warehousing and knowledge discovery. pp 407–417

  • Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  • Hasan ASMT, Jiang Q, Chen H, Wang S (2018) A new approach to privacy-preserving multiple independent data publishing. Appl Sci 8(5):1–22

    Article  Google Scholar 

  • Holland JH (1992) Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control and artificial intelligence. MIT Press, Cambridge

    Book  Google Scholar 

  • Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129

    Article  Google Scholar 

  • Hong TP, Lin CW, Yang KT, Wang SL (2012) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510

    Article  Google Scholar 

  • Hongcheng T (2012) An improved adaptive genetic algorithm. In: Knowledge discovery and data mining. pp 717–723

  • Kalyani G, Chandra Sekhara Rao MVP, Janakiramaiah B (2017) Decision tree based data reconstruction for privacy preserving classification rule mining. Informatica 41:289–304

    MathSciNet  Google Scholar 

  • Kennedy J, Eberhart R (1995) Particle swarm optimization. In: IEEE international conference on neural networks. pp 1942–1948

  • Knowles J, Corne D (1999) The pareto archived evolution strategy: a new baseline algorithm for pareto multiobjective optimisation. In: The congress on evolutionary computation. pp 98–105

  • Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4:201–227

    Google Scholar 

  • Lin CW, Zhang B, Yang KT, Hong TP (2014) Efficiently hiding sensitive itemsets with transaction deletion based on genetic algorithms. Sci World J 398269:1–13

    Google Scholar 

  • Lin CW, Hong TP, Yang KT, Wang SL (2015) The GA-based algorithms for optimizing hiding sensitive itemsets through transaction deletion. Appl Intell 42(2):210–230

    Article  Google Scholar 

  • Lin JCW, Liu Q, Fournier-Viger P (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell 53(C):1–18

    Google Scholar 

  • Lin JCW, Yang L, Fournier-Viger P, Hong TP (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl Artif Intell 77:229–238

    Article  Google Scholar 

  • Lindell Y, Pinkas B (2000) Privacy preserving data mining. In: The annual international cryptology conference on advances in cryptology. pp 36–54

  • Liu F, Li T (2018) A clustering k-anonymity privacy-preserving method for wearable IoT devices. Secur Commun Netw 4945152:1–8

    Google Scholar 

  • Marco D, Sabrina O, Thomas S (2004) Ant colony optimization. IEEE Comput Intell Mag 1(4):28–39

    Google Scholar 

  • Mendes R, Vilela JP (2017) Privacy-preserving data mining: methods, metrics, and applications. IEEE Access 5:10562–10582

    Article  Google Scholar 

  • Motlagh FN, Sajedi H (2016) MOSAR: a multi-objective strategy for hiding sensitive association rules using genetic algorithm. Appl Artif Intell 30(9):823–843

    Article  Google Scholar 

  • Oliveira SRM, Zaïane OR (2002) Privacy preserving frequent itemset mining. In: IEEE international conference on privacy, security and data mining. pp 43–54

  • Ping G, Chunbo X, Yi C, Jing L, Yanqing L (2014) Adaptive ant colony optimization algorithm. In: The international conference on mechatronics and control. pp 95–98

  • Schaffer JD (1985) Multiple objective optimization with vector evaluated genetic algorithms. In: The international conference on genetic algorithms, vol 2, no 1. pp 93–100

  • Srinivas N, Deb K (1994) Multiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248

    Article  Google Scholar 

  • Storn R, Price K (1997) Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359

    Article  MathSciNet  Google Scholar 

  • Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. ACM SIGMOD Record 33:50–57

    Article  Google Scholar 

  • Wu YH, Chiang CM, Chen ALP (2007) Hiding sensitive association rules with limited side effects. IEEE Trans Knowl Data Eng 19:29–42

    Article  Google Scholar 

  • Zitzler E, Thiele L (1999) Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Trans Evol Comput 3(4):257–271

    Article  Google Scholar 

  • Zitzler E, Laumanns M, Thiele L (2001) SPEA2: improving the strength Pareto evolutionary algorithm. In: Evolutionary methods for design, optimization and control with applications to industrial problems. pp 95–100

  • Zhan ZH, Zhang J, Li Y, Chung HSH (2009) Adaptive particle swarm optimization. IEEE Trans Syst Man Cybern B 39(6):1362–1381

    Article  Google Scholar 

  • Zhang Q, Li H (2007) MOEA/D: a multiobjective evolutionary algorithm based on decomposition. IEEE Trans Evol Comput 11(6):712–731

    Article  Google Scholar 

Download references

Acknowledgements

This research was partially supported by the Shenzhen Technical Project under JCYJ20170307151733005 and KQJSCX20170726103424709.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerry Chun-Wei Lin.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest in this paper.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, J.CW., Zhang, Y., Zhang, B. et al. Hiding sensitive itemsets with multiple objective optimization. Soft Comput 23, 12779–12797 (2019). https://doi.org/10.1007/s00500-019-03829-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-03829-3

Keywords

Navigation