Abstract
Privacy preserving utility mining (PPUM) aims to solve the problem of sensitive information leakage in utility pattern mining. In recent years, researchers have proposed algorithms to solve the privacy-preserving problem. However, these algorithms have high side effects, long sanitization time, and computational complexity. Although the FPUTT algorithm reduces the number of database scans, tree construction and traversal still take much time. The paper proposes a fast utility-list dictionary algorithm (FULD). The utility-list dictionary consists of all sensitive items. Through dictionary lookup, sensitive items can be found and modified. In addition, the novel concepts of SINS and tns are proposed to reduce the side effects of the algorithm. In this paper, the experiments show that the FULD algorithm has good performance, such as running time and side effects. The running time of the FULD is 15–20 times shorter than the FPUTT algorithm. It performs well both on sparse and dense datasets.
Similar content being viewed by others
References
Chen M-S, Han J, Philip SY (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Yu PS (2019) A survey of parallel sequential pattern mining. ACM Transac Knowl Disc Data (TKDD) 13(3):1–34
Mannila H, Toivonen H, Verkamo IA (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Disc 1(3):259–289
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p. 207–216
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Patt Recog 1(1):54–77
Koh YS, Ravana SD (2016) Unsupervised rare pattern mining: a survey. ACM Transac Knowl Disc Data (TKDD) 10(4):1–29
Fournier-Viger P, Lin JC-W, Vo B, Chi TT, Zhang J, Le HB (2017) A survey of itemset mining. Wiley Interdisciplinary Reviews. Data Min Knowl Disc 7(4):e1207
Yao H, Hamilton HJ, Geng L (2006) A unified framework for utility-based measures for mining itemsets. In: Proc. of ACM SIGKDD 2nd Workshop on Utility-Based Data Mining, pages 28–37. Citeseer
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: A survey. ACM Comput Surv (CSUR) 38(3):9–es
Tan P-N, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313
McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowl Eng Rev 20(1):39–61
Hilderman RJ, Hamilton HJ (2003) Measuring the interestingness of discovered knowledge: A principled approach. Int Data Analy 7(4):347–382
Silberschatz A, Tuzhilin A (1995) On subjective measures of interestingness in knowledge discovery. In: KDD, volume 95, pp. 275–281
Dwork C (2006) Differential privacy. In: International Colloquium on Automata, Languages, and Programming, pp. 1–12, Springer
Gentry C (2009) Fully homomorphic encryption using ideal lattices. In: Proceedings of the forty-first annual ACM symposium on Theory of computing, pp. 169–178
Weng J, Weng J, Zhang J, Li M, Zhang Y, Luo W (2019) Deepchain: Auditable and privacy-preserving deep learning with blockchain-based incentive. IEEE Transactions on Dependable and Secure Computing
Yeh J-S, Hsu P-C (2010) Hhuif and msicf: Novel algorithms for privacy preserving utility mining. Expert Syst Appl 37(7):4779–4786
Lin JC-W, Gan W, Fournier-Viger P, Hong T-P, Tseng VS (2016) Fast algorithms for mining high-utility itemsets with various discount strategies. Adv Eng Inform 30(2):109–126
Yun U, Kim J (2015) A fast perturbation algorithm using tree structure for privacy preserving utility mining. Expert Syst Appl 42(3):1149–1165
Li S, Nankun M, Le J, Liao X (2019) A novel algorithm for privacy preserving utility mining based on integer linear programming. Eng Appl Artif Intell 81:300–312
Lin JC-W, Djenouri Y, Srivastava G, Fourier-Viger P (2022) Efficient evolutionary computation model of closed high-utility itemset mining. Appl Intell, p. 1–13
Gan W, Lin JC-W, Fournier-Viger P, Chao H-C, Tseng VS, Philip SY (2021) A survey of utility-oriented pattern mining. IEEE Trans Knowl Data Eng 33(4):1306–1327
Lin JC-W, Djenouri Y, Srivastava G (2021) Efficient closed high-utility pattern fusion model in large-scale databases. Inform Fusion 76:122–132
Lin JC-W, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive ga-based model for closed high-utility itemset mining. Appl Soft Comput, 108:107422
Kim H, Ryu T, Lee C, Kim H, Yoon E, Vo B, Lin JC-W, Yun U (2022) Ehmin: Efficient approach of list based high-utility pattern mining with negative unit profits. Expert Syst Appl, 209:118214
Lee C, Baek Y, Ryu T, Kim H, Kim H, Lin JC-W, Vo B, Yun U (2022) An efficient approach for mining maximized erasable utility patterns. Inf Sci 609:1288–1308
Ryu T, Yun U, Lee C, Lin JC-W, Pedrycz W (2022) Occupancy-based utility pattern mining in dynamic environments of intelligent systems. Int J Intell Syst 37(9):5477–5507
Jianying H, Mojsilovic A (2007) High-utility pattern mining: A method for discovery of high-utility item sets. Pattern Recogn 40(11):3317–3324
Lin C-W, Hong T-P, Wen-Hsiang L (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Zida S, Fournier-Viger P, Lin JC-W, Cheng-Wei W, Tseng VS (2015) Efim: a highly efficient algorithm for high-utility itemset mining. In: Mexican international conference on artificial intelligence, pp. 530–546. Springer
Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: 2012 IEEE 12th international conference on data mining, pages 984–989. IEEE
Kim H, Yun U, Baek Y, Kim H, Nam H, Lin JC-W, Fournier-Viger P (2021) Damped sliding based utility oriented pattern mining over stream data. Knowl-Based Syst, 213, p. 106653
Baek Y, Yun U, Kim H, Kim J, Vo B, Truong T, Deng Z-H (2021) Approximate high utility itemset mining in noisy environments. Knowl-Based Syst, 212:106596
Hong T-P, Lin C-W, Yang K-T, Wang S-L (2013) Using tf-idf to hide sensitive itemsets. Appl Intell, 38:502–510
Jangra S, Toshniwal D (2022) Efficient algorithms for victim item selection in privacy-preserving utility mining. Futur Gener Comput Syst, 128, pp. 219–234
Lin JC-W, Fournier-Viger P, Wu L, Gan W, Djenouri Y, Zhang J (2018) Ppsf: An open-source privacy-preserving and security mining framework. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 1459–1463
Jimmy Ming-Tai W, Srivastava G, Jolfaei A, Pirouz M, Lin JC-W (2021) Security and privacy in shared hitlcps using a ga-based multiple-threshold sanitization model. IEEE Transactions on Emerging Topics in Comput Intell
Lin JC-W, Srivastava G, Zhang Y, Djenouri Y, Aloqaily M (2020) Privacy-preserving multiobjective sanitization model in 6g iot environments. IEEE Internet Things J 8(7):5340–5349
Dinh T, Quang MN, Le B (2015) A novel approach for hiding high utility sequential patterns. In: Proceedings of the Sixth International Symposium on Information and Communication Technology, pp. 121–128
Quang MN, Huynh U, Dinh T, Le NH, Le B (2016) An approach to decrease execution time and difference for hiding high utility sequential patterns. In: International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making, pp. 435–446. Springer
Lin JC-W, Liu Q, Fournier-Viger P, Hong T-P, Voznak M, Zhan J (2016) A sanitization approach for hiding sensitive itemsets based on particle swarm optimization. Eng Appl Artif Intell, 53:1–18
Duong Q-H, Fournier-Viger P, Ramampiaro H, Nørvåg K, Dam T-L (2018) Efficient high utility itemset mining using buffered utility-lists. Appl Intell 48(7):1859–1877
Ge Z, Song Z, Ding SX, Huang B (2017) Data mining and analytics in the process industry: The role of machine learning. Ieee Access, 5, pp. 20590–20616
Tassa T (2013) Secure mining of association rules in horizontally distributed databases. IEEE Trans Knowl Data Eng 26(4):970–983
Acknowledgments
The authors are sincerely grateful to the editor and anonymous reviewers for their insightful and constructive comments, which helped us improve this work greatly.
Funding
This research was funded by the National Natural Science Foundation of China (61772282). It was also supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX21 1008).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Chunyong Yin and Ying Li. The first draft of the manuscript was written by Ying Li and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interest or personal relationship that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yin, C., Li, Y. Fast privacy-preserving utility mining algorithm based on utility-list dictionary. Appl Intell 53, 29363–29377 (2023). https://doi.org/10.1007/s10489-023-04791-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04791-2