Abstract
High utility pattern has received a lot of research and attention because of their wide range of application scenarios. How to efficiently mine high utility patterns over data streams has become an important issue in the field of data mining. To solve the problem that the traditional utility list structure has too many join operations and the join operation is not efficient, which leads to the low spatio-temporal efficiency of the algorithm and the problem that the sliding window model repeatedly generates the same resultset, a new algorithm for high utility pattern mining over data streams is proposed, named HUPM_Stream. A location-indexed list structure, Ext-list, is designed to reduce the time complexity of the utility list join operation, and an improved remaining utility pruning strategy IRS is proposed to reduce the number of utility list join operations, and a hash table structure-based resultset maintenance strategy HRS is designed to effectively reduce the search space of the algorithm and avoid repeatedly generating the same resultset during the sliding process of the window. A large number of experimental results show that the proposed algorithm has better performance on dense datasets.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Yun U, Ryang H, Lee G, Fujita H (2017) An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl-Based Syst 124:188–206
Dawar S, Goyal V (2015) UP-Hist tree: an efficient data structure for mining high utility patterns from transaction databases. In: Proceedings of the 19th international database engineering & applications symposium, pp 56–61
Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining: 9th Pacific-Asia Conference, PAKDD 2005, Hanoi, Vietnam, May 18-20, 2005. Proceedings 9. Springer Berlin Heidelberg, pp 689–695
Tseng VS, Wu CW, Shie BE et al (2010) UP-Growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 253–262
Duong Q-H, Fournier-Viger P, Ramampiaro H, Nørvåg K, Dam T-L (2018) Efficient high utility itemset mining using buffered utility-lists. Appl Intell 48(7):1859–1877
Fournier-Viger P, Wu C W, Zida S et al (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Foundations of Intelligent Systems: 21st International Symposium, ISMIS 2014, Roskilde, Denmark, Proceedings 21. Springer International Publishing, pp 83–92
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 55–64
Wu P, Niu X, Fournier-Viger P, Huang C, Wang B (2022) UBP-miner: an efficient bit based high utility itemset mining algorithm. Knowl-Based Syst 248:108865
Lan G-C, Hong T-P, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Sohrabi MK (2020) An efficient projection-based method for high utility itemset mining using a novel pruning approach on the utility matrix. Knowl Inf Syst 62(11):4141–4167
Zida S, Fournier-Viger P, Lin JC-W, Wu C-W, Tseng VS (2017) EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625
Dam T-L, Li K, Fournier-Viger P, Duong Q-H (2019) CLS-miner: efficient and effective closed high-utility itemset mining. Front Comput Sci 13(2):357–381
Dam T-L, Ramampiaro H, Nørvåg K, Duong Q-H (2019) Towards efficiently mining closed high utility itemsets from incremental databases. Knowl-Based Syst 165:13–29
Nguyen LT, Vu VV, Lam MT, Duong TT, Manh LT, Nguyen TT, Vo B, Fujita H (2019) An efficient method for mining high utility closed itemsets. Inf Sci 495:78–99
Han M, Zhang N, Wang L, Li X, Cheng H (2022) Mining high utility pattern with negative items in dynamic databases. Int J Intell Syst 37(8):5325–5353
Dam T-L, Li K, Fournier-Viger P, Duong Q-H (2017) An efficient algorithm for mining top-k on-shelf high utility itemsets. Knowl Inf Syst 52(3):621–655
Ahmed CF, Tanbeer SK, Jeong B-S, Choi H-J (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11979–11991
Baek Y, Yun U, Kim H, Nam H, Kim H, Lin JC-W, Vo B, Pedrycz W (2021) Rhups: mining recent high utility patterns with sliding window–based arrival time control over data streams. ACM Trans Intell Syst Technol (TIST) 12(2):1–27
Chen X, Zhai P, Fang Y (2021) High utility pattern mining based on historical data table over data streams. In: 2021 4th International Conference on Data Science and Information Technology, pp 368–376
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47(4):1240–1255
Jaysawal BP, Huang JW (2020) Sohupds: a single-pass one-phase algorithm for mining high utility patterns over a data stream. In: Proceedings of the 35th annual ACM symposium on applied computing, pp 490-497
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231
Yun U, Lee G, Yoon E (2017) Efficient high utility pattern mining for establishing manufacturing plans with sliding window control. IEEE Trans Ind Electron 64(9):7239–7249
Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility Itemsets from transactional databases. IEEE Trans on Knowl Data Eng 25(8):1772–1786
Peng A Y, Koh Y S, Riddle P (2017) mHUIMiner: a fast high utility itemset mining algorithm for sparse datasets[C]. In: Advances in Knowledge Discovery and Data Mining: 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, Proceedings, Part II 21. Springer International Publishing, pp 196–207
Yun U, Nam H, Lee G, Yoon E (2019) Efficient approach for incremental high utility pattern mining with indexed list structure. Futur Gener Comput Syst 95:221–239
Gan W, Lin JC-W, Zhang J, Chao H-C, Fujita H, Philip SY (2020) ProUM: projection-based utility mining on sequence data. Inf Sci 513:222–240
Huynh U, Le B, Dinh D-T, Fujita H (2022) Multi-core parallel algorithms for hiding high-utility sequential patterns. Knowl-Based Syst 237:107793
Truong T, Duong H, Le B, Fournier-Viger P, Yun U, Fujita H (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inf Sci 568:239–264
Kim H, Yun U, Baek Y, Kim J, Vo B, Yoon E, Fujita H (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci 543:85–105
Gan W, Lin JC-W, Chao H-C, Fujita H, Philip SY (2019) Correlated utility-based pattern mining. Inf Sci 504:470–486
Kim D, Yun U (2016) Mining high utility itemsets based on the time decaying model. Intell Data Anal 20(5):1157–1180
Feng L, Wang L, Jin B (2013) UT-tree: efficient mining of high utility itemsets from data streams. Intell Data Anal 17(4):585–602
Nam H, Yun U, Yoon E, Lin JC-W (2020) Efficient approach of recent high utility stream pattern mining with indexed list structure and pruning strategy considering arrival times of transactions. Inf Sci 529:1–27
Wu CW, Fournier-Viger P, Gu JY et al (2015) Mining closed+ high utility itemsets without candidate generation. In: 2015 conference on technologies and applications of artificial intelligence (TAAI). IEEE, pp 187–194
Tseng VS, Wu C-W, Fournier-Viger P, Philip SY (2014) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27(3):726–739
Acknowledgments
This work was supported by the National Natural Science Foundation of China (62062004), the Natural Science Foundation of Ningxia Province (2022AAC03279), and the Graduate Innovation Project of North Minzu University (YCX22195). And We would like to thank Dr. Bijay Prasad Jaysawal for providing the executable file of the SOHUPDS algorithm.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Han, M., Li, M., Chen, Z. et al. High utility pattern mining algorithm over data streams using ext-list.. Appl Intell 53, 27072–27095 (2023). https://doi.org/10.1007/s10489-023-04925-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04925-6