Abstract
High average-utility itemset mining is a subfield of data mining that has extensive practical applications. However, it is difficult for users to determine a proper minimum threshold because they cannot accurately predict the number of patterns mined at a given threshold. To address this issue, top-k high average-utility itemset mining has been proposed where k is the number of high average-utility itemsets to be mined. In this paper, we design an effective algorithm (named ETAUIM) for finding top-k high average-utility itemsets. ETAUIM employs a breadth-first search strategy to efficiently explore the search space, and it utilizes a tighter upper bound instead of the average-utility upper bound to limit the search space. Additionally, ETAUIM removes irrelevant items during the mining process and utilizes an early abandoning strategy to terminate unnecessary join operations in advance. To evaluate the proposed algorithm, extensive experiments were conducted on six sparse datasets and two dense datasets. Four state-of-the-art algorithms were used for comparison. The experimental results show that ETAUIM has excellent performance and scalability. Moreover, ETAUIM always performs better for sparse datasets.
Similar content being viewed by others
Data availability
The data utilized in this study are available from the SPMF Open-Source Data Mining Library.
References
Liu H, Liu T, Chen Y et al (2022) EHPE: Skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimedia 1:12
Liu T, Wang J, Yang B et al (2021) NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220
Liu H, Fang S, Zhang Z et al (2021) MFDNet: Collaborative poses perception and matrix Fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460
Liu H, Zhang C, Deng Y et al (2023) TransIFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Trans Multimedia 1:14
Liu T, Liu H, Yang B et al (2023) LDCNet: Limb Direction Cues-aware Network for Flexible Human Pose Estimation in Industrial Behavioral Biometrics Systems. IEEE Trans Ind Inf 1:11
Liu H, Liu T, Zhang Z et al (2022) Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human–computer interaction. IEEE Trans Industr Inf 18(10):7107–7117
Luna JM, Fournier-Viger P, Sebastián V (2019) Frequent itemset mining: A 25 years review. Wiley Interdiscip Rev: Data Mining and Knowledge Discovery 9(6):e1329
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases 1215:487–499
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. Proceedings of the 21th International Conference on Very Large Data Bases 432–444.
Han JW, Pei J, Yin YW et al (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, 126:1–11
Grahne G, Zhu JF (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Tseng VS, Shie B-E, Wu C-W et al (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Lan G-C, Hong T-P, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Liu J, Wang K, Fung BCM (2016) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 28(5):1245–1257
Krishnamoorthy S (2017) HMiner: Efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Peng AY, Koh YS, Riddle P (2017) mHUIMiner: A fast high utility itemset mining algorithm for sparse datasets. Proceedings of the 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining 196–207
Nawaz MS, Fournier-Viger P, Yun U et al (2022) Mining high utility itemsets with Hill climbing and simulated annealing. ACM Trans Manag Inf Syst 13(1):1–22
Gan W, Lin JC-W, Fournier-Viger P et al (2021) A survey of utility-oriented pattern mining. IEEE Trans Knowl Data Eng 33(4):1306–1327
Choi H-J, Park CH (2019) Emerging topic detection in twitter stream based on high utility pattern mining. Expert Syst Appl 115:27–36
Vu HQ, Li G, Law R (2020) Discovering highly profitable travel patterns by high-utility pattern mining. Tour Manage 77:104008
Singh K, Kumar R, Biswas B (2022) High average-utility itemsets mining: a survey. Appl Intell 52(4):3901–3938
Hong T-P, Lee C-H, Wang S-L (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Lan G-C, Hong T-P, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209
Lan G-C, Hong T-P, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Mak 11(05):1009–1030
Lin C-W, Hong T-P, Lu W-H (2010) Efficiently mining high average utility itemsets with a tree structure. asian conference on intelligent information and database systems 131–139
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12
Yildirim I, Celik M (2019) An Efficient Tree-Based Algorithm for Mining High Average-Utility Itemset. IEEE Access 7:144245–144263
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. Proceedings of the 21st ACM International Conference on Information and Knowledge Management 55–64
Lin JC-W, Li T, Fournier-Viger P et al (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243
Lin JC-W, Ren S, Fournier-Viger P et al (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346
Lin JC-W, Ren S, Fournier-Viger P et al (2017) EHAUPM: Efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5:12927–12940
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Futur Gener Comput Syst 68:346–360
Sethi KK, Ramesh D (2020) A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure. J Supercomput 76(12):10288–10318
Kim H, Yun U, Baek Y et al (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci 543:85–105
Song W, Liu L, Huang C (2021) Generalized maximal utility for mining high average-utility itemsets. Knowl Inf Syst 63(11):2947–2967
Li G, Shang T, Zhang Y (2023) Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure. Appl Intell 53(5):6099–6118
Wu CW, Shie B-E, Tseng VS et al. (2012) Mining top-k high utility itemsets. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining 78–86
Tseng VS, Wu CW, Fournier Viger P et al (2016) Efficient algorithms for mining Top-K high htility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Tseng VS, Wu CW, Shie BE et al. (2010) UP-Growth: An efficient algorithm for high utility itemset mining. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 253–262
Duong Q-H, Liao B, Fournier-Viger P et al (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Singh K, Singh SS, Kumar A et al (2019) TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49:1078–1097
Zida S, Fournier-Viger P, Lin JC-W et al (2017) EFIM: A fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625
Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165
Luna JM, Kiran RU, Fournier-Viger P et al (2023) Efficient mining of top-k high utility itemsets through genetic algorithms. Inf Sci 624:529–553
Gan W, Wan S, Chen J et al (2020) TopHUI: Top-k high-utility itemset mining with negative utility. IEEE Int Conf Big Data (Big Data) 2020:5350–5359
Sun R, Han M, Zhang C et al (2021) Mining of top-k high utility itemsets with negative utility. J Intell Fuzzy Syst 40(3):5637–5652
Sun R, Han M, Zhang C et al (2021) Algorithm for mining top-k high utility itemsets with negative items. J Comp App 41(8):2386
Ashraf M, Abdelkader T, Rady S et al (2022) TKN: An efficient approach for discovering top-k high utility itemsets with positive or negative profits. Inf Sci 587:654–678
Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138–161
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47:1240–1255
Cheng H, Han M, Zhang N et al (2021) ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model. J Intell Fuzzy Syst 41(2):3317–3338
Wu R, He Z (2018) Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48(10):3429–3445
Liu X, Chen G, Zuo W (2022) Effective algorithms to mine skyline frequent-utility itemsets. Eng Appl Artif Intell 116:105355
Fournier-Viger P, Lin J C W, Gomariz A, et al. (2016) The SPMF open-source data mining library version 2. Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, 36–40
Acknowledgements
This work is supported by the Natural Science Foundation of Zhejiang Province (LQ21F030010); Natural Science Foundation of Ningbo (202003N4306); the Public Welfare Foundation of Ningbo (2021S108); the Key Technology R&D Program of Ningbo (2022Z149); Ningbo Science and Technology Special Innovation Projects (2021Z079, 2022Z235).
Author information
Authors and Affiliations
Contributions
Xuan Liu: Conceptualization, Methodology, Writing—original draft. Genlang Chen: Supervision, Project administration. Fangyu Wu: Data curation, Software. Shiting Wen: Validation, Writing—Review & Editing. Wanli Zuo: Writing—Review & Editing.
Corresponding author
Ethics declarations
Competing Interests
The authors declare that they have no competing interests related to this research.
Ethical and informed consent for data used
The data used in this study were obtained through publicly available sources, and no ethical or informed consent considerations were required.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liu, X., Chen, G., Wu, F. et al. Mining top-k high average-utility itemsets based on breadth-first search. Appl Intell 53, 29319–29337 (2023). https://doi.org/10.1007/s10489-023-05076-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05076-4