Mining top-k high average-utility itemsets based on breadth-first search

Liu, Xuan; Chen, Genlang; Wu, Fangyu; Wen, Shiting; Zuo, Wanli

doi:10.1007/s10489-023-05076-4

Mining top-k high average-utility itemsets based on breadth-first search

Published: 27 October 2023

Volume 53, pages 29319–29337, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xuan Liu¹,
Genlang Chen¹,
Fangyu Wu ORCID: orcid.org/0000-0001-9618-8965²,
Shiting Wen¹ &
…
Wanli Zuo³

246 Accesses
Explore all metrics

Abstract

High average-utility itemset mining is a subfield of data mining that has extensive practical applications. However, it is difficult for users to determine a proper minimum threshold because they cannot accurately predict the number of patterns mined at a given threshold. To address this issue, top-k high average-utility itemset mining has been proposed where k is the number of high average-utility itemsets to be mined. In this paper, we design an effective algorithm (named ETAUIM) for finding top-k high average-utility itemsets. ETAUIM employs a breadth-first search strategy to efficiently explore the search space, and it utilizes a tighter upper bound instead of the average-utility upper bound to limit the search space. Additionally, ETAUIM removes irrelevant items during the mining process and utilizes an early abandoning strategy to terminate unnecessary join operations in advance. To evaluate the proposed algorithm, extensive experiments were conducted on six sparse datasets and two dense datasets. Four state-of-the-art algorithms were used for comparison. The experimental results show that ETAUIM has excellent performance and scalability. Moreover, ETAUIM always performs better for sparse datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HyperSegRec: enhanced hypergraph-based recommendation system with user segmentation and item similarity learning

Article 03 June 2024

K-DBSCAN: An improved DBSCAN algorithm for big data

Article 26 November 2020

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Article Open access 22 February 2023

Data availability

The data utilized in this study are available from the SPMF Open-Source Data Mining Library.

References

Liu H, Liu T, Chen Y et al (2022) EHPE: Skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimedia 1:12
Google Scholar
Liu T, Wang J, Yang B et al (2021) NGDNet: Nonuniform Gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220
Google Scholar
Liu H, Fang S, Zhang Z et al (2021) MFDNet: Collaborative poses perception and matrix Fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460
Google Scholar
Liu H, Zhang C, Deng Y et al (2023) TransIFC: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Trans Multimedia 1:14
Google Scholar
Liu T, Liu H, Yang B et al (2023) LDCNet: Limb Direction Cues-aware Network for Flexible Human Pose Estimation in Industrial Behavioral Biometrics Systems. IEEE Trans Ind Inf 1:11
Google Scholar
Liu H, Liu T, Zhang Z et al (2022) Arhpe: Asymmetric relation-aware representation learning for head pose estimation in industrial human–computer interaction. IEEE Trans Industr Inf 18(10):7107–7117
Google Scholar
Luna JM, Fournier-Viger P, Sebastián V (2019) Frequent itemset mining: A 25 years review. Wiley Interdiscip Rev: Data Mining and Knowledge Discovery 9(6):e1329
Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases 1215:487–499
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Google Scholar
Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. Proceedings of the 21th International Conference on Very Large Data Bases 432–444.
Han JW, Pei J, Yin YW et al (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
MathSciNet Google Scholar
Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, 126:1–11
Grahne G, Zhu JF (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Google Scholar
Tseng VS, Shie B-E, Wu C-W et al (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Google Scholar
Lan G-C, Hong T-P, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Google Scholar
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Google Scholar
Liu J, Wang K, Fung BCM (2016) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 28(5):1245–1257
Google Scholar
Krishnamoorthy S (2017) HMiner: Efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Google Scholar
Peng AY, Koh YS, Riddle P (2017) mHUIMiner: A fast high utility itemset mining algorithm for sparse datasets. Proceedings of the 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining 196–207
Nawaz MS, Fournier-Viger P, Yun U et al (2022) Mining high utility itemsets with Hill climbing and simulated annealing. ACM Trans Manag Inf Syst 13(1):1–22
Google Scholar
Gan W, Lin JC-W, Fournier-Viger P et al (2021) A survey of utility-oriented pattern mining. IEEE Trans Knowl Data Eng 33(4):1306–1327
Google Scholar
Choi H-J, Park CH (2019) Emerging topic detection in twitter stream based on high utility pattern mining. Expert Syst Appl 115:27–36
Google Scholar
Vu HQ, Li G, Law R (2020) Discovering highly profitable travel patterns by high-utility pattern mining. Tour Manage 77:104008
Google Scholar
Singh K, Kumar R, Biswas B (2022) High average-utility itemsets mining: a survey. Appl Intell 52(4):3901–3938
Google Scholar
Hong T-P, Lee C-H, Wang S-L (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Google Scholar
Lan G-C, Hong T-P, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209
Google Scholar
Lan G-C, Hong T-P, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Mak 11(05):1009–1030
Google Scholar
Lin C-W, Hong T-P, Lu W-H (2010) Efficiently mining high average utility itemsets with a tree structure. asian conference on intelligent information and database systems 131–139
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12
Google Scholar
Yildirim I, Celik M (2019) An Efficient Tree-Based Algorithm for Mining High Average-Utility Itemset. IEEE Access 7:144245–144263
Google Scholar
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. Proceedings of the 21st ACM International Conference on Information and Knowledge Management 55–64
Lin JC-W, Li T, Fournier-Viger P et al (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243
Google Scholar
Lin JC-W, Ren S, Fournier-Viger P et al (2017) A fast algorithm for mining high average-utility itemsets. Appl Intell 47(2):331–346
Google Scholar
Lin JC-W, Ren S, Fournier-Viger P et al (2017) EHAUPM: Efficient high average-utility pattern mining with tighter upper bounds. IEEE Access 5:12927–12940
Google Scholar
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Futur Gener Comput Syst 68:346–360
Google Scholar
Sethi KK, Ramesh D (2020) A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure. J Supercomput 76(12):10288–10318
Google Scholar
Kim H, Yun U, Baek Y et al (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci 543:85–105
Google Scholar
Song W, Liu L, Huang C (2021) Generalized maximal utility for mining high average-utility itemsets. Knowl Inf Syst 63(11):2947–2967
Google Scholar
Li G, Shang T, Zhang Y (2023) Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure. Appl Intell 53(5):6099–6118
Google Scholar
Wu CW, Shie B-E, Tseng VS et al. (2012) Mining top-k high utility itemsets. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining 78–86
Tseng VS, Wu CW, Fournier Viger P et al (2016) Efficient algorithms for mining Top-K high htility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Google Scholar
Tseng VS, Wu CW, Shie BE et al. (2010) UP-Growth: An efficient algorithm for high utility itemset mining. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 253–262
Duong Q-H, Liao B, Fournier-Viger P et al (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122
Google Scholar
Singh K, Singh SS, Kumar A et al (2019) TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49:1078–1097
Google Scholar
Zida S, Fournier-Viger P, Lin JC-W et al (2017) EFIM: A fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625
Google Scholar
Krishnamoorthy S (2019) Mining top-k high utility itemsets with effective threshold raising strategies. Expert Syst Appl 117:148–165
Google Scholar
Luna JM, Kiran RU, Fournier-Viger P et al (2023) Efficient mining of top-k high utility itemsets through genetic algorithms. Inf Sci 624:529–553
Google Scholar
Gan W, Wan S, Chen J et al (2020) TopHUI: Top-k high-utility itemset mining with negative utility. IEEE Int Conf Big Data (Big Data) 2020:5350–5359
Google Scholar
Sun R, Han M, Zhang C et al (2021) Mining of top-k high utility itemsets with negative utility. J Intell Fuzzy Syst 40(3):5637–5652
Google Scholar
Sun R, Han M, Zhang C et al (2021) Algorithm for mining top-k high utility itemsets with negative items. J Comp App 41(8):2386
Google Scholar
Ashraf M, Abdelkader T, Rady S et al (2022) TKN: An efficient approach for discovering top-k high utility itemsets with positive or negative profits. Inf Sci 587:654–678
Google Scholar
Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138–161
MathSciNet MATH Google Scholar
Dawar S, Sharma V, Goyal V (2017) Mining top-k high-utility itemsets from a data stream under sliding window model. Appl Intell 47:1240–1255
Google Scholar
Cheng H, Han M, Zhang N et al (2021) ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model. J Intell Fuzzy Syst 41(2):3317–3338
Google Scholar
Wu R, He Z (2018) Top-k high average-utility itemsets mining with effective pruning strategies. Appl Intell 48(10):3429–3445
Google Scholar
Liu X, Chen G, Zuo W (2022) Effective algorithms to mine skyline frequent-utility itemsets. Eng Appl Artif Intell 116:105355
Google Scholar
Fournier-Viger P, Lin J C W, Gomariz A, et al. (2016) The SPMF open-source data mining library version 2. Proceedings of 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III, 36–40

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Zhejiang Province (LQ21F030010); Natural Science Foundation of Ningbo (202003N4306); the Public Welfare Foundation of Ningbo (2021S108); the Key Technology R&D Program of Ningbo (2022Z149); Ningbo Science and Technology Special Innovation Projects (2021Z079, 2022Z235).

Author information

Authors and Affiliations

School of Computer and Data Engineering, NingboTech University, Ningbo, 315100, Zhejiang, China
Xuan Liu, Genlang Chen & Shiting Wen
School of Advanced Technology, Xian Jiaotong-Liverpool University, Suzhou, 215028, Jiangsu, China
Fangyu Wu
School of Mechanical Engineering and Mechanics, Ningbo University, Ningbo, 315211, Zhejiang, China
Wanli Zuo

Authors

Xuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Genlang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fangyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shiting Wen
View author publications
You can also search for this author in PubMed Google Scholar
Wanli Zuo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xuan Liu: Conceptualization, Methodology, Writing—original draft. Genlang Chen: Supervision, Project administration. Fangyu Wu: Data curation, Software. Shiting Wen: Validation, Writing—Review & Editing. Wanli Zuo: Writing—Review & Editing.

Corresponding author

Correspondence to Fangyu Wu.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests related to this research.

Ethical and informed consent for data used

The data used in this study were obtained through publicly available sources, and no ethical or informed consent considerations were required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liu, X., Chen, G., Wu, F. et al. Mining top-k high average-utility itemsets based on breadth-first search. Appl Intell 53, 29319–29337 (2023). https://doi.org/10.1007/s10489-023-05076-4

Download citation

Accepted: 02 October 2023
Published: 27 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05076-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining top-k high average-utility itemsets based on breadth-first search

Abstract

Access this article

Similar content being viewed by others

HyperSegRec: enhanced hypergraph-based recommendation system with user segmentation and item similarity learning

K-DBSCAN: An improved DBSCAN algorithm for big data

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining top-k high average-utility itemsets based on breadth-first search

Abstract

Access this article

Similar content being viewed by others

HyperSegRec: enhanced hypergraph-based recommendation system with user segmentation and item similarity learning

K-DBSCAN: An improved DBSCAN algorithm for big data

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation