Abstract
The identification of both closed frequent high average utility itemsets (CFHAUIs) and generators of frequent high average utility itemsets (GFHAUIs) has substantial significance because they play an essential and concise role in representing frequent high average utility itemsets (FHAUIs). These concise summaries offer a compact yet crucial overview that can be much smaller. In addition, they allow the generation of non-redundant high average utility association rules, a crucial factor for decision-makers to consider. However, difficulty arises from the complexity of discovering these representations, primarily because the average utility function does not satisfy both monotonic and anti-monotonic properties within each equivalence class, that is for itemsets sharing the same subset of transactions. To tackle this challenge, this paper proposes an innovative method for efficiently extracting CFHAUIs and GFHAUIs. This approach introduces novel bounds on the average utility, including a weak lower bound called \(wlbau\) and a lower bound named \(auvlb\). Efficient pruning strategies are also designed with the aim of early elimination of non-closed and/or non-generator FHAUIs based on the \(wlbau\) and \(auvlb\) bounds, leading to quicker execution and lower memory consumption. Additionally, the paper introduces a novel algorithm, CG-FHAUI, designed to concurrently discover both GFHAUIs and CFHAUIs. Empirical results highlight the superior performance of the proposed algorithm in terms of runtime, memory usage, and scalability when compared to a baseline algorithm.
Similar content being viewed by others
Data availability
The manuscript has associated data in a data repository. [Author’s comment: The datasets used to evaluate algorithms and the synthetic dataset generator used in this study, are public and were obtained from [43]. The generated synthetic datasets will be shared on request.]
References
Luna JM, Fournier-Viger P, Ventura S (2019) Frequent itemset mining: a 25 years review. Wiley Interdiscip Rev Data Min Knowl Discov 9:1–31
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of ACM international conference on information and knowledge management. pp 55–64
Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38:418–435
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25:1772–1786
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). pp 83–92
Zida S, Fournier-Viger P, Lin JC-W, et al (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In: Proceedings of Mexican international conference on artificial intelligence (MICAI 2015). pp 530–546
Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183
Nguyen LTT, Nguyen P, Nguyen TDD et al (2019) Mining high-utility itemsets in dynamic profit databases. Knowl Based Syst 175:130–144
Wu P, Niu X, Fournier-Viger P et al (2022) UBP-miner: an efficient bit based high utility itemset mining algorithm. Knowl Based Syst 248:108865
Qu J-F, Fournier-Viger P, Liu M et al (2023) Mining high utility itemsets using prefix trees and utility vectors. IEEE Trans Knowl Data Eng 35:10224–10236
Duong H, Hoang T, Tran T et al (2022) Efficient algorithms for mining closed and maximal high utility itemsets. Knowl Based Syst 257:109921
Wu C-W, Fournier-Viger P, Gu J, Tseng VS (2019) Mining compact high utility itemsets without candidate generation. In: Fournier-Viger P, Lin JW, Nkambou R, Vo B, Tseng V (eds) High-utility pattern mining studies in big data. Springer, Berlin, pp 283–307
Nguyen LTT, Vu VV, Lam MTH et al (2019) An efficient method for mining high utility closed itemsets. Inf Sci (N Y) 495:78–99
Tseng VS, Wu C, Fournier-Viger P, Yu PS (2015) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27:726–739
Fournier-Viger P, Wu C-W, Tseng VS (2014) Novel Concise representations of high utility itemsets using generator patterns. In: Proceedings of international conference on advanced data mining and applications. pp 30–43
Sahoo J, Kumar A, Goswami A (2015) An efficient approach for mining association rules from high utility itemsets. Expert Syst Appl 42:5754–5778
Mai T, Nguyen LTT, Vo B et al (2020) Efficient algorithm for mining non-redundant high-utility association rules. Sensors (Switzerland) 20:1–17
Lin JC-W, Hong T-P, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. In: Lecture notes in computer science. pp 131–139
Hong T-P, Lee CH, Wang SL (2009) Mining high average-utility itemsets. In: Proceedings of IEEE international conference on systems, man and cybernetics. pp 2526–2530
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231
Yun U, Kim D, Ryang H et al (2016) Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst 30:3605–3617
Truong T, Duong H, Le B, Fournier-Viger P (2018) Efficient vertical mining of high average-utility itemsets based on novel upper-bounds. IEEE Trans Knowl Data Eng 31:301–314
Truong T, Duong H, Le B et al (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst 183:104847
Kim H, Yun U, Baek Y et al (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci (N Y) 543:85–105
Li G, Shang T, Zhang Y (2023) Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure. Appl Intell 53:6099–6118
Li J, Li H, Wong L, et al (2006) Minimum description length principle: Generators are preferable to closed patterns. In: Proceedings of the 21st national conference on artificial intelligence, AAAI ’06. pp 409–414
Grunwald P, Myung IJ, Pitt M (2005) Advances in minimum description length: theory and applications. The MIT Press, Cambridge
Lan GC, Hong T-P, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Mak 11:1009–1030
Lu T, Vo B, Nguyen HT, Hong T-P (2015) A new method for mining high average utility itemsets. In: Proceedings of international conference on computer information systems and industrial management. pp 33–42
Lin JC-W, Li T, Fournier-Viger P et al (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30:233–243
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Gener Comput Syst 68:346–360
Lin JC-W, Ren S, Fournier-Viger P (2018) MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6:7593–7609
Kim D, Yun U (2017) Efficient algorithm for mining high average-utility itemsets in incremental transaction databases. Appl Intell 47:114–131
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl Based Syst 144:188–205
Kim J, Yun U, Yoon E et al (2020) One scan based high average-utility pattern mining in static and dynamic databases. Future Gener Comput Syst 111:143–158
Song W, Liu L, Huang C (2021) Generalized maximal utility for mining high average-utility itemsets. Knowl Inf Syst 63:2947–2967
Tran T, Duong H, Truong T, Le B (2023) Efficient mining of concise and informative representations of frequent high utility itemsets. Eng Appl Artif Intell 126:107111
Bui H, Vo B, Nguyen-Hoang TA, Yun U (2021) Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy. Appl Intell 51:1439–1459
Merugula S, Rao MVPCS (2020) An integrated approach for mining closed and generator high utility itemsets. Knowl Based Intell Eng Syst 24:27–35
Fournier-Viger P, Zida S, Lin JC-W, et al (2016) EFIM-closed: fast and memory efficient discovery of closed high-utility itemsets. In: proceedings of international conference on machine learning and data mining in pattern recognition. pp 199–213
Tran A, Truong T, Le B (2014) Simultaneous mining of frequent closed itemsets and their generators: Foundation and algorithm. Eng Appl Artif Intell 36:64–80
Tran A, Duong H, Truong T, Le B (2012) Mining frequent itemsets with dualistic constraints. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). pp 807–813
Fournier-Viger P, Gomariz A, Soltani A et al (2014) SPMF: a java open-source pattern mining library. J Mach Learn Res 15:3569–3573
Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on Utility-based data mining. pp 90–99
Acknowledgements
Hai Duong was funded by the Postdoctoral Scholarship Programme of Vingroup Innovation Foundation (VINIF), code VINIF.2023.STS.52.
Author information
Authors and Affiliations
Contributions
HD, TT and BL contributed equally to the study conception and design. Material preparation, data collection and analysis were performed by HD, TT, and BL. The first draft of the manuscript was written by HD, TT, BL and PF-V. All authors read and approved the content of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Ethical approval
The authors have no conflict of interest. This research was carried using public data. No experiments were conducted with humans or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Duong, H., Truong, T., Le, B. et al. CG-FHAUI: an efficient algorithm for simultaneously mining succinct pattern sets of frequent high average utility itemsets. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02121-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10115-024-02121-7