Skip to main content
Log in

CG-FHAUI: an efficient algorithm for simultaneously mining succinct pattern sets of frequent high average utility itemsets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The identification of both closed frequent high average utility itemsets (CFHAUIs) and generators of frequent high average utility itemsets (GFHAUIs) has substantial significance because they play an essential and concise role in representing frequent high average utility itemsets (FHAUIs). These concise summaries offer a compact yet crucial overview that can be much smaller. In addition, they allow the generation of non-redundant high average utility association rules, a crucial factor for decision-makers to consider. However, difficulty arises from the complexity of discovering these representations, primarily because the average utility function does not satisfy both monotonic and anti-monotonic properties within each equivalence class, that is for itemsets sharing the same subset of transactions. To tackle this challenge, this paper proposes an innovative method for efficiently extracting CFHAUIs and GFHAUIs. This approach introduces novel bounds on the average utility, including a weak lower bound called \(wlbau\) and a lower bound named \(auvlb\). Efficient pruning strategies are also designed with the aim of early elimination of non-closed and/or non-generator FHAUIs based on the \(wlbau\) and \(auvlb\) bounds, leading to quicker execution and lower memory consumption. Additionally, the paper introduces a novel algorithm, CG-FHAUI, designed to concurrently discover both GFHAUIs and CFHAUIs. Empirical results highlight the superior performance of the proposed algorithm in terms of runtime, memory usage, and scalability when compared to a baseline algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Data availability

The manuscript has associated data in a data repository. [Author’s comment: The datasets used to evaluate algorithms and the synthetic dataset generator used in this study, are public and were obtained from [43]. The generated synthetic datasets will be shared on request.]

References

  1. Luna JM, Fournier-Viger P, Ventura S (2019) Frequent itemset mining: a 25 years review. Wiley Interdiscip Rev Data Min Knowl Discov 9:1–31

    Article  Google Scholar 

  2. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of ACM international conference on information and knowledge management. pp 55–64

  3. Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38:418–435

    Article  Google Scholar 

  4. Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25:1772–1786

    Article  Google Scholar 

  5. Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). pp 83–92

  6. Zida S, Fournier-Viger P, Lin JC-W, et al (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In: Proceedings of Mexican international conference on artificial intelligence (MICAI 2015). pp 530–546

  7. Krishnamoorthy S (2017) HMiner: efficiently mining high utility itemsets. Expert Syst Appl 90:168–183

    Article  Google Scholar 

  8. Nguyen LTT, Nguyen P, Nguyen TDD et al (2019) Mining high-utility itemsets in dynamic profit databases. Knowl Based Syst 175:130–144

    Article  Google Scholar 

  9. Wu P, Niu X, Fournier-Viger P et al (2022) UBP-miner: an efficient bit based high utility itemset mining algorithm. Knowl Based Syst 248:108865

    Article  Google Scholar 

  10. Qu J-F, Fournier-Viger P, Liu M et al (2023) Mining high utility itemsets using prefix trees and utility vectors. IEEE Trans Knowl Data Eng 35:10224–10236

    Article  Google Scholar 

  11. Duong H, Hoang T, Tran T et al (2022) Efficient algorithms for mining closed and maximal high utility itemsets. Knowl Based Syst 257:109921

    Article  Google Scholar 

  12. Wu C-W, Fournier-Viger P, Gu J, Tseng VS (2019) Mining compact high utility itemsets without candidate generation. In: Fournier-Viger P, Lin JW, Nkambou R, Vo B, Tseng V (eds) High-utility pattern mining studies in big data. Springer, Berlin, pp 283–307

    Google Scholar 

  13. Nguyen LTT, Vu VV, Lam MTH et al (2019) An efficient method for mining high utility closed itemsets. Inf Sci (N Y) 495:78–99

    Article  Google Scholar 

  14. Tseng VS, Wu C, Fournier-Viger P, Yu PS (2015) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27:726–739

    Article  Google Scholar 

  15. Fournier-Viger P, Wu C-W, Tseng VS (2014) Novel Concise representations of high utility itemsets using generator patterns. In: Proceedings of international conference on advanced data mining and applications. pp 30–43

  16. Sahoo J, Kumar A, Goswami A (2015) An efficient approach for mining association rules from high utility itemsets. Expert Syst Appl 42:5754–5778

    Article  Google Scholar 

  17. Mai T, Nguyen LTT, Vo B et al (2020) Efficient algorithm for mining non-redundant high-utility association rules. Sensors (Switzerland) 20:1–17

    Article  Google Scholar 

  18. Lin JC-W, Hong T-P, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. In: Lecture notes in computer science. pp 131–139

  19. Hong T-P, Lee CH, Wang SL (2009) Mining high average-utility itemsets. In: Proceedings of IEEE international conference on systems, man and cybernetics. pp 2526–2530

  20. Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231

    Article  Google Scholar 

  21. Yun U, Kim D, Ryang H et al (2016) Mining recent high average utility patterns based on sliding window from stream data. J Intell Fuzzy Syst 30:3605–3617

    Article  Google Scholar 

  22. Truong T, Duong H, Le B, Fournier-Viger P (2018) Efficient vertical mining of high average-utility itemsets based on novel upper-bounds. IEEE Trans Knowl Data Eng 31:301–314

    Article  Google Scholar 

  23. Truong T, Duong H, Le B et al (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl Based Syst 183:104847

    Article  Google Scholar 

  24. Kim H, Yun U, Baek Y et al (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci (N Y) 543:85–105

    Article  Google Scholar 

  25. Li G, Shang T, Zhang Y (2023) Efficient mining high average-utility itemsets with effective pruning strategies and novel list structure. Appl Intell 53:6099–6118

    Article  Google Scholar 

  26. Li J, Li H, Wong L, et al (2006) Minimum description length principle: Generators are preferable to closed patterns. In: Proceedings of the 21st national conference on artificial intelligence, AAAI ’06. pp 409–414

  27. Grunwald P, Myung IJ, Pitt M (2005) Advances in minimum description length: theory and applications. The MIT Press, Cambridge

    Book  Google Scholar 

  28. Lan GC, Hong T-P, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int J Inf Technol Decis Mak 11:1009–1030

    Article  Google Scholar 

  29. Lu T, Vo B, Nguyen HT, Hong T-P (2015) A new method for mining high average utility itemsets. In: Proceedings of international conference on computer information systems and industrial management. pp 33–42

  30. Lin JC-W, Li T, Fournier-Viger P et al (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30:233–243

    Article  Google Scholar 

  31. Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Gener Comput Syst 68:346–360

    Article  Google Scholar 

  32. Lin JC-W, Ren S, Fournier-Viger P (2018) MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6:7593–7609

    Article  Google Scholar 

  33. Kim D, Yun U (2017) Efficient algorithm for mining high average-utility itemsets in incremental transaction databases. Appl Intell 47:114–131

    Article  Google Scholar 

  34. Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowl Based Syst 144:188–205

    Article  Google Scholar 

  35. Kim J, Yun U, Yoon E et al (2020) One scan based high average-utility pattern mining in static and dynamic databases. Future Gener Comput Syst 111:143–158

    Article  Google Scholar 

  36. Song W, Liu L, Huang C (2021) Generalized maximal utility for mining high average-utility itemsets. Knowl Inf Syst 63:2947–2967

    Article  Google Scholar 

  37. Tran T, Duong H, Truong T, Le B (2023) Efficient mining of concise and informative representations of frequent high utility itemsets. Eng Appl Artif Intell 126:107111

    Article  Google Scholar 

  38. Bui H, Vo B, Nguyen-Hoang TA, Yun U (2021) Mining frequent weighted closed itemsets using the WN-list structure and an early pruning strategy. Appl Intell 51:1439–1459

    Article  Google Scholar 

  39. Merugula S, Rao MVPCS (2020) An integrated approach for mining closed and generator high utility itemsets. Knowl Based Intell Eng Syst 24:27–35

    Article  Google Scholar 

  40. Fournier-Viger P, Zida S, Lin JC-W, et al (2016) EFIM-closed: fast and memory efficient discovery of closed high-utility itemsets. In: proceedings of international conference on machine learning and data mining in pattern recognition. pp 199–213

  41. Tran A, Truong T, Le B (2014) Simultaneous mining of frequent closed itemsets and their generators: Foundation and algorithm. Eng Appl Artif Intell 36:64–80

    Article  Google Scholar 

  42. Tran A, Duong H, Truong T, Le B (2012) Mining frequent itemsets with dualistic constraints. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). pp 807–813

  43. Fournier-Viger P, Gomariz A, Soltani A et al (2014) SPMF: a java open-source pattern mining library. J Mach Learn Res 15:3569–3573

    Google Scholar 

  44. Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st international workshop on Utility-based data mining. pp 90–99

Download references

Acknowledgements

Hai Duong was funded by the Postdoctoral Scholarship Programme of Vingroup Innovation Foundation (VINIF), code VINIF.2023.STS.52.

Author information

Authors and Affiliations

Authors

Contributions

HD, TT and BL contributed equally to the study conception and design. Material preparation, data collection and analysis were performed by HD, TT, and BL. The first draft of the manuscript was written by HD, TT, BL and PF-V. All authors read and approved the content of the manuscript.

Corresponding author

Correspondence to Hai Duong.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Ethical approval

The authors have no conflict of interest. This research was carried using public data. No experiments were conducted with humans or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duong, H., Truong, T., Le, B. et al. CG-FHAUI: an efficient algorithm for simultaneously mining succinct pattern sets of frequent high average utility itemsets. Knowl Inf Syst (2024). https://doi.org/10.1007/s10115-024-02121-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10115-024-02121-7

Keywords

Navigation