Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

Efficient high utility itemset mining using buffered utility-lists

Abstract

Discovering high utility itemsets in transaction databases is a key task for studying the behavior of customers. It consists of finding groups of items bought together that yield a high profit. Several algorithms have been proposed to mine high utility itemsets using various approaches and more or less complex data structures. Among existing algorithms, one-phase algorithms employing the utility-list structure have shown to be the most efficient. In recent years, the simplicity of the utility-list structure has led to the development of numerous utility-list based algorithms for various tasks related to utility mining. However, a major limitation of utility-list based algorithms is that creating and maintaining utility-lists are time consuming and can consume a huge amount of memory. The reasons are that numerous utility lists are built and that the utility-list intersection/join operation to construct a utility-list is costly. This paper addresses this issue by proposing an improved utility-list structure called utility-list buffer to reduce the memory consumption and speed up the join operation. This structure is integrated into a novel algorithm named ULB-Miner (Utility-List Buffer for high utility itemset Miner), which introduces several new ideas to more efficiently discover high utility itemsets. ULB-Miner uses the designed utility-list buffer structure to efficiently store and retrieve utility-lists, and reuse memory during the mining process. Moreover, the paper also introduces a linear time method for constructing utility-list segments in a utility-list buffer. An extensive experimental study on various datasets shows that the proposed algorithm relying on the novel utility-list buffer structure is highly efficient in terms of both execution time and memory consumption. The ULB-Miner algorithm is up to 10 times faster than the FHM and HUI-Miner algorithms and consumes up to 6 times less memory. Moreover, it performs well on both dense and sparse datasets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    http://cucis.ece.northwestern.edu/projects/DMS/MineBenchDownload.html

  2. 2.

    https://www.microsoft.com/en-us/download/details.aspx?id=51958

  3. 3.

    http://fimi.cs.helsinki.fi/data/

  4. 4.

    http://fimi.cs.helsinki.fi/data/

  5. 5.

    http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

References

  1. 1.

    Agrawal R, Srikan R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of 20th international conference on very large data bases (VLDB 1994). Morgan Kaufmann, pp 487–499

  2. 2.

    Agrawal R, Srikant R (1994) Quest synthetic data generator. Available at. http://www.almaden.ibm.com/cs/quest/syndata.html

  3. 3.

    Ahmed C, Tanbeer S, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721

  4. 4.

    Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient mining of utility-based web path traversal patterns. In: Proceedings of the 11th international conference on advanced communication technology - vol 3, ICACT’09, pp. 2215–2218

  5. 5.

    Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Proceedings of the 3rd IEEE international conference on data mining, pp 19–26

  6. 6.

    Dam TL, Li K, Fournier-Viger P, Duong QH (2016) An efficient algorithm for mining top-rank-k frequent patterns. Appl Intell 45(1):96–111

  7. 7.

    Dam TL, Li K, Fournier-Viger P, Duong QH (2017) CLS-Miner: efficient and effective closed high utility itemset mining. Frontiers of Computer Science, pp 1–27

  8. 8.

    Dam TL, Li K, Fournier-Viger P, Duong QH (2017) An efficient algorithm for mining top-k on-shelf high utility itemsets. Knowl Inf Syst 52(3):621–655

  9. 9.

    Duong QH, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122

  10. 10.

    Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng V (2014) SPMF: A java open-source pattern mining library. J Mach Learn Res 15:3569–3573

  11. 11.

    Fournier-Viger P, Lin JC, Duong Q, Dam T (2016) FHM+: Faster high-utility itemset mining using length upper-bound reduction. In: Proceedings of the 29th international conference on industrial engineering and other applications of applied intelligent systems, pp 115–127

  12. 12.

    Fournier-Viger P, Lin JCW, Duong QH, Dam TL (2016) PHM: Mining periodic high-utility itemsets. In: Proceedings of the 16th industrial conference on data mining. Springer, pp 64–79. Springer

  13. 13.

    Fournier-Viger P, Wu CW, Zida S, Tseng V (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proceedings of the 21st international symposium on methodologies for intelligent systems, pp 83–92

  14. 14.

    Fournier-Viger P, Zida S (2015) FOSHU: Faster on-shelf high utility itemset mining – with or without negative unit profit. In: Proceedings of the 30th annual ACM symposium on applied computing, SAC ’15, pp 857–864

  15. 15.

    Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362

  16. 16.

    Han J, Wang J, Lu Y, Tzvetkov P (2002) Mining top-k frequent closed patterns without minimum support. In: Proceedings of the IEEE international conference on data mining, pp 211–218

  17. 17.

    Han JW, Pei J, Yin YW (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87

  18. 18.

    Joshi M, Bhalodia D (2016) Mining high utility itemset using graphics processor. In: Proceedings of the international symposium on intelligent systems technologies and applications, pp 665–674

  19. 19.

    Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381

  20. 20.

    Lan GC, Hong TP, Tseng V (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107

  21. 21.

    Lee S, Park JS (2016) Top-k high utility itemset mining based on utility-list structures. In: Proceedings of the international conference on big data and smart computing, pp 101–108

  22. 22.

    Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng V (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187

  23. 23.

    Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12, pp 55–64

  24. 24.

    Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th pacific-asia conference on advances in knowledge discovery and data mining, PAKDD’05, pp 689–695

  25. 25.

    Sahoo J, Das AK, Goswami A (2016) An efficient fast algorithm for discovering closed+ high utility itemsets. Appl Intell 45(1):44–74

  26. 26.

    Song W, Liu Y, Li J (2014) BAHUI: Fast and memory efficient mining of high utility itemsets based on bitmap. Int J Data Warehouse Min 10(1):1–15

  27. 27.

    Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29– 43

  28. 28.

    Song W, Zhang Z, Li J (2016) A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49(1):315– 340

  29. 29.

    Thilagu M, Nadarajan R (2012) Efficiently mining of effective web traversal patterns with average utility. Procedia Technol 6:444–451

  30. 30.

    Tseng V, Shie BE, Wu CW, Yu P (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786

  31. 31.

    Tseng V, Wu CW, Fournier-Viger P, Yu P (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67

  32. 32.

    Wang JZ, Huang JL, Chen YC (2016) On efficiently mining high utility sequential patterns. Knowl Inf Syst 49(2):597–627

  33. 33.

    Wu CW, Fournier-Viger P, Gu JY, Tseng V (2015) Mining closed+ high utility itemsets without candidate generation. In: 2015 conference on technologies and applications of artificial intelligence (TAAI), pp 187–194

  34. 34.

    Wu CW, Shie BE, Tseng V, Yu PS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, pp 78–86

  35. 35.

    Liu Y-C, Cheng C-P, Tseng V (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinformatics 14:230

  36. 36.

    Yun U, Ryang H, Lee G, Fujita H (2017) An efficient algorithm for mining high utility patterns from incremental databases with one database scan. Knowl-Based Syst 124:188–206

  37. 37.

    Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878

  38. 38.

    Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–335

Download references

Acknowledgments

This research was partly funded by the Norwegian University of Science and Technology (NTNU) through the MUSED project and partly supported by the Youth 1000 funding of Prof. Philippe Fournier-Viger. The work of Mrs. Dam was carried out during the tenure of an ERCIM “Alain Bensoussan” Fellowship Programme.

Author information

Correspondence to Philippe Fournier-Viger or Heri Ramampiaro.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Duong, Q., Fournier-Viger, P., Ramampiaro, H. et al. Efficient high utility itemset mining using buffered utility-lists. Appl Intell 48, 1859–1877 (2018). https://doi.org/10.1007/s10489-017-1057-2

Download citation

Keywords

  • Pattern mining
  • Itemset mining
  • Utility mining
  • Utility list
  • Utility list buffer