An efficient algorithm for mining top-k on-shelf high utility itemsets

Dam, Thu-Lan; Li, Kenli; Fournier-Viger, Philippe; Duong, Quang-Huy

doi:10.1007/s10115-016-1020-2

An efficient algorithm for mining top-k on-shelf high utility itemsets

Regular Paper
Published: 10 January 2017

Volume 52, pages 621–655, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Thu-Lan Dam^1,2,
Kenli Li^1,3,4,
Philippe Fournier-Viger⁵ &
…
Quang-Huy Duong⁶

629 Accesses
28 Citations
Explore all metrics

Abstract

High on-shelf utility itemset (HOU) mining is an emerging data mining task which consists of discovering sets of items generating a high profit in transaction databases. The task of HOU mining is more difficult than traditional high utility itemset (HUI) mining, because it also considers the shelf time of items, and items having negative unit profits. HOU mining can be used to discover more useful and interesting patterns in real-life applications than traditional HUI mining. Several algorithms have been proposed for this task. However, a major drawback of these algorithms is that it is difficult for users to find a suitable value for the minimum utility threshold parameter. If the threshold is set too high, not enough patterns are found. And if the threshold is set too low, too many patterns will be found and the algorithm may use an excessive amount of time and memory. To address this issue, we propose to address the problem of top-k on-shelf high utility itemset mining, where the user directly specifies k, the desired number of patterns to be output instead of specifying a minimum utility threshold value. An efficient algorithm named KOSHU (fast top-K on-shelf high utility itemset miner) is proposed to mine the top-k HOUs efficiently, while considering on-shelf time periods of items, and items having positive and/or negative unit profits. KOSHU introduces three novel strategies, named efficient estimated co-occurrence maximum period rate pruning, period utility pruning and concurrence existing of a pair 2-itemset pruning to reduce the search space. KOSHU also incorporates several novel optimizations and a faster method for constructing utility-lists. An extensive performance study on real-life and synthetic datasets shows that the proposed algorithm is efficient both in terms of runtime and memory consumption and has excellent scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High On-Shelf Utility Mining Using an Improved HOUI-Mine Algorithm

OHUQI: Mining on-shelf high-utility quantitative itemsets

Article 09 January 2022

A Comparative Study of Top-K High Utility Itemset Mining Methods

Notes

It should be noted that this tree representation is used for illustration purpose in this article. The proposed algorithm does not create an explicit tree structure in memory to explore the search space.
http://www.almaden.ibm.com/cs/quest/syndata.html.
http://cucis.ece.northwestern.edu/projects/DMS/MineBenchDownload.html.
https://www.microsoft.com/en-us/download/details.aspx?id=51958.
http://www.kdd.org/kdd-cup/view/kdd-cup-2000.
http://fimi.cs.helsinki.fi/data/.
http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php.
http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php.

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: VLDB, pp 487–499
Agrawal R, Srikant R (1994) Quest synthetic data generator. http://www.almaden.ibm.com/cs/quest/syndata.html
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Third IEEE international conference on data mining (ICDM 2003), pp 19–26
Chen H (2014) Mining top-k frequent patterns over data streams sliding window. J Intell Inf Syst 42(1):111–131
Article Google Scholar
Cheng J, Ke Y, Ng W (2008) A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst 16(1):1–27
Article Google Scholar
Cheung YL, Fu AC (2004) Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans Knowl Data Eng 16(9):1052–1069
Article Google Scholar
Chu CJ, Tseng VS, Liang T (2008) An efficient algorithm for mining temporal high utility itemsets from data streams. J Syst Softw 81(7):1105–1117
Article Google Scholar
Chu CJ, Tseng VS, Liang T (2009) An efficient algorithm for mining high utility itemsets with negative item values in large databases. Appl Math Comput 215(2):767–778
MATH Google Scholar
Dam TL, Li K, Fournier-Viger P, Duong OH (2016) CLS-Miner: efficient and effective closed high utility itemset mining. Front Comput Sci. doi:10.1007/s11704-016-6245-4
Google Scholar
Dam TL, Li K, Fournier-Viger P, Duong QH (2016) An efficient algorithm for mining top-rank-k frequent patterns. Appl Intell 45(1):96–111
Article Google Scholar
Duong QH, Liao B, Fournier-Viger P, Dam TL (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl Based Syst 104:106–122
Article Google Scholar
Fournier-Viger P (2014) FHN: efficient mining of high-utility itemsets with negative unit profits. In: Advanced data mining and applications, lecture notes in computer science, vol 8933. Springer, Berlin, pp 16–29
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a java open-source pattern mining library. J Mach Learn Res 15:3569–3573
MATH Google Scholar
Fournier-Viger P, Lin JCW, Gueniche T, Barhate P (2015) Efficient incremental high utility itemset mining. In: Proceedings of the ASE BigData & Social Informatics 2015, ASE BD & SI ’15. ACM, New York, pp 53:1–53:6
Fournier-Viger P, Wu CW, Zida S, Tseng V (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Foundations of intelligent systems, lecture notes in computer science, vol 8502. Springer, Berlin, pp 83–92
Fournier-Viger P, Zida S (2015) FOSHU: faster on-shelf high utility itemset mining—with or without negative unit profit. In: Proceedings of the 30th annual ACM symposium on applied computing, SAC ’15. ACM, New York, pp 857–864
Fu AWC, Kwong RWw, Tang J, (2000) Mining N-most interesting itemsets. In: Proceedings of the 12th international symposium on foundations of intelligent systems, ISMIS ’00. Springer, London, pp 59–67
Golab L, DeHaan D, Demaine ED, Lopez-Ortiz A, Munro JI (2003) Identifying frequent items in sliding windows over on-line packet streams. In: Proceedings of the 3rd ACM SIGCOMM conference on internet measurement, IMC ’03. ACM, New York, pp 173–178
Grahne G, Zhu JF (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362
Article Google Scholar
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86
Article MathSciNet Google Scholar
Han JW, Pei J, Yin YW (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Article MathSciNet Google Scholar
Homem N, Carvalho JP (2010) Finding top-k elements in data streams. Inf Sci 180(24):4958–4974
Article Google Scholar
Lan GC, Hong TP, Huang JP, Tseng VS (2014) On-shelf utility mining with negative item values. Expert Syst Appl 41(7):3450–3459
Article Google Scholar
Lan GC, Hong TP, Tseng VS (2011) Discovery of high utility itemsets from on-shelf time periods of products. Expert Syst Appl 38(5):5851–5857
Article Google Scholar
Lan GC, Hong TP, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Article Google Scholar
Li HF, Huang HY, Lee SY (2011) Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl Inf Syst 28(3):495–522
Article Google Scholar
Lin JCW, Gan W, Fournier-Viger P, Hong TP (2015) RWFIM: recent weighted-frequent itemsets mining. Eng Appl Artif Intell 45:18–32
Article Google Scholar
Lin JW, Gan W, Hong TP (2016) Maintaining the discovered high-utility itemsets with transaction modification. Appl Intell 44(1):166–178
Liu G, Lu H, Lou W, Xu Y, Yu J (2004) Efficient mining of frequent patterns using ascending frequency ordered prefix-tree. Data Min Knowl Discov 9(2):249–274
Article MathSciNet Google Scholar
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12. ACM, New York, pp 55–64
Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in knowledge discovery and data mining, lecture notes in computer science, vol 3518. Springer, Berlin, pp 689–695
Manerikar N, Palpanas T (2009) Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl Eng 68(4):415–430
Article Google Scholar
Metwally A, Agrawal D, Abbadi AE (2006) An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans Database Syst 31(3):1095–1133
Article Google Scholar
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl Based Syst 76:109–126
Article Google Scholar
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231
Article Google Scholar
Salam A, Khayal M (2012) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst 30(1):57–86
Article Google Scholar
Song W, Liu Y, Li J (2014) BAHUI: fast and memory efficient mining of high utility itemsets based on Bitmap. Int J Data Warehous Min 10(1):1–15
Article Google Scholar
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29–43
Article Google Scholar
Song W, Zhang Z, Li J (2016) A high utility itemset mining algorithm based on subsume index. Knowl Inf Syst 49(1):315–340
Tseng V, Shie BE, Wu CW, Yu P (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Article Google Scholar
Tseng V, Wu CW, Fournier-Viger P, Yu P (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Article Google Scholar
Wang JY, Han JW, Lu Y, Tzvetkov P (2005) TFP: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–664
Article Google Scholar
Wong RCW, Fu AWC (2006) Mining top-k frequent itemsets from data streams. Data Min Knowl Discov 13(2):193–217
Article MathSciNet Google Scholar
Wu CW, Shie BE, Tseng VS, Yu PS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 78–86
Yang B, Huang H (2010) TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams. Knowl Inf Syst 23(2):225–242
Article Google Scholar
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878
Article Google Scholar
Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 326–335
Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138–161
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This study was funded by the National Natural Science Foundation of China (Grant Nos. 61133005, 61432005, 61370095, 61472124, 61202109, 61472126), and the International Science and Technology Cooperation Program of China (Grant Nos. 2015DFA11240, 2014DFBS0010). T-L. Dam was also partially supported by science research fund of Hanoi University of Industry, Hanoi, Vietnam.

Author information

Authors and Affiliations

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
Thu-Lan Dam & Kenli Li
Faculty of Information Technology, Hanoi University of Industry, Hanoi, Vietnam
Thu-Lan Dam
CIC of HPC, National University of Defense Technology, Changsha, 410073, China
Kenli Li
National Supercomputing Center in Changsha, Changsha, 410082, Hunan, China
Kenli Li
School of Natural Sciences and Humanities, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, 518055, Guangdong, China
Philippe Fournier-Viger
Faculty of Information Technology, Mathematics and Electrical Engineering, Norwegian University of Science and Technology, Trondheim, Norway
Quang-Huy Duong

Authors

Thu-Lan Dam
View author publications
You can also search for this author in PubMed Google Scholar
Kenli Li
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Quang-Huy Duong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kenli Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dam, TL., Li, K., Fournier-Viger, P. et al. An efficient algorithm for mining top-k on-shelf high utility itemsets. Knowl Inf Syst 52, 621–655 (2017). https://doi.org/10.1007/s10115-016-1020-2

Download citation

Received: 09 March 2016
Accepted: 28 December 2016
Published: 10 January 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10115-016-1020-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient algorithm for mining top-k on-shelf high utility itemsets

Abstract

Access this article

Similar content being viewed by others

High On-Shelf Utility Mining Using an Improved HOUI-Mine Algorithm

OHUQI: Mining on-shelf high-utility quantitative itemsets

A Comparative Study of Top-K High Utility Itemset Mining Methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An efficient algorithm for mining top-k on-shelf high utility itemsets

Abstract

Access this article

Similar content being viewed by others

High On-Shelf Utility Mining Using an Improved HOUI-Mine Algorithm

OHUQI: Mining on-shelf high-utility quantitative itemsets

A Comparative Study of Top-K High Utility Itemset Mining Methods

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation