A fast algorithm for mining high average-utility itemsets

Lin, Jerry Chun-Wei; Ren, Shifeng; Fournier-Viger, Philippe; Hong, Tzung-Pei; Su, Ja-Hwung; Vo, Bay

doi:10.1007/s10489-017-0896-1

A fast algorithm for mining high average-utility itemsets

Published: 11 March 2017

Volume 47, pages 331–346, (2017)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jerry Chun-Wei Lin¹,
Shifeng Ren¹,
Philippe Fournier-Viger²,
Tzung-Pei Hong^3,4,
Ja-Hwung Su⁵ &
…
Bay Vo^6,7

573 Accesses
36 Citations
Explore all metrics

Abstract

Mining high-utility itemsets (HUIs) in transactional databases has become a very popular research topic in recent years. A popular variation of the problem of HUI mining is to discover high average-utility itemsets (HAUIs), where an alternative measure called the average-utility is used to evaluate the utility of itemsets by considering their lengths. Albeit, HAUI mining has been studied extensively, current algorithms often consume a large amount of memory and have long execution times, due to the large search space and the usage of loose upper bounds to estimate the average-utilities of itemsets. In this paper, we present a more efficient algorithm for HAUI mining, which includes three pruning strategies to provide a tighter upper bound on the average-utilities of itemsets, and thus reduce the search space more effectively to decrease the runtime. The first pruning strategy utilizes relationships between item pairs to reduce the search space for itemsets containing three or more items. The second pruning strategy provides a tighter upper bound on the average-utilities of itemsets to prune unpromising candidates early. The third strategy reduces the time for constructing the average-utility-list structures for itemsets, which is used to calculate their upper bounds. Substantial experiments conducted on both real-life and synthetic datasets show that the proposed algorithm with three pruning strategies can efficiently and effectively reduce the search space for mining HAUIs, when compared to the state-of-the-art algorithms, in terms of runtime, number of candidates, memory usage, performance of the pruning strategies and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized maximal utility for mining high average-utility itemsets

Article 21 October 2021

A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure

Article 18 March 2020

Efficient Mining of High Average-Utility Itemsets with Multiple Minimum Thresholds

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: International conference on very large data bases, pp 487–499
Agrawal R, Srikant R (1994) Quest synthetic data generator. http://www.Almaden.ibm.com/cs/quest/syndata.html
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Article Google Scholar
Eltabakh MY, Ouzzani M, Khalil MA, Aref WG, Elmagarmid AK (2008) Incremental mining for frequent patterns in evolving time series datatabases. Computer Science Technical Reports 1707
Erwin A, Gopalan RP, Achuthan NR (2008) Efficient mining of high utility itemsets from large datasets. In: The Pacific-Asia conference on advances in knowledge discovery and data mining, pp 554–561
Chen MS, Park JS, Yu PS (1998) Efficient data mining for path traversal patterns. IEEE Trans Knowl Data Eng 10(2):209–221
Article Google Scholar
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International symposium on methodologies for intelligent systems, pp 83–92
Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The SPMF open-source data mining library version 2. Lect Notes Comput Sci 9853:36–40
Article Google Scholar
Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: IEEE ICDM workshop on frequent itemset mining
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Disc 8:53–87
Article MathSciNet Google Scholar
Hong TP, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Article Google Scholar
Lan GC, Hong TP, Tseng VS (2012) Efficiently mining high average-utility itemsets with an improved upper-bound. Int J Inf Technol Decis Mak 11(5):1009–1030
Article Google Scholar
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Article Google Scholar
Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: The Pacific Asia knowledge discovery and data mining, pp 689–695
Lin CW, Hong TP, Lu WH (2010) Efficiently mining high average utility itemsets with a tree structure. In: The Asian conference on intelligent information and database systems, pp 131–139
Lin MY, Tu TF, Hsueh SC (2012) High utility pattern mining using the maximal itemset property and lexicographic tree structures. Inf Sci 215:1–14
Article Google Scholar
Lin YF, Wu CW, Huang CF, Tseng VS (2015) Discovering utility-based episode rules in complex event sequences. Expert Syst Appl 42(12):5303–5314
Article Google Scholar
Lin JCW, Li T, Fournier-Viger P, Hong TP, Zhan J, Voznak M (2016) An efficient algorithm to mine high average-utility itemsets. Adv Eng Inform 30(2):233–243
Article Google Scholar
Liu Y, Liao WK, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: The international workshop on utility-based data mining, pp 90–99
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: ACM international conference on information and knowledge management, pp 55–64
Liu J, Wang K, Fung BCM (2016) Mining high utility patterns in one phase without generating candidates. IEEE Trans Knowl Data Eng 25(5):1245–1257
Article Google Scholar
Lu T, Vo B, Nguyen HT, Hong TP (2014) A new method for mining high average utility itemsets. Lect Notes Comput Sci:33–42
Lucchese C, Orlando S, Perego R (2006) Fast and memory efficient mining of frequent closed itemsets. IEEE Trans Knowl Data Eng 18(1):21–36
Article Google Scholar
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 21– 30
Tanbeer SK, Ahmed CF, Jeong BS, Lee YK (2008) Efficient frequent pattern mining over data streams. In: ACM conference on information and knowledge management, pp 1447– 1448
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: SIAM international conference on data mining, pp 482–486
Yao H, Hamilton HJ, Geng L (2006) A unified framework for utility based measures for mining itemsets. In: The international workshop on utility-based data mining, pp 27–28
Yen SJ, Lee YS (2007) Mining high utility quantitative association rules. International Conference on Big Data Analytics and Knowledge Discovery, pp 283–292

Download references

Acknowledgment

This research was partially supported by the National Natural Science Foundation of China (NSFC) under grant No.61503092 and by the CCF-Tencent IAGR20160115.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Jerry Chun-Wei Lin & Shifeng Ren
School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
Philippe Fournier-Viger
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
Tzung-Pei Hong
Department of Information Management, Cheng Shiu University, Kaohsiung, Taiwan
Ja-Hwung Su
Faculty of Information Technology, Ho Chi Minh City University of Technology, Ho Chi Minh, Vietnam
Bay Vo
College of Electronics and Information Engineering, Sejong University, Seoul, Republic of Korea
Bay Vo

Authors

Jerry Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Shifeng Ren
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Tzung-Pei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Ja-Hwung Su
View author publications
You can also search for this author in PubMed Google Scholar
Bay Vo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerry Chun-Wei Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, J.CW., Ren, S., Fournier-Viger, P. et al. A fast algorithm for mining high average-utility itemsets. Appl Intell 47, 331–346 (2017). https://doi.org/10.1007/s10489-017-0896-1

Download citation

Published: 11 March 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10489-017-0896-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast algorithm for mining high average-utility itemsets

Abstract

Access this article

Similar content being viewed by others

Generalized maximal utility for mining high average-utility itemsets

A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure

Efficient Mining of High Average-Utility Itemsets with Multiple Minimum Thresholds

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast algorithm for mining high average-utility itemsets

Abstract

Access this article

Similar content being viewed by others

Generalized maximal utility for mining high average-utility itemsets

A fast high average-utility itemset mining with efficient tighter upper bounds and novel list structure

Efficient Mining of High Average-Utility Itemsets with Multiple Minimum Thresholds

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation