Abstract
In recent years, high-utility itemset mining has emerged as an important data mining task. However, it remains computationally expensive both in terms of runtime and memory consumption. It is thus an important challenge to design more efficient algorithms for this task. In this paper, we address this issue by proposing a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discover high-utility itemsets. EFIM relies on two new upper bounds named revised sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques named High-utility Database Projection and High-utility Transaction Merging (HTM), also performed in linear time. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster than the state-of-art algorithms \(\hbox {d}^2\)HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+ on dense datasets and performs quite well on sparse datasets. Moreover, a key advantage of EFIM is its low memory consumption.
Similar content being viewed by others
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large databases, Morgan Kaufmann, Santiago de Chile, Chile, September 1994, pp 487–499
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Ahmed CF, Tanbeer SK, Jeong B (2010) Mining high utility web access sequences in dynamic web log data. In: Proceedings of the international conference on software engineering artificial intelligence networking and parallel/distributed computing, IEEE, London, UK, June 2010, pp 76–81
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15:3389–3393
Fournier-Viger P, Wu CW, Tseng VS (2014) Novel concise representations of high utility itemsets using generator patterns. In: Proceedings of the 10th international conference on advanced data mining and applications, Guilin, China, December 2014. Lecture Notes in Artificial Intelligence, vol 8933. Springer, Berlin, pp 30–43
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. in: Proceedings of the 21st international symposium on methodologies for intelligent systems, Roskilde, Denmark, June 2014. Lecture Notes in Artificial Intelligence, vol 9384. Springer, Berlin, pp 83–92
Fournier-Viger P, Lin JCW, Duong QH, Dam TL (2016) PHM: mining periodic high-utility itemsets. In: Proceedings of the 16th industrial conference on data mining, New York, USA, July 2016. Lecture Notes in Artificial Intelligence, vol 9728, Springer, Berlin, pp 64–79
Fournier-Viger P, Zida S (2015) FOSHU: faster on-shelf high utility itemset mining with or without negative unit profit. In: Proceedings of the 30th symposium on applied computing, ACM, Salamanca, Spain, April 2015, pp 857–864
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-Mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of the 2001 IEEE international conference on data mining, IEEE, San Jose, CA, November 2001, pp 441–448
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Lan GC, Hong TP, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Lin JCW, Hong TP, Lan GC, Wong JW, Lin WY (2015) Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Adv Eng Inform 29(1):16–27
Lin YC, Wu CW, Tseng VS (2015) Mining high utility itemsets in big data. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining, Ho Chi Minh City, Vietnam, May 2015, Lecture Notes in Artificial Intelligence, vol 9077. Springer, Berlin, pp 649–661
Liu J, Wang K, Fung B (2012) Direct discovery of high utility itemsets without candidate generation. In: Proceedings of the 12th IEEE international conference on data mining, IEEE, Brussels, Belgium, December 2012, pp 984–989
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 22nd ACM international conference on information and knowledge management, ACM, Maui, HI, October 2012, pp 55–64
Liu Y, Cheng C, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinform 14(230)
Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining, Hanoi, Vietnam, May 2005, Lecture Notes in Artificial Intelligence, vol 3518. Springer, Berlin, pp 689–695
Lu T, Liu Y, Wang L (2014) An algorithm of top-k high utility itemsets mining over data stream. J Softw 9(9):2342–2347
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl Based Syst 76:109–126
Rymon R (1992) Search through systematic set enumeration. In: Proceedings of the third international conference on principles of knowledge representation and reasoning, Morgan Kaufmann, Cambridge, MA, October 1992, pp 539–50
Sahoo J, Das AK, Goswami A (2015) An efficient approach for mining association rules from high utility itemsets. Expert Syst Appl 42(13):5754–5778
Song W, Liu Y, Li J (2014) BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Proc Int J Data Wareh Min 10(1):1–15
Thilagu M, Nadarajan R (2012) Efficiently mining of effective web traversal patterns with average utility. In: Proceedings of the international conference on communication, computing, and security. CRC Press, Gurgaon, India, September 2016, pp 444–451
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Tseng VS, Wu CW, Fournier-Viger P, Yu P (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the ICDM’04 Workshop on Frequent Itemset Mining Implementations, CEUR, Brighton, UK, November 2014
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 3rd SIAM international conference on data mining, SIAM, Lake Buena Vista, FL, USA, April 2004, pp 482–486
Yin J, Zheng Z, Cao L, Song Y, Wei, W (2012) An efficient algorithm for mining high utility sequential patterns.In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Beijing, China, August 2012, pp 660–668
Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: Proceedings of the 13th international conference on data mining, IEEE, Dallas, TX, USA, December 2013, pp 1259–1264
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Zida S, Fournier-Viger P, Wu CW, Lin JCW, Tseng VS (2015) Efficient mining of high utility sequential rules. In: Proceedings of the 11th international conference on machine learning and data mining, Hamburg, Germany, July 2015, Lecture Notes in Artificial Intelligence vol 9166. Springer, Berlin, pp 1–15
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In: Proceedings of the 14th Mexican international conference on artificial intelligence, Cuernavaca, Mexico, October 2015. Lecture Notes in Artificial Intelligence, vol 9413. Springer, Berlin, pp 530–546
Acknowledgments
This project was supported by a NSERC Discovery grant from the Government of Canada and an initiating fund provided to the second author by Harbin Institute of Technology (Shenzhen).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zida, S., Fournier-Viger, P., Lin, J.CW. et al. EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51, 595–625 (2017). https://doi.org/10.1007/s10115-016-0986-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0986-0