EFIM: a fast and memory efficient algorithm for high-utility itemset mining

Zida, Souleymane; Fournier-Viger, Philippe; Lin, Jerry Chun-Wei; Wu, Cheng-Wei; Tseng, Vincent S.

doi:10.1007/s10115-016-0986-0

EFIM: a fast and memory efficient algorithm for high-utility itemset mining

Regular Paper
Published: 02 September 2016

Volume 51, pages 595–625, (2017)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Souleymane Zida¹,
Philippe Fournier-Viger ORCID: orcid.org/0000-0002-7680-9899²,
Jerry Chun-Wei Lin³,
Cheng-Wei Wu⁴ &
…
Vincent S. Tseng²

1286 Accesses
172 Citations
Explore all metrics

Abstract

In recent years, high-utility itemset mining has emerged as an important data mining task. However, it remains computationally expensive both in terms of runtime and memory consumption. It is thus an important challenge to design more efficient algorithms for this task. In this paper, we address this issue by proposing a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discover high-utility itemsets. EFIM relies on two new upper bounds named revised sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques named High-utility Database Projection and High-utility Transaction Merging (HTM), also performed in linear time. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster than the state-of-art algorithms \(\hbox {d}^2\)HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+ on dense datasets and performs quite well on sparse datasets. Moreover, a key advantage of EFIM is its low memory consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

Article Open access 22 February 2023

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Article Open access 15 January 2021

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large databases, Morgan Kaufmann, Santiago de Chile, Chile, September 1994, pp 487–499
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong B (2010) Mining high utility web access sequences in dynamic web log data. In: Proceedings of the international conference on software engineering artificial intelligence networking and parallel/distributed computing, IEEE, London, UK, June 2010, pp 76–81
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu CW, Tseng VS (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15:3389–3393
MATH Google Scholar
Fournier-Viger P, Wu CW, Tseng VS (2014) Novel concise representations of high utility itemsets using generator patterns. In: Proceedings of the 10th international conference on advanced data mining and applications, Guilin, China, December 2014. Lecture Notes in Artificial Intelligence, vol 8933. Springer, Berlin, pp 30–43
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. in: Proceedings of the 21st international symposium on methodologies for intelligent systems, Roskilde, Denmark, June 2014. Lecture Notes in Artificial Intelligence, vol 9384. Springer, Berlin, pp 83–92
Fournier-Viger P, Lin JCW, Duong QH, Dam TL (2016) PHM: mining periodic high-utility itemsets. In: Proceedings of the 16th industrial conference on data mining, New York, USA, July 2016. Lecture Notes in Artificial Intelligence, vol 9728, Springer, Berlin, pp 64–79
Fournier-Viger P, Zida S (2015) FOSHU: faster on-shelf high utility itemset mining with or without negative unit profit. In: Proceedings of the 30th symposium on applied computing, ACM, Salamanca, Spain, April 2015, pp 857–864
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Article MathSciNet Google Scholar
Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-Mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings of the 2001 IEEE international conference on data mining, IEEE, San Jose, CA, November 2001, pp 441–448
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Article Google Scholar
Lan GC, Hong TP, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Article Google Scholar
Lin JCW, Hong TP, Lan GC, Wong JW, Lin WY (2015) Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Adv Eng Inform 29(1):16–27
Article Google Scholar
Lin YC, Wu CW, Tseng VS (2015) Mining high utility itemsets in big data. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining, Ho Chi Minh City, Vietnam, May 2015, Lecture Notes in Artificial Intelligence, vol 9077. Springer, Berlin, pp 649–661
Liu J, Wang K, Fung B (2012) Direct discovery of high utility itemsets without candidate generation. In: Proceedings of the 12th IEEE international conference on data mining, IEEE, Brussels, Belgium, December 2012, pp 984–989
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 22nd ACM international conference on information and knowledge management, ACM, Maui, HI, October 2012, pp 55–64
Liu Y, Cheng C, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinform 14(230)
Liu Y, Liao W, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining, Hanoi, Vietnam, May 2005, Lecture Notes in Artificial Intelligence, vol 3518. Springer, Berlin, pp 689–695
Lu T, Liu Y, Wang L (2014) An algorithm of top-k high utility itemsets mining over data stream. J Softw 9(9):2342–2347
Article Google Scholar
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Article Google Scholar
Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl Based Syst 76:109–126
Article Google Scholar
Rymon R (1992) Search through systematic set enumeration. In: Proceedings of the third international conference on principles of knowledge representation and reasoning, Morgan Kaufmann, Cambridge, MA, October 1992, pp 539–50
Sahoo J, Das AK, Goswami A (2015) An efficient approach for mining association rules from high utility itemsets. Expert Syst Appl 42(13):5754–5778
Article Google Scholar
Song W, Liu Y, Li J (2014) BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Proc Int J Data Wareh Min 10(1):1–15
Article Google Scholar
Thilagu M, Nadarajan R (2012) Efficiently mining of effective web traversal patterns with average utility. In: Proceedings of the international conference on communication, computing, and security. CRC Press, Gurgaon, India, September 2016, pp 444–451
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Article Google Scholar
Tseng VS, Wu CW, Fournier-Viger P, Yu P (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67
Article Google Scholar
Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the ICDM’04 Workshop on Frequent Itemset Mining Implementations, CEUR, Brighton, UK, November 2014
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 3rd SIAM international conference on data mining, SIAM, Lake Buena Vista, FL, USA, April 2004, pp 482–486
Yin J, Zheng Z, Cao L, Song Y, Wei, W (2012) An efficient algorithm for mining high utility sequential patterns.In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, Beijing, China, August 2012, pp 660–668
Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: Proceedings of the 13th international conference on data mining, IEEE, Dallas, TX, USA, December 2013, pp 1259–1264
Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878
Article Google Scholar
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Article Google Scholar
Zida S, Fournier-Viger P, Wu CW, Lin JCW, Tseng VS (2015) Efficient mining of high utility sequential rules. In: Proceedings of the 11th international conference on machine learning and data mining, Hamburg, Germany, July 2015, Lecture Notes in Artificial Intelligence vol 9166. Springer, Berlin, pp 1–15
Zida S, Fournier-Viger P, Lin JCW, Wu CW, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In: Proceedings of the 14th Mexican international conference on artificial intelligence, Cuernavaca, Mexico, October 2015. Lecture Notes in Artificial Intelligence, vol 9413. Springer, Berlin, pp 530–546

Download references

Acknowledgments

This project was supported by a NSERC Discovery grant from the Government of Canada and an initiating fund provided to the second author by Harbin Institute of Technology (Shenzhen).

Author information

Authors and Affiliations

Department of Computer Science, University of Moncton, Moncton, NB, Canada
Souleymane Zida
School of Natural Sciences and Humanities, Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, GD, China
Philippe Fournier-Viger & Vincent S. Tseng
School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, GD, China
Jerry Chun-Wei Lin
Department of Computer Science, National Chiao Tung University, Hsinchu City, Taiwan
Cheng-Wei Wu

Authors

Souleymane Zida
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Wei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Vincent S. Tseng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Fournier-Viger.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zida, S., Fournier-Viger, P., Lin, J.CW. et al. EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51, 595–625 (2017). https://doi.org/10.1007/s10115-016-0986-0

Download citation

Received: 28 January 2016
Revised: 23 July 2016
Accepted: 25 August 2016
Published: 02 September 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s10115-016-0986-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EFIM: a fast and memory efficient algorithm for high-utility itemset mining

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

EFIM: a fast and memory efficient algorithm for high-utility itemset mining

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Privacy-preserving data (stream) mining techniques and their impact on data mining accuracy: a systematic literature review

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation