Mexican International Conference on Artificial Intelligence

Advances in Artificial Intelligence and Soft Computing pp 530-546 | Cite as

EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining

  • Souleymane Zida
  • Philippe Fournier-Viger
  • Jerry Chun-Wei Lin
  • Cheng-Wei Wu
  • Vincent S. Tseng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9413)

Abstract

High-utility itemset mining (HUIM) is an important data mining task with wide applications. In this paper, we propose a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discovers high-utility itemsets both in terms of execution time and memory. EFIM relies on two upper-bounds named sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper-bounds in linear time and space. Moreover, to reduce the cost of database scans, EFIM proposes efficient database projection and transaction merging techniques. An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster and consumes up to eight times less memory than the state-of-art algorithms d\(^2\)HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+.

Keywords

High-utility mining Itemset mining Pattern mining 

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the International Conference on Very Large Databases, pp. 487–499 (1994)Google Scholar
  2. 2.
    Fournier-Viger, P., Wu, C.-W., Zida, S., Tseng, V.S.: FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Andreasen, T., Christiansen, H., Cubero, J.-C., Ras, Z.W. (eds.) ISMIS 2014. LNCS, vol. 8502, pp. 83–92. Springer, Heidelberg (2014) Google Scholar
  3. 3.
    Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.-W., Tseng, V.S.: SPMF: a java open-source pattern mining library. J. Mach. Learn. Res. 15, 3389–3393 (2014)Google Scholar
  4. 4.
    Fournier-Viger, P., Zida, S.: Foshu: faster on-shelf high utility itemset mining with or without negative unit profit. In: Proc. 30th ACM Symposium on Applied Computing, pp. 857–864 (2015)Google Scholar
  5. 5.
    Fournier-Viger, P., Wu, C.-W., Tseng, V.S.: Novel concise representations of high utility itemsets using generator patterns. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS, vol. 8933, pp. 30–43. Springer, Heidelberg (2014) Google Scholar
  6. 6.
    Lan, G.C., Hong, T.P., Tseng, V.S.: An efficient projection-based indexing approach for mining high utility itemsets. Knowl. Inform. Syst. 38(1), 85–107 (2014)CrossRefGoogle Scholar
  7. 7.
    Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of 22nd ACM International Conference on Information on Knowledge and Management, pp. 55–64 (2012)Google Scholar
  8. 8.
    Krishnamoorthy, S.: Pruning strategies for mining high utility itemsets. Expert Syst. Appl. 42(5), 2371–2381 (2015)CrossRefGoogle Scholar
  9. 9.
    Liu, Y., Liao, W., Choudhary, A.K.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  10. 10.
    Liu, J., Wang, K., Fung, B.: Direct discovery of high utility itemsets without candidate generation. In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), pp. 984–989 (2012)Google Scholar
  11. 11.
    Song, W., Liu, Y., Li, J.: BAHUI: fast and memory efficient mining of high utility itemsets based on bitmap. Int. J. Data Warehous. Min. 10(1), 1–15 (2014)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Tseng, V.S., Shie, B.-E., Wu, C.-W., Yu, P.S.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2013)CrossRefGoogle Scholar
  13. 13.
    Tseng, V., Wu, C., Fournier-Viger, P., Yu, P.: Efficient algorithms for mining the concise and lossless representation of closed+ high utility itemsets. IEEE Trans. Knowl. Data Eng. 27(3), 726–739 (2015)CrossRefGoogle Scholar
  14. 14.
    Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the ICDM 2004 Workshop on Frequent Itemset Mining Implementations. CEUR (2004)Google Scholar
  15. 15.
    Zida, S., Fournier-Viger, P., Wu, C.-W., Lin, J.C.-W., Tseng, V.S.: Efficient mining of high-utility sequential rules. In: Perner, P. (ed.) MLDM 2015. LNCS, vol. 9166, pp. 157–171. Springer, Heidelberg (2015) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Souleymane Zida
    • 1
  • Philippe Fournier-Viger
    • 1
  • Jerry Chun-Wei Lin
    • 2
  • Cheng-Wei Wu
    • 3
  • Vincent S. Tseng
    • 3
  1. 1.Department of Computer ScienceUniversity of MonctonMonctonCanada
  2. 2.School of Computer Science and TechnologyHarbin Institute of Technology Shenzhen Graduate SchoolShenzhenChina
  3. 3.Department of Computer ScienceNational Chiao Tung UniversityHsinchuTaiwan

Personalised recommendations