EFIM-Closed: Fast and Memory Efficient Discovery of Closed High-Utility Itemsets

  • Philippe Fournier-Viger
  • Souleymane Zida
  • Jerry Chun-Wei Lin
  • Cheng-Wei Wu
  • Vincent S. Tseng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9729)

Abstract

Discovering high-utility temsets in transaction databases is a popular data mining task. A limitation of traditional algorithms is that a huge amount of high-utility itemsets may be presented to the user. To provide a concise and lossless representation of results to the user, the concept of closed high-utility itemsets was proposed. However, mining closed high-utility itemsets is computationally expensive. To address this issue, we present a novel algorithm for discovering closed high-utility itemsets, named EFIM-Closed. This algorithm includes novel pruning strategies named closure jumping, forward closure checking and backward closure checking to prune non-closed high-utility itemsets. Furthermore, it also introduces novel utility upper-bounds and a transaction merging mechanism. Experimental results shows that EFIM-Closed can be more than an order of magnitude faster and consumes more than an order of magnitude less memory than the previous state-of-art CHUD algorithm.

Keywords

Pattern mining High-utility itemset Closed itemset 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. Int. Conf. Very Large Databases, pp. 487–499 (1994)Google Scholar
  2. 2.
    Ahmed, C.F., Tanbeer, S.K., Jeong, B.-S., Lee, Y.-K.: Efficient tree structures for high-utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering 21(12), 1708–1721 (2009)CrossRefGoogle Scholar
  3. 3.
    Fournier-Viger, P., Wu, C.-W., Zida, S., Tseng, V.S.: FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Proc. 21st Intern. Symp. on Methodologies for Intell. Syst., pp. 83–92 (2014)Google Scholar
  4. 4.
    Fournier-Viger, P., Gomariz, A., Gueniche, T., Mwamikazi, E., Thomas, R.: TKS: Efficient mining of top-K sequential patterns. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013, Part I. LNCS, vol. 8346, pp. 109–120. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu., C., Tseng, V. S.: SPMF: a Java Open-Source Pattern Mining Library. Journal of Machine Learning Research (JMLR) 15, 3389–3393 (2014)Google Scholar
  6. 6.
    Lan, G.C., Hong, T.P., Tseng, V.S.: An efficient projection-based indexing approach for mining high utility itemsets. IEEE Transactions on Knowledge and Data Engineering 38(1), 85–107 (2014)Google Scholar
  7. 7.
    Song, W., Liu, Y., Li, J.: BAHUI: Fast and memory efficient mining of high utility itemsets based on bitmap. Intern. Journal of Data Warehousing and Mining 10(1), 1–15 (2014)Google Scholar
  8. 8.
    Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proc. 22nd ACM Intern. Conf. Info. and Know. Management, pp. 55–64 (2012)Google Scholar
  9. 9.
    Liu, Y., Liao, W., Choudhary, A.: A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Tseng, V.S., Shie, B.-E., Wu, C.-W.: Yu., P. S.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering 25(8), 1772–1786 (2013)CrossRefGoogle Scholar
  11. 11.
    Tseng, V., Wu, C., Fournier-Viger, P., Yu, P.: Efficient algorithms for mining the concise and lossless representation of closed+ high utility itemsets. IEEE Transactions on Knowledge and Data Engineering 27(3), 726–739 (2015)CrossRefGoogle Scholar
  12. 12.
    Uno, T., Kiyomi, M., Arimura, H.: LCM ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In: Proc. ICDM 2004 Workshop on Frequent Itemset Mining Implementations. CEUR (2004)Google Scholar
  13. 13.
    Wang, J., Han, J., Li, C.: Frequent closed sequence mining without candidate maintenance. IEEE Transactions on Knowledge and Data Engineering 19(8), 1042–1056 (2007)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Yun, U., Ryang, H., Ryu, K.H.: High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. IEEE Transactions on Knowledge and Data Engineering 41(8), 3861–3878 (2014)Google Scholar
  15. 15.
    Zida, S., Fournier-Viger, P., Wu, C.-W., Lin, J.C.-W., Tseng, V.S.: Efficient mining of high-utility sequential rules. In: Perner, P. (ed.) MLDM 2015. LNCS, vol. 9166, pp. 157–171. Springer, Heidelberg (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Philippe Fournier-Viger
    • 1
  • Souleymane Zida
    • 2
  • Jerry Chun-Wei Lin
    • 3
  • Cheng-Wei Wu
    • 4
  • Vincent S. Tseng
    • 4
  1. 1.School of Natural Sciences and HumanitiesHarbin Institute of Technology Shenzhen Graduate SchoolShenzhenChina
  2. 2.Department of Computer ScienceUniversity of MonctonMonctonCanada
  3. 3.School of Computer Science and TechnologyHarbin Institute of Technology Shenzhen Graduate SchoolShenzhenChina
  4. 4.Department of Computer ScienceNational Chiao Tung UniversityHsinchuPeople’s Republic of China

Personalised recommendations