Advertisement

Mining Compact High Utility Itemsets Without Candidate Generation

  • Cheng-Wei WuEmail author
  • Philippe Fournier-Viger
  • Jia-Yuan Gu
  • Vincent S. Tseng
Chapter
Part of the Studies in Big Data book series (SBD, volume 51)

Abstract

Though the research topic of high utility itemset (HUI) mining has received extensive attention in recent years, current algorithms suffer from the crucial problem that too many HUIs tend to be produced. This seriously degrades the performance of HUI mining in terms of execution and memory efficiency. Moreover, it is very hard for users to discover meaningful information in a huge number of HUIs. In this paper, we address this issue by proposing a promising framework with a novel algorithm named CHUI (Compact High Utility Itemset)-Mine to discover closed\(^{+}\) HUIs and maximal HUIs, which are compact representations of HUIs. The main merits of CHUI-Mine lie in two aspects: First, in terms of efficiency, unlike existing algorithms that tend to produce a large amount of candidates during the mining process, CHUI-Mine computes the utility of itemsets directly without generating candidates. Second, in terms of losslessness, unlike current algorithms that provide incomplete results, CHUI-Mine can discover the complete closed\(^{+}\) or maximal HUIs with no miss. A comprehensive investigation is also presented to compare the relative advantages of different compact representations in terms of computational cost and compactness. To our best knowledge, this is the first work addressing the issue of mining compact high utility itemsets in terms of closed\(^{+}\) and maximal HUIs without candidate generation. Experimental results show that CHUI-Mine achieves a massive reduction in the number of HUIs and is several orders of magnitude faster than benchmark algorithms.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)Google Scholar
  2. 2.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)CrossRefGoogle Scholar
  3. 3.
    Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: fast and space-preserving frequent pattern mining in large databases. IIE Trans. 39(6), 593–605 (2007)CrossRefGoogle Scholar
  4. 4.
    Ahmed, C.F., Tanbeer, S.K., Jeong, B.-S., Lee, Y.-K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)CrossRefGoogle Scholar
  5. 5.
    Chan, R., Yang, Q., Shen, Y.: Mining high utility itemsets. In: Proceedings of IEEE International Conference on Data Mining, pp. 19–26 (2003)Google Scholar
  6. 6.
    Gan, W., Lin, J.C.W., Fournier-Viger, P., Chao, H.C., Tseng, V.S., Yu, P.: A survey of utility-oriented pattern mining (2018). arxiv:1805.10511
  7. 7.
    Li, H.F., Huang, H.Y., Chen, Y.C., Liu, Y.J., Lee, S.Y.: Fast and memory efficient mining of high utility itemsets in data streams. In: Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 881–886 (2008)Google Scholar
  8. 8.
    Liu, Y., Liao, W., Choudhary, A.: A fast high utility itemsets mining algorithm. In: Proceedings of the Utility-Based Data Mining Workshop, pp. 90–99 (2005)Google Scholar
  9. 9.
    Tseng, V.S., Shie, B.E., Wu, C.W., Yu, P.S.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2013)CrossRefGoogle Scholar
  10. 10.
    Tseng, V.S., Wu, C.W., Fournier-Viger, P., Yu, P.S.: Efficient algorithms for mining top-k high utility itemsets. IEEE Trans. Knowl. Data Eng. 28(1), 54–67 (2016)CrossRefGoogle Scholar
  11. 11.
    Tseng, V.S., Wu, C.W., Shie, B.E., Yu, P.S.: UP-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of International Conference on ACM SIGKDD, pp. 253–262 (2010)Google Scholar
  12. 12.
    Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.W., Tseng, V.S.: SPMF: a Java open-source pattern mining library. J. Mach. Learn. Res. 15, 3389–3393 (2014)Google Scholar
  13. 13.
    Tseng, V.S., Wu, C.W., Lin, J.H., Fournier-Viger, P.: UP-miner: a utility pattern mining toolbox. In: Proceedings of IEEE International Conference on Data Mining, pp. 1656–1659 (2015)Google Scholar
  14. 14.
    Li, Y.C., Yeh, J.S., Chang, C.C.: Isolated items discarding strategy for discovering high utility itemsets. Data Knowl. Eng. 64(1), 198–217 (2008)CrossRefGoogle Scholar
  15. 15.
    Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of ACM International Conference on Information and knowledge Management, pp. 55–64 (2012)Google Scholar
  16. 16.
    Shie, B.E., Tseng, V.S., Yu, P.S.: Online mining of temporal maximal utility itemsets from data streams. In: Proceedings of Annual ACM Symposium on Applied Computing, pp. 1622–1626 (2010)Google Scholar
  17. 17.
    Wu, C.W., Fournier-Viger, P., Gu, J.Y., Tseng, V.S.: Mining closed+ high utility itemsets without candidate generation. In: Proceedings of Conference on Technologies and Applications of Artificial Intelligence, pp. 187–194 (2015)Google Scholar
  18. 18.
    Boulicaut, J.-F., Bykowski, A., Rigotti, C.: Free-sets: a condensed representation of Boolean data for the approximation of frequency queries. Data Min. Knowl. Discov. 7(1), 5–22 (2003)Google Scholar
  19. 19.
    Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery, pp. 74–85 (2002)CrossRefGoogle Scholar
  20. 20.
    Gouda, K., Zaki, M.J.: GenMax: an efficient algorithm for mining maximal frequent itemsets. Data Min. Knowl. Discov. 11(3), 223–242 (2005)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lucchese, C., Orlando, S., Perego, R.: Fast and memory efficient mining of frequent closed itemsets. IEEE Trans. Knowl. Data Eng. 18(1), 21–36 (2006)CrossRefGoogle Scholar
  22. 22.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattice. J. Inf. Syst. 24(1), 25–46 (1999)CrossRefGoogle Scholar
  23. 23.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Proceedings of International Conference on Database Theory, pp. 398–416 (1999)Google Scholar
  24. 24.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Generating a condensed representation for association rules. J. Intell. Inf. Syst. 24(1), 29–60 (2005)CrossRefGoogle Scholar
  25. 25.
    Wang, J., Han, J., Pei, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: Proceedings of International Conference on ACM SIGKDD, pp. 236–245 (2003)Google Scholar
  26. 26.
    Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)CrossRefGoogle Scholar
  27. 27.
    Tseng, V.S., Wu, C.W., Fournier-Viger, P., Yu, P.S.: Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans. Knowl. Data Eng. 27(3), 726–739 (2015)CrossRefGoogle Scholar
  28. 28.
    Wu, C.W., Fournier-Viger, P., Yu, P.S., Tseng, V.S.: Efficient mining of a concise and lossless representation of high utility itemsets. In: Proceedings of IEEE International Conference on Data Mining, pp. 824–833 (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Cheng-Wei Wu
    • 1
    Email author
  • Philippe Fournier-Viger
    • 2
  • Jia-Yuan Gu
    • 3
  • Vincent S. Tseng
    • 4
  1. 1.National Ilan UniversityIlanTaiwan
  2. 2.Harbin Institute of Technology (Shenzhen)ShenzhenChina
  3. 3.Department of Computer Science and Information EngineeringNational Cheng Kung UniversityTainanTaiwan
  4. 4.National Chiao Tung UniversityHsinchuTaiwan

Personalised recommendations