Advertisement

A Comparative Study of Top-K High Utility Itemset Mining Methods

  • Srikumar KrishnamoorthyEmail author
Chapter
Part of the Studies in Big Data book series (SBD, volume 51)

Abstract

High Utility Itemset (HUI) mining problem is one of the important problems in the data mining literature. The problem offers greater flexibility to a decision maker to incorporate her/his notion of utility into the pattern mining process. The problem, however, requires the decision maker to choose a minimum utility threshold value for discovering interesting patterns. This is quite challenging due to the disparate itemset characteristics and their utility distributions. In order to address this issue, Top-K High Utility Itemset (THUI) mining problem was introduced in the literature. THUI mining problem is primarily a variant of the HUI mining problem that allows a decision maker to specify the desired number of HUIs rather than the minimum utility threshold value. Several algorithms have been introduced in the literature to efficiently mine top-k HUIs. This paper systematically analyses the top-k HUI mining methods in the literature, describes the methods, and performs a comparative analysis. The data structures, threshold raising strategies, and pruning strategies adopted for efficient top-k HUI mining are also presented and analysed. Furthermore, the paper reviews several extensions of the top-k HUI mining problem such as data stream mining, sequential pattern mining and on-shelf utility mining. The paper is likely to be useful for researchers to examine the key methods in top-k HUI mining, evaluate the gaps in literature, explore new research opportunities and enhance the state-of-the-art in high utility pattern mining.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Databases, VLDB, pp. 487–499 (1994)Google Scholar
  2. 2.
    Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)CrossRefGoogle Scholar
  3. 3.
    Cheung, Y.L., Fu, A.W.C.: Mining frequent itemsets without support threshold: with and without item constraints. IEEE Trans. Knowl. Data Eng. 16(9), 1052–1069 (2004)CrossRefGoogle Scholar
  4. 4.
    Dam, T.L., Li, K., Fournier-Viger, P., Duong, Q.H.: An efficient algorithm for mining top-k on-shelf high utility itemsets. Knowl. Inf. Syst. 52, 621–655 (2017)CrossRefGoogle Scholar
  5. 5.
    Dawar, S., Sharma, V., Goyal, V.: Mining top-k high-utility itemsets from a data stream under sliding window model. Appl. Intell. 47(4), 1–16 (2017)CrossRefGoogle Scholar
  6. 6.
    Djenouri, Y., Belhadi, A., Fournier-Viger, P.: Extracting useful knowledge from event logs: a frequent itemset mining approach. Knowl.-Based Syst. 139, 132–148 (2017)CrossRefGoogle Scholar
  7. 7.
    Duong, Q.H., Liao, B., Fournier-Viger, P., Dam, T.L.: An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl.-Based Syst. 104, 106–122 (2016)CrossRefGoogle Scholar
  8. 8.
    Fournier-Viger, P., Gomariz, A., Soltani, A., Lam, H., Gueniche, T.: SPMF: open-source data mining platform. http://www.philippe-fournier-viger.com/spmf (2014)
  9. 9.
    Fournier-Viger, P., Wu, C.W., Zida, S., Tseng, V.S.: FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International Symposium on Methodologies for Intelligent Systems, pp. 83–92 (2014)Google Scholar
  10. 10.
    Gan, W., Lin, J.C.W., Fournier-Viger, P., Chao, H.C., Tseng, V.S.: Mining high-utility itemsets with both positive and negative unit profits from uncertain databases. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 434–446 (2017)CrossRefGoogle Scholar
  11. 11.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record, vol. 29, no. 2, pp. 1–12 (2000)CrossRefGoogle Scholar
  12. 12.
    Krishnamoorthy, S.: Pruning strategies for mining high utility itemsets. Expert. Syst. Appl. 42(5), 2371–2381 (2015)CrossRefGoogle Scholar
  13. 13.
    Krishnamoorthy, S.: Hminer: efficiently mining high utility itemsets. Expert. Syst. Appl. 90(C), 168–183 (2017)CrossRefGoogle Scholar
  14. 14.
    Lee, W., Stolfo, S.J., Mok, K.W.: Adaptive intrusion detection: a data mining approach. Artif. Intell. Rev. 14(6), 533–567 (2000)CrossRefGoogle Scholar
  15. 15.
    Lin, J.C.W., Fournier-Viger, P., Gan, W.: FHN: an efficient algorithm for mining high-utility itemsets with negative unit profits. Knowl.-Based Syst. 111, 283–298 (2016)CrossRefGoogle Scholar
  16. 16.
    Lin, J.C.W., Li, T., Fournier-Viger, P., Hong, T.P., Su, J.H.: Efficient mining of high average-utility itemsets with multiple minimum thresholds. In: Industrial Conference on Data Mining, pp. 14–28. Springer (2016)Google Scholar
  17. 17.
    Lin, J.C.W., Li, T., Fournier-Viger, P., Hong, T.P., Zhan, J., Voznak, M.: An efficient algorithm to mine high average-utility itemsets. Adv. Eng. Inform. 30(2), 233–243 (2016)CrossRefGoogle Scholar
  18. 18.
    Lin, J.C.W., Gan, W., Fournier-Viger, P., Hong, T.P., Tseng, V.S.: Efficiently mining uncertain high-utility itemsets. Soft Comput. 21(11), 2801–2820 (2017)CrossRefGoogle Scholar
  19. 19.
    Lin, W., Alvarez, S.A., Ruiz, C.: Efficient adaptive-support association rule mining for recommender systems. Data Min. Knowl. Discov. 6(1), 83–105 (2002)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64 (2012)Google Scholar
  21. 21.
    Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions. In: Proceedings of the 14th International Conference on World Wide Web, pp. 342–351. ACM (2005)Google Scholar
  22. 22.
    Liu, Y., Liao, W.K., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T., Cheung, D., Liu, H. (eds.) Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, vol. 3518, pp. 689–695. Springer (2005)Google Scholar
  23. 23.
    Livshits, B., Zimmermann, T.: Dynamine: finding common error patterns by mining software revision histories. In: ACM SIGSOFT Software Engineering Notes, vol. 30, pp. 296–305. ACM (2005)Google Scholar
  24. 24.
    Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Effective personalization based on association rule discovery from web usage data. In: Proceedings of the 3rd International Workshop on Web Information and Data Management, pp. 9–15. ACM (2001)Google Scholar
  25. 25.
    Pisharath, J., Liu, Y., Liao, W.K., Choudhary, A., Memik, G., Parhi, J.: NU-MineBench 2.0. Technical Report, Department of Electrical and Computer Engineering. Northwestern University (2005)Google Scholar
  26. 26.
    Quang, T.M., Oyanagi, S., Yamazaki, K.: Exminer: an efficient algorithm for mining top-k frequent patterns. In: International Conference on Advanced Data Mining and Applications, pp. 436–447. Springer (2006)Google Scholar
  27. 27.
    Ryang, H., Yun, U.: Top-k high utility pattern mining with effective threshold raising strategies. Knowl.-Based Syst. 76, 109–126 (2015)CrossRefGoogle Scholar
  28. 28.
    Salam, A., Khayal, M.S.H.: Mining top-k frequent patterns without minimum support threshold. Knowl. Inf. Syst. 30(1), 57–86 (2012)CrossRefGoogle Scholar
  29. 29.
    Tseng, V.S., Wu, C.W., Shie, B.E., Yu, P.S.: UP-Growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 253–262 (2010)Google Scholar
  30. 30.
    Tseng, V.S., Shie, B.E., Wu, C.W., Yu, P.S.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2013)CrossRefGoogle Scholar
  31. 31.
    Tseng, V.S., Wu, C.W., Fournier-Viger, P., Philip, S.Y.: Efficient algorithms for mining top-k high utility itemsets. IEEE Trans. Knowl. Data Eng. 28(1), 54–67 (2016)CrossRefGoogle Scholar
  32. 32.
    Wu, C.W., Shie, B.E., Tseng, V.S., Yu, P.S.: Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 78–86. ACM (2012)Google Scholar
  33. 33.
    Yin, J., Zheng, Z., Cao, L., Song, Y., Wei, W.: Efficiently mining top-k high utility sequential patterns. In: IEEE 13th International Conference on Data Mining ICDM, pp. 1259–1264. IEEE (2013)Google Scholar
  34. 34.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)CrossRefGoogle Scholar
  35. 35.
    Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: EFIM: a highly efficient algorithm for high-utility itemset mining. In: Mexican International Conference on Artificial Intelligence, pp. 530–546. Springer (2015)Google Scholar
  36. 36.
    Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 595–625 (2017)CrossRefGoogle Scholar
  37. 37.
    Zihayat, M., An, A.: Mining top-k high utility patterns over data streams. Inf. Sci. 285, 138–161 (2014)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Indian Institute of Management AhmedabadGujaratIndia

Personalised recommendations