Abstract
Considerable research has been dedicated to the area of data mining. Among the techniques used for this purpose, frequent pattern mining focuses on revealing patterns that occur frequently in the data. The field of data mining has expanded to include the extraction of patterns that can be beneficial to organizations, known as high-utility pattern mining. To improve the quality of these patterns, occupancy has been introduced as an interestingness measure in recent research. As a result, a new approach known as high-utility occupancy pattern mining (HUOPM) has been developed to identify patterns that make a significant contribution to their supporting transactions, as determined by a utility occupancy metric. A high-utility occupancy pattern is defined as a pattern whose utility occupancy exceeds a threshold value specified by the user. In the realm of transactional databases, the conventional definition of utility occupancy for an itemset involves adding up the utility occupancies for each of its supporting transactions. However, a key drawback of the traditional approach to HUOPM is that it does not account for the length of an itemset. Consequently, some items within an itemset may have higher utility occupancies than others, and the resulting representation may not accurately reflect the relative importance of each item within the itemset. To overcome this limitation, a novel high average utility occupancy pattern mining (HAUOPM) algorithm has been proposed. This algorithm uses a novel measure called average utility occupancy to mine high average utility occupancy patterns. As HAUOPM is not anti-monotonic, a novel average utility occupancy upper bound measure is used to prune unpromising itemsets. It can be observed that when two itemsets are compared, utility of one itemset might be greater than the other, whereas its average utility occupancy is less than the other itemset. Therefore, it is possible that some high-quality patterns are not discovered. To overcome this limitation, we introduce a novel measure called transaction utility occupancy to discover high-quality patterns. A novel average utility occupancy tree structure is used to enumerate the search space. This research uses a modified average utility occupancy list structure to store information about items/itemsets. Extensive experimentation has been performed on real and synthetic datasets to the test the effectiveness of the proposed HAUOPM algorithm.
Similar content being viewed by others
References
Chen, M.-S.; Han, J.; Yu, P.S.: Data mining: an overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)
Gan, W.; Lin, J.C.-W.; Fournier-Viger, P.; Chao, H.-C.; Zhan, J.: Mining of frequent patterns with multiple minimum supports. Eng. Appl. Artif. Intell. 60, 83–96 (2017)
Wang, Z.; Zhu, Y.; Wang, D.; Han, Z.: Fedfpm: a unified federated analytics framework for collaborative frequent pattern mining. In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pp. 61–70. IEEE (2022)
Liu, M.; Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64 (2012)
Fournier-Viger, P.: Fhn: efficient mining of high-utility itemsets with negative unit profits. In: International Conference on Advanced Data Mining and Applications, pp. 16–29. Springer (2014)
Han, M.; Gao, Z.; Li, A.; Liu, S.; Mu, D.: An overview of high utility itemsets mining methods based on intelligent optimization algorithms. Knowl. Inf. Syst. 1–40 (2022)
Fang, W.; Zhang, Q.; Lu, H.; Lin, J.C.-W.: High-utility itemsets mining based on binary particle swarm optimization with multiple adjustment strategies. Appl. Soft Comput. 109073 (2022)
Wu, P.; Niu, X.; Fournier-Viger, P.; Huang, C.; Wang, B.: Ubp-miner: an efficient bit based high utility itemset mining algorithm. Knowl. Based Syst. 248, 108865 (2022)
Sethi, K.K.; Ramesh, D.; Trivedi, M.C.: A spark-based high utility itemset mining with multiple external utilities. Clust. Comput. 25(2), 889–909 (2022)
Gan, W.; Lin, J.C.-W.; Chao, H.-C.; Fournier-Viger, P.; Wang, X.; Yu, P.S.: Utility-driven mining of trend information for intelligent system. arXiv:1912.11666 (2019)
Tang, L.; Zhang, L.; Luo, P.; Wang, M.: Incorporating occupancy into frequent pattern mining for high quality pattern recommendation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 75–84 (2012)
Shen, B.; Wen, Z.; Zhao, Y.; Zhou, D.; Zheng, W.: Ocean: Fast discovery of high utility occupancy itemsets. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 354–365. Springer (2016)
Lin, J.C.-W., et al.: An efficient algorithm to mine high average-utility itemsets. Adv. Eng. Inform. 30(2), 233–243 (2016)
Gan, W.; Lin, J.C.-W.; Fournier-Viger, P.; Chao, H.-C.; Philip, S.Y.: Huopm: high-utility occupancy pattern mining. IEEE Trans. Cybern. 50(3), 1195–1208 (2019)
Agrawal, R.; Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Han, J.; Pei, J.; Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29(2), 1–12 (2000)
Tseng, V.S.; Shie, B.-E.; Wu, C.-W.; Philip, S.Y.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2012)
Zhang, L.; Luo, P.; Tang, L.; Chen, E.; Liu, Q.; Wang, M.; Xiong, H.: Occupancy-based frequent pattern mining. ACM Trans. Knowl. Discov. Data TKDD 10(2), 1–33 (2015)
Chen, C.-M.; Chen, L.; Gan, W.; Qiu, L.; Ding, W.: Discovering high utility-occupancy patterns from uncertain data. Inf. Sci. 546, 1208–1229 (2021)
Ryu, T.; Yun, U.; Lee, C.; Lin, J.C.-W.; Pedrycz, W.: Occupancy-based utility pattern mining in dynamic environments of intelligent systems. Int. J. Intell. Syst. 37(9), 5477–5507 (2022)
Kim, H.; Ryu, T.; Lee, C.; Kim, H.; Truong, T.; Fournier-Viger, P.; Pedrycz, W.; Yun, U.: Mining high occupancy patterns to analyze incremental data in intelligent systems. ISA Trans. 131, 460–475 (2022)
Nguyen, L.T.; Mai, T.; Pham, G.-H.; Yun, U.; Vo, B.: An efficient method for mining high occupancy itemsets based on equivalence class and early pruning. Knowl. Based Syst. 267, 110441 (2023)
Datta, S.; Mali, K.; Ghosh, U.: High occupancy itemset mining with consideration of transaction occupancy. Arab. J. Sci. Eng. 47(2), 2061–2075 (2022)
Kumar, M.J.K., Rana, D.: High average utility itemset mining: a survey. In: ICCIDE, pp. 347 (2020)
Yao, H.; Hamilton, H.J.; Butz, C.J.: A foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 482–486. SIAM (2004)
Rymon, R.: Search through systematic set enumeration (1992)
Fournier-Viger, P.; Gomariz, A.; Gueniche, T.; Soltani, A.; Wu, C.-W.; Tseng, V.S.: Spmf: a java open-source pattern mining library. J. Mach. Learn. Res. 15(1), 3389–3393 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kumar, M.J.K., Rana, D. HAUOPM: High Average Utility Occupancy Pattern Mining. Arab J Sci Eng 49, 3397–3416 (2024). https://doi.org/10.1007/s13369-023-07971-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-023-07971-x