Abstract
Mining high utility itemset (HUIM) from an extensive database is a crucial descriptive task in data mining, which considers both the quantity and unit profit factor in revealing the ultimately profitable items. However, it may discover a vast number of HUIs which can be challenging to interpret by a user and also reduce the efficiency of the mining process. A solution to this problem is to mine a Closed high utility itemset, a more compact and lossless form of HUIs. In this paper, a fast nature-inspired meta-heuristic approach CHUI-AC (Closed high utility itemset mining using ant colony algorithm) has been introduced to mine CHUIs. This is the first work on mining CHUI using a nature-inspired ant colony algorithm. CHUI-AC maps the feasible solution space to a directed graph with quadratic space complexity to guide the searching efficiently. Several experiments on real-world datasets show that the proposed algorithm outrun the state-of-the-art algorithms in terms of execution time and rate of convergence. Moreover, the scalability experiments demonstrate that CHUI-AC is linearly scalable with respect to the number of transaction and number of items.
Similar content being viewed by others
Notes
The code can be found at https://github.com/chuim-ac/CHUI-AC
The codes for other compared models can be found at https://github.com/chuim-ac/SPMF
The dataset can be found at https://www.kaggle.com/irfanasrullah/groceries
This dataset can be found at https://www.kaggle.com/sulmansarwar/transactions-from-a-bakery
References
Agrawal R, Imieliński T., Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp 207–216
Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969
Ahmed CF, Tanbeer SK, Jeong BS, Choi HJ (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11979–11991
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2011) Huc-prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198
Borgelt C (2005) Keeping things simple: finding frequent item sets by recursive elimination. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, pp 66–70
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: Proceedings of the Third IEEE International Conference on Data Mining, ICDM ’03, IEEE Computer Society, USA
Chen D, Sain SL, Guo K (2012) Data mining for the online retail industry: a case study of rfm model-based customer segmentation using data mining. J Database Market Custom Strategy Manag 19(3):197–208
Dam TL, Li K, Fournier-Viger P, Duong QH (2019) Cls-miner: efficient and effective closed high-utility itemset mining. Front Comput Sci 13(2):357–381
Dawar S, Goyal V, Bera D (2017) A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl Intell 47(3):809–827
Deng Z (2018) An efficient structure for fast mining high utility itemsets. Appl Intell 48 (9):3161–3177
Dorigo M, Gambardella LM (1997) Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Trans Evol Comput 1(1):53–66
Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B (Cybern) 26(1):29–41
Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (1996) Advances in knowledge discovery and data mining. American Association for Artificial Intelligence
Fournier-Viger P, Lin JCW, Gomariz A, Gueniche T, Soltani A, Deng Z, Lam HT (2016) The spmf open-source data mining library version 2. In: Joint european conference on machine learning and knowledge discovery in databases. Springer, pp 36–40
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) Fhm: Faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International symposium on methodologies for intelligent systems. Springer, pp 83–92
Fournier-Viger P, Zhang Y, Lin JCW, Fujita H, Koh YS (2018) Mining local high utility itemsets. In: International conference on database and expert systems applications. Springer, pp 450–460
Fournier-Viger P, Zhang Y, Lin JCW, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inf Sci 481:344–367
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Fujita H (2018) Extracting non-redundant correlated purchase behaviors by utility measure. Knowl-Based Syst 143:30–41
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Hong TP, Fujita H (2018) A survey of incremental high-utility itemset mining. Wiley Interdiscip Rev Data Min Knowl Discov 8(2):e1242
Goethals B (2003) Frequent itemset mining dataset repository. Frequent Itemset Mining Implementations (FIMI’03)
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Holland J (1975) Adaptation in natural and artificial systems: an introductory analysis with application to biology Control and artificial intelligence
Kannimuthu S, Premalatha K (2014) Discovery of high utility itemsets using genetic algorithm with ranked mutation. Appl Artif Intell 28(4):337–359
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks. IEEE, vol 4, pp 1942–1948
Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381
Lan GC, Hong TP, Tseng VS (2014) An efficient projection-based indexing approach for mining high utility itemsets. Knowl Inf Syst 38(1):85–107
Li YC, Yeh JS, Chang CC (2005) Direct candidates generation: a novel algorithm for discovering complete share-frequent itemsets. In: International conference on fuzzy systems and knowledge discovery. Springer, pp 551–560
Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Lin JCW, Djenouri Y, Srivastava G, Yun U, Fournier-Viger P (2021) A predictive ga-based model for closed high-utility itemset mining. Appl Soft Comput 108:107422
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Fast algorithms for mining high-utility itemsets with various discount strategies. Adv Eng Inform 30(2):109–126
Lin JCW, Gan W, Hong TP (2016) Maintaining the discovered high-utility itemsets with transaction modification. Appl Intell 44(1):166–178
Lin JCW, Yang L, Fournier-Viger P, Hong TP, Voznak M (2017) A binary pso approach to mine high-utility itemsets. Soft Comput 21(17):5103–5121
Lin JCW, Yang L, Fournier-Viger P, Wu JMT, Hong TP, Wang LSL, Zhan J (2016) Mining high-utility itemsets based on particle swarm optimization. Eng Appl Artif Intell 55:320–330
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 55–64
Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Pacific-asia conference on knowledge discovery and data mining. Springer, pp 689–695
Liu Y, Cheng CP, Tseng VS (2013) Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC bioinformatics 14(1):230
NAWAZ MS, Fournier-Viger P, YUN U, WU Y, Song W (2021) Mining high utility itemsets with hill climbing and simulated annealing
Nguyen LT, Vu VV, Lam MT, Duong TT, Manh LT, Nguyen TT, Vo B, Fujita H (2019) An efficient method for mining high utility closed itemsets. Inf Sci 495:78–99
Osaba E, Yang XS, Diaz F, Lopez-Garcia P, Carballedo R (2016) An improved discrete bat algorithm for symmetric and asymmetric traveling salesman problems. Eng Appl Artif Intell 48:59–71
Pisharath J, Liu Y, Ozisikyilmaz B, Narayanan R, Liao W, Choudhary A, Memik G (2005) Nu-minebench version 2.0 dataset and technical report. http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html (last access on 2 March 2015)
Ryang H, Yun U (2017) Indexed list-based high utility pattern mining with utility upper-bound reduction and pattern combination techniques. Knowl Inf Syst 51(2):627–659
Sahoo J, Das AK, Goswami A (2016) An efficient fast algorithm for discovering closed+ high utility itemsets. Appl Intell 45(1):44–74
Shen YD, Zhang Z, Yang Q (2002) Objective-oriented utility-based association mining. In: 2002 IEEE International conference on data mining, 2002. Proceedings. IEEE, pp 426–433
Shie BE, Hsiao HF, Tseng VS (2013) Efficient algorithms for discovering high utility user behavior patterns in mobile commerce environments. Knowl Inf Syst 37(2):363–387
Shie BE, Hsiao HF, Tseng VS, Philip SY (2011) Mining high utility mobile sequential patterns in mobile commerce environments. In: International conference on database systems for advanced applications. Springer, pp 224–238
Song W, Huang C (2018) Mining high utility itemsets using bio-inspired algorithms: a diverse optimal value framework. IEEE Access 6:19568–19582
Song W, Liu Y, Li J (2014) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell 40(1):29–43
Song W, Nan J (2020) Mining high utility itemsets using ant colony optimization. In: The international conference on natural computation, fuzzy systems and knowledge discovery. Springer, pp 98–107
Tseng VS, Shie BE, Wu CW, Philip SY (2012) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Tseng VS, Wu CW, Fournier-Viger P, Philip SY (2014) Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Trans Knowl Data Eng 27(3):726–739
Tseng VS, Wu CW, Shie BE, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 253–262
Wu CW, Fournier-Viger P, Gu JY, Tseng VS (2015) Mining closed+ high utility itemsets without candidate generation. In: 2015 Conference on technologies and applications of artificial intelligence (TAAI). IEEE, pp 187–194
Wu JMT, Zhan J, Lin JCW (2017) An aco-based approach to mine high-utility itemsets. Knowl-Based Syst 116:102–113
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, pp 482–486
Yao H, Hamilton HJ, Geng L (2006) A unified framework for utility-based measures for mining itemsets. In: Proceedings of ACM SIGKDD 2nd workshop on utility-based data mining. Citeseer, pp 28–37
Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138–161
Zihayat M, Davoudi H, An A (2017) Mining significant high utility gene regulation sequential patterns. BMC Syst Biol 11(6):109
Acknowledgements
This study has been supported by Indian Institute of Technology, Kharagpur, India.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pramanik, S., Goswami, A. Discovery of closed high utility itemsets using a fast nature-inspired ant colony algorithm. Appl Intell 52, 8839–8855 (2022). https://doi.org/10.1007/s10489-021-02922-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02922-1