TKEH: an efficient algorithm for mining top-k high utility itemsets
- 56 Downloads
High utility itemsets mining is a subfield of data mining with wide applications. Although the existing high utility itemsets mining algorithms can discover all the itemsets satisfying a given minimum utility threshold, it is often difficult for users to set a proper minimum utility threshold. A smaller minimum utility threshold value may produce a huge number of itemsets, whereas a higher one may produce a few itemsets. Specification of minimum utility threshold is difficult and time-consuming. To address these issues, top-k high utility itemsets mining has been defined where k is the number of high utility itemsets to be found. In this paper, we present an efficient algorithm (named TKEH) for finding top-k high utility itemsets. TKEH utilizes transaction merging and dataset projection techniques to reduce the dataset scanning cost. These techniques reduce the dataset when larger items are explored. TKEH employs three minimum utility threshold raising strategies. We utilize two strategies to prune search space efficiently. To calculate the utility of items and upper-bounds in linear time, TKEH utilizes array-based utility technique. We carried out some extensive experiments on real datasets. The results show that TKEH outperforms the state-of-the-art algorithms. Moreover, TKEH always performs better for dense datasets.
KeywordsHigh utility itemsets Utility mining Itemset mining Top-k itemset mining Threshold raising strategies
Compliance with Ethical Standards
The article uses threshold raising and memory reduction techniques to mine the top-k high utility itemsets mining.
Conflict of interests
The authors declare no conflicts of interest.
- 1.Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94. Morgan Kaufmann Publishers Inc, San Francisco, pp 487– 499Google Scholar
- 6.Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C-W, Tseng VS (2014a) Spmf: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393Google Scholar
- 7.Fournier-Viger P, Wu C-W, Zida S, Tseng VS (2014b) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. Springer International Publishing, Cham, pp 83–92Google Scholar
- 8.Fournier-Viger P, Zida S (2015) Foshu: faster on-shelf high utility itemset mining – with or without negative unit profit. In: Proceedings of the 30th annual ACM symposium on applied computing, SAC ’15. ACM, New York, pp 857–864Google Scholar
- 11.Lee S, Park JS (2016) Top-k high utility itemset mining based on utility-list structures. In: 2016 International conference on big data and smart computing (BigComp), pp 101–108Google Scholar
- 12.Li HF, Huang HY, Chen YC, Liu YJ, Lee SY (2008) Fast and memory efficient mining of high utility itemsets in data streams. In: 2008 Eighth IEEE International conference on data mining, pp 881–886Google Scholar
- 13.Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: 2012 IEEE 12th International conference on data mining. Brussels, pp 984–989Google Scholar
- 14.Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12. ACM, New York, pp 55–64Google Scholar
- 15.Liu Y, Liao W -k, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’05. Springer, Berlin, pp 689–695Google Scholar
- 17.Shie B-E, Hsiao H-F, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. Springer, Berlin, pp 224–238Google Scholar
- 21.Tseng VS, Wu C -W, Shie B -E, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York, pp 253–262Google Scholar
- 22.Wu CW, Shie B-E, Tseng VS, Yu PS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 78–86Google Scholar
- 24.Yen S-J, Lee Y-S (2007) Mining high utility quantitative association rules. Springer, Berlin, pp 283–292Google Scholar
- 25.Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: 2013 IEEE 13th International conference on data mining, pp 1259–1264Google Scholar