A hybrid framework for mining high-utility itemsets in a sparse transaction database
- 422 Downloads
High-utility itemset mining aims to find the set of items with utility no less than a user-defined threshold in a transaction database. High-utility itemset mining is an emerging research area in the field of data mining and has important applications in inventory management, query recommendation, systems operation research, bio-medical analysis, etc. Currently, known algorithms for this problem can be classified as either 1-phase or 2-phase algorithms. The 2-phase algorithms typically consist of tree-based algorithms which generate candidate high-utility itemsets and verify them later. A tree data structure generates candidate high-utility itemsets quickly by storing some upper bound utility estimate at each node. The 1-phase algorithms typically consist of inverted-list based and transaction projection based algorithms which avoid the generation of candidate high-utility itemsets. The inverted list and transaction projection allows computation of exact utility estimates. We propose a novel hybrid framework that combines a tree-based and an inverted-list based algorithm to efficiently mine high-utility itemsets. Algorithms based on the framework can harness benefits of both types of algorithms. We report experiment results on real and synthetic datasets to demonstrate the effectiveness of our framework.
KeywordsData mining Mining methods and algorithms Pattern growth mining Frequent pattern mining Utility mining
This work was supported in parts by Infosys Centre for Artificial Intelligence, IIIT-Delhi and Visvesvaraya Ph.D scheme for Electronics and IT.
Compliance with Ethical Standards
Conflict of interests
The authors declare that they have no conflict of interest.
- 1.Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules Proceeding 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499Google Scholar
- 4.Ahmed CF, Tanbeer SK, Jeong BS, Choi HJ (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11,979–11,991. doi: 10.1016/j.eswa.2012.03.062. http://www.sciencedirect.com/science/article/pii/S0957417412005854 CrossRefGoogle Scholar
- 7.Dawar S, Goyal V (2014) Up-hist tree: an efficient data structure for mining high utility patterns from transaction databases Proceedings of the 19th international database engineering & applications symposium, ACM, New York, NY, USA, IDEAS ’15. doi: 10.1145/2790755.2790771, pp 56–61CrossRefGoogle Scholar
- 11.Goethals B, Zaki M (2003) The frequent itemset mining implementations repository. http://fimi.ua.ac.be/
- 17.Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data & Knowledge Engineering 64(1):198–217. doi: 10.1016/j.datak.2007.06.009. http://www.sciencedirect.com/science/article/pii/S0169023X07001218 CrossRefGoogle Scholar
- 18.Li YC, Yeh JS, Chang CC (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217. doi: 10.1016/j.datak.2007.06.009. http://www.sciencedirect.com/science/article/pii/S0169023X07001218 CrossRefGoogle Scholar
- 23.Pisharath J, Liu Y, Wk Liao, Choudhary A, Memik G, Parhi J (2005) Nu-minebench 2.0. Department of Electrical and Computer Engineering, Northwestern University, Tech RepGoogle Scholar
- 24.Rathore S, Dawar S, Goyal V, Patel D (2016) Top-k high utility episode mining from a complex event sequence 21St international conference on management of data, COMAD 2016, Pune, India, March 11–13, 2016. http://comad.in/comad2016/proceedings/paper_19.pdf, pp 56–63Google Scholar
- 27.Shie BE, Yu PS, Tseng VS (2012) Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Syst Appl 39(17):12,947–12,960. doi: 10.1016/j.eswa.2012.05.035. http://www.sciencedirect.com/science/article/pii/S095741741200749X CrossRefGoogle Scholar
- 30.Vu L, Alaghband G (2011) A fast algorithm combining fp-tree and tid-list for frequent pattern mining Proceedings of information and knowledge engineering, pp 472–477Google Scholar
- 34.Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878. doi: 10.1016/j.eswa.2013.11.038. http://www.sciencedirect.com/science/article/pii/S0957417413009585 CrossRefGoogle Scholar
- 35.Zaki M J, Parthasarathy S, Ogihara M, Li W, et al. (1997) New algorithms for fast discovery of association rules KDD, vol 97, pp 283–286Google Scholar