Reference itemsets: useful itemsets to approximate the representation of frequent itemsets
- 108 Downloads
Deriving frequent itemsets from databases is an important research issue in data mining. The number of frequent itemsets may be unusually large when a low minimum support threshold is given. As such, the design of a compact representation to compress and describe them is an interesting topic. In the past, most related research on compact representation focused on frequent closed itemsets and frequent maximal itemsets. The former is a lossless compact technology that can totally recover all frequent itemsets and their frequencies. Contrarily, the latter may lose some information regarding frequent itemsets, because it reserves frequent itemsets only and is unable to identify their frequency. In this paper, we propose a new compact representation that lies between closed itemsets and maximal itemsets. It can reserve all frequent itemsets and identify their approximate frequency. In addition, an efficient algorithm that corresponds to this new concept is designed to find related key information in databases. Finally, a series of experiments are conducted to show the effectiveness of compact representation and the performance of the proposed algorithm.
KeywordsData mining Frequent itemset Closed itemset Maximal itemset Approximate representation Reference itemset
This research was supported by the Ministry of Science and Technology of the Republic of China under contract number MOST 103-2221-E-390-014-MY2.
Compliance with ethical standards
Conflict of interest
This article does not contain any studies with human participants or animals performed by any of the authors.
- Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 207–216Google Scholar
- Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 85–93Google Scholar
- Chandola V, Kumar V (2007) Summarization–compressing data into an informative representation. Int J Data Min Knowl Discov 12(3):355–378Google Scholar
- Gallo A, DeBie T, Cristianini N (2007) MINI: mining informative non-redundant itemsets. In: Proceedings of the 11th conference on principles and practice of knowledge discovery in databases, pp 438–445Google Scholar
- Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12Google Scholar
- Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. In: ACM SIGKDD explorations newsletter, pp 58–64Google Scholar
- Kontonasios K-N, De Bie T (2012) Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In: Proceedings of the 11th international conference on advances in intelligent data analysis, pp 161–171Google Scholar
- Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data miningGoogle Scholar
- Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581Google Scholar
- Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceedings of the 7th international conference on database theory, pp 398–416Google Scholar
- Prabha S, Shanmugapriya S, Duraiswamy K (2013) A survey on closed frequent pattern mining. Int J Comput Appl 63(14):47–52Google Scholar
- van Leeuwen M, Ukkonen A (2013) Discovering skylines of subgroup sets. Int J Data Min Knowl Discov Databases 8190:272–287Google Scholar
- Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceedings of the 31st international conference on very large data bases, pp 709–720Google Scholar