Abstract
A complete set of frequent itemsets can get undesirably large due to redundancy when the minimum support threshold is low or when the database is dense. Several concise representations have been previously proposed to eliminate the redundancy. Generator based representations rely on a negative border to make the representation lossless. However, the number of itemsets on a negative border sometimes even exceeds the total number of frequent itemsets. In this paper, we propose to use a positive border together with frequent generators to form a lossless representation. A positive border is usually orders of magnitude smaller than its corresponding negative border. A set of frequent generators plus its positive border is always no larger than the corresponding complete set of frequent itemsets, thus it is a true concise representation. The generalized form of this representation is also proposed. We develop an efficient algorithm, called GrGrowth, to mine generators and positive borders as well as their generalizations. The GrGrowth algorithm uses the depth-first-search strategy to explore the search space, which is much more efficient than the breadth-first-search strategy adopted by most of the existing generator mining algorithms. Our experiment results show that the GrGrowth algorithm is significantly faster than level-wise algorithms for mining generator based representations, and is comparable to the state-of-the-art algorithms for mining frequent closed itemsets.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proc of the 1993 ACM SIGMOD Conference, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proc of the 20th VLDB Conference, pp 487–499
Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Proc of Computational Logic Conference, pp 972–986
Bastide Y, Taouil R, Pasquier N, Stumme G and Lakhal L (2000). Mining frequent patterns with counting inference. SIGKDD Explor 2(2): 66–75
Bonchi F and Lucchese C (2006). On condensed representations of constrained frequent patterns. Knowledge Inf Syst 9(2): 180–201
Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proc of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations
Boulicaut J-F, Bykowski A and Rigotti C (2003). Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowledge Discovery J 7(1): 5–22
Brijs T, Swinnen G, Vanhoof K, Wets G (2001) Using association rules for product assortment decisions: a case study. In: Proc of the 5th SIGKDD Conference, pp 254–260
Bykowski A, Rigotti C (2001) A condensed representation to find frequent patterns. In: Proc of the 20th PODS Symposium
Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proc of the 6th PKDD Conference, pp 74–85
Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Proc of the 7th PKDD Conference, pp 71–82
Calders T, Goethals B (2005) Depth-first non-derivable itemset mining. In: Proc of the 2005 SIAM International Data Mining Conference
Chi Y, Wang H, Yu PS and Muntz RR (2006). Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge Inf Syst 10(3): 265–294
Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: Proc of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc of the 2000 ACM SIGMOD Conference, pp 1–12
Han J, Wang J, Lu Y, Tzvetkov P (2002) Mining top-k frequent closed patterns without minimum support. In: Proc of the 2002 IEEE International Conference on Data Mining, pp 211–218
Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Proc of the 1998 ACM SIGMOD Conference, pp 85–93
Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proc of the 2001 ICDM Conference, pp 305–312
Kryszkiewicz M, Gajek M (2002) Concise representation of frequent patterns based on generalized disjunction-free generators. In: Proc of the 6th PAKDD Conference, pp 159–171
Liu G, Li J, Wong L, Hsu W (2006) Positive borders or negative borders: How to make lossless generator based representations concise. In: Proc of the 6th SIAM International Conference on Data Mining, pp 469–473
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proc of the 2nd ACM SIGKDD Conference, pp 189–194
Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) Carpenter: finding closed patterns in long biological datasets. In: Proc of the 9th ACM SIGKDD Conference, pp 637–642
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc of the 7th ICDT Conference, pp 398–416
Pei J, Dong G, Zou W, Han J (2002) On computing condensed frequent pattern bases. In: Proc of the 2002 ICDM Conference, pp 378–385
Pei J, Dong G, Zou W and Han J (2004). Mining condensed frequent-pattern bases. Knowledge Inf Syst 6(5): 570–594
Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: Workshop on Research Issues in Data Mining and Knowledge Discovery, pp 21–30
Tzvetkov P, Yan X and Han J (2005). Tsp: mining top-k closed sequential patterns. Knowledge Inf Syst 7(4): 438–457
Wang J, Pei J, Han J (2003) Closet+: searching for the best strategies for mining frequent closed itemsets. In: Proc of the 9th ACM SIGKDD Conference, pp 236–245
Zaki MJ, Hsiao C-J (2002) Charm: an efficient algorithm for closed itemset mining. In: Proc of SIAM International Conference on Data Mining, pp 398–416
Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: Proc of the 7th SIGKDD Conference, pp 401–406
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, G., Li, J. & Wong, L. A new concise representation of frequent itemsets using generators and a positive border. Knowl Inf Syst 17, 35–56 (2008). https://doi.org/10.1007/s10115-007-0111-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0111-5