A new concise representation of frequent itemsets using generators and a positive border

Liu, Guimei; Li, Jinyan; Wong, Limsoon

doi:10.1007/s10115-007-0111-5

A new concise representation of frequent itemsets using generators and a positive border

Regular Paper
Published: 07 November 2007

Volume 17, pages 35–56, (2008)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Guimei Liu¹,
Jinyan Li² &
Limsoon Wong¹

180 Accesses
22 Citations
Explore all metrics

Abstract

A complete set of frequent itemsets can get undesirably large due to redundancy when the minimum support threshold is low or when the database is dense. Several concise representations have been previously proposed to eliminate the redundancy. Generator based representations rely on a negative border to make the representation lossless. However, the number of itemsets on a negative border sometimes even exceeds the total number of frequent itemsets. In this paper, we propose to use a positive border together with frequent generators to form a lossless representation. A positive border is usually orders of magnitude smaller than its corresponding negative border. A set of frequent generators plus its positive border is always no larger than the corresponding complete set of frequent itemsets, thus it is a true concise representation. The generalized form of this representation is also proposed. We develop an efficient algorithm, called GrGrowth, to mine generators and positive borders as well as their generalizations. The GrGrowth algorithm uses the depth-first-search strategy to explore the search space, which is much more efficient than the breadth-first-search strategy adopted by most of the existing generator mining algorithms. Our experiment results show that the GrGrowth algorithm is significantly faster than level-wise algorithms for mining generator based representations, and is comparable to the state-of-the-art algorithms for mining frequent closed itemsets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proc of the 1993 ACM SIGMOD Conference, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proc of the 20th VLDB Conference, pp 487–499
Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Proc of Computational Logic Conference, pp 972–986
Bastide Y, Taouil R, Pasquier N, Stumme G and Lakhal L (2000). Mining frequent patterns with counting inference. SIGKDD Explor 2(2): 66–75
Article Google Scholar
Bonchi F and Lucchese C (2006). On condensed representations of constrained frequent patterns. Knowledge Inf Syst 9(2): 180–201
Article Google Scholar
Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proc of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations
Boulicaut J-F, Bykowski A and Rigotti C (2003). Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowledge Discovery J 7(1): 5–22
Article MathSciNet Google Scholar
Brijs T, Swinnen G, Vanhoof K, Wets G (2001) Using association rules for product assortment decisions: a case study. In: Proc of the 5th SIGKDD Conference, pp 254–260
Bykowski A, Rigotti C (2001) A condensed representation to find frequent patterns. In: Proc of the 20th PODS Symposium
Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proc of the 6th PKDD Conference, pp 74–85
Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Proc of the 7th PKDD Conference, pp 71–82
Calders T, Goethals B (2005) Depth-first non-derivable itemset mining. In: Proc of the 2005 SIAM International Data Mining Conference
Chi Y, Wang H, Yu PS and Muntz RR (2006). Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge Inf Syst 10(3): 265–294
Article Google Scholar
Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: Proc of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc of the 2000 ACM SIGMOD Conference, pp 1–12
Han J, Wang J, Lu Y, Tzvetkov P (2002) Mining top-k frequent closed patterns without minimum support. In: Proc of the 2002 IEEE International Conference on Data Mining, pp 211–218
Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Proc of the 1998 ACM SIGMOD Conference, pp 85–93
Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proc of the 2001 ICDM Conference, pp 305–312
Kryszkiewicz M, Gajek M (2002) Concise representation of frequent patterns based on generalized disjunction-free generators. In: Proc of the 6th PAKDD Conference, pp 159–171
Liu G, Li J, Wong L, Hsu W (2006) Positive borders or negative borders: How to make lossless generator based representations concise. In: Proc of the 6th SIAM International Conference on Data Mining, pp 469–473
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proc of the 2nd ACM SIGKDD Conference, pp 189–194
Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) Carpenter: finding closed patterns in long biological datasets. In: Proc of the 9th ACM SIGKDD Conference, pp 637–642
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc of the 7th ICDT Conference, pp 398–416
Pei J, Dong G, Zou W, Han J (2002) On computing condensed frequent pattern bases. In: Proc of the 2002 ICDM Conference, pp 378–385
Pei J, Dong G, Zou W and Han J (2004). Mining condensed frequent-pattern bases. Knowledge Inf Syst 6(5): 570–594
Google Scholar
Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: Workshop on Research Issues in Data Mining and Knowledge Discovery, pp 21–30
Tzvetkov P, Yan X and Han J (2005). Tsp: mining top-k closed sequential patterns. Knowledge Inf Syst 7(4): 438–457
Article Google Scholar
Wang J, Pei J, Han J (2003) Closet+: searching for the best strategies for mining frequent closed itemsets. In: Proc of the 9th ACM SIGKDD Conference, pp 236–245
Zaki MJ, Hsiao C-J (2002) Charm: an efficient algorithm for closed itemset mining. In: Proc of SIAM International Conference on Data Mining, pp 398–416
Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: Proc of the 7th SIGKDD Conference, pp 401–406

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, COM1, Law Link, Singapore, 117590, Singapore
Guimei Liu & Limsoon Wong
School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Jinyan Li

Authors

Guimei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jinyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Limsoon Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guimei Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, G., Li, J. & Wong, L. A new concise representation of frequent itemsets using generators and a positive border. Knowl Inf Syst 17, 35–56 (2008). https://doi.org/10.1007/s10115-007-0111-5

Download citation

Received: 05 September 2006
Revised: 17 May 2007
Accepted: 05 September 2007
Published: 07 November 2007
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10115-007-0111-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new concise representation of frequent itemsets using generators and a positive border

Abstract

Access this article

Similar content being viewed by others

An Efficient Algorithm for Deriving Frequent Itemsets from Lossless Condensed Representation

An Approach for Mining Concurrently Closed Itemsets and Generators

GrAFCI+ A fast generator-based algorithm for mining frequent closed itemsets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new concise representation of frequent itemsets using generators and a positive border

Abstract

Access this article

Similar content being viewed by others

An Efficient Algorithm for Deriving Frequent Itemsets from Lossless Condensed Representation

An Approach for Mining Concurrently Closed Itemsets and Generators

GrAFCI+ A fast generator-based algorithm for mining frequent closed itemsets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation