Skip to main content
Log in

A new concise representation of frequent itemsets using generators and a positive border

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A complete set of frequent itemsets can get undesirably large due to redundancy when the minimum support threshold is low or when the database is dense. Several concise representations have been previously proposed to eliminate the redundancy. Generator based representations rely on a negative border to make the representation lossless. However, the number of itemsets on a negative border sometimes even exceeds the total number of frequent itemsets. In this paper, we propose to use a positive border together with frequent generators to form a lossless representation. A positive border is usually orders of magnitude smaller than its corresponding negative border. A set of frequent generators plus its positive border is always no larger than the corresponding complete set of frequent itemsets, thus it is a true concise representation. The generalized form of this representation is also proposed. We develop an efficient algorithm, called GrGrowth, to mine generators and positive borders as well as their generalizations. The GrGrowth algorithm uses the depth-first-search strategy to explore the search space, which is much more efficient than the breadth-first-search strategy adopted by most of the existing generator mining algorithms. Our experiment results show that the GrGrowth algorithm is significantly faster than level-wise algorithms for mining generator based representations, and is comparable to the state-of-the-art algorithms for mining frequent closed itemsets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proc of the 1993 ACM SIGMOD Conference, pp 207–216

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proc of the 20th VLDB Conference, pp 487–499

  3. Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Proc of Computational Logic Conference, pp 972–986

  4. Bastide Y, Taouil R, Pasquier N, Stumme G and Lakhal L (2000). Mining frequent patterns with counting inference. SIGKDD Explor 2(2): 66–75

    Article  Google Scholar 

  5. Bonchi F and Lucchese C (2006). On condensed representations of constrained frequent patterns. Knowledge Inf Syst 9(2): 180–201

    Article  Google Scholar 

  6. Borgelt C (2003) Efficient implementations of apriori and eclat. In: Proc of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations

  7. Boulicaut J-F, Bykowski A and Rigotti C (2003). Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Min Knowledge Discovery J 7(1): 5–22

    Article  MathSciNet  Google Scholar 

  8. Brijs T, Swinnen G, Vanhoof K, Wets G (2001) Using association rules for product assortment decisions: a case study. In: Proc of the 5th SIGKDD Conference, pp 254–260

  9. Bykowski A, Rigotti C (2001) A condensed representation to find frequent patterns. In: Proc of the 20th PODS Symposium

  10. Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proc of the 6th PKDD Conference, pp 74–85

  11. Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Proc of the 7th PKDD Conference, pp 71–82

  12. Calders T, Goethals B (2005) Depth-first non-derivable itemset mining. In: Proc of the 2005 SIAM International Data Mining Conference

  13. Chi Y, Wang H, Yu PS and Muntz RR (2006). Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowledge Inf Syst 10(3): 265–294

    Article  Google Scholar 

  14. Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: Proc of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations

  15. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc of the 2000 ACM SIGMOD Conference, pp 1–12

  16. Han J, Wang J, Lu Y, Tzvetkov P (2002) Mining top-k frequent closed patterns without minimum support. In: Proc of the 2002 IEEE International Conference on Data Mining, pp 211–218

  17. Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Proc of the 1998 ACM SIGMOD Conference, pp 85–93

  18. Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proc of the 2001 ICDM Conference, pp 305–312

  19. Kryszkiewicz M, Gajek M (2002) Concise representation of frequent patterns based on generalized disjunction-free generators. In: Proc of the 6th PAKDD Conference, pp 159–171

  20. Liu G, Li J, Wong L, Hsu W (2006) Positive borders or negative borders: How to make lossless generator based representations concise. In: Proc of the 6th SIAM International Conference on Data Mining, pp 469–473

  21. Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proc of the 2nd ACM SIGKDD Conference, pp 189–194

  22. Pan F, Cong G, Tung AKH, Yang J, Zaki MJ (2003) Carpenter: finding closed patterns in long biological datasets. In: Proc of the 9th ACM SIGKDD Conference, pp 637–642

  23. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc of the 7th ICDT Conference, pp 398–416

  24. Pei J, Dong G, Zou W, Han J (2002) On computing condensed frequent pattern bases. In: Proc of the 2002 ICDM Conference, pp 378–385

  25. Pei J, Dong G, Zou W and Han J (2004). Mining condensed frequent-pattern bases. Knowledge Inf Syst 6(5): 570–594

    Google Scholar 

  26. Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: Workshop on Research Issues in Data Mining and Knowledge Discovery, pp 21–30

  27. Tzvetkov P, Yan X and Han J (2005). Tsp: mining top-k closed sequential patterns. Knowledge Inf Syst 7(4): 438–457

    Article  Google Scholar 

  28. Wang J, Pei J, Han J (2003) Closet+: searching for the best strategies for mining frequent closed itemsets. In: Proc of the 9th ACM SIGKDD Conference, pp 236–245

  29. Zaki MJ, Hsiao C-J (2002) Charm: an efficient algorithm for closed itemset mining. In: Proc of SIAM International Conference on Data Mining, pp 398–416

  30. Zheng Z, Kohavi R, Mason L (2001) Real world performance of association rule algorithms. In: Proc of the 7th SIGKDD Conference, pp 401–406

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guimei Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, G., Li, J. & Wong, L. A new concise representation of frequent itemsets using generators and a positive border. Knowl Inf Syst 17, 35–56 (2008). https://doi.org/10.1007/s10115-007-0111-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-007-0111-5

Keywords

Navigation