Advertisement

Knowledge and Information Systems

, Volume 6, Issue 5, pp 570–594 | Cite as

Mining Condensed Frequent-Pattern Bases

  • Jian PeiEmail author
  • Guozhu Dong
  • Wei Zou
  • Jiawei Han
Article

Abstract

Frequent-pattern mining has been studied extensively and has many useful applications. However, frequent-pattern mining often generates too many patterns to be truly efficient or effective. In many applications, it is sufficient to generate and examine frequent patterns with a sufficiently good approximation of the support frequency instead of in full precision. Such a compact but “close-enough” frequent-pattern base is called a condensed frequent-pattern base.

In this paper, we propose and examine several alternatives for the design, representation, and implementation of such condensed frequent-pattern bases. Several algorithms for computing such pattern bases are proposed. Their effectiveness at pattern compression and methods for efficiently computing them are investigated. A systematic performance study is conducted on different kinds of databases, and demonstrates the effectiveness and efficiency of our approach in handling frequent-pattern mining in large databases.

Keywords

Frequent patterns Frequent-pattern mining Approximation Compression 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal R, Aggarwal C, Prasad VVV (2001) A tree projection algorithm for generation of frequent itemsets. J Parallel Distrib Comput 61:350–371CrossRefzbMATHGoogle Scholar
  2. 2.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc 1994 Int Conf Very Large Data Bases (VLDB’94), Santiago, Chile, pp 487–499Google Scholar
  3. 3.
    Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proc 1995 Int Conf Data Engineering (ICDE’95), Taipei, Taiwan, pp 3–14Google Scholar
  4. 4.
    Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proc 1998 ACM-SIGMOD Int Conf Management of Data (SIGMOD’98), Seattle, USA, pp 85–93Google Scholar
  5. 5.
    Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proc 2002 ACM SIGKDD Int Conf Knowledge Discovery in Databases (KDD’02), Edmonton, Canada, pp 436–442Google Scholar
  6. 6.
    Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: Proc 1999 ACM-SIGMOD Int Conf Management of Data (SIGMOD’99), Philadelphia, USA, pp 359–370Google Scholar
  7. 7.
    Boulicaut JF, Bykowski A, Rigotti C (2000) Approximation of frequency queries by means of free-sets. In: Principles of Data Mining and Knowledge Discovery, pp 75–85Google Scholar
  8. 8.
    Brin S, Motwani R, Silverstein C (1997) Beyond market basket: generalizing association rules to correlations. In: Proc 1997 ACM-SIGMOD Int Conf Management of Data (SIGMOD’97), Tucson, USA, pp 265–276Google Scholar
  9. 9.
    Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 443–452Google Scholar
  10. 10.
    Dong G, Han J, Lam J, Pei J, Wang K (2001) Mining multi-dimensional constrained gradients in data cubes. In: Proc 2001 Int Conf Very Large Data Bases (VLDB’01), Rome, Italy, pp 321–330Google Scholar
  11. 11.
    Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proc 1999 Int Conf Knowledge Discovery and Data Mining (KDD’99), San Diego, USA, pp 43–52Google Scholar
  12. 12.
    Gouda K, Zaki MJ (2001) Efficiently mining maximal frequent itemsets. In: ICDM, pp 163–170Google Scholar
  13. 13.
    Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: Proc 1999 Int Conf Data Engineering (ICDE’99), Sydney, Australia, pp 106–115Google Scholar
  14. 14.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc 2000 ACM-SIGMOD Int Conf Management of Data (SIGMOD’00), Dallas, USA, pp 1–12Google Scholar
  15. 15.
    Imielinski T, Khachiyan L, Abdulghani A (2002) Cubegrades: generalizing association rules. Data Min Knowl Discovery 6:219–258CrossRefGoogle Scholar
  16. 16.
    Kamber M, Han J, Chiang JY (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. In: Proc 1997 Int Conf Knowledge Discovery and Data Mining (KDD’97), Newport Beach, USA, pp 207–210Google Scholar
  17. 17.
    Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proc 1998 Int Conf Knowledge Discovery and Data Mining (KDD’98), New York, USA, pp 80–86Google Scholar
  18. 18.
    Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proc 2001 Int Conf Data Mining (ICDM’01), San Jose, USA, pp 369–376Google Scholar
  19. 19.
    Lakshmanan LVS, Ng R, Han J, and Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. In: Proc 1999 ACM-SIGMOD Int Conf Management of Data (SIGMOD’99), Philadelphia, USA, pp 157–168Google Scholar
  20. 20.
    Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proc 1997 Int Conf Data Engineering (ICDE’97), Birmingham, England, pp 220–231,Google Scholar
  21. 21.
    Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations (extended abstract). In: Knowledge Discovery and Data Mining, pp 189–194Google Scholar
  22. 22.
    Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discovery 1:259–289CrossRefGoogle Scholar
  23. 23.
    Margaritis D, Faloutsos C, Thrun S (2001) Netcube: a scalable tool for fast data mining and compression. In: Proc 2001 Int Conf Very Large Data Bases (VLDB’01), pp 311–320Google Scholar
  24. 24.
    Ng R, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proc 1998 ACM-SIGMOD Int Conf Management of Data (SIGMOD’98), Seattle, USA, pp 13–24Google Scholar
  25. 25.
    Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc 7th Int Conf Database Theory (ICDT’99), Jerusalem, Israel, pp 398–416Google Scholar
  26. 26.
    Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 433–332Google Scholar
  27. 27.
    Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proc 2000 ACM-SIGMOD Int Workshop Data Mining and Knowledge Discovery (DMKD’00), Dallas, USA, pp 11–20Google Scholar
  28. 28.
    Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proc 2001 Int Conf Data Engineering (ICDE’01), Heidelberg, Germany, pp 215–224Google Scholar
  29. 29.
    Silverstein C, Brin S, Motwani R, Ullman J (1998) Scalable techniques for mining causal structures. In: Proc 1998 Int Conf Very Large Data Bases (VLDB’98), New York, USA, pp 594–605Google Scholar
  30. 30.
    Zaki M (2000) Generating non-redundant association rules. In: Proc 2000 ACM SIGKDD Int Conf Knowledge Discovery in Databases (KDD’00), Boston, USA, pp 34–43Google Scholar
  31. 31.
    Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proc 2002 SIAM Int Conf Data Mining, Arlington, USA, pp 457–473Google Scholar

Copyright information

© Springer-Verlag 2004

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringState University of New York at BuffaloBuffaloUSA
  2. 2.Wright State UniversityUSA
  3. 3.University of Illinois at Urbana-ChampaignUrbana-ChampaignUSA

Personalised recommendations