Skip to main content

Frequent pattern mining: current status and future directions

Abstract

Frequent pattern mining has been a focused theme in data mining research for over a decade. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining, correlation mining, associative classification, and frequent pattern-based clustering, as well as their broad applications. In this article, we provide a brief overview of the current status of frequent pattern mining and discuss a few promising research directions. We believe that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run. However, there are still some challenging research issues that need to be solved before frequent pattern mining can claim a cornerstone approach in data mining applications.

References

  • Afrati FN, Gionis A, Mannila H (2004) Approximating a collection of frequent sets. In: Proceedings of the 2004 ACM SIGKDD international conference knowledge discovery in databases (KDD’04), Seattle, WA, pp 12–19

  • Agarwal R, Aggarwal CC, Prasad VVV (2001) A tree projection algorithm for generation of frequent itemsets. J Parallel Distribut Comput 61:350–371

    Google Scholar 

  • Aggarwal CC, Yu PS (1998) A new framework for itemset generation. In: Proceedings of the 1998 ACM symposium on principles of database systems (PODS’98), Seattle, WA, pp 18–24

  • Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), Seattle, WA, pp 94–105

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM-SIGMOD international conference on management of data (SIGMOD’93), Washington, DC, pp 207–216

  • Agrawal R, Shafer JC (1996) Parallel mining of association rules: design, implementation, and experience. IEEE Trans Knowl Data Eng 8:962–969

    Article  Google Scholar 

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 1994 international conference on very large data bases (VLDB’94), Santiago, Chile, pp 487–499

  • Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 1995 international conference on data engineering (ICDE’95), Taipei, Taiwan, pp 3–14

  • Ahmed KM, El-Makky NM, Taha Y (2000) A note on “beyond market basket: generalizing association rules to correlations”. SIGKDD Explorations 1:46–48

    Google Scholar 

  • Asai T, Abe K, Kawasoe S, Arimura H, Satamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the 2002 SIAM international conference on data mining (SDM’02), Arlington, VA, pp 158–174

  • Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In: Proceeding of the 1999 international conference on knowledge discovery and data mining (KDD’99), San Diego, CA, pp 261–270

  • Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceeding of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), Seattle, WA, pp 85–93

  • Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proceeding of the 2002 ACM SIGKDD international conference on knowledge discovery in databases (KDD’02), Edmonton, Canada, pp 436–442

  • Bettini C, Sean Wang X, Jajodia S (1998) Mining temporal relationships with multiple granularities in time sequences. Bull Tech Committee Data Eng 21:32–38

    Google Scholar 

  • Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: Proceeding of the 1999 ACM-SIGMOD international conference on management of data (SIGMOD’99), Philadelphia, PA, pp 359–370

  • Blanchard J, Guillet F, Gras R, Briand H (2005) Using information-theoretic measures to assess association rule interestingness. In: Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 66–73

  • Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2003) Exante: anticipated data reduction in constrained pattern mining. In: Proceeding of the 7th European conference on principles and pratice of knowledge discovery in databases (PKDD’03), pp 59–70

  • Bonchi F, Lucchese C (2004) On closed constrained frequent pattern mining. In: Proceeding of the 2004 international conference on data mining (ICDM’04), Brighton, UK, pp 35–42

  • Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceeding of the 2002 international conference on data mining (ICDM’02), Maebashi, Japan, pp 211–218

  • Brin S, Motwani R, Silverstein C (1997) Beyond market basket: generalizing association rules to correlations. In: Proceeding of the 1997 ACM-SIGMOD international conference on management of data (SIGMOD’97), Tucson, AZ, pp 265–276

  • Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket analysis. In: Proceeding of the 1997 ACM-SIGMOD international conference on management of data (SIGMOD’97), Tucson, AZ, pp 255–264

  • Bucila C, Gehrke J, Kifer D, White W (2003) DualMiner: a dual-pruning algorithm for itemsets with constraints. Data Min knowl discov 7:241–272

    Google Scholar 

  • Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceeding of the 2001 international conference on data engineering (ICDE’01), Heidelberg, Germany, pp 443–452

  • Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proceeding of the 2002 European conference on principles and pratice of knowledge discovery in databases (PKDD’02), Helsinki, Finland, pp 74–85

  • Calders T, Goethals B (2005) Depth-first non-derivable itemset mining. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, CA, pp 250–261

  • Cao H, Mamoulis N, Cheung DW (2005) Mining frequent spatio-temporal sequential patterns. In: Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 82–89

  • Chang J, Lee W (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceeding of the 2003 international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 487–492

  • Chen MS, Park JS, Yu PS (1996) Data mining for path traversal patterns in a web environment. In: Proceeding of the 16th international conference on distributed computing systems, pp 385–392

  • Cheng CH, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceeding of the 1999 international conference on knowledge discovery and data mining (KDD’99), San Diego, CA, pp 84–93

  • Cheng H, Yan X, Han J (2004) IncSpan: incremental mining of sequential patterns in large In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 527–532

  • Cheng H, Yan X, Han J (2005) Seqindex: indexing sequences by sequential pattern analysis. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, CA, pp 601–605

  • Cheng H, Yan X, Han J, Hsu C (2007) Discriminative frequent pattern analysis for effective classification. In: Proceeding of the 2007 international conference on data engineering (ICDE’07), Istanbul, Turkey

  • Cheung DW, Han J, Ng V, Fu A, Fu Y (1996) A fast distributed algorithm for mining association rules. In: Proceeding of the 1996 international conference on parallel and distributed information systems, Miami Beach, FL, pp 31–44

  • Cheung DW, Han J, Ng V, Wong CY (1996) Maintenance of discovered association rules in large an incremental updating technique. In: Proceeding of the 1996 international conference on data engineering (ICDE’96), New Orleans, LA, pp 106–114

  • Chi Y, Wang H, Yu PS, Muntz R (2004) Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceeding of the 2004 international conference on data mining (ICDM’04), Brighton, UK, pp 59–66

  • Cong S, Han J, Padua D (2005) Parallel mining of closed sequential patterns. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD’05), Chicago, IL, pp 562–567

  • Cong G, Tan K-L, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: Proceeding of the 2005 ACM-SIGMOD international conference on management of data (SIGMOD’05), Baltimore, MD, pp 670–681

  • Deshpande M, Kuramochi M, Karypis G (2003) Frequent sub-structure-based approaches for classifying chemical compounds. In: Proceeding of the 2002 international conference on data mining (ICDM’03), Melbourne, FL, pp 35–42

  • Dong G, Han J, Lam J, Pei J, Wang K, Zou W (2004) Mining constrained gradients in multi-dimensional databases. IEEE Trans Knowl Data Eng 16:922–938

    Google Scholar 

  • Dehaspe L, Toivonen H, King R (1998) Finding frequent substructures in chemical compounds. In: Proceeding of the 1998 international conference on knowledge discovery and data mining (KDD’98), New York, NY, pp 30–36

  • Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceeding of the 1999 international conference on knowledge discovery and data mining (KDD’99), San Diego, CA, pp 43–52

  • Eirinaki M, Vazirgiannis M (2003) Web mining for web personalization. ACM Trans Inter Tech 3:1–27

    Article  Google Scholar 

  • Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1996) Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization. In: Proceeding of the 1996 ACM-SIGMOD international conference management of data (SIGMOD’96), Montreal, Canada, pp 13–23

  • Gade K, Wang J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. In: Proceeding of the 2004 international conference on knowledge discovery and data mining (KDD’04), Seattle, WA, pp 138–147

  • Garofalakis M, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceeding of the 1999 international conference on Very large data bases (VLDB’99), Edinburgh, UK, pp 223–234

  • Geerts F, Goethals B, Bussche J (2001) A tight upper bound on the number of candidate patterns. In: Proceeding of the 2001 international conference on data mining (ICDM’01), San Jose, CA, pp 155–162

  • Gionis A, Kujala T, Mannila H (2003) Fragments of order. In: Proceeding of the 2003 international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 129–136

  • Gionis A, Mannila H, Mielikäinen T, Tsaparas P (2006) Assessing data mining results via swap randomization. In: Proceeding of the 2006 ACM SIGKDD international conference on knowledge discovery in databases (KDD’06), Philadelphia, PA, pp 167–176

  • Goethals B, Zaki M (2003) An introduction to workshop on frequent itemset mining implementations. In: Proceeding of the ICDM’03 international workshop on frequent itemset mining implementations (FIMI’03), Melbourne, FL, pp 1–13

  • Grahne G, Lakshmanan L, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceeding of the 2000 international conference on data engineering (ICDE’00), San Diego, CA, pp 512–521

  • Grahne G, Zhu J (2003)Efficiently using prefix-trees in mining frequent itemsets. In: Proceeding of the ICDM’03 international workshop on frequent itemset mining implementations (FIMI’03), Melbourne, FL, pp 123–132

  • Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: Proceeding of the 1999 international conference on data engineering (ICDE’99), Sydney, Australia, pp 106–115

  • Han J, Fu Y (1995) Discovery of multiple-level association rules from large databases. In: Proceeding of the 1995 international conference on very large data bases (VLDB’95), Zurich, Switzerland, pp 420–431

  • Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann

  • Han J, Pei J, Dong G, Wang K (2001) Efficient computation of iceberg cubes with complex measures. In: Proceeding of the 2001 ACM-SIGMOD international conference on management of data (SIGMOD’01), Santa Barbara, CA, pp 1–12

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceeding of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD’00), Dallas, TX, pp 1–12

  • Hilderman RJ, Hamilton HJ (2001) Knowledge discovery and measures of interest. Kluwer Academic

  • Holder LB, Cook DJ, Djoko S (1994) Substructure discovery in the subdue system. In: Proceeding of the AAAI’94 workshop knowledge discovery in databases (KDD’94), Seattle, WA, pp 169–180

  • Holsheimer M, Kersten M, Mannila H, Toivonen H (1995) A perspective on databases and data mining. In Proceeding of the 1995 international conference on knowledge discovery and data mining (KDD’95), Montreal, Canada, pp 150–155

  • Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A (2004) Mining spatial motifs from protein structure graphs. In: Proceeding of the 8th international conference on research in computational molecular biology (RECOMB), San Diego, CA, pp 308–315

  • Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceeding of the 2003 international conference on data mining (ICDM’03), Melbourne, FL, pp 549–552

  • Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 581–586

  • Imielinski T, Khachiyan L, Abdulghani A (2002) Cubegrades: generalizing association rules. Data Min Knowl Discov 6:219–258

    Article  Google Scholar 

  • Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceeding of the 2000 European symposium on the principle of data mining and knowledge discovery (PKDD’00), Lyon, France, pp 13–23

  • Jaroszewicz S, Scheffer T (2005) Fast discovery of unexpected patterns in data relative to a bayesian network. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05), Chicago, IL, pp 118–127

  • Jaroszewicz S, Simovici D (2004) interestingness of frequent itemsets using bayesian networks as background knowledge. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04), Seattle, WA, pp 178–186

  • Ji X, Bailey J, Dong G (2005) Mining minimal distinguishing subsequence patterns with gap constraints. In: Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 194–201

  • Jin R, Agrawal G (2005) An algorithm for in-core frequent itemset mining on streaming data. In Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 210–217

  • Jin R, Wang C, Polshakov D, Parthasarathy S, Agrawal G (2005) Discovering frequent topological structures from graph datasets. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD’05), Chicago, IL, pp 606–611

  • Kamber M, Han J, Chiang JY (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. In: Proceeding of the 1997 international conference on knowledge discovery and data mining (KDD’97), Newport Beach, CA, pp 207–210

  • Karp RM, Papadimitriou CH, Shenker S (2003) A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst, 28:51–55

    Article  Google Scholar 

  • Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: Proceeding of the 1995 international symposium on large spatial databases (SSD’95), Portland, ME, pp 47–66

  • Kosala R, Blockeel H (2000) Web mining research: a survey. SIGKDD Explor 2

  • Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceeding of the 2001 international conference on data mining (ICDM’01), San Jose, CA, pp 313–320

  • Kuramochi M, Karypis G (2004) GREW: a scalable frequent subgraph discovery algorithm. In Proceeding of the 2004 international conference on data mining (ICDM’04), Brighton, UK, pp 439–442

  • Lakshmanan LVS, Ng R, Han J, Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. In: Proceeding of the 1999 ACM-SIGMOD international conference on management of data (SIGMOD’99), Philadelphia, PA, pp 157–168

  • Lakshmanan LVS, Pei J, Han J (2002) Quotient cube: how to summarize the semantics of a data cube. In: Proceeding of the 2002 international conference on very large data bases (VLDB’02), Hong Kong, China, pp 778–789

  • Lee Y-K, Kim W-Y, Cai YD, Han J (2003) CoMine: efficient mining of correlated patterns. In: Proceeding of the 2003 international conference on data mining (ICDM’03), Melbourne, FL, pp 581–584

  • Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proceeding of the 1997 international conference on data engineering (ICDE’97), Birmingham, England, pp 220–231

  • Li Z, Chen Z, Srinivasan SM, Zhou Y (2004) C-Miner: mining block correlations in storage systems. In: Proceeding of the 2004 USENIX conference on file and storage technologies (FAST’04), San Francisco, CA, pp 173–186

  • Li J, Dong G, Ramamohanrarao K (2000) Making use of the most expressive jumping emerging patterns for classification. In: Proceeding of the 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’00), Kyoto, Japan, pp 220–232

  • Li X, Han J, Kim S (2006) Motion-alert: automatic anomaly detection in massive moving objects. In: IEEE international conference on intelligence and security informatics (ISI’06), San Diego, CA, pp 166–177

  • Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceeding of the 2001 international conference on data mining (ICDM’01), San Jose, CA, pp 369–376

  • Li Z, Lu S, Myagmar S, Zhou Y (2004) CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In: Proceeding of the 2004 symposium on operating systems design and implementation (OSDI’04), San Francisco, CA, pp 289–302

  • Li Z, Zhou Y (2005) PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In: Proceeding of the 2005 ACM SIGSOFT symposium on foundations software eng (FSE’05), Lisbon, Portugal, pp 306–315

  • Lin C, Chiu D, Wu Y, Chen A (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, pp 68–79

  • Liu H, Han J, Xin D, Shao Z (2006) Mining frequent patterns on very high dimensional data: a top-down row enumeration approach. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 280–291

  • Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceeding of the 1998 international conference on knowledge discovery and data mining (KDD’98), New York, NY, pp 80–86

  • Liu G, Li J, Wong L, Hsu W (2006) Positive borders or negative borders: how to make lossless generator based representations concise. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 467–471

  • Liu G, Lu H, Lou W, Yu JX (2003) On computing, storing and querying frequent patterns. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 607–612

  • Liu J, Paulsen S, Sun X, Wang W, Nobel A, Prins J (2006) Mining approximate frequent itemsets in the presence of noise: algorithm and analysis. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 405–416

  • Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceeding of the 2002 ACM SIGKDD international conference on knowledge discovery in databases (KDD’02), Edmonton, Canada, pp 239–248

  • Liu C, Yan X, Yu H, Han J, Yu PS (2005) Mining behavior graphs for “backtrace” of noncrashing bugs. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, pp 286–297

  • Lu H, Han J, Feng L (1998) Stock movement and n-dimensional inter-transaction association rules. In: Proceeding of the 1998 SIGMOD workshop research issues on data mining and knowledge discovery (DMKD’98), Seattle, WA, pp 12:1–12:7

  • Luo C, Chung S (2005) Efficient mining of maximal sequential patterns using multiple samples. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, CA, pp 415–426

  • Ma S, Hellerstein JL (2001) Mining partially periodic event patterns with unknown periods. In: Proceeding of the 2001 international conference on data engineering (ICDE’01), Heidelberg, Germany, pp 205–214

  • Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: Proceeding of the 2002 international conference on very large data bases (VLDB’02), Hong Kong, China, pp 346–357

  • Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Proceeding of the AAAI’94 workshop knowledge discovery in databases (KDD’94), Seattle, WA, pp 181–192

  • Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1:259–289

    Article  Google Scholar 

  • Mei Q, Xin D, Cheng H, Han J, Zhai C (2006) Generating semantic annotations for frequent patterns with context analysis. In: Proceeding of the 2006 ACM SIGKDD international conference on knowledge discovery in databases (KDD’06), Philadelphia, PA, pp 337–346

  • Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: Proceeding of the 2005 international conference on database theory (ICDT’05), Edinburgh, UK, pp 398–412

  • Miller RJ, Yang Y (1997) Association rules over interval data. In: Proceeding of the 1997 ACM-SIGMOD international conference on management of data (SIGMOD’97), Tucson, AZ, pp 452–461

  • Nanopoulos A, Manolopoulos Y (2001) Mining patterns from graph traversals. Data Knowl Eng 37:243–266

    MATH  Article  Google Scholar 

  • Ng R, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proceeding of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), Seattle, WA, pp 13–24

  • Nijssen S, Kok J (2004) A quickstart in frequent structure mining can make a difference. In: Proceeding of the 2004 ACM SIGKDD international conference on kowledge discovery in databases (KDD’04), Seattle, WA, pp 647–652

  • Omiecinski E (2003) Alternative interest measures for mining associations. IEEE Trans Knowl and data engineering 15:57–69

    Article  Google Scholar 

  • Özden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: Proceeding of the 1998 international conference on data engineering (ICDE’98), Orlando, FL, pp 412–421

  • Pan F, Cong G, Tung AKH, Yang J, Zaki M (2003) CARPENTER: finding closed patterns in long biological datasets. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 637–642

  • Pan F, Tung AKH, Cong G, Xu X (2004) COBBLER: combining column, and row enumeration for closed pattern discovery. In: Proceeding of the 2004 international conference on scientific and statistical database management (SSDBM’04), Santorini Island, Greece, pp 21–30

  • Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceeding of the 1995 ACM-SIGMOD international conference on management of data (SIGMOD’95), San Jose, CA, pp 175–186

  • Park JS, Chen MS, Yu PS (1995) Efficient parallel mining for association rules. In: Proceeding of the 4th international conference on information and knowledge management, Baltimore, MD, pp 31–36

  • Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceeding of the 7th international conference on database theory (ICDT’99), Jerusalem, Israel, pp 398–416

  • Pei J, Dong G, Zou W, Han J (2002) On computing condensed frequent pattern bases. In: Proceeding of the 2002 international conference on data mining (ICDM’02), Maebashi, Japan, pp 378–385

  • Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proceeding of the 2001 international conference on data engineering (ICDE’01), Heidelberg, Germany, pp 433–332

  • Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceeding of the 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD’00), Dallas, TX, pp 11–20

  • Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceeding of the 2001 international conference on data engineering (ICDE’01), Heidelberg, Germany, pp 215–224

  • Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16:1424–1440

    Google Scholar 

  • Pei J, Han J, Mortazavi-Asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In: Proceeding of the 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’00), Kyoto, Japan, pp 396–407

  • Pei J, Han J, Wang W (2002) Constraint-based sequential pattern mining in large databases. In: Proceeding of the 2002 international conference on information and knowledge management (CIKM’02), McLean, VA, pp 18–25

  • Pei J, Liu J, Wang H, Wang K, Yu PS, Yang J (2005) Efficiently mining frequent closed partial orders. In: Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 753–756

  • Piatetsky-Shapiro G (1991) Notes of AAAI’91 workshop knowledge discovery in databases (KDD’91). AAAI/MIT Press, Anaheim, CA

  • Pinto H, Han J, Pei J, Wang K, Chen Q, Dayal U (2001) Multi-dimensional sequential pattern mining. In: Proceeding of the 2001 international conference on information and knowledge management (CIKM’01), Atlanta, GA, pp 81–88

  • Punin J, Krishnamoorthy M, Zaki M (2001) Web usage mining: languages and algorithms. Springer-Verlag

  • Ramesh G, Maniatty WA, Zaki MJ (2003) Feasible itemset distributions in data mining: theory and application. In: Proceeding of the 2003 ACM symposium on principles of database systems (PODS’03), San Diego, CA, pp 284–295

  • Sarawagi S, Thomas S, Agrawal R (1998) Integrating association rule mining with relational database systems: alternatives and implications. In: Proceeding of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), Seattle, WA, pp 343–354

  • Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceeding of the 1995 international conference on very large data bases (VLDB’95), Zurich, Switzerland, pp 432–443

  • Seppänen J, Mannila H (2004) Dense itemsets. In: Proceeding of the 2004 international conference on knowledge discovery and data mining (KDD’04), Seattle, WA, pp 683–688

  • Shekar B, Natarajan R (2004) A transaction-based neighbourhood-driven approach to quantifying interestingness of assoication rules. In: Proceeding of the 2004 international conference on data mining (ICDM’04), Brighton, UK, pp 194–201

  • Siebes A, Vreeken J, Leeuwen M (2006) Item sets that compress. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 393–404

  • Silverstein C, Brin S, Motwani R, Ullman JD (1998) Scalable techniques for mining causal structures. In: Proceeding of the 1998 international conference on very large data bases (VLDB’98), New York, NY, pp 594–605

  • Sismanis Y, Roussopoulos N, Deligianannakis A, Kotidis Y (2002) Dwarf: shrinking the petacube. In: Proceeding of the 2002 ACM-SIGMOD international conference on management of data (SIGMOD’02), Madison, WI, pp 464–475

  • Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceeding of the 1995 international conference on very large data bases (VLDB’95), Zurich, Switzerland, pp 407–419

  • Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceeding of the 5th international conference on extending database technology (EDBT’96), Avignon, France, pp 3–17

  • Srivastava J, Cooley R, Deshpande M, Tan PN (2000) Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor 1:12–23

    Google Scholar 

  • Steinbach M, Tan P, Kumar V (2004) Support envelopes: A technique for exploring the structure of association patterns. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 296–305

  • Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceeding of the 2002 ACM SIGKDD international conference on knowledge discovery in databases (KDD’02), Edmonton, Canada, pp 32–41

  • Ting R, Bailey J (2006) Mining minimal contrast subgraph patterns. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 638–642

  • Toivonen H (1996) Sampling large databases for association rules. In: Proceeding of the 1996 international conference on very large data bases (VLDB’96), Bombay, India, pp 134–145

  • Ukkonen A, Fortelius M, Mannila H (2005) Finding partial orders from unordered 0-1 data. In: Proceeding of the 2005 international conference on knowledge discovery and data mining (KDD’05), Chicago, IL, pp 285–293

  • Vanetik N, Gudes E, Shimony SE (2002) Computing frequent graph patterns from semistructured data. In: Proceeding of the 2002 international conference on data mining (ICDM’02), Maebashi, Japan, pp 458–465

  • Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: Proceeding of the 2004 international conference on data engineering (ICDE’04), Boston, MA, pp 79–90

  • Wang J, Han J, Lu Y, Tzvetkov P (2005) TFP: An efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17:652–664

    Article  Google Scholar 

  • Wang J, Han J, Pei J (2003) CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 236–245

  • Wang K, Jiang Y, Lakshmanan L (2003) Mining unexpected rules by pushing user dynamics. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery in databases (KDD’03), Washington, DC, pp 246–255

  • Wang J, Karypis G (2005) HARMONY: efficiently mining the best rules for classification. In: Proceeding of the 2005 SIAM conference on data mining (SDM’05), Newport Beach, CA, pp 205–216

  • Wang W, Lu H, Feng J, Yu JX (2002) Condensed cube: an effective approach to reducing data cube size. In: Proceeding of the 2002 international conference on data engineering (ICDE’02), San Fransisco, CA, pp 155–165

  • Wang C, Wang W, Pei J, Zhu Y, Shi B (2004) Scalable mining of large disk-base graph databases. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 316–325

  • Wang H, Wang W, Yang J, Yu PS (2002) Clustering by pattern similarity in large data sets. In: Proceeding of the 2002 ACM-SIGMOD international conference on management of data (SIGMOD’02), Madison, WI, pp 418–427

  • Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor 5:59–68

    Google Scholar 

  • Xin D, Han J, Li X, Wah WB (2003) Star-cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceeding of the 2003 international conference on very large data bases (VLDB’03), Berlin, Germany, pp 476–487

  • Xin D, Han J, Shao Z, Liu H (2006) C-cubing: efficient computation of closed cubes by aggregation-based checking. In: Proceeding of the 2006 international conference on data engineering (ICDE’06), Atlanta, Georgia, p 4

  • Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceeding of the 2005 international conference on very large data bases (VLDB’05), Trondheim, Norway, pp 709–720

  • Xin D, Shen X, Mei Q, Han J (2006) Discovering interesting patterns through user’s interactive feedback. In: Proceeding of the 2006 ACM SIGKDD international conference on knowledge discovery in databases (KDD’06), Philadelphia, PA, pp 773–778

  • Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, Yoo JS (2004) A framework for discovering co-location patterns in data sets with extended spatial objects. In: Proceeding of the 2004 SIAM international conference on data mining (SDM’04), Lake Buena Vista, FL, pp 78–89

  • Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: a profile-based approach. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD’05), Chicago, IL, pp 314–323

  • Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceeding of the 2002 international conference on data mining (ICDM’02), Maebashi, Japan, pp 721–724

  • Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 286–295

  • Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceeding of the 2003 SIAM international conference on data mining (SDM’03), San Fransisco, CA, pp 166–177

  • Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceeding of the 2004 ACM-SIGMOD international conference on management of data (SIGMOD’04), Paris, France, pp 335–346

  • Yan X, Yu PS, Han J (2005) Substructure similarity search in graph databases. In: Proceeding of the 2005 ACM-SIGMOD international conference on management of data (SIGMOD’05), Baltimore, MD, pp 766–777

  • Yan X, Zhou XJ, Han J (2005) Mining closed relational graphs with connectivity constraints. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD’05), Chicago, IL, pp 324–333

  • Yan X, Zhu F, Han J, Yu PS (2006) Searching substructures with superimposed distance. In: Proceeding of the 2006 international conference on data engineering (ICDE’06), Atlanta, Georgia, p 88

  • Yang G (2004) The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceeding of the 2004 ACM SIGKDD international conference on kowledge discovery in databases (KDD’04), Seattle, WA, pp 344–353

  • Yang C, Fayyad U, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceeding of the 2001 ACM SIGKDD international conference on knowledge discovery in databases (KDD’01), San Fransisco, CA, pp 194–203

  • Yang LH, Lee M-L, Hsu W (2003) Efficient mining of xml query patterns for caching. In: VLDB, pp 69–80

  • Yang J, Wang W (2003) CLUSEQ: efficient and effective sequence clustering. In: Proceeding of the 2003 international conference on data engineering (ICDE’03), Bangalore, India, pp 101–112

  • Yang J, Wang W, Yu PS (2003) Mining asynchronous periodic patterns in time series data. IEEE Trans Knowl Data Eng 15:613–628

    Article  Google Scholar 

  • Yin X, Han J (2003) CPAR: classification based on predictive association rules. In: Proceeding of the 2003 SIAM international conference on data mining (SDM’03), San Fransisco, CA, pp 331–335

  • Yoda K, Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1997) Computing optimized rectilinear regions for association rules. In: Proceeding of the 1997 international conference on knowledge discovery and data mining (KDD’97), Newport Beach, CA, pp 96–103

  • Yu JX, Chong Z, Lu H, Zhou A (2004) False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceeding of the 2004 international conference on very large data bases (VLDB’04), Toronto, Canada, pp 204–215

  • Yun U, Leggett J (2005) Wfim: weighted frequent itemset mining with a weight range and a minimum weight. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, CA, pp 636–640

  • Zaïane OR, Han J, Zhu H (2000) Mining recurrent items in multimedia with progressive resolution refinement. In: Proceeding of the 2000 international conference on data engineering (ICDE’00), San Diego, CA, pp 461–470

  • Zaki MJ (1998) Efficient enumeration of frequent sequences. In: Proceeding of the 7th international conference on information and knowledge management (CIKM’98), Washington, DC, pp 68–75

  • Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390

    Article  Google Scholar 

  • Zaki M (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 40:31–60

    Article  Google Scholar 

  • Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: Proceeding of the 2002 ACM SIGKDD international conference on knowledge discovery in databases (KDD’02), Edmonton, Canada, pp 71–80

  • Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceeding of the 2002 SIAM international conference on data mining (SDM’02), Arlington, VA, pp 457–473

  • Zaki MJ, Lesh N, Ogihara M (1998) PLANMINE: sequence mining for plan failures. In: Proceeding of the 1998 international conference on knowledge discovery and data mining (KDD’98), New York, NY, pp 369–373

  • Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) Parallel algorithm for discovery of association rules. data mining knowl discov, 1:343–374

    Article  Google Scholar 

  • Zhang X, Mamoulis N, Cheung DW, Shou Y (2004) Fast mining of spatial collocations. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 384–393

  • Zhang H, Padmanabhan B, Tuzhilin A (2004) On the discovery of significant statistical quantitative rules. In: Proceeding of the 2004 international conference on knowledge discovery and data mining (KDD’04), Seattle, WA, pp 374–383

  • Zhu F, Yan X, Han J, Yu PS, Cheng H (2007) Mining colossal frequent patterns by core pattern fusion. In: Proceeding of the 2007 international conference on data engineering (ICDE’07), Istanbul, Turkey

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiawei Han.

Additional information

Communicated by Geoff Webb.

The work was supported in part by the U.S. National Science Foundation NSF IIS-05-13678/06-42771 and NSF BDI-05-15813. Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of the funding agencies.

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Han, J., Cheng, H., Xin, D. et al. Frequent pattern mining: current status and future directions. Data Min Knowl Disc 15, 55–86 (2007). https://doi.org/10.1007/s10618-006-0059-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-006-0059-1

Keywords

  • Frequent pattern mining
  • Association rules
  • Data mining research
  • Applications