Frequent pattern mining: current status and future directions

Abstract

Frequent pattern mining has been a focused theme in data mining research for over a decade. Abundant literature has been dedicated to this research and tremendous progress has been made, ranging from efficient and scalable algorithms for frequent itemset mining in transaction databases to numerous research frontiers, such as sequential pattern mining, structured pattern mining, correlation mining, associative classification, and frequent pattern-based clustering, as well as their broad applications. In this article, we provide a brief overview of the current status of frequent pattern mining and discuss a few promising research directions. We believe that frequent pattern mining research has substantially broadened the scope of data analysis and will have deep impact on data mining methodologies and applications in the long run. However, there are still some challenging research issues that need to be solved before frequent pattern mining can claim a cornerstone approach in data mining applications.

References

  1. Afrati FN, Gionis A, Mannila H (2004) Approximating a collection of frequent sets. In: Proceedings of the 2004 ACM SIGKDD international conference knowledge discovery in databases (KDD’04), Seattle, WA, pp 12–19

  2. Agarwal R, Aggarwal CC, Prasad VVV (2001) A tree projection algorithm for generation of frequent itemsets. J Parallel Distribut Comput 61:350–371

    Google Scholar 

  3. Aggarwal CC, Yu PS (1998) A new framework for itemset generation. In: Proceedings of the 1998 ACM symposium on principles of database systems (PODS’98), Seattle, WA, pp 18–24

  4. Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), Seattle, WA, pp 94–105

  5. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM-SIGMOD international conference on management of data (SIGMOD’93), Washington, DC, pp 207–216

  6. Agrawal R, Shafer JC (1996) Parallel mining of association rules: design, implementation, and experience. IEEE Trans Knowl Data Eng 8:962–969

    Article  Google Scholar 

  7. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 1994 international conference on very large data bases (VLDB’94), Santiago, Chile, pp 487–499

  8. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 1995 international conference on data engineering (ICDE’95), Taipei, Taiwan, pp 3–14

  9. Ahmed KM, El-Makky NM, Taha Y (2000) A note on “beyond market basket: generalizing association rules to correlations”. SIGKDD Explorations 1:46–48

    Google Scholar 

  10. Asai T, Abe K, Kawasoe S, Arimura H, Satamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the 2002 SIAM international conference on data mining (SDM’02), Arlington, VA, pp 158–174

  11. Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In: Proceeding of the 1999 international conference on knowledge discovery and data mining (KDD’99), San Diego, CA, pp 261–270

  12. Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proceeding of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), Seattle, WA, pp 85–93

  13. Beil F, Ester M, Xu X (2002) Frequent term-based text clustering. In: Proceeding of the 2002 ACM SIGKDD international conference on knowledge discovery in databases (KDD’02), Edmonton, Canada, pp 436–442

  14. Bettini C, Sean Wang X, Jajodia S (1998) Mining temporal relationships with multiple granularities in time sequences. Bull Tech Committee Data Eng 21:32–38

    Google Scholar 

  15. Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg cubes. In: Proceeding of the 1999 ACM-SIGMOD international conference on management of data (SIGMOD’99), Philadelphia, PA, pp 359–370

  16. Blanchard J, Guillet F, Gras R, Briand H (2005) Using information-theoretic measures to assess association rule interestingness. In: Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 66–73

  17. Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2003) Exante: anticipated data reduction in constrained pattern mining. In: Proceeding of the 7th European conference on principles and pratice of knowledge discovery in databases (PKDD’03), pp 59–70

  18. Bonchi F, Lucchese C (2004) On closed constrained frequent pattern mining. In: Proceeding of the 2004 international conference on data mining (ICDM’04), Brighton, UK, pp 35–42

  19. Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceeding of the 2002 international conference on data mining (ICDM’02), Maebashi, Japan, pp 211–218

  20. Brin S, Motwani R, Silverstein C (1997) Beyond market basket: generalizing association rules to correlations. In: Proceeding of the 1997 ACM-SIGMOD international conference on management of data (SIGMOD’97), Tucson, AZ, pp 265–276

  21. Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket analysis. In: Proceeding of the 1997 ACM-SIGMOD international conference on management of data (SIGMOD’97), Tucson, AZ, pp 255–264

  22. Bucila C, Gehrke J, Kifer D, White W (2003) DualMiner: a dual-pruning algorithm for itemsets with constraints. Data Min knowl discov 7:241–272

    Google Scholar 

  23. Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceeding of the 2001 international conference on data engineering (ICDE’01), Heidelberg, Germany, pp 443–452

  24. Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proceeding of the 2002 European conference on principles and pratice of knowledge discovery in databases (PKDD’02), Helsinki, Finland, pp 74–85

  25. Calders T, Goethals B (2005) Depth-first non-derivable itemset mining. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, CA, pp 250–261

  26. Cao H, Mamoulis N, Cheung DW (2005) Mining frequent spatio-temporal sequential patterns. In: Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 82–89

  27. Chang J, Lee W (2003) Finding recent frequent itemsets adaptively over online data streams. In: Proceeding of the 2003 international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 487–492

  28. Chen MS, Park JS, Yu PS (1996) Data mining for path traversal patterns in a web environment. In: Proceeding of the 16th international conference on distributed computing systems, pp 385–392

  29. Cheng CH, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: Proceeding of the 1999 international conference on knowledge discovery and data mining (KDD’99), San Diego, CA, pp 84–93

  30. Cheng H, Yan X, Han J (2004) IncSpan: incremental mining of sequential patterns in large In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 527–532

  31. Cheng H, Yan X, Han J (2005) Seqindex: indexing sequences by sequential pattern analysis. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, CA, pp 601–605

  32. Cheng H, Yan X, Han J, Hsu C (2007) Discriminative frequent pattern analysis for effective classification. In: Proceeding of the 2007 international conference on data engineering (ICDE’07), Istanbul, Turkey

  33. Cheung DW, Han J, Ng V, Fu A, Fu Y (1996) A fast distributed algorithm for mining association rules. In: Proceeding of the 1996 international conference on parallel and distributed information systems, Miami Beach, FL, pp 31–44

  34. Cheung DW, Han J, Ng V, Wong CY (1996) Maintenance of discovered association rules in large an incremental updating technique. In: Proceeding of the 1996 international conference on data engineering (ICDE’96), New Orleans, LA, pp 106–114

  35. Chi Y, Wang H, Yu PS, Muntz R (2004) Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceeding of the 2004 international conference on data mining (ICDM’04), Brighton, UK, pp 59–66

  36. Cong S, Han J, Padua D (2005) Parallel mining of closed sequential patterns. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD’05), Chicago, IL, pp 562–567

  37. Cong G, Tan K-L, Tung AKH, Xu X (2005) Mining top-k covering rule groups for gene expression data. In: Proceeding of the 2005 ACM-SIGMOD international conference on management of data (SIGMOD’05), Baltimore, MD, pp 670–681

  38. Deshpande M, Kuramochi M, Karypis G (2003) Frequent sub-structure-based approaches for classifying chemical compounds. In: Proceeding of the 2002 international conference on data mining (ICDM’03), Melbourne, FL, pp 35–42

  39. Dong G, Han J, Lam J, Pei J, Wang K, Zou W (2004) Mining constrained gradients in multi-dimensional databases. IEEE Trans Knowl Data Eng 16:922–938

    Google Scholar 

  40. Dehaspe L, Toivonen H, King R (1998) Finding frequent substructures in chemical compounds. In: Proceeding of the 1998 international conference on knowledge discovery and data mining (KDD’98), New York, NY, pp 30–36

  41. Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceeding of the 1999 international conference on knowledge discovery and data mining (KDD’99), San Diego, CA, pp 43–52

  42. Eirinaki M, Vazirgiannis M (2003) Web mining for web personalization. ACM Trans Inter Tech 3:1–27

    Article  Google Scholar 

  43. Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1996) Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization. In: Proceeding of the 1996 ACM-SIGMOD international conference management of data (SIGMOD’96), Montreal, Canada, pp 13–23

  44. Gade K, Wang J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. In: Proceeding of the 2004 international conference on knowledge discovery and data mining (KDD’04), Seattle, WA, pp 138–147

  45. Garofalakis M, Rastogi R, Shim K (1999) SPIRIT: Sequential pattern mining with regular expression constraints. In: Proceeding of the 1999 international conference on Very large data bases (VLDB’99), Edinburgh, UK, pp 223–234

  46. Geerts F, Goethals B, Bussche J (2001) A tight upper bound on the number of candidate patterns. In: Proceeding of the 2001 international conference on data mining (ICDM’01), San Jose, CA, pp 155–162

  47. Gionis A, Kujala T, Mannila H (2003) Fragments of order. In: Proceeding of the 2003 international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 129–136

  48. Gionis A, Mannila H, Mielikäinen T, Tsaparas P (2006) Assessing data mining results via swap randomization. In: Proceeding of the 2006 ACM SIGKDD international conference on knowledge discovery in databases (KDD’06), Philadelphia, PA, pp 167–176

  49. Goethals B, Zaki M (2003) An introduction to workshop on frequent itemset mining implementations. In: Proceeding of the ICDM’03 international workshop on frequent itemset mining implementations (FIMI’03), Melbourne, FL, pp 1–13

  50. Grahne G, Lakshmanan L, Wang X (2000) Efficient mining of constrained correlated sets. In: Proceeding of the 2000 international conference on data engineering (ICDE’00), San Diego, CA, pp 512–521

  51. Grahne G, Zhu J (2003)Efficiently using prefix-trees in mining frequent itemsets. In: Proceeding of the ICDM’03 international workshop on frequent itemset mining implementations (FIMI’03), Melbourne, FL, pp 123–132

  52. Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: Proceeding of the 1999 international conference on data engineering (ICDE’99), Sydney, Australia, pp 106–115

  53. Han J, Fu Y (1995) Discovery of multiple-level association rules from large databases. In: Proceeding of the 1995 international conference on very large data bases (VLDB’95), Zurich, Switzerland, pp 420–431

  54. Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann

  55. Han J, Pei J, Dong G, Wang K (2001) Efficient computation of iceberg cubes with complex measures. In: Proceeding of the 2001 ACM-SIGMOD international conference on management of data (SIGMOD’01), Santa Barbara, CA, pp 1–12

  56. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceeding of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD’00), Dallas, TX, pp 1–12

  57. Hilderman RJ, Hamilton HJ (2001) Knowledge discovery and measures of interest. Kluwer Academic

  58. Holder LB, Cook DJ, Djoko S (1994) Substructure discovery in the subdue system. In: Proceeding of the AAAI’94 workshop knowledge discovery in databases (KDD’94), Seattle, WA, pp 169–180

  59. Holsheimer M, Kersten M, Mannila H, Toivonen H (1995) A perspective on databases and data mining. In Proceeding of the 1995 international conference on knowledge discovery and data mining (KDD’95), Montreal, Canada, pp 150–155

  60. Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A (2004) Mining spatial motifs from protein structure graphs. In: Proceeding of the 8th international conference on research in computational molecular biology (RECOMB), San Diego, CA, pp 308–315

  61. Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceeding of the 2003 international conference on data mining (ICDM’03), Melbourne, FL, pp 549–552

  62. Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 581–586

  63. Imielinski T, Khachiyan L, Abdulghani A (2002) Cubegrades: generalizing association rules. Data Min Knowl Discov 6:219–258

    Article  Google Scholar 

  64. Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceeding of the 2000 European symposium on the principle of data mining and knowledge discovery (PKDD’00), Lyon, France, pp 13–23

  65. Jaroszewicz S, Scheffer T (2005) Fast discovery of unexpected patterns in data relative to a bayesian network. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’05), Chicago, IL, pp 118–127

  66. Jaroszewicz S, Simovici D (2004) interestingness of frequent itemsets using bayesian networks as background knowledge. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04), Seattle, WA, pp 178–186

  67. Ji X, Bailey J, Dong G (2005) Mining minimal distinguishing subsequence patterns with gap constraints. In: Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 194–201

  68. Jin R, Agrawal G (2005) An algorithm for in-core frequent itemset mining on streaming data. In Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 210–217

  69. Jin R, Wang C, Polshakov D, Parthasarathy S, Agrawal G (2005) Discovering frequent topological structures from graph datasets. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD’05), Chicago, IL, pp 606–611

  70. Kamber M, Han J, Chiang JY (1997) Metarule-guided mining of multi-dimensional association rules using data cubes. In: Proceeding of the 1997 international conference on knowledge discovery and data mining (KDD’97), Newport Beach, CA, pp 207–210

  71. Karp RM, Papadimitriou CH, Shenker S (2003) A simple algorithm for finding frequent elements in streams and bags. ACM Trans Database Syst, 28:51–55

    Article  Google Scholar 

  72. Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: Proceeding of the 1995 international symposium on large spatial databases (SSD’95), Portland, ME, pp 47–66

  73. Kosala R, Blockeel H (2000) Web mining research: a survey. SIGKDD Explor 2

  74. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceeding of the 2001 international conference on data mining (ICDM’01), San Jose, CA, pp 313–320

  75. Kuramochi M, Karypis G (2004) GREW: a scalable frequent subgraph discovery algorithm. In Proceeding of the 2004 international conference on data mining (ICDM’04), Brighton, UK, pp 439–442

  76. Lakshmanan LVS, Ng R, Han J, Pang A (1999) Optimization of constrained frequent set queries with 2-variable constraints. In: Proceeding of the 1999 ACM-SIGMOD international conference on management of data (SIGMOD’99), Philadelphia, PA, pp 157–168

  77. Lakshmanan LVS, Pei J, Han J (2002) Quotient cube: how to summarize the semantics of a data cube. In: Proceeding of the 2002 international conference on very large data bases (VLDB’02), Hong Kong, China, pp 778–789

  78. Lee Y-K, Kim W-Y, Cai YD, Han J (2003) CoMine: efficient mining of correlated patterns. In: Proceeding of the 2003 international conference on data mining (ICDM’03), Melbourne, FL, pp 581–584

  79. Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proceeding of the 1997 international conference on data engineering (ICDE’97), Birmingham, England, pp 220–231

  80. Li Z, Chen Z, Srinivasan SM, Zhou Y (2004) C-Miner: mining block correlations in storage systems. In: Proceeding of the 2004 USENIX conference on file and storage technologies (FAST’04), San Francisco, CA, pp 173–186

  81. Li J, Dong G, Ramamohanrarao K (2000) Making use of the most expressive jumping emerging patterns for classification. In: Proceeding of the 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’00), Kyoto, Japan, pp 220–232

  82. Li X, Han J, Kim S (2006) Motion-alert: automatic anomaly detection in massive moving objects. In: IEEE international conference on intelligence and security informatics (ISI’06), San Diego, CA, pp 166–177

  83. Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceeding of the 2001 international conference on data mining (ICDM’01), San Jose, CA, pp 369–376

  84. Li Z, Lu S, Myagmar S, Zhou Y (2004) CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In: Proceeding of the 2004 symposium on operating systems design and implementation (OSDI’04), San Francisco, CA, pp 289–302

  85. Li Z, Zhou Y (2005) PR-Miner: Automatically extracting implicit programming rules and detecting violations in large software code. In: Proceeding of the 2005 ACM SIGSOFT symposium on foundations software eng (FSE’05), Lisbon, Portugal, pp 306–315

  86. Lin C, Chiu D, Wu Y, Chen A (2005) Mining frequent itemsets from data streams with a time-sensitive sliding window. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, pp 68–79

  87. Liu H, Han J, Xin D, Shao Z (2006) Mining frequent patterns on very high dimensional data: a top-down row enumeration approach. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 280–291

  88. Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceeding of the 1998 international conference on knowledge discovery and data mining (KDD’98), New York, NY, pp 80–86

  89. Liu G, Li J, Wong L, Hsu W (2006) Positive borders or negative borders: how to make lossless generator based representations concise. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 467–471

  90. Liu G, Lu H, Lou W, Yu JX (2003) On computing, storing and querying frequent patterns. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 607–612

  91. Liu J, Paulsen S, Sun X, Wang W, Nobel A, Prins J (2006) Mining approximate frequent itemsets in the presence of noise: algorithm and analysis. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 405–416

  92. Liu J, Pan Y, Wang K, Han J (2002) Mining frequent item sets by opportunistic projection. In: Proceeding of the 2002 ACM SIGKDD international conference on knowledge discovery in databases (KDD’02), Edmonton, Canada, pp 239–248

  93. Liu C, Yan X, Yu H, Han J, Yu PS (2005) Mining behavior graphs for “backtrace” of noncrashing bugs. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, pp 286–297

  94. Lu H, Han J, Feng L (1998) Stock movement and n-dimensional inter-transaction association rules. In: Proceeding of the 1998 SIGMOD workshop research issues on data mining and knowledge discovery (DMKD’98), Seattle, WA, pp 12:1–12:7

  95. Luo C, Chung S (2005) Efficient mining of maximal sequential patterns using multiple samples. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, CA, pp 415–426

  96. Ma S, Hellerstein JL (2001) Mining partially periodic event patterns with unknown periods. In: Proceeding of the 2001 international conference on data engineering (ICDE’01), Heidelberg, Germany, pp 205–214

  97. Manku G, Motwani R (2002) Approximate frequency counts over data streams. In: Proceeding of the 2002 international conference on very large data bases (VLDB’02), Hong Kong, China, pp 346–357

  98. Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Proceeding of the AAAI’94 workshop knowledge discovery in databases (KDD’94), Seattle, WA, pp 181–192

  99. Mannila H, Toivonen H, Verkamo AI (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1:259–289

    Article  Google Scholar 

  100. Mei Q, Xin D, Cheng H, Han J, Zhai C (2006) Generating semantic annotations for frequent patterns with context analysis. In: Proceeding of the 2006 ACM SIGKDD international conference on knowledge discovery in databases (KDD’06), Philadelphia, PA, pp 337–346

  101. Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: Proceeding of the 2005 international conference on database theory (ICDT’05), Edinburgh, UK, pp 398–412

  102. Miller RJ, Yang Y (1997) Association rules over interval data. In: Proceeding of the 1997 ACM-SIGMOD international conference on management of data (SIGMOD’97), Tucson, AZ, pp 452–461

  103. Nanopoulos A, Manolopoulos Y (2001) Mining patterns from graph traversals. Data Knowl Eng 37:243–266

    MATH  Article  Google Scholar 

  104. Ng R, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Proceeding of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), Seattle, WA, pp 13–24

  105. Nijssen S, Kok J (2004) A quickstart in frequent structure mining can make a difference. In: Proceeding of the 2004 ACM SIGKDD international conference on kowledge discovery in databases (KDD’04), Seattle, WA, pp 647–652

  106. Omiecinski E (2003) Alternative interest measures for mining associations. IEEE Trans Knowl and data engineering 15:57–69

    Article  Google Scholar 

  107. Özden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: Proceeding of the 1998 international conference on data engineering (ICDE’98), Orlando, FL, pp 412–421

  108. Pan F, Cong G, Tung AKH, Yang J, Zaki M (2003) CARPENTER: finding closed patterns in long biological datasets. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 637–642

  109. Pan F, Tung AKH, Cong G, Xu X (2004) COBBLER: combining column, and row enumeration for closed pattern discovery. In: Proceeding of the 2004 international conference on scientific and statistical database management (SSDBM’04), Santorini Island, Greece, pp 21–30

  110. Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. In: Proceeding of the 1995 ACM-SIGMOD international conference on management of data (SIGMOD’95), San Jose, CA, pp 175–186

  111. Park JS, Chen MS, Yu PS (1995) Efficient parallel mining for association rules. In: Proceeding of the 4th international conference on information and knowledge management, Baltimore, MD, pp 31–36

  112. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceeding of the 7th international conference on database theory (ICDT’99), Jerusalem, Israel, pp 398–416

  113. Pei J, Dong G, Zou W, Han J (2002) On computing condensed frequent pattern bases. In: Proceeding of the 2002 international conference on data mining (ICDM’02), Maebashi, Japan, pp 378–385

  114. Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Proceeding of the 2001 international conference on data engineering (ICDE’01), Heidelberg, Germany, pp 433–332

  115. Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Proceeding of the 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD’00), Dallas, TX, pp 11–20

  116. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M-C (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceeding of the 2001 international conference on data engineering (ICDE’01), Heidelberg, Germany, pp 215–224

  117. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16:1424–1440

    Google Scholar 

  118. Pei J, Han J, Mortazavi-Asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In: Proceeding of the 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’00), Kyoto, Japan, pp 396–407

  119. Pei J, Han J, Wang W (2002) Constraint-based sequential pattern mining in large databases. In: Proceeding of the 2002 international conference on information and knowledge management (CIKM’02), McLean, VA, pp 18–25

  120. Pei J, Liu J, Wang H, Wang K, Yu PS, Yang J (2005) Efficiently mining frequent closed partial orders. In: Proceeding of the 2005 international conference on data mining (ICDM’05), Houston, TX, pp 753–756

  121. Piatetsky-Shapiro G (1991) Notes of AAAI’91 workshop knowledge discovery in databases (KDD’91). AAAI/MIT Press, Anaheim, CA

  122. Pinto H, Han J, Pei J, Wang K, Chen Q, Dayal U (2001) Multi-dimensional sequential pattern mining. In: Proceeding of the 2001 international conference on information and knowledge management (CIKM’01), Atlanta, GA, pp 81–88

  123. Punin J, Krishnamoorthy M, Zaki M (2001) Web usage mining: languages and algorithms. Springer-Verlag

  124. Ramesh G, Maniatty WA, Zaki MJ (2003) Feasible itemset distributions in data mining: theory and application. In: Proceeding of the 2003 ACM symposium on principles of database systems (PODS’03), San Diego, CA, pp 284–295

  125. Sarawagi S, Thomas S, Agrawal R (1998) Integrating association rule mining with relational database systems: alternatives and implications. In: Proceeding of the 1998 ACM-SIGMOD international conference on management of data (SIGMOD’98), Seattle, WA, pp 343–354

  126. Savasere A, Omiecinski E, Navathe S (1995) An efficient algorithm for mining association rules in large databases. In: Proceeding of the 1995 international conference on very large data bases (VLDB’95), Zurich, Switzerland, pp 432–443

  127. Seppänen J, Mannila H (2004) Dense itemsets. In: Proceeding of the 2004 international conference on knowledge discovery and data mining (KDD’04), Seattle, WA, pp 683–688

  128. Shekar B, Natarajan R (2004) A transaction-based neighbourhood-driven approach to quantifying interestingness of assoication rules. In: Proceeding of the 2004 international conference on data mining (ICDM’04), Brighton, UK, pp 194–201

  129. Siebes A, Vreeken J, Leeuwen M (2006) Item sets that compress. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 393–404

  130. Silverstein C, Brin S, Motwani R, Ullman JD (1998) Scalable techniques for mining causal structures. In: Proceeding of the 1998 international conference on very large data bases (VLDB’98), New York, NY, pp 594–605

  131. Sismanis Y, Roussopoulos N, Deligianannakis A, Kotidis Y (2002) Dwarf: shrinking the petacube. In: Proceeding of the 2002 ACM-SIGMOD international conference on management of data (SIGMOD’02), Madison, WI, pp 464–475

  132. Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceeding of the 1995 international conference on very large data bases (VLDB’95), Zurich, Switzerland, pp 407–419

  133. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceeding of the 5th international conference on extending database technology (EDBT’96), Avignon, France, pp 3–17

  134. Srivastava J, Cooley R, Deshpande M, Tan PN (2000) Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor 1:12–23

    Google Scholar 

  135. Steinbach M, Tan P, Kumar V (2004) Support envelopes: A technique for exploring the structure of association patterns. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 296–305

  136. Tan P-N, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceeding of the 2002 ACM SIGKDD international conference on knowledge discovery in databases (KDD’02), Edmonton, Canada, pp 32–41

  137. Ting R, Bailey J (2006) Mining minimal contrast subgraph patterns. In: Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, pp 638–642

  138. Toivonen H (1996) Sampling large databases for association rules. In: Proceeding of the 1996 international conference on very large data bases (VLDB’96), Bombay, India, pp 134–145

  139. Ukkonen A, Fortelius M, Mannila H (2005) Finding partial orders from unordered 0-1 data. In: Proceeding of the 2005 international conference on knowledge discovery and data mining (KDD’05), Chicago, IL, pp 285–293

  140. Vanetik N, Gudes E, Shimony SE (2002) Computing frequent graph patterns from semistructured data. In: Proceeding of the 2002 international conference on data mining (ICDM’02), Maebashi, Japan, pp 458–465

  141. Wang J, Han J (2004) BIDE: Efficient mining of frequent closed sequences. In: Proceeding of the 2004 international conference on data engineering (ICDE’04), Boston, MA, pp 79–90

  142. Wang J, Han J, Lu Y, Tzvetkov P (2005) TFP: An efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17:652–664

    Article  Google Scholar 

  143. Wang J, Han J, Pei J (2003) CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 236–245

  144. Wang K, Jiang Y, Lakshmanan L (2003) Mining unexpected rules by pushing user dynamics. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery in databases (KDD’03), Washington, DC, pp 246–255

  145. Wang J, Karypis G (2005) HARMONY: efficiently mining the best rules for classification. In: Proceeding of the 2005 SIAM conference on data mining (SDM’05), Newport Beach, CA, pp 205–216

  146. Wang W, Lu H, Feng J, Yu JX (2002) Condensed cube: an effective approach to reducing data cube size. In: Proceeding of the 2002 international conference on data engineering (ICDE’02), San Fransisco, CA, pp 155–165

  147. Wang C, Wang W, Pei J, Zhu Y, Shi B (2004) Scalable mining of large disk-base graph databases. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 316–325

  148. Wang H, Wang W, Yang J, Yu PS (2002) Clustering by pattern similarity in large data sets. In: Proceeding of the 2002 ACM-SIGMOD international conference on management of data (SIGMOD’02), Madison, WI, pp 418–427

  149. Washio T, Motoda H (2003) State of the art of graph-based data mining. SIGKDD Explor 5:59–68

    Google Scholar 

  150. Xin D, Han J, Li X, Wah WB (2003) Star-cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceeding of the 2003 international conference on very large data bases (VLDB’03), Berlin, Germany, pp 476–487

  151. Xin D, Han J, Shao Z, Liu H (2006) C-cubing: efficient computation of closed cubes by aggregation-based checking. In: Proceeding of the 2006 international conference on data engineering (ICDE’06), Atlanta, Georgia, p 4

  152. Xin D, Han J, Yan X, Cheng H (2005) Mining compressed frequent-pattern sets. In: Proceeding of the 2005 international conference on very large data bases (VLDB’05), Trondheim, Norway, pp 709–720

  153. Xin D, Shen X, Mei Q, Han J (2006) Discovering interesting patterns through user’s interactive feedback. In: Proceeding of the 2006 ACM SIGKDD international conference on knowledge discovery in databases (KDD’06), Philadelphia, PA, pp 773–778

  154. Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, Yoo JS (2004) A framework for discovering co-location patterns in data sets with extended spatial objects. In: Proceeding of the 2004 SIAM international conference on data mining (SDM’04), Lake Buena Vista, FL, pp 78–89

  155. Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: a profile-based approach. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD’05), Chicago, IL, pp 314–323

  156. Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceeding of the 2002 international conference on data mining (ICDM’02), Maebashi, Japan, pp 721–724

  157. Yan X, Han J (2003) CloseGraph: mining closed frequent graph patterns. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 286–295

  158. Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceeding of the 2003 SIAM international conference on data mining (SDM’03), San Fransisco, CA, pp 166–177

  159. Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceeding of the 2004 ACM-SIGMOD international conference on management of data (SIGMOD’04), Paris, France, pp 335–346

  160. Yan X, Yu PS, Han J (2005) Substructure similarity search in graph databases. In: Proceeding of the 2005 ACM-SIGMOD international conference on management of data (SIGMOD’05), Baltimore, MD, pp 766–777

  161. Yan X, Zhou XJ, Han J (2005) Mining closed relational graphs with connectivity constraints. In: Proceeding of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD’05), Chicago, IL, pp 324–333

  162. Yan X, Zhu F, Han J, Yu PS (2006) Searching substructures with superimposed distance. In: Proceeding of the 2006 international conference on data engineering (ICDE’06), Atlanta, Georgia, p 88

  163. Yang G (2004) The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceeding of the 2004 ACM SIGKDD international conference on kowledge discovery in databases (KDD’04), Seattle, WA, pp 344–353

  164. Yang C, Fayyad U, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceeding of the 2001 ACM SIGKDD international conference on knowledge discovery in databases (KDD’01), San Fransisco, CA, pp 194–203

  165. Yang LH, Lee M-L, Hsu W (2003) Efficient mining of xml query patterns for caching. In: VLDB, pp 69–80

  166. Yang J, Wang W (2003) CLUSEQ: efficient and effective sequence clustering. In: Proceeding of the 2003 international conference on data engineering (ICDE’03), Bangalore, India, pp 101–112

  167. Yang J, Wang W, Yu PS (2003) Mining asynchronous periodic patterns in time series data. IEEE Trans Knowl Data Eng 15:613–628

    Article  Google Scholar 

  168. Yin X, Han J (2003) CPAR: classification based on predictive association rules. In: Proceeding of the 2003 SIAM international conference on data mining (SDM’03), San Fransisco, CA, pp 331–335

  169. Yoda K, Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1997) Computing optimized rectilinear regions for association rules. In: Proceeding of the 1997 international conference on knowledge discovery and data mining (KDD’97), Newport Beach, CA, pp 96–103

  170. Yu JX, Chong Z, Lu H, Zhou A (2004) False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: Proceeding of the 2004 international conference on very large data bases (VLDB’04), Toronto, Canada, pp 204–215

  171. Yun U, Leggett J (2005) Wfim: weighted frequent itemset mining with a weight range and a minimum weight. In: Proceeding of the 2005 SIAM international conference on data mining (SDM’05), Newport Beach, CA, pp 636–640

  172. Zaïane OR, Han J, Zhu H (2000) Mining recurrent items in multimedia with progressive resolution refinement. In: Proceeding of the 2000 international conference on data engineering (ICDE’00), San Diego, CA, pp 461–470

  173. Zaki MJ (1998) Efficient enumeration of frequent sequences. In: Proceeding of the 7th international conference on information and knowledge management (CIKM’98), Washington, DC, pp 68–75

  174. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390

    Article  Google Scholar 

  175. Zaki M (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 40:31–60

    Article  Google Scholar 

  176. Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: Proceeding of the 2002 ACM SIGKDD international conference on knowledge discovery in databases (KDD’02), Edmonton, Canada, pp 71–80

  177. Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceeding of the 2002 SIAM international conference on data mining (SDM’02), Arlington, VA, pp 457–473

  178. Zaki MJ, Lesh N, Ogihara M (1998) PLANMINE: sequence mining for plan failures. In: Proceeding of the 1998 international conference on knowledge discovery and data mining (KDD’98), New York, NY, pp 369–373

  179. Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) Parallel algorithm for discovery of association rules. data mining knowl discov, 1:343–374

    Article  Google Scholar 

  180. Zhang X, Mamoulis N, Cheung DW, Shou Y (2004) Fast mining of spatial collocations. In: Proceeding of the 2004 ACM SIGKDD international conference on knowledge discovery in databases (KDD’04), Seattle, WA, pp 384–393

  181. Zhang H, Padmanabhan B, Tuzhilin A (2004) On the discovery of significant statistical quantitative rules. In: Proceeding of the 2004 international conference on knowledge discovery and data mining (KDD’04), Seattle, WA, pp 374–383

  182. Zhu F, Yan X, Han J, Yu PS, Cheng H (2007) Mining colossal frequent patterns by core pattern fusion. In: Proceeding of the 2007 international conference on data engineering (ICDE’07), Istanbul, Turkey

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jiawei Han.

Additional information

The work was supported in part by the U.S. National Science Foundation NSF IIS-05-13678/06-42771 and NSF BDI-05-15813. Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect the views of the funding agencies.

Communicated by Geoff Webb.

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Han, J., Cheng, H., Xin, D. et al. Frequent pattern mining: current status and future directions. Data Min Knowl Disc 15, 55–86 (2007). https://doi.org/10.1007/s10618-006-0059-1

Download citation

Keywords

  • Frequent pattern mining
  • Association rules
  • Data mining research
  • Applications