Skip to main content

Towards Generic Pattern Mining

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 3403)

Abstract

Frequent Pattern Mining (FPM) is a very powerful paradigm for mining informative and useful patterns in massive, complex datasets. In this paper we propose the Data Mining Template Library, a collection of generic containers and algorithms for FPM, as well as persistency and database management classes. DMTL provides a systematic solution to a whole class of common FPM tasks like itemset, sequence, tree and graph mining. DMTL is extensible, scalable, and high-performance for rapid response on massive datasets. Our experiments show that DMTL is competitive with special purpose algorithms designed for a particular pattern type, especially as database sizes increase.

Keywords

  • Association Rule
  • Frequent Pattern
  • Mining Algorithm
  • Pattern Mining
  • Formal Concept Analysis

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This work was supported by NSF Grant EIA-0103708 under the KD-D program, NSF CAREER Award IIS-0092978, and DOE Early Career PI Award DE-FG02-02ER25538.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Inkeri Verkamo, A.: Fast discovery of association rules. In: Fayyad, U., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th Intl. Conf. on Data Engg. (1995)

    Google Scholar 

  3. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: 2nd SIAM Int’l Conference on Data Mining (April 2002)

    Google Scholar 

  4. Austern, M.H.: Generic Programming and the STL. Addison Wesley Longman, Inc., Amsterdam (1999)

    Google Scholar 

  5. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD Explorations, 2(2) (December 2000)

    Google Scholar 

  6. Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Intl. Conf. on Data Engineering (April 2001)

    Google Scholar 

  7. Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: Cmtreeminer: Mining both closed and maximal frequent subtrees. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 63–73. Springer, Heidelberg (2004)

    CrossRef  Google Scholar 

  8. Cristofor, D., Cristofor, L., Simovici, D.: Galois connection and data mining. Journal of Universal Computer Science 6(1), 60–73 (2000)

    MATH  MathSciNet  Google Scholar 

  9. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  10. Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: report on FIMI 2003. SIGKDD Explorations, 6(1) (June 2003)

    Google Scholar 

  11. Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)

    Google Scholar 

  12. Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)

    CrossRef  Google Scholar 

  13. Balcazar, J.L., Casas-Garriga, G.: On horn axiomatizations for sequential data. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 215–229. Springer, Heidelberg (2005)

    CrossRef  Google Scholar 

  14. Knizhnik, K.: Gigabase, object-relational database management system, http://sourceforge.net/projects/gigabase

  15. Kohavi, R., Sommerfield, D., Dougherty, J.: Data mining using mlc++, a machine learning library in c++. International Journal of Artificial Intelligence Tools 6(4), 537–566 (1997)

    CrossRef  Google Scholar 

  16. Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)

    Google Scholar 

  17. Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: 1st Int’l Workshop on Mining Graphs, Trees and Sequences (2003)

    Google Scholar 

  18. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999)

    CrossRef  Google Scholar 

  19. Pei, J., Han, J., Mao, R.: Closet: An efficient algorithm for mining frequent closed itemsets. In: SIGMOD Int’l Workshop on Data Mining and Knowledge Discovery (May 2000)

    Google Scholar 

  20. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: 5th Intl. Conf. Extending Database Technology (March 1996)

    Google Scholar 

  21. Termier, A., Rousset, M.-C., Sebag, M.: Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: IEEE Int’l Conf. on Data Mining (2004)

    Google Scholar 

  22. Wang, J., Han, J., Pei, J.: Closet+: Searching for the best strategies for mining frequent closed itemsets. In: ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 2003)

    Google Scholar 

  23. Wang, J., Han, J.: Bide: Efficient mining of frequent closed sequences. In: IEEE Int’l Conf. on Data Engineering (2004)

    Google Scholar 

  24. Wippler, J.-C.: Metakit, http://www.equi4.com/metakit/

  25. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco (1999)

    Google Scholar 

  26. Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: IEEE Int’l Conf. on Data Mining (2002)

    Google Scholar 

  27. Yan, X., Han, J.: Closegraph: Mining closed frequent graph patterns. In: ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (August 2003)

    Google Scholar 

  28. Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)

    CrossRef  MathSciNet  Google Scholar 

  29. Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal 42(1/2), 31–60 (2001)

    CrossRef  MATH  Google Scholar 

  30. Zaki, M.J.: Efficiently mining frequent trees in a forest. In: 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (July 2002)

    Google Scholar 

  31. Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: 2nd SIAM International Conference on Data Mining (April 2002)

    Google Scholar 

  32. Zaki, M.J., Ogihara, M.: Theoretical foundations of association rules. In: 3rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (June 1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zaki, M.J. et al. (2005). Towards Generic Pattern Mining. In: Ganter, B., Godin, R. (eds) Formal Concept Analysis. ICFCA 2005. Lecture Notes in Computer Science(), vol 3403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32262-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-32262-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24525-4

  • Online ISBN: 978-3-540-32262-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics