Advertisement

Towards Generic Pattern Mining

  • Mohammed J. Zaki
  • Nagender Parimi
  • Nilanjana De
  • Feng Gao
  • Benjarath Phoophakdee
  • Joe Urban
  • Vineet Chaoji
  • Mohammad Al Hasan
  • Saeed Salem
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3403)

Abstract

Frequent Pattern Mining (FPM) is a very powerful paradigm for mining informative and useful patterns in massive, complex datasets. In this paper we propose the Data Mining Template Library, a collection of generic containers and algorithms for FPM, as well as persistency and database management classes. DMTL provides a systematic solution to a whole class of common FPM tasks like itemset, sequence, tree and graph mining. DMTL is extensible, scalable, and high-performance for rapid response on massive datasets. Our experiments show that DMTL is competitive with special purpose algorithms designed for a particular pattern type, especially as database sizes increase.

Keywords

Association Rule Frequent Pattern Mining Algorithm Pattern Mining Formal Concept Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Inkeri Verkamo, A.: Fast discovery of association rules. In: Fayyad, U., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)Google Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th Intl. Conf. on Data Engg. (1995)Google Scholar
  3. 3.
    Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: 2nd SIAM Int’l Conference on Data Mining (April 2002)Google Scholar
  4. 4.
    Austern, M.H.: Generic Programming and the STL. Addison Wesley Longman, Inc., Amsterdam (1999)Google Scholar
  5. 5.
    Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD Explorations, 2(2) (December 2000)Google Scholar
  6. 6.
    Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Intl. Conf. on Data Engineering (April 2001)Google Scholar
  7. 7.
    Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: Cmtreeminer: Mining both closed and maximal frequent subtrees. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 63–73. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Cristofor, D., Cristofor, L., Simovici, D.: Galois connection and data mining. Journal of Universal Computer Science 6(1), 60–73 (2000)zbMATHMathSciNetGoogle Scholar
  9. 9.
    Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)zbMATHGoogle Scholar
  10. 10.
    Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: report on FIMI 2003. SIGKDD Explorations, 6(1) (June 2003)Google Scholar
  11. 11.
    Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)Google Scholar
  12. 12.
    Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Balcazar, J.L., Casas-Garriga, G.: On horn axiomatizations for sequential data. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 215–229. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Knizhnik, K.: Gigabase, object-relational database management system, http://sourceforge.net/projects/gigabase
  15. 15.
    Kohavi, R., Sommerfield, D., Dougherty, J.: Data mining using mlc++, a machine learning library in c++. International Journal of Artificial Intelligence Tools 6(4), 537–566 (1997)CrossRefGoogle Scholar
  16. 16.
    Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)Google Scholar
  17. 17.
    Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: 1st Int’l Workshop on Mining Graphs, Trees and Sequences (2003)Google Scholar
  18. 18.
    Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  19. 19.
    Pei, J., Han, J., Mao, R.: Closet: An efficient algorithm for mining frequent closed itemsets. In: SIGMOD Int’l Workshop on Data Mining and Knowledge Discovery (May 2000)Google Scholar
  20. 20.
    Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: 5th Intl. Conf. Extending Database Technology (March 1996)Google Scholar
  21. 21.
    Termier, A., Rousset, M.-C., Sebag, M.: Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: IEEE Int’l Conf. on Data Mining (2004)Google Scholar
  22. 22.
    Wang, J., Han, J., Pei, J.: Closet+: Searching for the best strategies for mining frequent closed itemsets. In: ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 2003)Google Scholar
  23. 23.
    Wang, J., Han, J.: Bide: Efficient mining of frequent closed sequences. In: IEEE Int’l Conf. on Data Engineering (2004)Google Scholar
  24. 24.
    Wippler, J.-C.: Metakit, http://www.equi4.com/metakit/
  25. 25.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco (1999)Google Scholar
  26. 26.
    Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: IEEE Int’l Conf. on Data Mining (2002)Google Scholar
  27. 27.
    Yan, X., Han, J.: Closegraph: Mining closed frequent graph patterns. In: ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (August 2003)Google Scholar
  28. 28.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal 42(1/2), 31–60 (2001)zbMATHCrossRefGoogle Scholar
  30. 30.
    Zaki, M.J.: Efficiently mining frequent trees in a forest. In: 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (July 2002)Google Scholar
  31. 31.
    Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: 2nd SIAM International Conference on Data Mining (April 2002)Google Scholar
  32. 32.
    Zaki, M.J., Ogihara, M.: Theoretical foundations of association rules. In: 3rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (June 1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Mohammed J. Zaki
    • 1
  • Nagender Parimi
    • 1
  • Nilanjana De
    • 1
  • Feng Gao
    • 1
  • Benjarath Phoophakdee
    • 1
  • Joe Urban
    • 1
  • Vineet Chaoji
    • 1
  • Mohammad Al Hasan
    • 1
  • Saeed Salem
    • 1
  1. 1.Computer Science DepartmentRensselaer Polytechnic InstituteTroy

Personalised recommendations