Abstract
Frequent Pattern Mining (FPM) is a very powerful paradigm for mining informative and useful patterns in massive, complex datasets. In this paper we propose the Data Mining Template Library, a collection of generic containers and algorithms for FPM, as well as persistency and database management classes. DMTL provides a systematic solution to a whole class of common FPM tasks like itemset, sequence, tree and graph mining. DMTL is extensible, scalable, and high-performance for rapid response on massive datasets. Our experiments show that DMTL is competitive with special purpose algorithms designed for a particular pattern type, especially as database sizes increase.
Keywords
- Association Rule
- Frequent Pattern
- Mining Algorithm
- Pattern Mining
- Formal Concept Analysis
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work was supported by NSF Grant EIA-0103708 under the KD-D program, NSF CAREER Award IIS-0092978, and DOE Early Career PI Award DE-FG02-02ER25538.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Inkeri Verkamo, A.: Fast discovery of association rules. In: Fayyad, U., et al. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: 11th Intl. Conf. on Data Engg. (1995)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S.: Efficient substructure discovery from large semi-structured data. In: 2nd SIAM Int’l Conference on Data Mining (April 2002)
Austern, M.H.: Generic Programming and the STL. Addison Wesley Longman, Inc., Amsterdam (1999)
Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining frequent patterns with counting inference. SIGKDD Explorations, 2(2) (December 2000)
Burdick, D., Calimlim, M., Gehrke, J.: MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Intl. Conf. on Data Engineering (April 2001)
Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: Cmtreeminer: Mining both closed and maximal frequent subtrees. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 63–73. Springer, Heidelberg (2004)
Cristofor, D., Cristofor, L., Simovici, D.: Galois connection and data mining. Journal of Universal Computer Science 6(1), 60–73 (2000)
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)
Goethals, B., Zaki, M.J.: Advances in frequent itemset mining implementations: report on FIMI 2003. SIGKDD Explorations, 6(1) (June 2003)
Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Balcazar, J.L., Casas-Garriga, G.: On horn axiomatizations for sequential data. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 215–229. Springer, Heidelberg (2005)
Knizhnik, K.: Gigabase, object-relational database management system, http://sourceforge.net/projects/gigabase
Kohavi, R., Sommerfield, D., Dougherty, J.: Data mining using mlc++, a machine learning library in c++. International Journal of Artificial Intelligence Tools 6(4), 537–566 (1997)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: 1st IEEE Int’l Conf. on Data Mining (November 2001)
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: 1st Int’l Workshop on Mining Graphs, Trees and Sequences (2003)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999)
Pei, J., Han, J., Mao, R.: Closet: An efficient algorithm for mining frequent closed itemsets. In: SIGMOD Int’l Workshop on Data Mining and Knowledge Discovery (May 2000)
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: 5th Intl. Conf. Extending Database Technology (March 1996)
Termier, A., Rousset, M.-C., Sebag, M.: Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases. In: IEEE Int’l Conf. on Data Mining (2004)
Wang, J., Han, J., Pei, J.: Closet+: Searching for the best strategies for mining frequent closed itemsets. In: ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 2003)
Wang, J., Han, J.: Bide: Efficient mining of frequent closed sequences. In: IEEE Int’l Conf. on Data Engineering (2004)
Wippler, J.-C.: Metakit, http://www.equi4.com/metakit/
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco (1999)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: IEEE Int’l Conf. on Data Mining (2002)
Yan, X., Han, J.: Closegraph: Mining closed frequent graph patterns. In: ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (August 2003)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)
Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal 42(1/2), 31–60 (2001)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (July 2002)
Zaki, M.J., Hsiao, C.-J.: CHARM: An efficient algorithm for closed itemset mining. In: 2nd SIAM International Conference on Data Mining (April 2002)
Zaki, M.J., Ogihara, M.: Theoretical foundations of association rules. In: 3rd ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (June 1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zaki, M.J. et al. (2005). Towards Generic Pattern Mining. In: Ganter, B., Godin, R. (eds) Formal Concept Analysis. ICFCA 2005. Lecture Notes in Computer Science(), vol 3403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32262-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-32262-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24525-4
Online ISBN: 978-3-540-32262-7
eBook Packages: Computer ScienceComputer Science (R0)
