Abstract
Several new miners for frequent subgraphs have been published recently. Whereas new approaches are presented in detail, the quantitative evaluations are often of limited value: only the performance on a small set of graph databases is discussed and the new algorithm is often only compared to a single competitor based on an executable. It remains unclear, how the algorithms work on bigger/other graph databases and which of their distinctive features is best suited for which database. We have re-implemented the subgraph miners MoFa, gSpan, FFSM, and Gaston within a common code base and with the same level of programming expertise and optimization effort. This paper presents the results of a comparative benchmarking that ran the algorithms on a comprehensive set of graph databases.
Chapter PDF
References
Fischer, I., Meinl, T.: Subgraph Mining. In: Wang, J. (ed.) Encyclopedia of Data Warehousing and Mining. Idea Group Reference, Hershey, PA, USA (2005)
Washio, T., Motoda, H.: State of the Art of Graph–based Data Mining. SIGKDD Explorations Newsletter 5, 59–68 (2003)
McKay, B.: Practical graph isomorphism. Congressus Numerantium 30 (1981)
Agrawal, R., Imielinski, T., Swami, A.N.: Mining Association Rules between Sets of Items in Large Databases. In: Buneman, P., Jajodia, S. (eds.) Proc. 1993 ACM SIGMOD Int’l Conf. on Management of Data, Washington, D.C., USA, pp. 207–216. ACM Press, New York (1993)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Heckerman, D., Mannila, H., Pregibon, D., Uthurusamy, R., Park, M. (eds.) 3rd Int’l Conf. on Knowledge Discovery and Data Mining, pp. 283–296. AAAI Press, Menlo Park (1997)
Cook, D.J., Holder, L.B.: Substructure Discovery Using Minimum Description Length and Background Knowledge. J. of Artificial Intelligence Research 1, 231–255 (1994)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proceedings of the IEEE Intl. Conf. on Data Mining ICDM, Piscataway, NJ, USA, pp. 313–320. IEEE Press, Los Alamitos (2001)
Borgelt, C., Berthold, M.R.: Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In: Proc. IEEE Int’l Conf. on Data Mining ICDM, Maebashi City, Japan, pp. 51–58 (2002)
Yan, X., Han, J.: gSpan: Graph–Based Substructure Pattern Mining. In: Proc. IEEE Int’l Conf. on Data Mining ICDM, Maebashi City, Japan, pp. 721–723 (2002)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Proceedings of the 3rd IEEE Intl. Conf. on Data Mining ICDM, Piscataway, NJ, USA, pp. 549–552. IEEE Press, Los Alamitos (2003)
Nijssen, S., Kok, J.N.: Frequent Graph Mining and its Application to Molecular Databases. In: Thissen, W., Wieringa, P., Pantic, M., Ludema, M. (eds.) Proc. of the 2004 IEEE Conf. on Systems, Man and Cybernetics, SMC 2004, Den Haag, The Netherlands, pp. 4571–4577 (2004)
Institute of Scientific Information, Inc. (ISI): Index chemicus - subset from 1993 (1993)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. Technical report, Leiden Institute of Advanced Computer Science, Leiden University (2004)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. Technical report, Department of Computer Science at the University of North Carolina, Chapel Hill (2003)
Srinivasan, A., King, R.D., Muggleton, S.H., Sternberg, M.: The predictive toxicology evaluation challenge. In: Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI 1997), pp. 1–6. Morgan-Kaufmann, San Francisco (1997)
Yan, X., Han, J.: Closegraph: Mining Closed Frequent Graph Patterns. In: Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, Washington, DC, USA, pp. 286–295. ACM Press, New York (2003)
Meinl, T., Borgelt, C., Berthold, M.R.: Discriminative Closed Fragment Mining and Pefect Extensions in MoFa. In: Onaindia, E., Staab, S. (eds.) STAIRS 2004 - Proc. of the Second Starting AI Researchers’ Symp. Frontiers in Artificial Intelligence and Applications., Valencia, Spain, vol. 109, pp. 3–14. IOS Press, Amsterdam (2004)
Hofer, H., Borgelt, C., Berthold, M.R.: Large Scale Mining of Molecular Fragments with Wildcards. In: R. Berthold, M., Lenz, H.-J., Bradley, E., Kruse, R., Borgelt, C. (eds.) IDA 2003. LNCS, vol. 2810, pp. 380–389. Springer, Heidelberg (2003)
Meinl, T., Borgelt, C., Berthold, M.R.: Mining Fragments with Fuzzy Chains in Molecular Databases. In: Kok, J.N., Washio, T. (eds.) Proc. of the Workshop W7 on Mining Graphs, Trees and Sequences (MGTS 2004), Pisa, Italy, pp. 49–60 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wörlein, M., Meinl, T., Fischer, I., Philippsen, M. (2005). A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_39
Download citation
DOI: https://doi.org/10.1007/11564126_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)