Canonical Forms for Frequent Graph Mining

Borgelt, Christian

doi:10.1007/978-3-540-70981-7_38

Christian Borgelt³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3812 Accesses
7 Citations

Abstract

A core problem of approaches to frequent graph mining, which are based on growing subgraphs into a set of graphs, is how to avoid redundant search. A powerful technique for this is a canonical description of a graph, which uniquely identifies it, and a corresponding test. I introduce a family of canonical forms based on systematic ways to construct spanning trees. I show that the canonical form used in gSpan ([Yan and Han (2002)]) is a member of this family, and that MoSS/MoFa ([Borgelt and Berthold (2002), Borgelt et al. (2005)]) is implicitly based on a different member, which I make explicit and exploit in the same way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BORGELT, C. and BERTHOLD, M.R. (2002): Mining Molecular Fragments: Finding Relevant Substructures of Molecules. Proc. 2nd IEEE Int. Conf. on Data Mining. IEEE Press, Piscataway, 51–58.
Google Scholar
BORGELT, C., MEINL, T. and BERTHOLD, M.R. (2004): Advanced Pruning Strategies to Speed Up Mining Closed Molecular Fragments. Proc. IEEE Conf. on Systems, Man and Cybernetics, CD-ROM. IEEE Press, Piscataway.
Google Scholar
BORGELT, C., MEINL, T. and BERTHOLD, M.R. (2005): MoSS: A Program for Molecular Substructure Mining. Proc. Open Source Data Mining Workshop. ACM Press, New York, 6–15.
Google Scholar
COOK, D.J. and HOLDER, L.B. (2000): Graph-based Data Mining. IEEE Trans. on Intelligent Systems 15,2, 32–41.
Article Google Scholar
FINN, P.W., MUGGLETON, S., PAGE, D. and SRINIVASAN, A. (1998): Pharmacore Discovery Using the Inductive Logic Programming System PROGOL. Machine Learning 30,2–3, 241–270.
Article Google Scholar
GOETHALS, B. and ZAKI, M. (2003/2004): Proc. 1st and 2nd IEEE ICDM Workshop on Frequent Itemset Mining Implementations. CEUR Workshop Proceedings 90 and 126. Sun SITE Central Europe and RWTH Aachen http://www.ceur-ws.org/Vol-90/, http://www.ceur-ws.org/Vol-126/.
Google Scholar
HUAN, J., WANG, W. and PRINS, J. (2003): Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. Proc. 3rd IEEE Int. Conf. on Data Mining. IEEE Press, Piscataway, 549–552.
Chapter Google Scholar
INDEX CHEMICUS — Subset from 1993. Institute of Scientific Information, Inc. (ISI). Thomson Scientific, Philadelphia.
Google Scholar
KRAMER, S., DE RAEDT, L. and HELMA, C. (2001): Molecular Feature Mining in HIV Data. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM Press, New York, 136–143.
Google Scholar
KURAMOCHI, M. and KARYPIS, G. (2001): Frequent Subgraph Discovery. Proc. 1st IEEE Int. Conf. on Data Mining. IEEE Press, Piscataway, 313–320.
Google Scholar
NIJSSEN, S. and KOK, J.N. (2004): A Quickstart in Frequent Structure Mining can Make a Difference. Proc. 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM Press, New York, 647–652.
Google Scholar
WASHIO, T. and MOTODA, H. (2003): State of the Art of Graph-based Data Mining. SIGKDD Explorations Newsletter 5,1, 59–68.
Article Google Scholar
YAN, X. and HAN, J. (2002): gSpan: Graph-based Substructure Pattern Mining. Proc. 2nd IEEE Int. Conf. on Data Mining. IEEE Press, Piscataway, 721–724.
Google Scholar
YAN, X. and HAN, J. (2003): CloseGraph: Mining Closed Frequent Graph Patterns. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM Press, New York, 286–295.
Google Scholar

Download references

Author information

Authors and Affiliations

European Center for Soft Computing, c/ Gonzalo Gutiérrez Quirós s/n, 33600, Mieres, Spain
Christian Borgelt

Authors

Christian Borgelt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33501, Bielefeld, Germany
Reinhold Decker
Department of Economics, Freie Universität Berlin, Garystraße 21, 14195, Berlin, Germany
Hans -J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Borgelt, C. (2007). Canonical Forms for Frequent Graph Mining. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_38

Download citation

DOI: https://doi.org/10.1007/978-3-540-70981-7_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics