Canonical Forms for Frequent Graph Mining
A core problem of approaches to frequent graph mining, which are based on growing subgraphs into a set of graphs, is how to avoid redundant search. A powerful technique for this is a canonical description of a graph, which uniquely identifies it, and a corresponding test. I introduce a family of canonical forms based on systematic ways to construct spanning trees. I show that the canonical form used in gSpan ([Yan and Han (2002)]) is a member of this family, and that MoSS/MoFa ([Borgelt and Berthold (2002), Borgelt et al. (2005)]) is implicitly based on a different member, which I make explicit and exploit in the same way.
KeywordsSpan Tree Destination Node Canonical Form Search Tree Code Word
Unable to display preview. Download preview PDF.
- BORGELT, C. and BERTHOLD, M.R. (2002): Mining Molecular Fragments: Finding Relevant Substructures of Molecules. Proc. 2nd IEEE Int. Conf. on Data Mining. IEEE Press, Piscataway, 51–58.Google Scholar
- BORGELT, C., MEINL, T. and BERTHOLD, M.R. (2004): Advanced Pruning Strategies to Speed Up Mining Closed Molecular Fragments. Proc. IEEE Conf. on Systems, Man and Cybernetics, CD-ROM. IEEE Press, Piscataway.Google Scholar
- BORGELT, C., MEINL, T. and BERTHOLD, M.R. (2005): MoSS: A Program for Molecular Substructure Mining. Proc. Open Source Data Mining Workshop. ACM Press, New York, 6–15.Google Scholar
- GOETHALS, B. and ZAKI, M. (2003/2004): Proc. 1st and 2nd IEEE ICDM Workshop on Frequent Itemset Mining Implementations. CEUR Workshop Proceedings 90 and 126. Sun SITE Central Europe and RWTH Aachen http://www.ceur-ws.org/Vol-90/, http://www.ceur-ws.org/Vol-126/.Google Scholar
- INDEX CHEMICUS — Subset from 1993. Institute of Scientific Information, Inc. (ISI). Thomson Scientific, Philadelphia.Google Scholar
- KRAMER, S., DE RAEDT, L. and HELMA, C. (2001): Molecular Feature Mining in HIV Data. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM Press, New York, 136–143.Google Scholar
- KURAMOCHI, M. and KARYPIS, G. (2001): Frequent Subgraph Discovery. Proc. 1st IEEE Int. Conf. on Data Mining. IEEE Press, Piscataway, 313–320.Google Scholar
- NIJSSEN, S. and KOK, J.N. (2004): A Quickstart in Frequent Structure Mining can Make a Difference. Proc. 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM Press, New York, 647–652.Google Scholar
- YAN, X. and HAN, J. (2002): gSpan: Graph-based Substructure Pattern Mining. Proc. 2nd IEEE Int. Conf. on Data Mining. IEEE Press, Piscataway, 721–724.Google Scholar
- YAN, X. and HAN, J. (2003): CloseGraph: Mining Closed Frequent Graph Patterns. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM Press, New York, 286–295.Google Scholar