Graph Mining: Repository vs. Canonical Form

  • Christian Borgelt
  • Mathias Fiedler
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

In frequent subgraph mining one tries to find all subgraphs that occur with a userspecified minimum frequency in a given graph database. The basic approach is to grow subgraphs, adding an edge and maybe a node in each step, to count the number of database graphs containing them, and to eliminate infrequent subgraphs. The predominant method to avoid redundant search (the same subgraph can be grown in several ways) is to define a canonical form that uniquely identifies a graph up to automorphisms. The obvious alternative, a repository of processed subgraphs, has received fairly little attention yet. However, if the repository is laid out as a hash table with a carefully designed hash function, this approach is competitive with canonical form pruning. In experiments we conducted, the repository-based approach could sometimes outperform canonical form pruning by 15%.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BORGELT, C., and BERTHOLD, M.R. (2002): Mining Molecular Fragments: Finding Rel-evant Substructures of Molecules. Proc. IEEE Int. Conf. on Data Mining (ICDM 2002, Maebashi, Japan), 51-58. IEEE Press, Piscataway, NJ, USAGoogle Scholar
  2. BORGELT, C., MEINL, T., and BERTHOLD, M.R. (2005): MoSS: A Program for Molec-ular Substructure Mining. Workshop Open Source Data Mining Software (OSDM’05, Chicago, IL), 6-15. ACM Press, New York, NY, USAGoogle Scholar
  3. BORGELT, C. (2006): Canonical Forms for Frequent Graph Mining. Proc. 30th Ann. Conf. of the German Classification Society (GfKl 2006, Berlin, Germany). Springer-Verlag, Heidelberg, GermanyGoogle Scholar
  4. COOK, D.J., and HOLDER, L.B. (2000) Graph-Based Data Mining. IEEE Trans. on Intelli-gent Systems 15(2):32-41. IEEE Press, Piscataway, NJ, USAGoogle Scholar
  5. FINN, P.W., MUGGLETON, S., PAGE, D., and SRINIVASAN, A. (1998): Pharmacore Dis-covery Using the Inductive Logic Programming System PROGOL. Machine Learning, 30 (2-3):241-270. Kluwer, Amsterdam, NetherlandsGoogle Scholar
  6. HUAN, J., WANG, W., and PRINS, J. (2003): Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. Proc. 3rd IEEE Int. Conf. on Data Mining (ICDM 2003, Melbourne, FL), 549-552. IEEE Press, Piscataway, NJ, USAGoogle Scholar
  7. INDEX CHEMICUS — Subset from 1993. Institute of Scientific Information, Inc. (ISI). Thomson Scientific, Philadelphia, PA, USA 1993 http://www.thomsonscientific.com/products/indexchemicus/
  8. KRAMER, S., DE RAEDT, L., and HELMA, C. (2001): Molecular Feature Mining in HIV Data. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001, San Francisco, CA), 136-143. ACM Press, New York, NY, USAGoogle Scholar
  9. KURAMOCHI, M., and KARYPIS, G. (2001): Frequent Subgraph Discovery. Proc. 1st IEEE Int. Conf. on Data Mining (ICDM 2001, San Jose, CA), 313-320. IEEE Press, Piscataway, NJ, USAGoogle Scholar
  10. NIJSSEN, S., and KOK, J.N. (2004): A Quickstart in Frequent Structure Mining Can Make a Difference. Proc. 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD2004, Seattle, WA), 647-652. ACM Press, New York, NY, USAGoogle Scholar
  11. YAN, X., and HAN, J. (2002): gSpan: Graph-Based Substructure Pattern Mining. Proc. 2nd IEEE Int. Conf. on Data Mining (ICDM 2003, Maebashi, Japan), 721-724. IEEE Press, Piscataway, NJ, USAGoogle Scholar
  12. YAN, X., and HAN, J. (2003): Closegraph: Mining Closed Frequent Graph Patterns. Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2003, Washington, DC), 286-295. ACM Press, New York, NY, USAGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Christian Borgelt
    • 1
  • Mathias Fiedler
    • 1
  1. 1.European Center for Soft ComputingMieresSpain

Personalised recommendations