Graph Clustering Based on Structural Similarity of Fragments

  • Tetsuya Yoshida
  • Ryosuke Shoda
  • Hiroshi Motoda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3847)


Resources available over the Web are often used in combination to meet a specific need of a user. Since resource combinations can be represented as graphs in terms of the relations among the resources, locating desirable resource combinations can be formulated as locating the corresponding graph. This paper describes a graph clustering method based on structural similarity of fragments (currently, connected subgraphs are considered) in graph-structured data. A fragment is characterized based on the connectivity (degree) of a node in the fragment. A fragment spectrum of a graph is created based on the frequency distribution of fragments. Thus, the representation of a graph is transformed into a fragment spectrum in terms of the properties of fragments in the graph. Graphs are then clustered with respect to the transformed spectra by applying a standard clustering method. We also devise a criterion to determine the number of clusters by defining a pseudo-entropy for clusters. Preliminary experiments with synthesized data were conducted and the results are reported.


Connected Subgraph Inductive Logic Programming Graph Cluster Base Graph Fragment Spectrum 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amaral, L.A.N., Scala, A., Barthélémy, M., Stanley, H.E.: Classes of small-world networks. Proceedings of the National Academy of Sciences 97(21), 11149–11152 (2000)CrossRefGoogle Scholar
  2. 2.
    Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)Google Scholar
  3. 3.
    Clark, P., Niblett, T.: The cn2 induction algorithm. Machine Learning 3, 261–283 (1989)Google Scholar
  4. 4.
    Cook, D.J., Holder, L.B.: Graph-based data mining. IEEE Intelligent Systems 15(2), 32–41 (2000)CrossRefGoogle Scholar
  5. 5.
    Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compound. In: Proc. the 4th International conference on Knowledge Discovery and Data Mining, pp. 30–36 (1998)Google Scholar
  6. 6.
    Inokuchi, A., Washio, T., Motoda, H.: Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning 50(3), 321–354 (2003)zbMATHCrossRefGoogle Scholar
  7. 7.
    Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. of the 1st IEEE ICDM, pp. 313–320 (2001)Google Scholar
  8. 8.
    Matsuda, T., Motoda, H., Yoshida, T., Washio, T.: Mining patterns from structured data by beam-wise graph-based induction. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 422–429. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Matsuda, T., Yoshida, T., Motoda, H., Washio, T.: Beam-wise graph-based induction for structured data mining. In: International Workshop on Active Mining (AM 2002): working notes, pp. 23–30 (2002)Google Scholar
  10. 10.
    Michalski, R.S.: Learning flexible concepts: Fundamental ideas and a method based on two-tiered representaion. Machine Learning: An Artificial Intelligence Approach 3, 63–102 (1990)Google Scholar
  11. 11.
    Muggleton, S., de Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19(20), 629–679 (1994)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Nomura, S., Miki, T., Ishida, T.: Comparative Study of Web Citation Analysis and Bibliographical Citation Analysis in Community Mining. IEICE Transaction J87-D-I(3), 382–389 (2004) (in Japanese)Google Scholar
  13. 13.
    Palmer, C.R., Gibbons, P.B., Faloutsos, C.: ANF: A fast and scalable tool for data mining in massive graphs. In: Proc. of the KDD 2002 (2002)Google Scholar
  14. 14.
    Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)Google Scholar
  15. 15.
    Quinlan, J.R.: C4.5:Programs For Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)Google Scholar
  16. 16.
    Raymond, J.W., Blankley, C.J., Willett, P.: Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. Molecular Graphics and Modelling 21(5), 421–433 (2003)CrossRefGoogle Scholar
  17. 17.
    Takahashi, Y., Ohoka, H., Ishiyama, Y.: Structural similarity analysis based on topological fragement spectra. Adavances in Molecular Similarity 2, 93–104 (1998)Google Scholar
  18. 18.
    Watts, D.J.: Small Worlds: The Dynamics of Networks Between Order and Randomness. Princeton University Press, Princeton (2004)zbMATHGoogle Scholar
  19. 19.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998)CrossRefGoogle Scholar
  20. 20.
    Yoshida, T., Warodom, G., Mogi, A., Ohara, K., Motoda, H., Washio, T., Yokoi, H., Takabayashi, K.: Preliminary analysis of interferon therapy by graph-based induction. In: Working note of International Workshop on Active Mining (AM 2004), pp. 31–40 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tetsuya Yoshida
    • 1
  • Ryosuke Shoda
    • 2
  • Hiroshi Motoda
    • 2
  1. 1.Graduate School of Information Science and TechnologyHokkaido UniversitySapporoJapan
  2. 2.Institute of Scientific and Industrial ResearchOsaka UniversityIbaraki, OsakaJapan

Personalised recommendations