Graph Clustering Based on Structural Similarity of Fragments
Resources available over the Web are often used in combination to meet a specific need of a user. Since resource combinations can be represented as graphs in terms of the relations among the resources, locating desirable resource combinations can be formulated as locating the corresponding graph. This paper describes a graph clustering method based on structural similarity of fragments (currently, connected subgraphs are considered) in graph-structured data. A fragment is characterized based on the connectivity (degree) of a node in the fragment. A fragment spectrum of a graph is created based on the frequency distribution of fragments. Thus, the representation of a graph is transformed into a fragment spectrum in terms of the properties of fragments in the graph. Graphs are then clustered with respect to the transformed spectra by applying a standard clustering method. We also devise a criterion to determine the number of clusters by defining a pseudo-entropy for clusters. Preliminary experiments with synthesized data were conducted and the results are reported.
KeywordsConnected Subgraph Inductive Logic Programming Graph Cluster Base Graph Fragment Spectrum
Unable to display preview. Download preview PDF.
- 2.Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)Google Scholar
- 3.Clark, P., Niblett, T.: The cn2 induction algorithm. Machine Learning 3, 261–283 (1989)Google Scholar
- 5.Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compound. In: Proc. the 4th International conference on Knowledge Discovery and Data Mining, pp. 30–36 (1998)Google Scholar
- 7.Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: Proc. of the 1st IEEE ICDM, pp. 313–320 (2001)Google Scholar
- 9.Matsuda, T., Yoshida, T., Motoda, H., Washio, T.: Beam-wise graph-based induction for structured data mining. In: International Workshop on Active Mining (AM 2002): working notes, pp. 23–30 (2002)Google Scholar
- 10.Michalski, R.S.: Learning flexible concepts: Fundamental ideas and a method based on two-tiered representaion. Machine Learning: An Artificial Intelligence Approach 3, 63–102 (1990)Google Scholar
- 12.Nomura, S., Miki, T., Ishida, T.: Comparative Study of Web Citation Analysis and Bibliographical Citation Analysis in Community Mining. IEICE Transaction J87-D-I(3), 382–389 (2004) (in Japanese)Google Scholar
- 13.Palmer, C.R., Gibbons, P.B., Faloutsos, C.: ANF: A fast and scalable tool for data mining in massive graphs. In: Proc. of the KDD 2002 (2002)Google Scholar
- 14.Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)Google Scholar
- 15.Quinlan, J.R.: C4.5:Programs For Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)Google Scholar
- 17.Takahashi, Y., Ohoka, H., Ishiyama, Y.: Structural similarity analysis based on topological fragement spectra. Adavances in Molecular Similarity 2, 93–104 (1998)Google Scholar
- 20.Yoshida, T., Warodom, G., Mogi, A., Ohara, K., Motoda, H., Washio, T., Yokoi, H., Takabayashi, K.: Preliminary analysis of interferon therapy by graph-based induction. In: Working note of International Workshop on Active Mining (AM 2004), pp. 31–40 (2004)Google Scholar