Kernel k-Means Clustering Applied to Vector Space Embeddings of Graphs

  • Kaspar Riesen
  • Horst Bunke
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5064)

Abstract

In the present paper a novel approach to clustering objects given in terms of graphs is introduced. The proposed method is based on an embedding procedure that maps graphs to an n-dimensional real vector space. The basic idea is to view the edit distance of an input graph g to a number of prototype graphs as a vectorial description of g. Based on the embedded graphs, kernel k-means clustering is applied. In several experiments conducted on different graph data sets we demonstrate the robustness and flexibility of our novel graph clustering approach and compare it with a standard clustering procedure directly applied in the domain of graphs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  2. 2.
    Englert, R., Glantz, R.: Towards the clustering of graphs. In: Kropatsch, W., Jolion, J. (eds.) Proc.2nd Int.Workshop on Graph Based Representations in Pattern Recognition, pp. 125–133 (2000)Google Scholar
  3. 3.
    Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int.Journal of Pattern Recognition and Artificial Intelligence 18(3), 265–298 (2004)CrossRefGoogle Scholar
  4. 4.
    Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
  5. 5.
    Gärtner, T.: A survey of kernels for structured data. SIGKDD Explorations 5(1), 49–58 (2003)CrossRefGoogle Scholar
  6. 6.
    Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern Recognition Letters 1, 245–253 (1983)CrossRefMATHGoogle Scholar
  7. 7.
    Riesen, K., Neuhaus, M., Bunke, H.: Bipartite graph matching for computing the edit distance of graphs. In: Escolano, F., Vento, M. (eds.) GbRPR. LNCS, vol. 4538, pp. 1–12. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Duin, R., Pekalska, E.: The Dissimilarity Representations for Pattern Recognition: Foundations and Applications. World Scientific, Singapore (2005)Google Scholar
  9. 9.
    Riesen, K., Neuhaus, M., Bunke, H.: Graph embedding in vector spaces by means of prototype selection. In: Escolano, F., Vento, M. (eds.) GbRPR. LNCS, vol. 4538, pp. 383–393. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    MacQueen, J.: Some methods for classification and analysis of multivariant observations. In: Proc. 5th. Berkeley Symp., vol. 1, pp. 281–297. University of California Press 1 (1966)Google Scholar
  11. 11.
    Dunn, J.: Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Hubert, L., Schultz, J.: Quadratic assignment as a general data analysis strategy. British Journal of Mathematical and Statistical Psychology 29, 190–241 (1976)MathSciNetMATHGoogle Scholar
  13. 13.
    Rand, W.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850 (1971)CrossRefGoogle Scholar
  14. 14.
    Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 5, 32–38 (1957)CrossRefMathSciNetMATHGoogle Scholar
  15. 15.
    Nene, S., Nayar, S., Murase, H.: Columbia Object Image Library: COIL-100. Technical report, Department of Computer Science, Columbia University, New York (1996)Google Scholar
  16. 16.
    Watson, C., Wilson, C.: NIST Special Database 4, Fingerprint Database. National Institute of Standards and Technology (1992)Google Scholar
  17. 17.
    Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shidyalov, I., Bourne, P.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)CrossRefGoogle Scholar
  18. 18.
    DTP, DTP.: AIDS antiviral screen (2004), http://dtp.nci.nih.gov/docs/aids/aids_data.html
  19. 19.
    Schenker, A., Bunke, H., Last, M., Kandel, A.: Graph-Theoretic Techniques for Web Content Mining. World Scientific, Singapore (2005)MATHGoogle Scholar
  20. 20.
    Cox, T., Cox, M.: Multidimensional Scaling. Chapman and Hall, Boca Raton (1994)MATHGoogle Scholar
  21. 21.
    Kuncheva, L., Vetrov, D.: Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(11), 1798–1808 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Kaspar Riesen
    • 1
  • Horst Bunke
    • 1
  1. 1.Institute of Computer Science and Applied MathematicsUniversity of BernBernSwitzerland

Personalised recommendations