A Significance-Based Graph Model for Clustering Web Documents

  • Argyris Kalogeratos
  • Aristidis Likas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3955)


Traditional document clustering techniques rely on single-term analysis, such as the widely used Vector Space Model. However, recent approaches have emerged that are based on Graph Models and provide a more detailed description of document properties. In this work we present a novel Significance-based Graph Model for Web documents that introduces a sophisticated graph weighting method, based on significance evaluation of graph elements. We also define an associated similarity measure based on the maximum common subgraph between the graphs of the corresponding web documents. Experimental results on artificial and real document collections using well-known clustering algorithms indicate the effectiveness of the proposed approach.


Vector Space Model Maximum Common Subgraph Document Part Unique Node Label Simple Weighting Scheme 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Schenker, A., Last, M., Bunke, H., Kandel, A.: Clustering of Web Documents Using a Graph Model. In: Antonacopoulos, A., Hu, J. (eds.) Web Document Analysis: Challenges and Opportunities (to appear)Google Scholar
  2. 2.
    Hammuda, K.M.: Efficient Phrase-Based Document Indexing for Web-Document Clustering. IEEE, Los Alamitos (2003)Google Scholar
  3. 3.
    Schenker, A., Last, M., Bunke, H., Kandel, A.: A Comparison of Two Novel Algorithms for Clustering Web Documents. In: 2nd Int. Workshop of Web Document Analysis, WDA 2003, Edinburgh, UK, August 2003 (2003)Google Scholar
  4. 4.
    Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognition 36, 451–461 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Argyris Kalogeratos
    • 1
  • Aristidis Likas
    • 1
  1. 1.Department of Computer ScienceUniversity of IoanninaIoanninaGreece

Personalised recommendations