This paper addresses the issue of semantically clustering the increasing number of the schemaless XML documents. In our approach, each document in a document collection is firstly represented by a macro-path sequence. Secondly, the similarity matrix for a document collection is constructed by computing the similarity value among these macro-path sequences. Finally, the desired clusters are constructed by utilizing the hierarchical clustering technique. Experimental results are also shown in this paper.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Yun Shen
    • 1
  • Bing Wang
    • 1
  1. 1.Department of Computer ScienceUniversity of HullHullUK

Personalised recommendations