Clustering Schemaless XML Documents
This paper addresses the issue of semantically clustering the increasing number of the schemaless XML documents. In our approach, each document in a document collection is firstly represented by a macro-path sequence. Secondly, the similarity matrix for a document collection is constructed by computing the similarity value among these macro-path sequences. Finally, the desired clusters are constructed by utilizing the hierarchical clustering technique. Experimental results are also shown in this paper.
Unable to display preview. Download preview PDF.