Chapter

On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE

Volume 2888 of the series Lecture Notes in Computer Science pp 767-784

Clustering Schemaless XML Documents

  • Yun ShenAffiliated withDepartment of Computer Science, University of Hull
  • , Bing WangAffiliated withDepartment of Computer Science, University of Hull

* Final gross prices may vary according to local VAT.

Get Access

Abstract

This paper addresses the issue of semantically clustering the increasing number of the schemaless XML documents. In our approach, each document in a document collection is firstly represented by a macro-path sequence. Secondly, the similarity matrix for a document collection is constructed by computing the similarity value among these macro-path sequences. Finally, the desired clusters are constructed by utilizing the hierarchical clustering technique. Experimental results are also shown in this paper.