Clustering Schemaless XML Documents

  • Yun Shen
  • Bing Wang
Conference paper

DOI: 10.1007/978-3-540-39964-3_49

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2888)
Cite this paper as:
Shen Y., Wang B. (2003) Clustering Schemaless XML Documents. In: Meersman R., Tari Z., Schmidt D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, vol 2888. Springer, Berlin, Heidelberg

Abstract

This paper addresses the issue of semantically clustering the increasing number of the schemaless XML documents. In our approach, each document in a document collection is firstly represented by a macro-path sequence. Secondly, the similarity matrix for a document collection is constructed by computing the similarity value among these macro-path sequences. Finally, the desired clusters are constructed by utilizing the hierarchical clustering technique. Experimental results are also shown in this paper.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Yun Shen
    • 1
  • Bing Wang
    • 1
  1. 1.Department of Computer ScienceUniversity of HullHullUK

Personalised recommendations