Clustering XML Documents Based on Structural Similarity

  • Guangming Xing
  • Zhonghang Xia
  • Jinhua Guo
Conference paper

DOI: 10.1007/978-3-540-71703-4_77

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4443)
Cite this paper as:
Xing G., Xia Z., Guo J. (2007) Clustering XML Documents Based on Structural Similarity. In: Kotagiri R., Krishna P.R., Mohania M., Nantajeewarawat E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg

Abstract

In this paper, we present a framework for clustering XML documents based on structural similarity between XML documents. Firstly, the validity of using the edit distance between XML documents and schemata as the structural similarity is presented. Secondly, a novel solution is given for schema extraction. The solution is based on the minimum length description (MLD) principle, and allows tradeoff between the schema simplicity and precision based on the user’s specification. Thirdly, clustering XML documents based on the edit distance is discussed. The efficacy and efficiency of our methodology have been tested using both real and synthesized data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Guangming Xing
    • 1
  • Zhonghang Xia
    • 1
  • Jinhua Guo
    • 2
  1. 1.Department of Computer Science, Western Kentucky University, Bowling Green, KY 42104 
  2. 2.Computer and Information Science Department, University of Michigan - Dearborn, Dearborn, MI 48128 

Personalised recommendations