Clustering XML Documents Based on Structural Similarity
- Cite this paper as:
- Xing G., Xia Z., Guo J. (2007) Clustering XML Documents Based on Structural Similarity. In: Kotagiri R., Krishna P.R., Mohania M., Nantajeewarawat E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg
In this paper, we present a framework for clustering XML documents based on structural similarity between XML documents. Firstly, the validity of using the edit distance between XML documents and schemata as the structural similarity is presented. Secondly, a novel solution is given for schema extraction. The solution is based on the minimum length description (MLD) principle, and allows tradeoff between the schema simplicity and precision based on the user’s specification. Thirdly, clustering XML documents based on the edit distance is discussed. The efficacy and efficiency of our methodology have been tested using both real and synthesized data.
Unable to display preview. Download preview PDF.