Abstract
Clustering of XML documents is an important data mining method, the aim of which is the grouping of similar XML documents. The issue of clustering XML documents by structure is being considered in this paper. Two different and independent methods of clustering XML documents by structure are being proposed. The first method represents a set of XML documents as a set of labels. The second method introduces a new representation of a set of XML documents, which is called the SuperTree. In this paper, it is suggested that the proposed methods may improve the accuracy of XML clustering by structure. Such thesis is based on the tests, the aim of which is to assess advantages of the proposals, as conducted respectively on the heterogeneous and homogenous sets of data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dalamagas, T., Cheng, T., Winkiel, K., Sellis, T.: Clustering XML documents by structure. In: EDBT Workshop 2004, pp. 547–556 (2004)
Wikipedia XML Corpus (2006), http://www-connex.lip6.fr/~denoyer/wikipediaXML
Vercoustre, A., Fegas, M., Gul, S., Lechevallier, Y.: A Flexible Structured-based Representation for XML Document Mining, inria-00000839 (2006)
Rafiei, D., Moise, D.L., Sun, D.: Finding Syntactic Similarities Between XML Documents. In: Proceedings of the 17th International Conference on Database and Expert Systems Applications (2006)
Candillier, L., Tellier, I., Torre, F.: Transforming XML trees for efficient classification and clustering. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 469–480. Springer, Heidelberg (2006)
Flesca, S., Manco, G., Masciari, E., Pontieri, L.: Fast Detection of XML Structural Similarity. IEEE Transactions on Knowledge and Data Engineering (2005)
Yoon, J.P., Raghavan, V., Chakilam, V.: BitCube: A three-dimensional bitmap indexing for XML documents. In: SSDBM, Fairfax,Virginia, USA, pp. 158–167 (2001)
Hagenbuchner, M., Trentini, F., Sperduti, A., Scarselli, F., Tsoi, A.C.: A Self - Organising Map Approach for Clustering of XML Documents, pp. 1805–1812. IEEE, Los Alamitos (2006)
Nayak, R., Iryadi, W.: XMine: A Methodology for Mining XML Structure. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds.) APWeb 2006. LNCS, vol. 3841, pp. 786–792. Springer, Heidelberg (2006)
Lian, W., Cheung, D.W., Mamoulis, N., You, S.: An efficient and scalable algorithm for clustering XML documents by structure. IEEE Trans. Knowl. Data Eng. (2004)
Li, J., Liu, C., Yu, J., Liu, J., Wang, G., Yang, C.: Computing Structural Similarity of Source XML Schemas against Domain XML Schema. In: Proc. 19th Australasian Database Conference, Wollongong, Australia (2008)
Garboni, C., Masseglia, F., Trousse, B.: Sequential Pattern Mining for Structure-Based XML Document Classification. In: Advances in XML Information Retrieval and Evaluation, pp. 458–468. Springer, Heidelberg (2006)
Han, J., Pei, J.: Mining Frequent Patterns by Pattern-growth: Methodology and Implications. ACM SIGKDD Explorations (2000)
Tekli, J., Chbeir, R., Yetongton, K.: An overview on XML similarity: Background, current trends and future directions. Computer Science Review 3(3), 151–173 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lesniewska, A. (2010). Clustering XML Documents by Structure. In: Grundspenkis, J., Kirikova, M., Manolopoulos, Y., Novickis, L. (eds) Advances in Databases and Information Systems. ADBIS 2009. Lecture Notes in Computer Science, vol 5968. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12082-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-12082-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12081-7
Online ISBN: 978-3-642-12082-4
eBook Packages: Computer ScienceComputer Science (R0)