Clustering XML Documents by Structure

Lesniewska, Anna

doi:10.1007/978-3-642-12082-4_30

Anna Lesniewska²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5968))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

693 Accesses
2 Citations

Abstract

Clustering of XML documents is an important data mining method, the aim of which is the grouping of similar XML documents. The issue of clustering XML documents by structure is being considered in this paper. Two different and independent methods of clustering XML documents by structure are being proposed. The first method represents a set of XML documents as a set of labels. The second method introduces a new representation of a set of XML documents, which is called the SuperTree. In this paper, it is suggested that the proposed methods may improve the accuracy of XML clustering by structure. Such thesis is based on the tests, the aim of which is to assess advantages of the proposals, as conducted respectively on the heterogeneous and homogenous sets of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 99.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dalamagas, T., Cheng, T., Winkiel, K., Sellis, T.: Clustering XML documents by structure. In: EDBT Workshop 2004, pp. 547–556 (2004)
Google Scholar
Wikipedia XML Corpus (2006), http://www-connex.lip6.fr/~denoyer/wikipediaXML
Vercoustre, A., Fegas, M., Gul, S., Lechevallier, Y.: A Flexible Structured-based Representation for XML Document Mining, inria-00000839 (2006)
Google Scholar
Rafiei, D., Moise, D.L., Sun, D.: Finding Syntactic Similarities Between XML Documents. In: Proceedings of the 17th International Conference on Database and Expert Systems Applications (2006)
Google Scholar
Candillier, L., Tellier, I., Torre, F.: Transforming XML trees for efficient classification and clustering. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 469–480. Springer, Heidelberg (2006)
Google Scholar
Flesca, S., Manco, G., Masciari, E., Pontieri, L.: Fast Detection of XML Structural Similarity. IEEE Transactions on Knowledge and Data Engineering (2005)
Google Scholar
Yoon, J.P., Raghavan, V., Chakilam, V.: BitCube: A three-dimensional bitmap indexing for XML documents. In: SSDBM, Fairfax,Virginia, USA, pp. 158–167 (2001)
Google Scholar
Hagenbuchner, M., Trentini, F., Sperduti, A., Scarselli, F., Tsoi, A.C.: A Self - Organising Map Approach for Clustering of XML Documents, pp. 1805–1812. IEEE, Los Alamitos (2006)
Google Scholar
Nayak, R., Iryadi, W.: XMine: A Methodology for Mining XML Structure. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds.) APWeb 2006. LNCS, vol. 3841, pp. 786–792. Springer, Heidelberg (2006)
Chapter Google Scholar
Lian, W., Cheung, D.W., Mamoulis, N., You, S.: An efficient and scalable algorithm for clustering XML documents by structure. IEEE Trans. Knowl. Data Eng. (2004)
Google Scholar
Li, J., Liu, C., Yu, J., Liu, J., Wang, G., Yang, C.: Computing Structural Similarity of Source XML Schemas against Domain XML Schema. In: Proc. 19th Australasian Database Conference, Wollongong, Australia (2008)
Google Scholar
Garboni, C., Masseglia, F., Trousse, B.: Sequential Pattern Mining for Structure-Based XML Document Classification. In: Advances in XML Information Retrieval and Evaluation, pp. 458–468. Springer, Heidelberg (2006)
Google Scholar
Han, J., Pei, J.: Mining Frequent Patterns by Pattern-growth: Methodology and Implications. ACM SIGKDD Explorations (2000)
Google Scholar
Tekli, J., Chbeir, R., Yetongton, K.: An overview on XML similarity: Background, current trends and future directions. Computer Science Review 3(3), 151–173 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965, Poznan, Poland
Anna Lesniewska

Authors

Anna Lesniewska
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Systems Theory and Design, Riga Technical University, Meza str 1/4, LV 1048, Riga, Latvia
Janis Grundspenkis
Institute of Applied Computer Systems, Riga Technical University, 1 Kalku, LV-1658, Riga, Latvia
Marite Kirikova
Dept. of Informatics, Aristotle University, 54124, Thessaloniki, Greece
Yannis Manolopoulos
Division of Applied Computer Systems Software, Riga Technical University, Meza str 1/4, LV 1048, Riga, Latvia
Leonids Novickis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lesniewska, A. (2010). Clustering XML Documents by Structure. In: Grundspenkis, J., Kirikova, M., Manolopoulos, Y., Novickis, L. (eds) Advances in Databases and Information Systems. ADBIS 2009. Lecture Notes in Computer Science, vol 5968. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12082-4_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-12082-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12081-7
Online ISBN: 978-3-642-12082-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics