Finding Maximal Similar Paths Between XML Documents Using Sequential Patterns

Lee, Jung-Won; Park, Seung-Soo

doi:10.1007/978-3-540-30198-1_11

Finding Maximal Similar Paths Between XML Documents Using Sequential Patterns

Jung-Won Lee¹⁷ &
Seung-Soo Park¹⁷

Conference paper

1427 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3261))

Abstract

Techniques for storing XML documents, optimizing the query, and indexing for XML have been active subjects of research. Most of these techniques are focused on XML documents shared with the same structure (i.e., the same DTD or XML Schema). However, when XML documents from the Web or EDMS (Electronic Document Management System) are required to be merged or classified, it is very important to find the common structure among multiple documents for the process of handling documents. In this paper, we propose a new methodology for extracting common structures from XML documents and finding maximal similar paths between structures using sequential pattern mining algorithms. Correct determination of common structures between XML documents provides an important basis for a variety of applications of XML document mining and processing. Experiments with XML documents show that our adapted sequential pattern mining algorithms can find common structures and maximal similar paths between them exactly.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Proc. of the Int’l Conf. on Data Engineering (ICDE) (March 1995)
Google Scholar
Nestorov, Abiteboul, Motwani.: Extracting Schema from Semistructured Data. In Proc. of SIGMOD, pages 295-306. 1998.
Google Scholar
Papakonsstantinou, Y.: XML and the Automation of Web Information Processing. Tutorial given at the International Conference on Data Engineering (1999)
Google Scholar
Deutsch, Fernandez, Suciu: Storing Semistructured Data with STORED. In: Proc. of SIGMOD, pp. 431–442 (1999)
Google Scholar
Hoffmann, C.M., O’Donnell, M.J.: Pattern Matching in Trees. Journal of ACM 29(1), 68–95 (1982)
Article MATH MathSciNet Google Scholar
Kilpelainen, P., Mannila, H.: The Tree Inclusion Problem. In: Abramsky, S. (ed.) CAAP 1991 and TAPSOFT 1991. LNCS, vol. 493, Springer, Heidelberg (1991)
Google Scholar
Wang, K., Liu, H.: Discovering Typical Structures of Documents: a Road Map Approach. In: Proc. of SIGIR, pp. 146–154 (1998)
Google Scholar
Baxter, D., Yahin, A., Moura, L., Sant’Anna, M., Bier, L.: Clone Detection using Abstract Syntax Tree. In: Proc. of the ICSM (November 1998)
Google Scholar
Lee, J.W., Lee, K., Kim, W.: Preparations for Semantics-based XML Mining. In: Proc. of IEEE International Conference on Data Mining (ICDM 2001), pp. 345–352 (November/December 2001)
Google Scholar
Abiteboul, Buneman, Suciu: Data on the web: from relations to semistructured data and XML. Morgan-Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Engineering, Ewha Womans University, 11-1 Daehyun-dong, Sudaemun-ku, Seoul, Korea
Jung-Won Lee & Seung-Soo Park

Authors

Jung-Won Lee
View author publications
You can also search for this author in PubMed Google Scholar
Seung-Soo Park
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dokuz Eylül University, lzmir, Turkey
Tatyana Yakhno

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, JW., Park, SS. (2004). Finding Maximal Similar Paths Between XML Documents Using Sequential Patterns. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-30198-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23478-4
Online ISBN: 978-3-540-30198-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics