Abstract
XML is widely applied to describe semi-structured data commonly generated and used by modern information systems. XML database management systems (XDBMSs) are thus essential platforms in this context. Most XDBMS architectures proposed so far aim at reproducing functionalities found in relational systems. As such, these architectures inherit the same deficiency of traditional systems in dealing with less-structured data. What is badly needed is efficient support of common database operations under the similarity matching paradigm. In this paper, we present an engineering approach to incorporating similarity joins into XDBMSs, which exploits XDBMS components—the storage layer in particular—to design efficient algorithms. We experimentally confirm the accuracy, performance, and scalability of our approach.
Keywords
- Anchor Node
- Inverse Document Frequency
- Path Query
- Inverted List
- Path Class
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jagadish, H.V., et al.: Timber: A native xml database. VLDB J. 11(4), 274–291 (2002)
Mathis, C.: Storing, Indexing, and Querying XML Documents in Native XML Database Systems. PhD thesis, Technische Universität Kaiserslautern (2009)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. TKDE 19(1), 1–16 (2007)
Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying structured text in an xml database. In: SIGMOD, pp. 4–15 (2003)
Ribeiro, L.A., Härder, T., Pimenta, F.S.: A cluster-based approach to xml similarity joins. In: IDEAS, pp. 182–193 (2009)
Ribeiro, L.A., Härder, T.: Generalizing prefix filtering to improve set similarity joins. Information Systems 36(1), 62–78 (2011)
Ribeiro, L., Härder, T.: Evaluating Performance and Quality of XML-Based Similarity Joins. In: Atzeni, P., Caplinskas, A., Jaakkola, H. (eds.) ADBIS 2008. LNCS, vol. 5207, pp. 246–261. Springer, Heidelberg (2008)
Augsten, N., Böhlen, M.H., Gamper, J.: The pq-gram distance between ordered labeled trees. TODS 35(1) (2010)
Guha, S., Jagadish, H.V., Koudas, N., Srivastava, D., Yu, T.: Integrating xml data sources using approximate joins. TODS 31(1), 161–207 (2006)
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., Toman, D.: Structure and content scoring for xml. In: VLDB, pp. 361–372 (2005)
Chen, Y., Wang, W., Liu, Z.: Keyword-based search and exploration on databases. In: ICDE, pp. 1380–1383 (2011)
Theobald, M., et al.: Topx: efficient and versatile top- k query processing for semistructured data. VLDB J. 17(1), 81–115 (2008)
Zhang, N., et al.: Binary xml storage and query processing in oracle 11g. PVLDB 2(2), 1354–1365 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Andrade Ribeiro, L., Härder, T. (2012). Leveraging the Storage Layer to Support XML Similarity Joins in XDBMSs. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2012. Lecture Notes in Computer Science, vol 7503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33074-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-33074-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33073-5
Online ISBN: 978-3-642-33074-2
eBook Packages: Computer ScienceComputer Science (R0)
