An Efficient Path Index for Querying Semi-structured Data

Barg, Michael; Wong, Raymond K.; Lam, Franky

doi:10.1007/3-540-36901-5_9

Michael Barg⁶,
Raymond K. Wong⁶ &
Franky Lam⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2642))

Included in the following conference series:

Asia-Pacific Web Conference

536 Accesses

Abstract

The richness of semi-structured data allows data of varied and inconsistent structures to be stored in a single database. Such data can be represented as a graph, and queries can be constructed using path expressions, which describe traversals through the graph.

Instead of providing optimal performance for a limited range of path expressions, we propose a mechanism which is shown to have consistent and high performance for path expressions of any complexity, including those with descendant operators (path wildcards). We further detail mechanisms which employ our index to perform more complex processing, such as evaluating both path expressions containing links and entire (sub) queries containing path based predicates. Performance is shown to be independent of the number of terms in the path expression(s), even where these expressions contain wildcards. Experiments show that our index is faster than conventional methods by up to two orders of magnitude for certain query types, is compact, and scales well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Abiteboul. Querying semi-structured data. In ICDT, 1997.
Google Scholar
M. Barg and R. K. Wong. Fast and versatile path index for querying semi-structured data. Full paper. Technical Report 0209, University of NSW, 2002. Available at: ftp://ftp.cse.unsw.edu.au/pub/doc/papers/UNSW/0209.ps.Z.
M. Barg and R.K. Wong. Structural proximity searching for large collections of semi-structured data. In ACM CIKM, 2001.
Google Scholar
M. Barg and R.K. Wong. A fast and versatile path index for querying semi-structured data. In 8th Intl. Conf. on Database Systems for Advanced Applications (DASFAA’03), Kyoto, Japan, March 2003.
Google Scholar
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: Optimal xml pattern matching. In SIGMOD, 2002.
Google Scholar
S. Chien, V. Tsotras, C. Zaniolo, and D. Zhang. Efficient complex query support for multiversion XML documents. In EDBT, 2002.
Google Scholar
B. Cooper, N. Sample, M. Franklin, G. Hjaltason, and M. Shadmon. A fast index for semi-structured data. In VLDB, 2001.
Google Scholar
R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In VLDB, 1997.
Google Scholar
T. Grust. Accelerating xpath location steps. In SIGMOD, 2002.
Google Scholar
R. Kaushik, P. Bohannon, J. Naughton, and H. Korth. Covering indexes for branching path queries. In SIGMOD, 2002.
Google Scholar
Q. Li and B. Moon. Indexing and querying xml data for regular path expressions. In VLDB, 2001.
Google Scholar
J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A database management system for semistructured data. In SIGMOD, 1997.
Google Scholar
University of New South Wales. The Soda2 project. http://www.cse.unsw.edu.au/soda/.
J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. DeWitt, and J. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In VLDB, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, 2052, Australia
Michael Barg, Raymond K. Wong & Franky Lam

Authors

Michael Barg
View author publications
You can also search for this author in PubMed Google Scholar
Raymond K. Wong
View author publications
You can also search for this author in PubMed Google Scholar
Franky Lam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, 4072, Australia
Xiaofang Zhou & Maria E. Orlowska &
Department of Mathematics and Computing, University of Southern Queensland, Toowoomba, QLD, 4350, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barg, M., Wong, R.K., Lam, F. (2003). An Efficient Path Index for Querying Semi-structured Data. In: Zhou, X., Orlowska, M.E., Zhang, Y. (eds) Web Technologies and Applications. APWeb 2003. Lecture Notes in Computer Science, vol 2642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36901-5_9

Download citation

DOI: https://doi.org/10.1007/3-540-36901-5_9
Published: 15 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-02354-8
Online ISBN: 978-3-540-36901-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics