Abstract
Finding all the occurrences of a twig pattern in an XML document is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process twig queries based on region encoding. While each element in source document is given two or more numbers in region-encoding-form index, the size of index grows linearly to the source document. The algorithms based on region encoding perform worse when the source document grows large. In this paper, we address the problem by putting forward a novel index structure, called Clustered Absolute Path Index (CAPI for brief). This index can extremely reduce the size of index and grows slowly as the source document grows large. Based on CAPI, we design novel join algorithms, called Path-Match to process queries without branches, Branch-Filter and RelatedPath-Join to process queries with branches. Experimental results show that the proposed algorithms based on CAPI outperform twig join significantly and have good scalability.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Clark, J., De Rose, S. (eds.): XML Path Language (XPath) Version 2.0 – W3C Working Draft (2003)
Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: An XML query language. Technical report, W3C (2002)
Bruno, N., Srivastava, D., Koudas, N.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD Conference, pp. 310–321 (2002)
Jiang, H., et al.: Holistic twig joins on indexed XML documents. In: Proc. of VLDB, pp. 273–284 (2003)
Jiang, H., Lu, H., Wang, W.: Efficient processing of XML twig queries with OR-predicates. In: Proc. of SIGMOD Conference, pp. 274–285 (2004)
Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of VLDB, pp. 361–370 (2001)
Milo, T., Dan Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1999)
Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: PODS, pp. 65–76 (2002)
O’Neil, P., et al.: ORDPATHs: Insert-friendly XML node labels. In: SIGMOD, pp. 903–908 (2004)
Chen, Y., Davidson, S.B., Zheng, Y.: BLAS: An efficient XPath processing system. In: Proc. of SIGMOD, pp. 47–58 (2004)
Extensible Markup Language (XML) 1.0, http://www.w3.org/TR/2004/REC-xml-20040204/
Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for efficient indexing of paths in graph structured data. In: ICDE 2002 (2002)
Qun, C., Lim, A., Ong, K.W.: D(k)-index:An adaptive structural summary for graph-structureddata. In: ACM SIGMOD, San Diego, California, USA, pp. 134–144 (2003)
He, H., Yang, J.: Multiresolution indexing of XML for frequent queries. In: ICDE 2004 (2004)
Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: SIGMOD 2002 (2002)
XMark: The XML-benchmark project (2002), http://monetdb.cwi.nl/xml
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H., Li, J., Wang, H. (2006). Clustered Absolute Path Index for XML Document: On Efficient Processing of Twig Queries. In: Shen, H.T., Li, J., Li, M., Ni, J., Wang, W. (eds) Advanced Web and Network Technologies, and Applications. APWeb 2006. Lecture Notes in Computer Science, vol 3842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610496_1
Download citation
DOI: https://doi.org/10.1007/11610496_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31158-4
Online ISBN: 978-3-540-32435-5
eBook Packages: Computer ScienceComputer Science (R0)