Clustered Absolute Path Index for XML Document: On Efficient Processing of Twig Queries

  • Hongqiang Wang
  • Jianzhong Li
  • Hongzhi Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3842)


Finding all the occurrences of a twig pattern in an XML document is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process twig queries based on region encoding. While each element in source document is given two or more numbers in region-encoding-form index, the size of index grows linearly to the source document. The algorithms based on region encoding perform worse when the source document grows large. In this paper, we address the problem by putting forward a novel index structure, called Clustered Absolute Path Index (CAPI for brief). This index can extremely reduce the size of index and grows slowly as the source document grows large. Based on CAPI, we design novel join algorithms, called Path-Match to process queries without branches, Branch-Filter and RelatedPath-Join to process queries with branches. Experimental results show that the proposed algorithms based on CAPI outperform twig join significantly and have good scalability.


Query Processing CAPI Index Source Document Disk Access Path Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Clark, J., De Rose, S. (eds.): XML Path Language (XPath) Version 2.0 – W3C Working Draft (2003)Google Scholar
  2. 2.
    Boag, S., Chamberlin, D., Fernandez, M.F., Florescu, D., Robie, J., Simeon, J.: XQuery 1.0: An XML query language. Technical report, W3C (2002)Google Scholar
  3. 3.
    Bruno, N., Srivastava, D., Koudas, N.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD Conference, pp. 310–321 (2002)Google Scholar
  4. 4.
    Jiang, H., et al.: Holistic twig joins on indexed XML documents. In: Proc. of VLDB, pp. 273–284 (2003)Google Scholar
  5. 5.
    Jiang, H., Lu, H., Wang, W.: Efficient processing of XML twig queries with OR-predicates. In: Proc. of SIGMOD Conference, pp. 274–285 (2004)Google Scholar
  6. 6.
    Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of VLDB, pp. 361–370 (2001)Google Scholar
  7. 7.
    Milo, T., Dan Suciu, D.: Index structures for path expressions. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 277–295. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  8. 8.
    Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: PODS, pp. 65–76 (2002)Google Scholar
  9. 9.
    O’Neil, P., et al.: ORDPATHs: Insert-friendly XML node labels. In: SIGMOD, pp. 903–908 (2004)Google Scholar
  10. 10.
    Chen, Y., Davidson, S.B., Zheng, Y.: BLAS: An efficient XPath processing system. In: Proc. of SIGMOD, pp. 47–58 (2004)Google Scholar
  11. 11.
    Extensible Markup Language (XML) 1.0,
  12. 12.
    Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for efficient indexing of paths in graph structured data. In: ICDE 2002 (2002)Google Scholar
  13. 13.
    Qun, C., Lim, A., Ong, K.W.: D(k)-index:An adaptive structural summary for graph-structureddata. In: ACM SIGMOD, San Diego, California, USA, pp. 134–144 (2003)Google Scholar
  14. 14.
    He, H., Yang, J.: Multiresolution indexing of XML for frequent queries. In: ICDE 2004 (2004)Google Scholar
  15. 15.
    Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: SIGMOD 2002 (2002)Google Scholar
  16. 16.
    XMark: The XML-benchmark project (2002),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hongqiang Wang
    • 1
  • Jianzhong Li
    • 1
  • Hongzhi Wang
    • 1
  1. 1.School of Computer Science and TechnologyHarbin Institute of TechnologyHarbin

Personalised recommendations