Skip to main content
Log in

Clustered Chain Path Index for XML Document: Efficiently Processing Branch Queries

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Branch query processing is a core operation of XML query processing. In recent years, a number of stack based twig join algorithms have been proposed to process twig queries based on tag stream index. However, in tag stream index, each element is labeled separately without considering the similarity among elements. Besides, algorithms based on tag stream index perform inefficiently on large document. This paper proposes a novel index, named Clustered Chain Path Index, based on a novel labeling scheme. This index provides efficient support for processing branch queries. It also has the same cardinality as 1-index against tree structured XML document. Based on CCPI, efficient algorithms, KMP-Match-Path and Related-Path-Segment-Join, are proposed to process queries efficiently. Analysis and experimental results show that proposed query processing algorithms based on CCPI outperform other algorithms and have good scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Braga, D., Campi, A.: XQBE: a graphical environment to query XML data. J. World Wide Web 8(3) 287–316

    Article  Google Scholar 

  2. Bruno, N., Srivastava, D., Koudas, N.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD Conference, 310–321 (2002)

  3. Chen, Y., Davidson, S.B., Zheng, Y.: BLAS: An efficient XPath processing system. In: Proc. of SIGMOD, 47–58 (2004)

  4. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to algorithms, 2nd edn., MIT(2001)

  5. He, H., Yang, J.: Multi resolution indexing of XML for frequent queries. In: ICDE (2004)

  6. Jiang, H., Wang, W., Lu, H., Yu, J.X.: Holistic twig joins on indexed XML documents. In: Proceeding of VLDB 2003, 273–284 (2003)

  7. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for efficient indexing of paths in graph structured data. In: ICDE (2002)

  8. Kaushik, R., Bohannon, P., Naughton, J.F., Korth, H.F.: Covering indexes for branching path queries. In: SIGMOD (2002)

  9. Li, Q., Moon, B.: Indexing and querying XML data for regular path expressions. In: Proc. of VLDB, 361–370 (2001)

  10. Lu, J., Ling, T.W., Chan, C.Y., Chen, T.: From region encoding to extended dewey: on efficient processing of XML twig pattern matching. 193–204. In: Proc. of VLDB (2003)

  11. Lu, JH., Chen, T., Ling, TW.: Efficient processing of XML twig patterns with parent child edges: a look-ahead approach. In Proceedings of CIKM Conference 2004, 533–542, (2004)

  12. Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: PODS, 65–76, (2002)

  13. Milo, T., Dan Suciu, D.: Index structures for path expressions. In: ICDT, 277–295, Jerusalem, Israel (1999)

  14. Qun, C., Lim, A., Ong, K.W.: D(k)-index: an adaptive structural summary for graph-structured data. In: ACM SIGMOD, 134–144 (2003)

  15. U. of Washington XML Repository. http://www.cs.washington.edu/research/xmldatasets/

  16. Wang, H., Li, J., Wang, H.: Clustered chain path index for XML document: efficiently processing branch queries. In: Proc. of WISE, 474–486 (2006)

  17. Wong, K.-F., Yu, J.X., Tang, N.: Answering XML queries using path-based indexes: a survey. J. World Wide Web 9(3):277–299

    Article  Google Scholar 

  18. XMark: The XML-benchmark project. http://monetdb.cwi.nl/xml

  19. XML Path Language (XPath) 2.0. http://www.w3.org/TR/xpath20/

  20. XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery/

  21. Zhang, N., Kacholia, V., Özsu, M.T.: A succinct physical storage scheme for efficient evaluation of path queries in XML. In: ICDE 2004, 54–65 (2004)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongqiang Wang.

Additional information

This paper is partially supported by Natural Science Foundation of Heilongjiang Province, Grant No. zjg03-05 and National Natural Science Foundation of China, Grant No. 60473075 and Key Program of the National Natural Science Foundation of China, Grant No. 60533110.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Li, J. & Wang, H. Clustered Chain Path Index for XML Document: Efficiently Processing Branch Queries. World Wide Web 11, 153–168 (2008). https://doi.org/10.1007/s11280-007-0029-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-007-0029-6

Keywords

Navigation