Skip to main content
Log in

Path-partitioned encoding supports wildcard-awareness twig queries

  • Published:
Journal of Shanghai University (English Edition)

Abstract

Finding all occurrences of a twig query in an XML database is a core operation for efficient evaluation of XML queries. It is important to effectively handle twig queries with wildcards. In this paper, a novel path-partitioned encoding scheme is proposed for XML documents to capture paths of all elements, and a twig query is modeled as an XPattern extended from tree pattern. After definition, simplification, normalization, verification and initialization of the XPattern, both work sets and a join plan are generated. According to these measures, an effective algorithm to answer for a twig query, called DMTwig, is designed without unnecessary elements and invalid structural joins. The algorithm can adaptively deal with twig queries with branch ([]), child edge (/), descendant edge (//), and wildcard (*) synthetically. We show that path-partitioned encoding scheme and XPattern guarantee the I/O and CPU optimality for twig queries. Experiments on representative data set indicate that the proposed solution performs significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bruno N, Koudas N, Srivastava D. Holistic twig joins: Optimal XML pattern matching [C]// Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin. 2002: 310–321.

  2. Grust T. Accelerating XPath location steps [C]// Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin. 2002: 109–120.

  3. Wu X, Lee M, Hsu W. A prime number labeling scheme for dynamic ordered XML trees [C]// Proceedings of the 20th International Conference on Data Engineering, Boston Massachusetts, 2004: 66–77.

  4. Chen T, Lu J H, Ling T W. On boosting holism in XML twig pattern matching using structural indexing techniques [C]// Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA. 2005: 455–466.

  5. Lu J H, Ling T W, Chan C Y, Chen T. From region encoding to extended dewey: On efficient processing of XML twig pattern matching [C]// Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway. 2005: 193–204.

  6. Schmidt A R, Waas F, Kersten M L, Florescu D, Manolescu I, Carey M J, Busse R. The XML benchmark project [R]. Technical Report INS-R0103, CWI, Amsterdam, The Netherlands, April, 2001.

    Google Scholar 

  7. Miklau G, Suciu D. Containment and equivalence for a fragment of XPath [J]. Journal of the ACM, 2004, 51(1): 2–45.

    Article  MathSciNet  Google Scholar 

  8. Hockenmaier J, Steedman M. CCGbank: A corpus of CCG derivations and dependency structures extracted from the penn treebank [J]. Computational Linguistics, 2007, 33(3): 355–396.

    Article  Google Scholar 

  9. Elmacioglu E, Lee D. On six degrees of separation in DBLP-DB and more [J]. ACM SIGMOD Record, 2005, 34(2): 33–40.

    Article  Google Scholar 

  10. Wood P T. Minimizing simple XPath expressions [C]// Proceedings of the 4th International Workshop on the Web and Databases (WebDB), Santa Barbara, California, USA, 2001: 13–18.

  11. Amer-Yahis S, Cho S, Lakshmanan L V, Srivastava D. Minimization of tree pattern queries [C]// Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, California, USA, 2001: 497–508.

  12. Lu H, Chen T, Ling T W. Efficient processing of XML twig patterns with parent child edges: A look-ahead approach [C]// Proceedings of CIKM Conference, Washington, DC, USA. 2004: 533–542.

  13. Chen S T, Li H G, Tatemura J, Hsiung W P, Agrawal D, Selçuk C K. Twig2Stack: Bottomup processing of generalized tree pattern queries over XML documents [C]// Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea. 2006: 283–294.

  14. Tatarinov I, Viglas S, Beyer K S, Shanmugasundaram J, Shekita E J, Zhang C. Storing and querying ordered XML using a relational database system [C]// Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin. 2002: 204–215.

  15. Wu X, Liu G Q. XML twig pattern matching using version tree [J]. Data and Knowledge Engineering, 2008, 64(3): 580–599.

    Article  Google Scholar 

  16. Goldman R, Widom J. DataGuides: Enabling query formulation and optimization in semistructured database [C]// Proceedings of the 23th International Conference on Very Large Data Bases, Athens, Greece. 1997: 436–445.

  17. Milo T, Suciu D. Index structures for path expressions [C]// Procedings of the 7th International Conference on Database Theory, Jerusalem. 1999: 277–255.

  18. Moro M, Vagena Z, Tsotras V. XML structural summaries [C]// Proceedings of the VLDB Endowment, Auckland, New Zealand. 2008, 1(2): 1524–1525.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-shuang Xu  (徐小双).

Additional information

Project supported by the National High-Tech Research and Development Plan of China (Grant No.2005AA4Z3030)

About this article

Cite this article

Xu, Xs., Feng, Yc. & Wang, F. Path-partitioned encoding supports wildcard-awareness twig queries. J. Shanghai Univ.(Engl. Ed.) 13, 363–374 (2009). https://doi.org/10.1007/s11741-009-0506-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11741-009-0506-1

Keywords

Navigation