The VLDB Journal

, Volume 22, Issue 3, pp 369–393 | Cite as

Optimal and efficient generalized twig pattern processing: a combination of preorder and postorder filterings

  • Radim Bača
  • Michal Krátký
  • Tok Wang Ling
  • Jiaheng Lu
Regular Paper

Abstract

Searching for occurrences of a twig pattern query (TPQ) in an XML document is a core task of all XML database query languages. The generalized twig pattern (GTP) extends the TPQ model to include semantics related to output nodes, optional nodes, and boolean expressions which are part of the XQuery language. Preorder filtering holistic algorithms such as TwigStack represent a significant class of TPQ processing approaches with a linear worst-case I/O complexity with respect to the sum of the input and output sizes for some query classes. Another important class of holistic approaches is represented by postorder filtering holistic algorithms such as \(\text{ Twig}^2\)Stack which introduced a linear output enumeration time with respect to the result size. In this article, we introduce a holistic algorithm called GTPStack which is the first approach capable of processing a GTP with a linear worst-case I/O complexity with respect to the GTP result size. This is achieved by using a combination of the preorder and postorder filterings before storing nodes in an intermediate storage. Additionally, another contribution of this article is an introduction of a new perspective of holistic algorithm optimality. We show that the optimality depends not only on a query class but also on XML document characteristics. This new view on the optimality extends the general knowledge about the type of queries for which the holistic algorithms are optimal. Moreover, it allows us to determine that GTPStack is optimal for any GTP when a specific XML document is considered. We present a comprehensive experimental study of the state-of-the-art holistic algorithms showing under which conditions GTPStack outperforms the other holistic approaches.

Keywords

XML Query processing  Generalized twig pattern Holistic algorithms 

References

  1. 1.
    Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Srivastava, D., Wu, Y.: Structural joins: a primitive for efficient XML query pattern matching. In: Proceedings of ICDE 2002, pp. 141–152. IEEE CS (2002)Google Scholar
  2. 2.
    Bača, R., Krátký, M.: On the Efficiency of a prefix path holistic algorithm. In: Proceedings of Database and XML Technologies, XSym 2009, vol. LNCS 5679, pp. 25–32. Springer (2009)Google Scholar
  3. 3.
    Bača, R., Krátký, M., Snášel, V.: On the efficient search of an XML twig query in large dataGuide trees. In: Proceedings of the Twelfth International Database Engineering & Applications Symposium, IDEAS 2008, pp. 149–158. ACM Press (2008)Google Scholar
  4. 4.
    Bača, R., Walder, J., Pawlas, M., Krátký, M.: Benchmarking the compression of XML node streams. In: Database Systems for Advanced Applications: 15th International Conference, DASFAA 2010, International Workshops, vol. 6193, pp. 179–190. Springer (2010)Google Scholar
  5. 5.
    Brantner, M., Helmer, S., Kanne, C.-C., Moerkotte, G.: Full-fledged algebraic XPath processing in Natix. In: Proceedings of Data Engineering, 2005. ICDE 2005, pp. 705–716. IEEE (2005)Google Scholar
  6. 6.
    Bruno, N., Srivastava, D., Koudas, N.: Holistic twig joins: optimal XML pattern matching. In: Proceedings of ACM SIGMOD 2002, pp. 310–321. ACM Press (2002)Google Scholar
  7. 7.
    Che, D., Ling, T.W., Hou, W.-C.: Holistic boolean-twig pattern matching for efficient XML query processing. IEEE Trans. Knowl. Data Eng. 99, 2008–2024Google Scholar
  8. 8.
    Chen, S., Li, H.-G., Tatemura, J., Hsiung, W.-P., Agrawal, D., Candan, K.S.: Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents. In: Proceedings of VLDB 2006, pp. 283–294 (2006)Google Scholar
  9. 9.
    Chen, T., Lu, J., Ling, T.W.: On boosting holism in XML twig pattern matching using structural indexing techniques. In: Proceedings of ACM SIGMOD 2005, pp. 455–466. ACM Press (2005)Google Scholar
  10. 10.
    Chen, Z., Jagadish, H.V., Lakshmanan, L.V.S., Paparizos, S.: From tree patterns to generalized tree patterns: on efficient evaluation of XQuery. In: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB 2003, pp. 237–248 (2003)Google Scholar
  11. 11.
    Cooper, B., Sample, N., Franklin, M.J., Hjaltason, G.R., Shadmon, M.: A fast index for semistructured data. In: Proceedings of VLDB 2001, pp. 341–350 (2001)Google Scholar
  12. 12.
    Dietz, P.F.: Maintaining order in a linked list. In: Proceedings of 14th annual ACM symposium on theory of computing (STOC 1982), pp. 122–127 (1982)Google Scholar
  13. 13.
    Fuhr, N., Gvert, N., Malik, S., Lalmas, M., Kazai, G.: INEX (2007) https://inex.mmci.uni-saarland.de/
  14. 14.
    Goldman, R., Widom, J.: DataGuides: enabling query formulation and optimization in semistructured databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pp. 436–445 (1997)Google Scholar
  15. 15.
    Grimsmo, N., Bjørklund, T.A., Hetland, M.L.: Fast optimal twig joins. In: Proceedings of the 36th International Conference on Very Large Data Bases, VLDB 2010, pp. 894–905. VLDB Endowment (2010)Google Scholar
  16. 16.
    Grust, T., van Keulen, M., Teubner, J.: Staircase join: teach a relational DBMS to watch its (Axis) steps. In: Proceedings of VLDB 2003, pp. 524–535 (2003)Google Scholar
  17. 17.
    Härder, T., Haustein, M., Mathis, C., Wagner, M.: Node labeling schemes for dynamic XML documents reconsidered. Data Knowl. Eng. 60, 126–149 (2007)CrossRefGoogle Scholar
  18. 18.
    Jiang, H., Lu, H., Wang, W.: Efficient processing of XML twig queries with OR-predicates. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of data, pp. 59–70. ACM New York (2004)Google Scholar
  19. 19.
    Kaushik, R., Bohannon, P., Naughton, J., Korth, H.: Covering indexes for branching path queries. In: Proceedings of ACM SIGMOD 2002, pp. 133–144. ACM Press (2002)Google Scholar
  20. 20.
    Krátký, M., Bača, R., Snášel, V.: On the efficient processing regular path expressions of an enormous volume of XML data. In: Proceedings of DEXA 2007, vol. 4653 of LNCS, pp. 1–12. Springer (2007)Google Scholar
  21. 21.
    Krátký, M., Pokorný, J., Snášel, V.: Implementation of XPath axes in the multi-dimensional approach to indexing XML data. In: Current Trends in Database Technology, EDBT 2004, vol. 3268 of LNCS. Springer (2004)Google Scholar
  22. 22.
    Li, G., Feng, J., Zhang, Y., Zhou, L.: Efficient holistic twig joins in leaf-to-root combining with root-to-leaf way. In: Proceedings of the 12th International Conference on Database systems for Advanced Applications, DASFAA ’07, pp. 834–849. Springer (2007)Google Scholar
  23. 23.
    Li, J., Wang, J.: TwigBuffer: avoiding useless intermediate solutions completely in twig joins. In: The 13th International Conference on Database Systems for Advanced Applications, DASFAA 2008, vol. 4947, pp. 554–561. Springer (2008)Google Scholar
  24. 24.
    Lu, J., Chen, T., Ling, T.W.: Efficient processing of XML twig patterns with parent child edges: a look-ahead approach. In: Proceedings of ACM CIKM 2004, pp. 533–542. ACM Press (2004)Google Scholar
  25. 25.
    Lu, J., Ling, T.W., Bao, Z., Wang, C.: Extended XML tree pattern matching: theories and algorithms. IEEE Trans. Knowl. Data Eng. 23, 402–416 (2011)CrossRefGoogle Scholar
  26. 26.
    Lu, J., Ling, T.W., Chan, C.-Y., Chen, T.: From region encoding to extended Dewey: on efficient processing of XML twig pattern matching. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 193–204 (2005)Google Scholar
  27. 27.
    Lu, J., Ling, T.W., Yu, T., Li, C., Ni, W.: Efficient processing of ordered XML twig pattern. In: Proceedings of DEXA 2005, vol. 3588 of LNCS, pp. 300–309. Springer (2005)Google Scholar
  28. 28.
    Lu, J., Meng, X., Ling, T.W.: Indexing and querying XML using extended Dewey labeling scheme. Data Knowl. Eng. 70(1), 35–59 (2011)CrossRefGoogle Scholar
  29. 29.
    Ley, M.: The DBLP computer science bibliography, http://www.informatik.uni-trier.de/~ley/db/
  30. 30.
    Michiels, P., Mihaila, G., Siméon, J.: Put a tree pattern in your algebra. In: Proceedings of the 23th International Conference on Data Engineering, ICDE 2007, pp. 246–255 (2007)Google Scholar
  31. 31.
    Moro, M.M., Vagena, Z., Tsotras, V.J.: Tree-pattern queries on a lightweight XML processor. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 205–216 (2005)Google Scholar
  32. 32.
    Paparizos, S., Wu, Y., Lakshmanan, L.V.S., Jagadish, H.V.: Tree logical classes for efficient evaluation of XQuery. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of data, pp. 71–82. ACM (2004)Google Scholar
  33. 33.
    Qin, L., Yu, J.X., Ding, B.: TwigList: make twig pattern matching fast. In: The 12th International Conference on Database Systems for Advanced Applications, DASFAA 2007, vol. 4443 of LNCS, pp. 850–862. Springer (2007)Google Scholar
  34. 34.
    Schmidt, A.R. et al.: The XML benchmark. Technical Report INS-R0103, CWI, The Netherlands (April 2001), http://monetdb.cwi.nl/xml/
  35. 35.
    Tatarinov, I., Viglas, S.D., Beyer, K., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered XML using a relational database system. In: Proceedings of ACM SIGMOD 2002, pp. 204–215. New York, USA (2002)Google Scholar
  36. 36.
    University of Washington Database Group: The XML Data Repository http://www.cs.washington.edu/research/xmldatasets/ 2002
  37. 37.
    W3 Consortium: XQuery 1.0: An XML Query Language, W3C Working Draft, 12 November 2003, http://www.w3.org/TR/xquery/
  38. 38.
    Wang, H., Park, S., Fan, W., Yu, P.S.: ViST: a dynamic index method for querying XML data by tree structures. In: Proceedings of the ACM SIGMOD 2003, pp. 110–121. ACM Press (2003)Google Scholar
  39. 39.
    Weiner, A.M., Härder, T.: Using structural joins and holistic twig joins for native XML query optimization. In: Advances in Databases and Information Systems, vol. 5739 of LNCS, pp. 149–163. Springer, Berlin Heidelberg (2009)Google Scholar
  40. 40.
    Weiner, A.M., Härder, T.: An integrative approach to query optimization in native XML database management systems. In: Proceedings of the Fourteenth International Database Engineering & Applications Symposium, IDEAS ’10, pp. 64–74. ACM, New York, NY, USA (2010)Google Scholar
  41. 41.
    Wu, H., Ling, T.W., Chen, B., Xu, L.: TwigTable: using semantics in XML twig pattern query processing. In: Journal on Data Semantics XV, vol. 6720 of LNCS, pp. 102–129. Springer, Berlin Heidelberg (2011)Google Scholar
  42. 42.
    Wu, H., Ling, T.W., Dobbie, G.: TP+Output: modeling complex output information in XML twig pattern query. In: Database and XML Technologies, pp. 128–143. Springer (2010) Google Scholar
  43. 43.
    Yang, B., Fontoura, M., Shekita, E., Rajagopalan, S., Beyer, K.: Virtual cursors for XML joins. In: Proceedings of the thirteenth ACM International Conference on Information and Knowledge Management, CIKM 2004, pp. 523–532. ACM (2004)Google Scholar
  44. 44.
    Yoshikawa, M., Amagasa, T., Shimura, T., Uemura, S.: XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM Trans. Internet Technol 1(1), 110–141 (2001)Google Scholar
  45. 45.
    Yu, T., Ling, T.W., Lu, J.: TwigStackList\(\lnot \): a holistic twig join algorithm for twig query with not-predicates on XML data. In: The 11th International Conference on Database Systems for Advanced Applications, DASFAA 2006, vol. 3882, pp. 249–263. Springer (2006)Google Scholar
  46. 46.
    Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proceedings of ACM SIGMOD 2001, pp. 425–436 (2001)Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Radim Bača
    • 1
  • Michal Krátký
    • 1
  • Tok Wang Ling
    • 2
  • Jiaheng Lu
    • 3
  1. 1.Department of Computer ScienceVŠB—Technical University of OstravaOstravaCzech Republic
  2. 2.Department of Computer ScienceNational University of SingaporeSingaporeSingapore
  3. 3.DEKE, MOE and School of InformationRenmin University of ChinaBeijingChina

Personalised recommendations