Skip to main content

Conjunctive query containment over trees using schema information

Abstract

We study the containment, satisfiability, and validity problems for conjunctive queries over trees with respect to a schema. We show that conjunctive query containment and validity are 2EXPTIME -complete with respect to a schema, in both cases where the schema is given as a DTD or as a tree automaton. Furthermore, we show that satisfiability for conjunctive queries with respect to a schema can be decided in NP . The problem is NP -hard already for queries using only one kind of axis. Finally, we consider conjunctive queries that can test for equalities and inequalities of data values. Here, satisfiability and validity are decidable, but containment is undecidable, even without schema information. On the other hand, containment with respect to a schema becomes decidable again if the “larger” query is not allowed to use both equalities and inequalities.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. We do not require CQs to be in prenex normal form. However, all formulas that we construct in the paper can be put in prenex normal form by simply renaming the variables and moving the quantifiers.

  2. Notice that, as stated in the introduction, we assume that trees only take labels from a finite alphabet \(\varSigma \). Hence, for a conjunctive query Q, the set L(Q) also consists of trees over alphabet \(\varSigma \). In the rare cases where we consider trees without schema information, we state this explicitly.

  3. Transforming an NTA to a reduced NTA can be done in polynomial time by first performing an emptiness test for every state of A, followed by a reachability test. Section 4.2 of [35] describes an algorithm for reducing a DTD. The algorithm for NTAs is analogous.

  4. To the best of our knowledge, the full proof is unpublished. For the convenience of our readers, we provide Wood’s proof, which he kindly provided in a personal communication.

  5. We assume \(\varDelta \) to contain all the data values we use in our proofs and examples.

  6. This definition is done with the proof of Theorem 23 in the back of our minds and therefore more complicated than a reader might have expected. In this proof, the reader should think of \(x'\) and \(y'\) as being mapped to the same node.

  7. Of course, the resulting equality atoms can be removed by suitable variable renaming.

  8. For the purpose of the reduction, a node v of the query is a leaf node if and only if the query does not have any atom of the form \(\textit{Child}\,(v,w)\) or \(\textit{Child}\,^+(v,w)\).

  9. We can assume w.l.o.g. that the free variables are the same in P and Q.

  10. Here, structural constraints include node identities and VBCs allow comparison of data values to constants.

References

  1. Abiteboul, S., Bourhis, P., Muscholl, A., Wu, Z.: Recursive queries on trees and data trees. In: International Conference on Database Theory (ICDT), pp. 93–104 (2013)

  2. Arenas, M., Barceló, P., Libkin, L., Murlak, F.: Foundations of Data Exchange. Cambridge University Press, Cambridge (2014)

    MATH  Google Scholar 

  3. Barceló, P., Libkin, L., Poggi, A., Sirangelo, C.: XML with incomplete information. J. ACM 58(1), 4 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  4. Benedikt, M., Bourhis, P., Senellart, P.: Monadic datalog containment. In: International Colloquium on Automata, Languages, and Programming (ICALP), pp. 79–91 (2012)

  5. Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM 55(2), Art. no. 8 (2008). doi:10.1145/1346330.1346333

  6. Berglund, A., Boag, S., Chamberlin, D., Fernández, M.F., Kay, M., Robie, J., Siméon, J.: XML path language (XPath) 2.0. Technical report, World Wide Web Consortium (2007). http://www.w3.org/TR/xpath20/

  7. Björklund, H., Martens, W., Schwentick, T.: Conjunctive query containment over trees. J. Comput. Syst. Sci. 77(3), 450–472 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  8. Björklund, H., Martens, W., Schwentick, T.: Validity of tree pattern queries with respect to schema information. In: Mathematical Foundations of Computer Science (MFCS), pp. 171–182 (2013)

  9. Bojanczyk, M., Kolodziejczyk, L.A., Murlak, F.: Solutions in XML data exchange. J. Comput. Syst. Sci. 79(6), 785–815 (2013)

    MathSciNet  Article  MATH  Google Scholar 

  10. Bojanczyk, M., Murlak, F., Witkowski, A.: Containment of monadic datalog programs via bounded clique-width. In: International Colloquium on Automata, Languages, and Programming (ICALP), pp. 427–439 (2015)

  11. Bojanczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees and XML reasoning. J. ACM 56(3), Art. no.13 (2009). doi:10.1145/1516512.1516515

  12. Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 142(2), 182–206 (1998)

    MathSciNet  Article  MATH  Google Scholar 

  13. Chandra, A.K., Kozen, D.C., Stockmeyer, L.J.: Alternation. J. ACM 28(1), 114–133 (1981)

    MathSciNet  Article  MATH  Google Scholar 

  14. Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: STOC, pp. 77–90 (1977)

  15. Chlebus, B.S.: Domino-tiling games. J. Comput. Syst. Sci. 32(3), 374–392 (1986)

    MathSciNet  Article  MATH  Google Scholar 

  16. Clark, J., Murata, M.: Relax NG specification (2001). http://www.relaxng.org/spec-20011203.html

  17. Czerwinski, W., David, C., Losemann, K., Martens, W.: Deciding definability by deterministic regular expressions. In: International Conference on Foundations of Software Science and Computation Structures (FOSSACS), pp 289–304. Springer, Berlin (2013)

  18. Czerwinski, W., Martens, W., Niewerth, M., Parys, P.: Minimization of tree pattern queries. In: Symposium on Principles of Database Systems (PODS), pp. 43–54 (2016)

  19. Czerwinski, W., Martens, W., Parys, P., Przybylko, M.: The (almost) complete guide to tree pattern containment. In: Symposium on Principles of Database Systems (PODS), pp. 117–130 (2015)

  20. David, C.: Complexity of data tree patterns over XML documents. In: MFCS, pp. 278–289 (2008)

  21. David, C., Gheerbrant, A., Libkin, L., Martens, W.: Containment of pattern-based queries over data trees. In: International Conference on Database Theory (ICDT), pp. 201–212 (2013)

  22. David, C., Hofman, P., Murlak, F., Pilipczuk, M.: Synthesizing transformations from XML schema mappings. In: International Conference on Database Theory (ICDT), pp. 61–71 (2014)

  23. David, C, Libkin, L., Murlak, F.: Certain answers for XML queries. In: Symposium on Principles of Database Systems (PODS), pp. 191–202 (2010)

  24. Flum, Jörg, Frick, Markus, Grohe, Martin: Query evaluation via tree-decompositions. J. ACM 49(6), 716–752 (2002)

    MathSciNet  Article  MATH  Google Scholar 

  25. Gallant, J., Maier, D., Storer, J.A.: On finding minimal length superstrings. J. Comput. Syst. Sci. 20(1), 50–58 (1980)

    MathSciNet  Article  MATH  Google Scholar 

  26. Geerts, F., Fan, W.: Satisfiability of XPath queries with sibling axes. In: DBPL, pp. 122–137 (2005)

  27. Gheerbrant, A., Libkin, L., Tan, T.: On the complexity of query answering over incomplete XML documents. In: ICDT, pp. 169–181 (2012)

  28. Gottlob, G., Koch, C., Schulz, K.U.: Conjunctive queries over trees. J. ACM 53(2), 238–272 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  29. Hidders, J.: Satisfiability of XPath expressions. In: DBPL, pp. 21–36 (2003)

  30. Kimelfeld, B., Sagiv, Y.: Revisiting redundancy and minimization in an XPath fragment. In: Extending Database Technology (EDBT), pp. 61–72 (2008)

  31. Kolaitis, P.G., Vardi, M.Y.: Conjunctive-query containment and constraint satisfaction. J. Comput. Syst. Sci. 61(2), 302–332 (2000)

    MathSciNet  Article  MATH  Google Scholar 

  32. Lakshmanan, L.V.S., Ramesh, G., Wang, H., Zhao, Z.: On testing satisfiability of tree pattern queries. In: VLDB, pp. 120–131 (2004)

  33. Lu, P., Bremer, J., Chen, H.: Deciding determinism of regular languages. Theory Comput. Syst. 57(1), 97–139 (2015). doi:10.1007/s00224-014-9576-2

  34. Martens, W., Neven, F.: On the complexity of typechecking top-down XML transformations. Theor. Comput. Sci. 336(1), 153–180 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  35. Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  36. Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Syst. 31(3), 770–813 (2006)

    Article  Google Scholar 

  37. Marx, M.: Conditional XPath. ACM TODS 30(4), 929–959 (2005)

    Article  Google Scholar 

  38. Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)

    MathSciNet  Article  MATH  Google Scholar 

  39. Murlak, F., Oginski, M., Przybylko, M.: Between tree patterns and conjunctive queries: Is there tractability beyond acyclicity? In: Mathematical Foundations of Computer Science (MFCS), pp. 705–717 (2012)

  40. Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Log. Methods Comput. Sci. 2(3), Art. no. 1 (2006). doi:10.2168/LMCS-2(3:1)2006

  41. Post, E.L.: A variant of a recursively unsolvable problem. Bull. AMS 52(4), 264–268 (1946)

    MathSciNet  Article  MATH  Google Scholar 

  42. Räihä, K.J., Ukkonen, E.: The shortest common supersequence problem over binary alphabet is NP-complete. Theor. Comput. Sci. 16(2), 187–198 (1981)

    MathSciNet  Article  MATH  Google Scholar 

  43. Takahashi, M.: Generalizations of regular sets and their application to a study of context-free languages. Inf. Control 27(1), 1–36 (1975)

    MathSciNet  Article  MATH  Google Scholar 

  44. ten Cate, B., Lutz, C.: The complexity of query containment in expressive fragments of XPath 2. J. ACM 56(6), Art. no. 31 (2009). doi:10.1145/1568318.1568321

  45. Thatcher, James W., Wright, Jesse B.: Generalized finite automata theory with an application to a decision problem of second-order logic. Math. Syst. Theory 2(1), 57–81 (1968)

    MathSciNet  Article  MATH  Google Scholar 

  46. Vardi, Moshe Y.: Reasoning about the past with two-way automata. In: Proceedings of the 25th International Colloquium on Automata, Languages and Programming (ICALP’98), Aalborg, Denmark, July 13–17, 1998, pp. 628–641 (1998)

  47. Wood, P.T.: Containment for XPath fragments under DTD constraints. In: ICDT, 2003. Full version, obtained through personal communication (2003)

Download references

Acknowledgments

This work was supported by grant number MA 4938/2-1 from the Deutsche Forschungsgemeinschaft (Emmy Noether Nachwuchsgruppe) and the Swedish Research Council Grant 621-2011-6080.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wim Martens.

Additional information

A preliminary version of this work was presented at the 33rd International Symposium on Mathematical Foundations of Computer Science.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Björklund, H., Martens, W. & Schwentick, T. Conjunctive query containment over trees using schema information. Acta Informatica 55, 17–56 (2018). https://doi.org/10.1007/s00236-016-0282-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00236-016-0282-1