Skip to main content

Schemas for Unordered XML on a DIME

Abstract

We investigate schema languages for unordered XML having no relative order among siblings. First, we propose unordered regular expressions (UREs), essentially regular expressions with unordered concatenation instead of standard concatenation, that define languages of unordered words to model the allowed content of a node (i.e., collections of the labels of children). However, unrestricted UREs are computationally too expensive as we show the intractability of two fundamental decision problems for UREs: membership of an unordered word to the language of a URE and containment of two UREs. Consequently, we propose a practical and tractable restriction of UREs, disjunctive interval multiplicity expressions (DIMEs). Next, we employ DIMEs to define languages of unordered trees and propose two schema languages: disjunctive interval multiplicity schema (DIMS), and its restriction, disjunction-free interval multiplicity schema (IMS). We study the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, as well as twig query satisfiability, implication, and containment in the presence of schema. Finally, we study the expressive power of the proposed schema languages and compare them with yardstick languages of unordered trees (FO, MSO, and Presburger constraints) and DTDs under commutative closure. Our results show that the proposed schema languages are capable of expressing many practical languages of unordered trees and enjoy desirable computational properties.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    http://dblp.uni-trier.de/xml/dblp.dtd

References

  1. 1.

    Abiteboul, S., Bourhis, P., Vianu, V.: Highly expressive query languages for unordered data trees. In: ICDT, pp 46–60 (2012)

  2. 2.

    Albert, J., Giammarresi, D., Wood, D.: Normal form algorithms for extended context-free grammars. Theor. Comput. Sci. 267(1-2), 35–47 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  3. 3.

    Amer-Yahia, S., Cho, S., Lakshmanan, L.V.S., Srivastava, D.: Tree pattern query minimization. VLDB J. 11(4), 315–331 (2002)

    Article  MATH  Google Scholar 

  4. 4.

    Beeri, C., Milo, T.: Schemas for integration and translation of structured and semi-structured data. In: ICDT, pp 296–313 (1999)

  5. 5.

    Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM 55(2) (2008)

  6. 6.

    Berglund, M., Björklund, H., Högberg, J.: Recognizing shuffled languages. In: LATA, pp 142–154 (2011)

  7. 7.

    Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst 35(2) (2010)

  8. 8.

    Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML Schema A practical study. In: WebDB, pp 79–84 (2004)

  9. 9.

    Björklund, H., Martens, W., Schwentick, T.: Validity of tree pattern queries with respect to schema information MFCS, pp 171–182 (2013)

  10. 10.

    Boneva, I., Ciucanu, R., Staworko, S.: Simple schemas for unordered XML. In: WebDB (2013)

  11. 11.

    Boneva, I., Gayo, J.E.L., Hym, S., Prud’hommeau, E.G., Solbrig, H.R., Staworko, S.: Validating RDF with shape expressions. arXiv:CoRRabs/1404.1270 CoRR (2014)

  12. 12.

    Boneva, I., Talbot, J.: Automata and logics for unranked and unordered trees. In: RTA, pp 500–515 (2005)

  13. 13.

    Boneva, I., Talbot, J., Tison, S.: Expressiveness of a spatial logic for trees. In: LICS, pp 280–289 (2005)

  14. 14.

    Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 142(2), 182–206 (1998)

    Article  MATH  Google Scholar 

  15. 15.

    Cardelli, L., Ghelli, G.: TQL: a query language for semistructured data based on the ambient logic. Math. Struct. Comput. Sci. 14(3), 285–327 (2004)

    MathSciNet  Article  MATH  Google Scholar 

  16. 16.

    Ciucanu, R., Staworko, S.: Learning schemas for unordered XML. In: DBPL (2013)

  17. 17.

    Colazzo, D., Ghelli, G., Pardini, L., Sartiani, C.: Almost-linear inclusion for XML regular expression types. ACM Trans. Database Syst. 38(3), 15 (2013)

    MathSciNet  Article  Google Scholar 

  18. 18.

    Colazzo, D., Ghelli, G., Sartiani, C.: Efficient inclusion for a class of XML types with interleaving and counting. Inf. Syst. 34(7), 643–656 (2009)

    Article  MATH  Google Scholar 

  19. 19.

    Czerwinski, W., David, C., Losemann, K., Martens, W.: Deciding definability by deterministic regular expressions. In: FoSSaCS, pp 289–304 (2013)

  20. 20.

    Dal-Zilio, S., Lugiez, D.: XML schema, tree logic and sheaves automata RTA, pp 246–263 (2003)

  21. 21.

    Gelade, W., Martens, W., F. Neven.: Optimizing schema languages for XML, Numerical constraints and interleaving. SIAM J. Comput. 38(5), 2021–2043 (2009)

    Article  MATH  Google Scholar 

  22. 22.

    Ghelli, G., Colazzo, D., Sartiani, C.: Linear time membership in a class of regular expressions with interleaving and counting. In: CIKM, pp 389–398 (2008)

  23. 23.

    Grijzenhout, S., Marx, M.: The quality of the XML web. J. Web Sem. 19, 59–68 (2013)

    Article  Google Scholar 

  24. 24.

    Hashimoto, K., Kusunoki, Y., Ishihara, Y., Fujiwara, T.: Validity of positive XPath queries with wildcard in the presence of DTDs. In: DBPL (2011)

  25. 25.

    Hovland, D.: The membership problem for regular expressions with unordered concatenation and numerical constraints. In: LATA, pp 313–324 (2012)

  26. 26.

    Kopczynski, E., To, A.: Parikh images of grammars Complexity and applications. In: LICS, pp 80–89 (2010)

  27. 27.

    Martens, W., Neven, F.: On the complexity of typechecking top-down XML transformations. Theor. Comput. Sci. 336(1), 153–180 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  28. 28.

    Martens, W., Neven, F., Gyssens, M.: Typechecking top-down XML transformations Fixed input or output schemas. Inf. Comput. 206(7), 806–827 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  29. 29.

    Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: MFCS, pp 889–900 (2004)

  30. 30.

    Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  31. 31.

    Mayer, A.J., Stockmeyer, L.J.: Word problems-this time with interleaving. Inf. Comput. 115(2), 293–311 (1994)

    MathSciNet  Article  Google Scholar 

  32. 32.

    Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)

    MathSciNet  Article  Google Scholar 

  33. 33.

    Montazerian, M., Wood, P. T., Mousavi, S. R.: XPath query satisfiability is in PTIME for real-world DTDs XSym, pp 17–30 (2007)

  34. 34.

    Neven, F., Schwentick, T.: XML schemas without order (1999)

  35. 35.

    Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science 2 (3) (2006)

  36. 36.

    Oppen, D.C.: A \(2^{2^{2^{p_{n}}}}\) upper bound on the complexity of Presburger arithmetic. J. Comput. Syst. Sci. 16(3), 323–332 (1978)

    MathSciNet  Article  MATH  Google Scholar 

  37. 37.

    Papakonstantinou, Y., Vianu, V.: DTD inference for views of XML data. In: PODS, pp 35–46 (2000)

  38. 38.

    Schaefer, T.J.: The complexity of satisfiability problems. In: STOC, pp 216–226 (1978)

  39. 39.

    Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., XMark, R. Busse.: A benchmark for XML data management VLDB, pp 974–985 (2002)

  40. 40.

    Schwentick, T.: Trees, automata and XML PODS, p 222 (2004)

  41. 41.

    Segoufin, L., Sirangelo, C.: Constant-memory validation of streaming XML documents against DTDs. In: ICDT, pp 299–313 (2007)

  42. 42.

    Segoufin, L., Vianu, V.: Validating streaming XML documents. In: PODS, pp 53–64 (2002)

  43. 43.

    Seidl, H., Schwentick, T., Muscholl, A.: Numerical document queries. In: PODS, pp 155–166 (2003)

  44. 44.

    Seidl, H., Schwentick, T., Muscholl, A.: Counting in trees. In: Logic and Automata, pp 575–612 (2008)

  45. 45.

    Staworko, S., Wieczorek, P.: Learning twig and path queries. In: ICDT, pp 140–154 (2012)

  46. 46.

    Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time Preliminary report STOC, pp 1–9 (1973)

  47. 47.

    W3C: XML Path language (XPath) 1.0 (1999)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Radu Ciucanu.

Additional information

A preliminary version of this article has appeared in the Proceedings of the 16th International Workshop on the Web and Databases (WebDB), 2013 [10].

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Boneva, I., Ciucanu, R. & Staworko, S. Schemas for Unordered XML on a DIME. Theory Comput Syst 57, 337–376 (2015). https://doi.org/10.1007/s00224-014-9593-1

Download citation

Keywords

  • Schemas for XML
  • Unordered XML
  • Regular expressions
  • Twig queries
  • Semi-structured data.