Theory of Computing Systems

, Volume 57, Issue 2, pp 337–376 | Cite as

Schemas for Unordered XML on a DIME



We investigate schema languages for unordered XML having no relative order among siblings. First, we propose unordered regular expressions (UREs), essentially regular expressions with unordered concatenation instead of standard concatenation, that define languages of unordered words to model the allowed content of a node (i.e., collections of the labels of children). However, unrestricted UREs are computationally too expensive as we show the intractability of two fundamental decision problems for UREs: membership of an unordered word to the language of a URE and containment of two UREs. Consequently, we propose a practical and tractable restriction of UREs, disjunctive interval multiplicity expressions (DIMEs). Next, we employ DIMEs to define languages of unordered trees and propose two schema languages: disjunctive interval multiplicity schema (DIMS), and its restriction, disjunction-free interval multiplicity schema (IMS). We study the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, as well as twig query satisfiability, implication, and containment in the presence of schema. Finally, we study the expressive power of the proposed schema languages and compare them with yardstick languages of unordered trees (FO, MSO, and Presburger constraints) and DTDs under commutative closure. Our results show that the proposed schema languages are capable of expressing many practical languages of unordered trees and enjoy desirable computational properties.


Schemas for XML Unordered XML Regular expressions Twig queries Semi-structured data. 


  1. 1.
    Abiteboul, S., Bourhis, P., Vianu, V.: Highly expressive query languages for unordered data trees. In: ICDT, pp 46–60 (2012)Google Scholar
  2. 2.
    Albert, J., Giammarresi, D., Wood, D.: Normal form algorithms for extended context-free grammars. Theor. Comput. Sci. 267(1-2), 35–47 (2001)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Amer-Yahia, S., Cho, S., Lakshmanan, L.V.S., Srivastava, D.: Tree pattern query minimization. VLDB J. 11(4), 315–331 (2002)CrossRefMATHGoogle Scholar
  4. 4.
    Beeri, C., Milo, T.: Schemas for integration and translation of structured and semi-structured data. In: ICDT, pp 296–313 (1999)Google Scholar
  5. 5.
    Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM 55(2) (2008)Google Scholar
  6. 6.
    Berglund, M., Björklund, H., Högberg, J.: Recognizing shuffled languages. In: LATA, pp 142–154 (2011)Google Scholar
  7. 7.
    Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst 35(2) (2010)Google Scholar
  8. 8.
    Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML Schema A practical study. In: WebDB, pp 79–84 (2004)Google Scholar
  9. 9.
    Björklund, H., Martens, W., Schwentick, T.: Validity of tree pattern queries with respect to schema information MFCS, pp 171–182 (2013)Google Scholar
  10. 10.
    Boneva, I., Ciucanu, R., Staworko, S.: Simple schemas for unordered XML. In: WebDB (2013)Google Scholar
  11. 11.
    Boneva, I., Gayo, J.E.L., Hym, S., Prud’hommeau, E.G., Solbrig, H.R., Staworko, S.: Validating RDF with shape expressions. arXiv:CoRRabs/1404.1270 CoRR (2014)
  12. 12.
    Boneva, I., Talbot, J.: Automata and logics for unranked and unordered trees. In: RTA, pp 500–515 (2005)Google Scholar
  13. 13.
    Boneva, I., Talbot, J., Tison, S.: Expressiveness of a spatial logic for trees. In: LICS, pp 280–289 (2005)Google Scholar
  14. 14.
    Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Inf. Comput. 142(2), 182–206 (1998)CrossRefMATHGoogle Scholar
  15. 15.
    Cardelli, L., Ghelli, G.: TQL: a query language for semistructured data based on the ambient logic. Math. Struct. Comput. Sci. 14(3), 285–327 (2004)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Ciucanu, R., Staworko, S.: Learning schemas for unordered XML. In: DBPL (2013)Google Scholar
  17. 17.
    Colazzo, D., Ghelli, G., Pardini, L., Sartiani, C.: Almost-linear inclusion for XML regular expression types. ACM Trans. Database Syst. 38(3), 15 (2013)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Colazzo, D., Ghelli, G., Sartiani, C.: Efficient inclusion for a class of XML types with interleaving and counting. Inf. Syst. 34(7), 643–656 (2009)CrossRefMATHGoogle Scholar
  19. 19.
    Czerwinski, W., David, C., Losemann, K., Martens, W.: Deciding definability by deterministic regular expressions. In: FoSSaCS, pp 289–304 (2013)Google Scholar
  20. 20.
    Dal-Zilio, S., Lugiez, D.: XML schema, tree logic and sheaves automata RTA, pp 246–263 (2003)Google Scholar
  21. 21.
    Gelade, W., Martens, W., F. Neven.: Optimizing schema languages for XML, Numerical constraints and interleaving. SIAM J. Comput. 38(5), 2021–2043 (2009)CrossRefMATHGoogle Scholar
  22. 22.
    Ghelli, G., Colazzo, D., Sartiani, C.: Linear time membership in a class of regular expressions with interleaving and counting. In: CIKM, pp 389–398 (2008)Google Scholar
  23. 23.
    Grijzenhout, S., Marx, M.: The quality of the XML web. J. Web Sem. 19, 59–68 (2013)CrossRefGoogle Scholar
  24. 24.
    Hashimoto, K., Kusunoki, Y., Ishihara, Y., Fujiwara, T.: Validity of positive XPath queries with wildcard in the presence of DTDs. In: DBPL (2011)Google Scholar
  25. 25.
    Hovland, D.: The membership problem for regular expressions with unordered concatenation and numerical constraints. In: LATA, pp 313–324 (2012)Google Scholar
  26. 26.
    Kopczynski, E., To, A.: Parikh images of grammars Complexity and applications. In: LICS, pp 80–89 (2010)Google Scholar
  27. 27.
    Martens, W., Neven, F.: On the complexity of typechecking top-down XML transformations. Theor. Comput. Sci. 336(1), 153–180 (2005)MathSciNetCrossRefMATHGoogle Scholar
  28. 28.
    Martens, W., Neven, F., Gyssens, M.: Typechecking top-down XML transformations Fixed input or output schemas. Inf. Comput. 206(7), 806–827 (2008)MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: MFCS, pp 889–900 (2004)Google Scholar
  30. 30.
    Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Mayer, A.J., Stockmeyer, L.J.: Word problems-this time with interleaving. Inf. Comput. 115(2), 293–311 (1994)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Miklau, G., Suciu, D.: Containment and equivalence for a fragment of XPath. J. ACM 51(1), 2–45 (2004)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Montazerian, M., Wood, P. T., Mousavi, S. R.: XPath query satisfiability is in PTIME for real-world DTDs XSym, pp 17–30 (2007)Google Scholar
  34. 34.
    Neven, F., Schwentick, T.: XML schemas without order (1999)Google Scholar
  35. 35.
    Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science 2 (3) (2006)Google Scholar
  36. 36.
    Oppen, D.C.: A \(2^{2^{2^{p_{n}}}}\) upper bound on the complexity of Presburger arithmetic. J. Comput. Syst. Sci. 16(3), 323–332 (1978)MathSciNetCrossRefMATHGoogle Scholar
  37. 37.
    Papakonstantinou, Y., Vianu, V.: DTD inference for views of XML data. In: PODS, pp 35–46 (2000)Google Scholar
  38. 38.
    Schaefer, T.J.: The complexity of satisfiability problems. In: STOC, pp 216–226 (1978)Google Scholar
  39. 39.
    Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., XMark, R. Busse.: A benchmark for XML data management VLDB, pp 974–985 (2002)Google Scholar
  40. 40.
    Schwentick, T.: Trees, automata and XML PODS, p 222 (2004)Google Scholar
  41. 41.
    Segoufin, L., Sirangelo, C.: Constant-memory validation of streaming XML documents against DTDs. In: ICDT, pp 299–313 (2007)Google Scholar
  42. 42.
    Segoufin, L., Vianu, V.: Validating streaming XML documents. In: PODS, pp 53–64 (2002)Google Scholar
  43. 43.
    Seidl, H., Schwentick, T., Muscholl, A.: Numerical document queries. In: PODS, pp 155–166 (2003)Google Scholar
  44. 44.
    Seidl, H., Schwentick, T., Muscholl, A.: Counting in trees. In: Logic and Automata, pp 575–612 (2008)Google Scholar
  45. 45.
    Staworko, S., Wieczorek, P.: Learning twig and path queries. In: ICDT, pp 140–154 (2012)Google Scholar
  46. 46.
    Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time Preliminary report STOC, pp 1–9 (1973)Google Scholar
  47. 47.
    W3C: XML Path language (XPath) 1.0 (1999)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of Lille, INRIAVilleneuve d’AscqFrance

Personalised recommendations