Optimizing Schema Languages for XML: Numerical Constraints and Interleaving

  • Wouter Gelade
  • Wim Martens
  • Frank Neven
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4353)


The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and non-emptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of these problems. Most common schema languages for XML can be adequately modeled by some kind of grammar with regular expressions at right-hand sides. In this paper, we observe that apart from the usual regular operators of union, concatenation and Kleene-star, schema languages also allow numerical occurrence constraints and interleaving operators. Although the expressiveness of these operators remain within the regular languages, their presence or absence has significant impact on the complexity of the basic decision problems. We present a complete overview of the complexity of the basic decision problems for DTDs, XSDs and Relax NG with regular expressions incorporating numerical occurrence constraints and interleaving. We also discuss chain regular expressions and the complexity of the schema simplification problem incorporating the new operators.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco (1999)Google Scholar
  2. 2.
    Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. In: PODS 2005, pp. 25–36 (2005)Google Scholar
  3. 3.
    Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: VLDB 2006, pp. 115–126 (2006)Google Scholar
  4. 4.
    Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML schema: A practical study. In: WebDB 2004, pp. 79–84 (2004)Google Scholar
  5. 5.
    Brüggemann-Klein, A.: Unambiguity of extended regular expressions in SGML document grammars. In: Lengauer, T. (ed.) ESA 1993. LNCS, vol. 726, pp. 73–84. Springer, Heidelberg (1993)Google Scholar
  6. 6.
    Brüggemann-Klein, A., Murata, M., Wood, D.: Regular tree and regular hedge languages over unranked alphabets: Version 1 (April 3, 2001); Technical Report HKUST-TCSC-2001-0, The Hongkong University of Science and Technology (2001)Google Scholar
  7. 7.
    Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Information and Computation 142(2), 182–206 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Clark, J., Murata, M.: RELAX NG Specification. OASIS (December 2001)Google Scholar
  9. 9.
    Cristau, J., Löding, C., Thomas, W.: Deterministic automata on unranked trees. In: Liśkiewicz, M., Reischuk, R. (eds.) FCT 2005. LNCS, vol. 3623, pp. 68–79. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Dal-Zilio, S., Lugiez, D.: XML schema, tree logic and sheaves automata. In: RTA, pp. 246–263 (2003)Google Scholar
  11. 11.
    Deutsch, A., Fernandez, M.F., Suciu, D.: Storing Semistructured Data with STORED. In: SIGMOD 1999, pp. 431–442 (1999)Google Scholar
  12. 12.
    Fürer, M.: The complexity of the inequivalence problem for regular expressions with intersection. In: de Bakker, J.W., van Leeuwen, J. (eds.) ICALP 1980. LNCS, vol. 85, pp. 234–245. Springer, Heidelberg (1980)Google Scholar
  13. 13.
    Hemaspaandra, L., Ogihara, M.: Complexity Theory Companion. Springer, Heidelberg (2002)zbMATHGoogle Scholar
  14. 14.
    Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2001)zbMATHGoogle Scholar
  15. 15.
    Hosoya, H., Pierce, B.C.: XDuce: A statically typed XML processing language. ACM Trans. Inter. Tech. 3(2), 117–148 (2003)CrossRefGoogle Scholar
  16. 16.
    Jȩdrzejowicz, J., Szepietowski, A.: Shuffle languages are in P. Theoretical Computer Science 250(1-2), 31–53 (2001)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Kilpeläinen, P.: Inclusion of unambiguous #REs is NP-hard, University of Kuopio, Finland (May 2004) (unpublished note)Google Scholar
  18. 18.
    Kilpeläinen, P., Tuhkanen, R.: One-unambiguity of regular expressions with numeric occurrence indicators. Tech. Rep. A/2006/2, Univ. Kuopio, Finland (2006)Google Scholar
  19. 19.
    Kilpeläinen, P., Tuhkanen, R.: Towards efficient implementation of XML schema content models. In: DOCENG 2004, pp. 239–241. ACM Press, New York (2004)CrossRefGoogle Scholar
  20. 20.
    Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: VLDB 2004, pp. 228–239 (2004)Google Scholar
  21. 21.
    Kozen, D.: Lower bounds for natural proof systems. In: FOCS 1977, pp. 254–266. IEEE, Los Alamitos (1977)Google Scholar
  22. 22.
    Mani, M.: Keeping chess alive — Do we need 1-unambiguous content models? In: Extreme Markup Languages, Montreal, Canada (2001)Google Scholar
  23. 23.
    Manolescu, I., Florescu, D., Kossmann, D.: Answering XML Queries on Heterogeneous Data Sources. In: VLDB 2001, pp. 241–250 (2001)Google Scholar
  24. 24.
    Martens, W., Neven, F.: Frontiers of tractability for typechecking simple XML transformations. Journal of Computer and System Sciences (to appear, 2006)Google Scholar
  25. 25.
    Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 889–900. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  26. 26.
    Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Systems 31(3) (to appear, 2006)Google Scholar
  27. 27.
    Martens, W., Niehren, J.: Minimizing tree automata for unranked trees. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 232–246. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  28. 28.
    Mayer, A.J., Stockmeyer, L.J.: Word problems — this time with interleaving. Information and Computation 115(2), 293–311 (1994)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Murata, M., Lee, D., Mani, M., Kawaguchi, K.: Taxonomy of XML schema languages using formal language theory. ACM Trans. Inter. Tech. 5(4), 1–45 (2005)CrossRefGoogle Scholar
  30. 30.
    Neven, F., Schwentick, T.: XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science (to appear, 2006)Google Scholar
  31. 31.
    Papakonstantinou, Y., Vianu, V.: DTD inference for views of XML data. In: PODS 2000, pp. 35–46. ACM Press, New York (2000)CrossRefGoogle Scholar
  32. 32.
    Reuter, F.: An enhanced W3C XML Schema-based language binding for object oriented programming languages (2006) manuscriptGoogle Scholar
  33. 33.
    Seidl, H.: Deciding equivalence of finite tree automata. SIAM Journal on Computing 19(3), 424–437 (1990)zbMATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Seidl, H.: Haskell overloading is DEXPTIME-complete. Information Processing Letters 52(2), 57–60 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Sperberg-McQueen, C.M.: XML Schema 1.0: A language for document grammars. In: XML 2003 (2003)Google Scholar
  36. 36.
    Sperberg-McQueen, C.M., Thompson, H.: XML Schema (2005),
  37. 37.
    Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time: Preliminary report. In: STOC 1973, pp. 1–9. ACM Press, New York (1973)CrossRefGoogle Scholar
  38. 38.
    van der Vlist, E.: XML Schema. O’Reilly, Sebastopol (2002)zbMATHGoogle Scholar
  39. 39.
    van Emde Boas, P.: The convenience of tilings. In: Complexity, Logic and Recursion Theory. Lec. Notes in Pure and App. Math., vol. 187, pp. 331–363 (1997)Google Scholar
  40. 40.
    Wang, G., Liu, M., Yu, J.X., Sun, B., Yu, G., Lv, J., Lu, H.: Effective schema-based XML query optimization techniques. In: IDEAS 2003, pp. 230–235 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wouter Gelade
    • 1
  • Wim Martens
    • 1
  • Frank Neven
    • 1
  1. 1.School for Information TechnologyHasselt University and Transnational University of Limburg 

Personalised recommendations