Optimizing Schema Languages for XML: Numerical Constraints and Interleaving
- 12 Citations
- 598 Downloads
Abstract
The presence of a schema offers many advantages in processing, translating, querying, and storage of XML data. Basic decision problems like equivalence, inclusion, and non-emptiness of intersection of schemas form the basic building blocks for schema optimization and integration, and algorithms for static analysis of transformations. It is thereby paramount to establish the exact complexity of these problems. Most common schema languages for XML can be adequately modeled by some kind of grammar with regular expressions at right-hand sides. In this paper, we observe that apart from the usual regular operators of union, concatenation and Kleene-star, schema languages also allow numerical occurrence constraints and interleaving operators. Although the expressiveness of these operators remain within the regular languages, their presence or absence has significant impact on the complexity of the basic decision problems. We present a complete overview of the complexity of the basic decision problems for DTDs, XSDs and Relax NG with regular expressions incorporating numerical occurrence constraints and interleaving. We also discuss chain regular expressions and the complexity of the schema simplification problem incorporating the new operators.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Abiteboul, S., Buneman, P., Suciu, D.: Data on the Web: From Relations to Semistructured Data and XML. Morgan Kaufmann, San Francisco (1999)Google Scholar
- 2.Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. In: PODS 2005, pp. 25–36 (2005)Google Scholar
- 3.Bex, G.J., Neven, F., Schwentick, T., Tuyls, K.: Inference of concise DTDs from XML data. In: VLDB 2006, pp. 115–126 (2006)Google Scholar
- 4.Bex, G.J., Neven, F., Van den Bussche, J.: DTDs versus XML schema: A practical study. In: WebDB 2004, pp. 79–84 (2004)Google Scholar
- 5.Brüggemann-Klein, A.: Unambiguity of extended regular expressions in SGML document grammars. In: Lengauer, T. (ed.) ESA 1993. LNCS, vol. 726, pp. 73–84. Springer, Heidelberg (1993)Google Scholar
- 6.Brüggemann-Klein, A., Murata, M., Wood, D.: Regular tree and regular hedge languages over unranked alphabets: Version 1 (April 3, 2001); Technical Report HKUST-TCSC-2001-0, The Hongkong University of Science and Technology (2001)Google Scholar
- 7.Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Information and Computation 142(2), 182–206 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
- 8.Clark, J., Murata, M.: RELAX NG Specification. OASIS (December 2001)Google Scholar
- 9.Cristau, J., Löding, C., Thomas, W.: Deterministic automata on unranked trees. In: Liśkiewicz, M., Reischuk, R. (eds.) FCT 2005. LNCS, vol. 3623, pp. 68–79. Springer, Heidelberg (2005)CrossRefGoogle Scholar
- 10.Dal-Zilio, S., Lugiez, D.: XML schema, tree logic and sheaves automata. In: RTA, pp. 246–263 (2003)Google Scholar
- 11.Deutsch, A., Fernandez, M.F., Suciu, D.: Storing Semistructured Data with STORED. In: SIGMOD 1999, pp. 431–442 (1999)Google Scholar
- 12.Fürer, M.: The complexity of the inequivalence problem for regular expressions with intersection. In: de Bakker, J.W., van Leeuwen, J. (eds.) ICALP 1980. LNCS, vol. 85, pp. 234–245. Springer, Heidelberg (1980)Google Scholar
- 13.Hemaspaandra, L., Ogihara, M.: Complexity Theory Companion. Springer, Heidelberg (2002)zbMATHGoogle Scholar
- 14.Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation, 2nd edn. Addison-Wesley, Reading (2001)zbMATHGoogle Scholar
- 15.Hosoya, H., Pierce, B.C.: XDuce: A statically typed XML processing language. ACM Trans. Inter. Tech. 3(2), 117–148 (2003)CrossRefGoogle Scholar
- 16.Jȩdrzejowicz, J., Szepietowski, A.: Shuffle languages are in P. Theoretical Computer Science 250(1-2), 31–53 (2001)CrossRefMathSciNetGoogle Scholar
- 17.Kilpeläinen, P.: Inclusion of unambiguous #REs is NP-hard, University of Kuopio, Finland (May 2004) (unpublished note)Google Scholar
- 18.Kilpeläinen, P., Tuhkanen, R.: One-unambiguity of regular expressions with numeric occurrence indicators. Tech. Rep. A/2006/2, Univ. Kuopio, Finland (2006)Google Scholar
- 19.Kilpeläinen, P., Tuhkanen, R.: Towards efficient implementation of XML schema content models. In: DOCENG 2004, pp. 239–241. ACM Press, New York (2004)CrossRefGoogle Scholar
- 20.Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: VLDB 2004, pp. 228–239 (2004)Google Scholar
- 21.Kozen, D.: Lower bounds for natural proof systems. In: FOCS 1977, pp. 254–266. IEEE, Los Alamitos (1977)Google Scholar
- 22.Mani, M.: Keeping chess alive — Do we need 1-unambiguous content models? In: Extreme Markup Languages, Montreal, Canada (2001)Google Scholar
- 23.Manolescu, I., Florescu, D., Kossmann, D.: Answering XML Queries on Heterogeneous Data Sources. In: VLDB 2001, pp. 241–250 (2001)Google Scholar
- 24.Martens, W., Neven, F.: Frontiers of tractability for typechecking simple XML transformations. Journal of Computer and System Sciences (to appear, 2006)Google Scholar
- 25.Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for simple regular expressions. In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 889–900. Springer, Heidelberg (2004)CrossRefGoogle Scholar
- 26.Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Systems 31(3) (to appear, 2006)Google Scholar
- 27.Martens, W., Niehren, J.: Minimizing tree automata for unranked trees. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 232–246. Springer, Heidelberg (2005)CrossRefGoogle Scholar
- 28.Mayer, A.J., Stockmeyer, L.J.: Word problems — this time with interleaving. Information and Computation 115(2), 293–311 (1994)CrossRefMathSciNetGoogle Scholar
- 29.Murata, M., Lee, D., Mani, M., Kawaguchi, K.: Taxonomy of XML schema languages using formal language theory. ACM Trans. Inter. Tech. 5(4), 1–45 (2005)CrossRefGoogle Scholar
- 30.Neven, F., Schwentick, T.: XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science (to appear, 2006)Google Scholar
- 31.Papakonstantinou, Y., Vianu, V.: DTD inference for views of XML data. In: PODS 2000, pp. 35–46. ACM Press, New York (2000)CrossRefGoogle Scholar
- 32.Reuter, F.: An enhanced W3C XML Schema-based language binding for object oriented programming languages (2006) manuscriptGoogle Scholar
- 33.Seidl, H.: Deciding equivalence of finite tree automata. SIAM Journal on Computing 19(3), 424–437 (1990)zbMATHCrossRefMathSciNetGoogle Scholar
- 34.Seidl, H.: Haskell overloading is DEXPTIME-complete. Information Processing Letters 52(2), 57–60 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
- 35.Sperberg-McQueen, C.M.: XML Schema 1.0: A language for document grammars. In: XML 2003 (2003)Google Scholar
- 36.Sperberg-McQueen, C.M., Thompson, H.: XML Schema (2005), http://www.w3.org/XML/Schema
- 37.Stockmeyer, L.J., Meyer, A.R.: Word problems requiring exponential time: Preliminary report. In: STOC 1973, pp. 1–9. ACM Press, New York (1973)CrossRefGoogle Scholar
- 38.van der Vlist, E.: XML Schema. O’Reilly, Sebastopol (2002)zbMATHGoogle Scholar
- 39.van Emde Boas, P.: The convenience of tilings. In: Complexity, Logic and Recursion Theory. Lec. Notes in Pure and App. Math., vol. 187, pp. 331–363 (1997)Google Scholar
- 40.Wang, G., Liu, M., Yu, J.X., Sun, B., Yu, G., Lv, J., Lu, H.: Effective schema-based XML query optimization techniques. In: IDEAS 2003, pp. 230–235 (2003)Google Scholar