Foundations of XML Based on Logic and Automata: A Snapshot

  • Thomas Schwentick
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7153)


XML query and schema languages have some obvious connections to Formal Language Theory. For example, Document Type Definitions (DTDs) can be viewed as tree grammars and use regular expressions, XML Schemas resemble tree automata. Likewise, there are immediate links to Logic, e.g., through the classical characterization of regular tree languages by monadic second-order logic.

It is therefore not surprising that concepts from Logic and Formal Language Theory played an important role in the development of the theoretical foundations of XML query and schema languages. For example, they helped to analyze the expressiveness of languages, to understand the restrictions posed by the W3C standards, and to develop algorithms for various purposes, including efficient evaluation and static analysis.

However, methods from Logic and Formal Languages have not merely been applied to XML theory, the fertilization took place both ways. XML theory posed a lot of new challenges for Logic and Formal Language Theory and triggered various new research lines, e.g., the study of deterministic regular expressions and the development of automata for trees with data values.

The aim of the talk at FoIKS 2012 is to present some of the fundamental connections between XML query and schema languages and Logic and Formal Language Theory, to report on recent developments in the area, and to highlight some current directions of research. This accompanying paper is a kind of annotated bibliography for that talk.


Regular Expression Regular Language XPath Query Formal Language Theory Very Large Data Base 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aho, A.V., Ullman, J.D.: Translations on a context-free grammar. Information and Control 19(5), 439–475 (1971)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Balmin, A., Papakonstantinou, Y., Vianu, V.: Incremental validation of xml documents. ACM Trans. Database Syst. 29(4), 710–751 (2004)CrossRefGoogle Scholar
  3. 3.
    Barbosa, D., Mendelzon, A.O., Libkin, L., Mignet, L., Arenas, M.: Efficient incremental validation of XML documents. In: ICDE, pp. 671–682. IEEE Computer Society (2004)Google Scholar
  4. 4.
    Benedikt, M., Fan, W., Geerts, F.: XPath satisfiability in the presence of DTDs. J. ACM 55(2) (2008)Google Scholar
  5. 5.
    Benedikt, M., Koch, C.: Interpreting Tree-to-Tree Queries. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part II. LNCS, vol. 4052, pp. 552–564. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Benedikt, M., Koch, C.: From XQuery to relational logics. ACM Trans. Database Syst. 34(4) (2009)Google Scholar
  7. 7.
    Berstel, J., Boasson, L.: Formal properties of XML grammars and languages. Acta Inf. 38(9), 649–671 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Bex, G.J., Maneth, S., Neven, F.: A Formal Model for an Expressive Fragment of XSLT. In: Palamidessi, C., Moniz Pereira, L., Lloyd, J.W., Dahl, V., Furbach, U., Kerber, M., Lau, K.-K., Sagiv, Y., Stuckey, P.J. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 1137–1151. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Bex, G.J., Gelade, W., Martens, W., Neven, F.: Simplifying xml schema: effortless handling of nondeterministic regular expressions. In: Çetintemel, U., Zdonik, S.B., Kossmann, D., Tatbul, N. (eds.) SIGMOD Conference, pp. 731–744. ACM (2009)Google Scholar
  10. 10.
    Bex, G.J., Gelade, W., Neven, F., Vansummeren, S.: Learning deterministic regular expressions for the inference of schemas from XML data. TWEB 4(4) (2010)Google Scholar
  11. 11.
    Bex, G.J., Neven, F., Schwentick, T., Vansummeren, S.: Inference of concise regular expressions and DTDs. ACM Trans. Database Syst. 35(2) (2010)Google Scholar
  12. 12.
    Bex, G.J., Neven, F., Vansummeren, S.: Inferring XML schema definitions from XML data. In: VLDB, pp. 998–1009 (2007)Google Scholar
  13. 13.
    Bex, G.J., Neven, F., Vansummeren, S.: Schemascope: a system for inferring and cleaning xml schemas. In: Wang, J.T.-L. (ed.) SIGMOD Conference, pp. 1259–1262. ACM (2008)Google Scholar
  14. 14.
    Björklund, H., Bojańczyk, M.: Bounded Depth Data Trees. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 862–874. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Björklund, H., Gelade, W., Martens, W.: Incremental xpath evaluation. ACM Trans. Database Syst. 35(4), 29 (2010)CrossRefGoogle Scholar
  16. 16.
    Björklund, H., Schwentick, T.: On notions of regularity for data languages. Theor. Comput. Sci. 411(4-5), 702–715 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Bojanczyk, M.: Data monoids. In: Schwentick, Dürr [82], pp. 105–116Google Scholar
  18. 18.
    Bojanczyk, M., Colcombet, T.: Tree-walking automata do not recognize all regular languages. SIAM J. Comput. 38(2), 658–701 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Bojanczyk, M., Lasota, S.: An extension of data automata that captures xpath. In: LICS, pp. 243–252. IEEE Computer Society (2010)Google Scholar
  20. 20.
    Bojanczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees and xml reasoning. J. ACM 56(3) (2009)Google Scholar
  21. 21.
    Bojanczyk, M., Parys, P.: XPath evaluation in linear time. In: Lenzerini, M., Lembo, D. (eds.) PODS, pp. 241–250. ACM (2008)Google Scholar
  22. 22.
    Bojanczyk, M., Parys, P.: XPath evaluation in linear time. J. ACM 58(4), 17 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Bojańczyk, M., Samuelides, M., Schwentick, T., Segoufin, L.: Expressive Power of Pebble Automata. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006, Part I. LNCS, vol. 4051, pp. 157–168. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Bouajjani, A., Habermehl, P., Jurski, Y., Sighireanu, M.: Rewriting Systems with Data. In: Csuhaj-Varjú, E., Ésik, Z. (eds.) FCT 2007. LNCS, vol. 4639, pp. 1–22. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  25. 25.
    Bouyer, P., Petit, A., Therien, D.: An algebraic approach to data languages and timed languages. Inf. Comput. 182(2), 137–162 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Bray, T., Paoli, J., Sperberg-McQueen, C.M.: Extensible markup language (XML) working draft. Technical report, W3C (1996),
  27. 27.
    Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F.: Extensible markup language (XML) 1.0 (fifth edition). Technical report, W3C (2008),
  28. 28.
    Brüggemann-Klein, A., Wood, D.: Deterministic regular languages. Technical Report Bericht 38, Universität Freiburg, Institut für Informatik (1991)Google Scholar
  29. 29.
    Brüggemann-Klein, A., Wood, D.: One-unambiguous regular languages. Information and Computation 140(2), 229–253 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Brüggemann-Klein, A., Murata, M., Wood, D.: Regular tree languages over non-ranked alphabets (1998),
  31. 31.
    Büchi, J.R.: Weak second-order arithmetic and finite automata. Z. Math. Logik Grundl. Math. 6, 66–92 (1960)MathSciNetCrossRefzbMATHGoogle Scholar
  32. 32.
    Chamberlin, D., Clark, J., Florescu, D., Robie, J., Simeon, J., Stefanascu, M.: XQuery 1.0: An XML query language (2002),
  33. 33.
    Clark, J., Murata, M.: Relax ng specification. Technical report, OASIS Committee Specification Document (2001)Google Scholar
  34. 34.
    Clark, J.: XSL transformations version 1.0 (August 1999),
  35. 35.
    Colcombet, T., Ley, C., Puppis, G.: On the Use of Guards for Logics with Data. In: Murlak, F., Sankowski, P. (eds.) MFCS 2011. LNCS, vol. 6907, pp. 243–255. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  36. 36.
    World Wide Web Consortium. XML Path Language (XPath), Version 1.0. W3C Recommendation, November 16 (1999),
  37. 37.
    World Wide Web Consortium. XML Path Language (XPath), Version 2.0. W3C Recommendation,
  38. 38.
    World Wide Web Consortium. XML schema,
  39. 39.
    Demri, S., Lazic, R.: LTL with the freeze quantifier and register automata. ACM Trans. Comput. Log. 10(3) (2009)Google Scholar
  40. 40.
    Demri, S., Sangnier, A.: When Model-Checking Freeze ltl Over Counter Machines Becomes Decidable. In: Ong, L. (ed.) FOSSACS 2010. LNCS, vol. 6014, pp. 176–190. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  41. 41.
    Doner, J.: Tree acceptors and some of their applications. J. Comput. Syst. Sci. 4(5), 406–451 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Elgot, C.C.: Decision problems of finite automata design and related arithmetics. Transactions of The American Mathematical Society 98, 21–21 (1961)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Engelfriet, J., Hoogeboom, H.J.: Automata with nested pebbles capture first-order logic with transitive closure. Logical Methods in Computer Science 3(2) (2007)Google Scholar
  44. 44.
    Engelfriet, J., Hoogeboom, H.J., Samwel, B.: Xml transformation by tree-walking transducers with invisible pebbles. In: Libkin, L. (ed.) PODS, pp. 63–72. ACM (2007)Google Scholar
  45. 45.
    Engelfriet, J., Maneth, S.: Macro tree transducers, attribute grammars, and mso definable tree translations. Inf. Comput. 154(1), 34–91 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    Figueira, D.: Satisfiability of downward XPath with data equality tests. In: PODS, pp. 197–206 (2009)Google Scholar
  47. 47.
    Figueira, D.: Forward-XPath and extended register automata on data-trees. In: ICDT, pp. 231–241 (2010)Google Scholar
  48. 48.
    Figueira, D., Segoufin, L.: Bottom-up automata on data trees and vertical xpath. In: Schwentick, Dürr [82], pp. 93–104Google Scholar
  49. 49.
    Garofalakis, M.N., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: Xtract: Learning document type descriptors from xml document collections. Data Min. Knowl. Discov. 7(1), 23–56 (2003)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Van den Bussche, J., Bex, G.J., Neven, F.: DTDs versus XML schema: A practical study. In: International Workshop on the World Wide Web and Databases, pp. 79–84 (2004)Google Scholar
  51. 51.
    Gelade, W., Gyssens, M., Martens, W.: Regular Expressions with Counting: Weak versus Strong Determinism. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 369–381. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  52. 52.
    Gelade, W., Neven, F.: Succinctness of the complement and intersection of regular expressions. In: Albers, S., Weil, P. (eds.) STACS. LIPIcs, vol. 1, pp. 325–336. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Germany (2008)Google Scholar
  53. 53.
    Glushkov, V.M.: The abstract theory of automata. Russian Math. Surveys 16, 1–53 (1961)MathSciNetCrossRefzbMATHGoogle Scholar
  54. 54.
    Goris, E., Marx, M.: Looping caterpillars. In: LICS 2005 (2005)Google Scholar
  55. 55.
    Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing xpath queries. ACM Trans. Database Syst. 30(2), 444–491 (2005)CrossRefGoogle Scholar
  56. 56.
    Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing XML Streams with Deterministic Automata. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 173–189. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  57. 57.
    Kaminski, M., Francez, N.: Finite-memory automata. Theor. Comput. Sci. 134(2), 329–363 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  58. 58.
    Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: FluXQuery: An optimizing XQuery processor for streaming XML data. In: Proc. 30th International Conference on Very Large Data Bases (VLDB), Toronto, pp. 1309–1312 (2004)Google Scholar
  59. 59.
    Libkin, L.: Logics for unranked trees: An overview. Logical Methods in Computer Science 2(3) (2006)Google Scholar
  60. 60.
    Maneth, S., Berlea, A., Perst, T., Seidl, H.: XML type checking with macro tree transducers. In: PODS, pp. 283–294 (2005)Google Scholar
  61. 61.
    Maneth, S., Perst, T., Seidl, H.: Exact XML Type Checking in Polynomial Time. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 254–268. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  62. 62.
    Martens, W., Neven, F.: Frontiers of tractability for typechecking simple xml transformations. J. Comput. Syst. Sci. 73(3), 362–390 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  63. 63.
    Martens, W., Neven, F., Schwentick, T.: Simple off the shelf abstractions for XML schema. SIGMOD Record 36(3), 15–22 (2007)CrossRefGoogle Scholar
  64. 64.
    Martens, W., Neven, F., Schwentick, T.: Complexity of decision problems for XML schemas and chain regular expressions. SIAM J. Comput. 39(4), 1486–1530 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  65. 65.
    Martens, W., Neven, F., Schwentick, T., Bex, G.J.: Expressiveness and complexity of XML schema. ACM Trans. Database Syst. 31(3), 770–813 (2006)CrossRefGoogle Scholar
  66. 66.
    Marx, M., de Rijke, M.: Semantic characterizations of XPath. In: TDM 2004 Workshop on XML Databases and Information Retrieval, Twente, The Netherlands, June 21 (2004)Google Scholar
  67. 67.
    Marx, M.: Conditional xpath. ACM Trans. Database Syst. 30(4), 929–959 (2005)CrossRefGoogle Scholar
  68. 68.
    Marx, M.: Logical Foundations of XML and XQuery. In: Tessaris, S., Franconi, E., Eiter, T., Gutierrez, C., Handschuh, S., Rousset, M.-C., Schmidt, R.A. (eds.) Reasoning Web 2009. LNCS, vol. 5689, pp. 111–157. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  69. 69.
    Marx, M., de Rijke, M.: Semantic characterizations of navigational xpath. SIGMOD Record 34(2), 41–46 (2005)CrossRefGoogle Scholar
  70. 70.
    Milo, T., Suciu, D., Vianu, V.: Typechecking for xml transformers. J. Comput. Syst. Sci. 66(1), 66–97 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  71. 71.
    Møller, A., Schwartzbach, M.I.: The Design Space of Type Checkers for XML Transformation Languages. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, pp. 17–36. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  72. 72.
    Murata, M.: Forest-regular languages and tree-regular languages. Technical report, Fuji Xerox, Japan. Technical Report (1995)Google Scholar
  73. 73.
    Murata, M., Lee, D., Mani, M., Kawaguchi, K.: Taxonomy of XML schema languages using formal language theory. ACM Trans. Internet Techn. 5(4), 660–704 (2005)CrossRefGoogle Scholar
  74. 74.
    Neven, F., Schwentick, T.: Query automata over finite trees. Theoretical Computer Science 275, 633–674 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  75. 75.
    Neven, F.: Automata, logic, and XML. In: CSL, pp. 2–26 (2002)Google Scholar
  76. 76.
    Neven, F.: Automata theory for XML researchers. SIGMOD Record 31(3), 39–46 (2002)CrossRefGoogle Scholar
  77. 77.
    Neven, F., Schwentick, T.: Query automata over finite trees. Theoretical Computer Science 275(1-2), 633–674 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  78. 78.
    Neven, F., Schwentick, T.: On the power of tree-walking automata. Information and Control 183(1), 86–103 (2003)MathSciNetzbMATHGoogle Scholar
  79. 79.
    Neven, F., Schwentick, T.: On the complexity of XPath containment in the presence of disjunction, DTDs, and variables. Logical Methods in Computer Science 2(3) (2006)Google Scholar
  80. 80.
    Potthoff, A.: Logische Klassifizierung regulärer Baumsprachen. Doctor’s thesis, Institut für Informatik u. Prakt. Math., Universität Kiel (1994)Google Scholar
  81. 81.
    Schwentick, T.: Automata for XML - a survey. J. Comput. Syst. Sci. 73(3), 289–315 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  82. 82.
    Schwentick, T., Dürr, C. (eds.): 28th International Symposium on Theoretical Aspects of Computer Science, STACS 2011, Dortmund, Germany, March 10-12. LIPIcs, vol. 9. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2011)Google Scholar
  83. 83.
    Segoufin, L.: Automata and Logics for Words and Trees over an Infinite Alphabet. In: Ésik, Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 41–57. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  84. 84.
    Segoufin, L.: Static analysis of xml processing with data values. SIGMOD Record 36(1), 31–38 (2007)CrossRefGoogle Scholar
  85. 85.
    Segoufin, L., Vianu, V.: Validating streaming XML documents. In: PODS 2002, pp. 53–64 (2002)Google Scholar
  86. 86.
    ten Cate, B., Lutz, C.: The complexity of query containment in expressive fragments of xpath 2.0. J. ACM 56(6) (2009)Google Scholar
  87. 87.
    ten Cate, B., Marx, M.: Axiomatizing the logical core of xpath 2.0. Theory Comput. Syst. 44(4), 561–589 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  88. 88.
    ten Cate, B., Segoufin, L.: Transitive closure logic, nested tree walking automata, and xpath. J. ACM 57(3) (2010)Google Scholar
  89. 89.
    Thatcher, J.W., Wright, J.B.: Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory 2(1), 57–81 (1968)MathSciNetCrossRefzbMATHGoogle Scholar
  90. 90.
    Thomas, W.: Languages, Automata, and Logic. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. III, pp. 389–455. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  91. 91.
    Trakhtenbrot, B.A.: Finite automata and logic of monadic predicates. Doklady Akademii Nauk SSSR 140, 326–329 (1961)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Thomas Schwentick
    • 1
  1. 1.Technische Universität DortmundGermany

Personalised recommendations