Abstract
In this contribution we propose a query method for XML documents that provides a well chosen balance between expressive power of the query language and query complexity using methods derived from logic. Since XML documents are basically regular tree languages, it is appealing to use monadic second-order logic as a query language. But MSO is incapable of querying secondary relations in XML documents introduced via the ID-IDREF mechanism. We therefore show how a well-defined subclass of these ID-IDREF pairs can be queried using MSO, signature translations, and MSO-definable transductions. The ID-IDREF pairs will be coded by linear context-free tree grammars. And any query result is intersected with the coding of the ID-IDREF pairs to ensure only those matches are retained that respect the ID-IDREF informations contained in the document. The advantage of this method is that it uses regular techniques only. In consequence every query is computable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Berglund, Anders, Boag, Scott, Chamberlin, Don, Fernandez, Mary, Kay, Michael, Robie, Jonathan, and Siméon, Jérôme (2005). XML path language (XPath) 2.0. Technical report, W3C. Working draft.
Bloem, Roderick and Engelfriet, Joost (1997). Characterization of properties and relations defined in Monadic Second Order logic on the nodes of trees. Technical Report 97-03, Dept. of Computer Science, Leiden University.
Boag, Scott, Chamberlin, Don, Fernández, Mary, Florescu, Daniela, Robie, Jonathan, and Siméon, Jérôme (2005). XQuery 1.0: An XML query language. Technical report, W3C. Working draft.
Courcelle, Bruno (1997). The expression of graph properties and graph transformations in monadic second-order logic. In Rozenberg, Grzegorz, editor, Handbook of Graph Grammars and Computing by Graph Transformation, pages 313–400. World Scientific Publishing, Singapore.
Doner, John (1970). Tree acceptors and some of their applications. Journal of Computer and System Sciences, 4:406–451.
Engelfriet, Joost and Schmidt, Erik Meineche (1977). IO and OI. I. Journal of Computer and System Sciences, 15(3):328–353.
Engelfriet, Joost and Schmidt, Erik Meineche (1978). IO and OI. II. Journal of Computer and System Sciences, 16(1):67–99.
Gécseg, Ferenc and Steinby, Magnus (1984). Tree Automata. Akademiai Kiado, Budapest.
Gécseg, Ferenc and Steinby, Magnus (1997). Tree languages. In Rozenberg, Grzegorz and Salomaa, Arto, editors, Handbook of Formal Languages, Vol 3: Beyond Words, pages 1–68. Springer-Verlag, Berlin.
Gottlob, Georg and Koch, Christoph (2004). Monadic datalog and the expressive power of languages for Web information extraction. Journal of the ACM, 51(1):74–113.
Huybregts, Riny (1984). The weak inadequacy of context–free phrase structure grammars. In de Haan, Ger J. Trommelen, Mieke, and Zonneveld, Wim, editors, Van Periferie naar Kern, pages 81–99. Foris, Dordrecht.
Kay, Michael (2005). XSL Transformations (XSLT), version 2.0. Technical report, W3C. Working draft.
Kepser, Stephan (2004). A simple proof of the Turing-completeness of XSLT and XQuery. In Usdin, B. Tommie, editor, Extreme Markup Languages 2004.
Kepser, Stephan and Mönnich, Uwe (2006). Closure properties of linear context-free tree languages with an application to optimality theory. Theoretical Computer Science, 354(1): 82–97.
Kolb, Hans-Peter, Michaelis, Jens, Mönnich, Uwe, and Morawietz, Frank (2003). An operational and denotational approach to non-context-freeness. Theoretical Computer Science, 293: 261–289.
Kolb, Hans-Peter, Mönnich, Uwe, and Morawietz, Frank (2000). Descriptions of cross-serial dependencies. Grammars, 3(2/3):189–216.
Maibaum, Thomas S. E. (1974). A generalized approach to formal languages. Journal of Computer and System Sciences, 8(3):409–439.
Maibaum, Thomas S. E. (1977). Erratum: A generalized approach to formal languages. Journal of Computer and System Sciences, 14(3):369.
Marx, Maarten (2004). XPath with conditional axis relations. In Bertino, Elisa, Christodoulakis, Stavros, Plexousakis, Dimitris, Christophides, Vassilis, Koubarakis, Manolis, Böhm, Klemens, and Ferrari, Elena, editors, Advances in Database Technology – EDBT 2004, volume LNCS 2992, pages 477–494. Springer.
Michaelis, Jens, Mönnich, Uwe, and Morawietz, Frank (2001). On minimalist attribute grammars and macro tree transducers. In Rohrer, Christian, Roßdeutscher, Antje, and Kamp, Hans, editors, Linguistic Form and its Computation, pages 287–326. CSLI.
Mönnich, Uwe (1999). On cloning contextfreeness. In Kolb, Hans-Peter and Mönnich, Uwe, editors, The Mathematics of Syntactic Structure, pages 195–229. Mouton de Gruyter.
Morawietz, Frank and Mönnich, Uwe (2001). A model-theoretic description of tree adjoining grammars. In Kruiff, Geert-Jan, Moss, Larry, and Oehrle, Richard, editors, Proceedings FG-MOL 2001, volume 53 of ENTCS. Kluwer.
Rogers, James (1998). A Descriptive Approach to Language-Theoretic Complexity. CSLI Publications, Stanford.
Shieber, Stuart (1985). Evidence against the context-freeness of natural language. Linguistics and Philosophy, 8:333–343.
Thatcher, James W. and Wright, Jesse B. (1968). Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory, 2(1):57–81.
Acknowledgments
This research was supported by a DFG grant (SFB 441).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Kepser, S., Mönnich, U., Morawietz, F. (2010). Regular Query Techniques for XML-Documents. In: Witt, A., Metzing, D. (eds) Linguistic Modeling of Information and Markup Languages. Text, Speech and Language Technology, vol 41. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3331-4_13
Download citation
DOI: https://doi.org/10.1007/978-90-481-3331-4_13
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-3330-7
Online ISBN: 978-90-481-3331-4
eBook Packages: Computer ScienceComputer Science (R0)