Skip to main content

Regular Query Techniques for XML-Documents

  • Chapter
  • First Online:
Linguistic Modeling of Information and Markup Languages

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 41))

  • 732 Accesses

Abstract

In this contribution we propose a query method for XML documents that provides a well chosen balance between expressive power of the query language and query complexity using methods derived from logic. Since XML documents are basically regular tree languages, it is appealing to use monadic second-order logic as a query language. But MSO is incapable of querying secondary relations in XML documents introduced via the ID-IDREF mechanism. We therefore show how a well-defined subclass of these ID-IDREF pairs can be queried using MSO, signature translations, and MSO-definable transductions. The ID-IDREF pairs will be coded by linear context-free tree grammars. And any query result is intersected with the coding of the ID-IDREF pairs to ensure only those matches are retained that respect the ID-IDREF informations contained in the document. The advantage of this method is that it uses regular techniques only. In consequence every query is computable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Berglund, Anders, Boag, Scott, Chamberlin, Don, Fernandez, Mary, Kay, Michael, Robie, Jonathan, and Siméon, Jérôme (2005). XML path language (XPath) 2.0. Technical report, W3C. Working draft.

    Google Scholar 

  • Bloem, Roderick and Engelfriet, Joost (1997). Characterization of properties and relations defined in Monadic Second Order logic on the nodes of trees. Technical Report 97-03, Dept. of Computer Science, Leiden University.

    Google Scholar 

  • Boag, Scott, Chamberlin, Don, Fernández, Mary, Florescu, Daniela, Robie, Jonathan, and Siméon, Jérôme (2005). XQuery 1.0: An XML query language. Technical report, W3C. Working draft.

    Google Scholar 

  • Courcelle, Bruno (1997). The expression of graph properties and graph transformations in monadic second-order logic. In Rozenberg, Grzegorz, editor, Handbook of Graph Grammars and Computing by Graph Transformation, pages 313–400. World Scientific Publishing, Singapore.

    Chapter  Google Scholar 

  • Doner, John (1970). Tree acceptors and some of their applications. Journal of Computer and System Sciences, 4:406–451.

    Article  MATH  MathSciNet  Google Scholar 

  • Engelfriet, Joost and Schmidt, Erik Meineche (1977). IO and OI. I. Journal of Computer and System Sciences, 15(3):328–353.

    Article  MATH  MathSciNet  Google Scholar 

  • Engelfriet, Joost and Schmidt, Erik Meineche (1978). IO and OI. II. Journal of Computer and System Sciences, 16(1):67–99.

    Article  MATH  MathSciNet  Google Scholar 

  • Gécseg, Ferenc and Steinby, Magnus (1984). Tree Automata. Akademiai Kiado, Budapest.

    Google Scholar 

  • Gécseg, Ferenc and Steinby, Magnus (1997). Tree languages. In Rozenberg, Grzegorz and Salomaa, Arto, editors, Handbook of Formal Languages, Vol 3: Beyond Words, pages 1–68. Springer-Verlag, Berlin.

    Google Scholar 

  • Gottlob, Georg and Koch, Christoph (2004). Monadic datalog and the expressive power of languages for Web information extraction. Journal of the ACM, 51(1):74–113.

    Article  MathSciNet  Google Scholar 

  • Huybregts, Riny (1984). The weak inadequacy of context–free phrase structure grammars. In de Haan, Ger J. Trommelen, Mieke, and Zonneveld, Wim, editors, Van Periferie naar Kern, pages 81–99. Foris, Dordrecht.

    Google Scholar 

  • Kay, Michael (2005). XSL Transformations (XSLT), version 2.0. Technical report, W3C. Working draft.

    Google Scholar 

  • Kepser, Stephan (2004). A simple proof of the Turing-completeness of XSLT and XQuery. In Usdin, B. Tommie, editor, Extreme Markup Languages 2004.

    Google Scholar 

  • Kepser, Stephan and Mönnich, Uwe (2006). Closure properties of linear context-free tree languages with an application to optimality theory. Theoretical Computer Science, 354(1): 82–97.

    Article  MATH  MathSciNet  Google Scholar 

  • Kolb, Hans-Peter, Michaelis, Jens, Mönnich, Uwe, and Morawietz, Frank (2003). An operational and denotational approach to non-context-freeness. Theoretical Computer Science, 293: 261–289.

    Article  MATH  MathSciNet  Google Scholar 

  • Kolb, Hans-Peter, Mönnich, Uwe, and Morawietz, Frank (2000). Descriptions of cross-serial dependencies. Grammars, 3(2/3):189–216.

    Article  MATH  MathSciNet  Google Scholar 

  • Maibaum, Thomas S. E. (1974). A generalized approach to formal languages. Journal of Computer and System Sciences, 8(3):409–439.

    Article  MATH  MathSciNet  Google Scholar 

  • Maibaum, Thomas S. E. (1977). Erratum: A generalized approach to formal languages. Journal of Computer and System Sciences, 14(3):369.

    Article  MathSciNet  Google Scholar 

  • Marx, Maarten (2004). XPath with conditional axis relations. In Bertino, Elisa, Christodoulakis, Stavros, Plexousakis, Dimitris, Christophides, Vassilis, Koubarakis, Manolis, Böhm, Klemens, and Ferrari, Elena, editors, Advances in Database Technology – EDBT 2004, volume LNCS 2992, pages 477–494. Springer.

    Google Scholar 

  • Michaelis, Jens, Mönnich, Uwe, and Morawietz, Frank (2001). On minimalist attribute grammars and macro tree transducers. In Rohrer, Christian, Roßdeutscher, Antje, and Kamp, Hans, editors, Linguistic Form and its Computation, pages 287–326. CSLI.

    Google Scholar 

  • Mönnich, Uwe (1999). On cloning contextfreeness. In Kolb, Hans-Peter and Mönnich, Uwe, editors, The Mathematics of Syntactic Structure, pages 195–229. Mouton de Gruyter.

    Google Scholar 

  • Morawietz, Frank and Mönnich, Uwe (2001). A model-theoretic description of tree adjoining grammars. In Kruiff, Geert-Jan, Moss, Larry, and Oehrle, Richard, editors, Proceedings FG-MOL 2001, volume 53 of ENTCS. Kluwer.

    Google Scholar 

  • Rogers, James (1998). A Descriptive Approach to Language-Theoretic Complexity. CSLI Publications, Stanford.

    Google Scholar 

  • Shieber, Stuart (1985). Evidence against the context-freeness of natural language. Linguistics and Philosophy, 8:333–343.

    Article  Google Scholar 

  • Thatcher, James W. and Wright, Jesse B. (1968). Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory, 2(1):57–81.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This research was supported by a DFG grant (SFB 441).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Kepser .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Kepser, S., Mönnich, U., Morawietz, F. (2010). Regular Query Techniques for XML-Documents. In: Witt, A., Metzing, D. (eds) Linguistic Modeling of Information and Markup Languages. Text, Speech and Language Technology, vol 41. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3331-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-3331-4_13

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-3330-7

  • Online ISBN: 978-90-481-3331-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics