Querying Linguistic Trees



Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in nature, and have a hierarchical structure. Query languages are used to select particular structures of interest, or to project out large slices of a corpus for external analysis. Existing languages suffer from a variety of problems in the areas of expressiveness, efficiency, and naturalness for linguistic query. We describe the domain of linguistic trees and discuss the expressive requirements for a query language. Then we present a language that can express a wide range of queries over these trees, and show that the language is first-order complete over trees.


Linguistic databases Treebank Tree query XPath Annotation First order logic 


  1. Afanasiev, L. (2003). XML query evaluation via CTL model checking. Master’s thesis, University of Amsterdam, ILLC Scientific Publications, MoL-2003-07.Google Scholar
  2. Alechina N., Immerman N. (2000) Reachability logic: An efficient fragment of transitive closure logic. Logic Journal of the IGPL 8(3): 325–337CrossRefGoogle Scholar
  3. Berwick, R. C., & Weinberg, A. S. (1984). The grammatical basis of linguistic performance: Language use and acquisition, Vol. 11 of Current studies in linguistics. Cambridge, Mass: MIT Press.Google Scholar
  4. Bird, S., Chen, Y., Davidson, S., Lee, H., & Zheng, Y. (2006). Designing and evaluating an XPath dialect for linguistic queries. In 22nd International Conference on Data Engineering (ICDE) (pp. 52–61).Google Scholar
  5. Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. O’Reilly Media Inc. http://www.nltk.org/.
  6. Bird, S., & Lee, H. (2007). Graphical query for linguistic treebanks. In 10th Conference of the Pacific Association for Computational Linguistics (pp. 22–30).Google Scholar
  7. Bird S., Liberman M. (2001) A formal framework for linguistic annotation. Speech Communication 33: 23–60CrossRefGoogle Scholar
  8. Blackburn P., de Rijke M., Venema Y. (2001) Modal logic. Cambridge University Press., New York, NY, USAGoogle Scholar
  9. Blackburn, P., Meyer-Viol, W., & de Rijke, M. (1996). A proof system for finite trees. In H. K. Büning, (Ed.), Computer science logic, Vol. 1092 of Lecture Notes in Computer Science (pp. 86–105). Springer.Google Scholar
  10. Cassidy, S. (2002). XQuery as an annotation query language: A use case analysis. In Proceedings of LREC 2002, Las Palmas, Spain, May.Google Scholar
  11. Cassidy, S., & Bird, S. (2000). Querying databases of annotated speech. In Database technologies: Proceedings of the Eleventh Australasian Database Conference (pp. 12–20).Google Scholar
  12. Chomsky N. (1981) Lectures on government and binding. Foris, DordrechtGoogle Scholar
  13. Clark, J., & DeRose, S. (1999). XML Path language (XPath). W3C. http://www.w3.org/TR/xpath
  14. Gottlob, G., Koch, C., & Pichler, R. (2003). The complexity of XPath query evaluation. In Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS (pp. 179–190). San Diego, CA, USA.Google Scholar
  15. Gottlob, G., Koch, C., & Schulz, K. U. (2004). Conjunctive queries over trees. In Proceedings of the Twenty-Third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 189–200). Paris, France.Google Scholar
  16. Harel, D., Kozen, D., & Tiuryn, J. (2002). Dynamic logic. In D. Gabbay & F. Guenthner (Eds.), Handbook of philosophical logic (Vol 4., 2nd ed., pp. 99–217). Dordrecht: Kluwer Academic Publishers.Google Scholar
  17. Heid, U., Voormann, H., Milde, J.-T., Gut, U., Erk, K., & Pado, S. (2004). Querying both time-aligned and hierarchical corpora with NXT Search. In Fourth Language Resources and Evaluation Conference, Lisbon, Portugal.Google Scholar
  18. Henriksen, J., Jensen, J., Jørgensen, M., Klarlund, N., Paige, B., Rauhe, T., & Sandholm, A. (1995). Mona: Monadic second-order logic in practice. In Tools and Algorithms for the Construction and Analysis of Systems, First International Workshop, TACAS ’95, LNCS 1019.Google Scholar
  19. Hinrichs, E. W., Bartels, J., Kawata, Y., & Kordoni, V. (2000). The VERBMOBIL treebanks. In KONVENS 2000 Sprachkommunikation, ITG-Fachbericht 161 (pp. 107–112).Google Scholar
  20. Hoeksema, J. & Janda, R. D. (1988). Implications of process-morphology for categorial grammar. In R. T. Oehrle, E. Bach, & D. Wheeler (Eds.), Categorial grammars and natural language structures. Dordrecht: D. Reidel.Google Scholar
  21. Kamp, J. (1968). Tense logic and the theory of order. Ph.D. thesis, University of California, Los Angeles.Google Scholar
  22. Kepser, S. (2003). Finite structure query: A tool for querying syntactically annotated corpora. In EACL 2003: The 10th Conference of the European Chapter of the Association for Computational Linguistics (pp. 179–186).Google Scholar
  23. Kepser, S. (2006). Properties of binary transitive closure logic over trees. In P. Monachesi, G. Penn, G. Satta, & S. Wintner (Eds.), Formal grammar 2006 (pp. 77–89). CSLI Publications.Google Scholar
  24. König, E. & Lezius, W. (2001). The TIGER language – A description language for syntax graphs. Part 1: User’s guidelines. Technical report, University of Stuttgart, Stuttgart, Germany.Google Scholar
  25. Kracht, M. (1997). Inessential features, Vol. 1328 of Lecture Notes in Artificial Intelligence (pp. 43–62). Berlin: Springer.Google Scholar
  26. Lai, C. (2005). A formal framework for linguistic tree query. Master’s thesis, Department of Computer Science and Software Engineering, University of Melbourne, Australia.Google Scholar
  27. Lai, C., & Bird, S. (2004). Querying and updating treebanks: A critical survey and requirements analysis. In Proceedings of the Australasian Language Technology Workshop (pp. 139–146).Google Scholar
  28. Libkin L. (1998) Elements of finite model theory. Springer-Verlag, BerlinGoogle Scholar
  29. Marcus M.P., Santorini B., Marcinkiewicz M.A. (1994) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2): 313–330Google Scholar
  30. Marx M. (2005) Conditional XPath. ACM Transactions on Database Systems 30(4): 929–959CrossRefGoogle Scholar
  31. Marx, M. (2005b). First order paths in ordered trees. In T. Eiter & L. Libkin (Eds.), Database theory – ICDT 2005, 10th International Conference, Edinburgh, UK, January 5–7, 2005. Proceedings, Vol. 3363 of Lecture Notes in Computer Science (pp. 114–128).Google Scholar
  32. Marx, M., & de Rijke, M. (2004). Semantic characterization of navigational XPath. In Proceedings of TDM’04 Workshop on XML Databases and Information Retrieval. The Netherlands: Twente.Google Scholar
  33. Maryns, H., & Kepser, S. (2008). MonaSearch – A tool for querying linguistic treebanks. http://tcl.sfs.uni-tuebingen.de/MonaSearch/.
  34. Mönnich, U., Morawietz, F., & Kepser, S. (2001). A regular query for context-sensitive relations. In IRCS Workshop on Linguistic Databases 2001 (pp. 187–195).Google Scholar
  35. Palm, A. (1999). Propositional tense logic for trees. In Proceedings of the Sixth Meeting on Mathematics of Language: MOL6. University of Central Florida, Orlando, Florida.Google Scholar
  36. Randall, B. (2008). CorpusSearch 2 users guide. http://corpussearch.sourceforge.net/CS-manual/Contents.html.
  37. Rogers, J. (1994). Studies in the logic of trees with applications to grammar formalisms. Technical Report 95-04, Department of Computer & Information Sciences, University of Delaware, Newark, Delaware.Google Scholar
  38. Rohde, D. (2001). TGrep2 user manual. http://tedlab.mit.edu/dr/Tgrep2/tgrep2.pdf.
  39. Schlingloff, B.H. (1992). On the expressive power of modal logics on trees. In Proceedings of the Second International Symposium on Logical Foundations of Computer Science, Springer LNCS 620 (pp. 441–451).Google Scholar
  40. Shieber S.M. (1985) Evidence against the context-freeness of natural language. Linguistics and Philosophy 8(3): 333–343CrossRefGoogle Scholar
  41. Steiner, I., & Kallmeyer, L. (2002). VIQTORYA – A visual query tool for syntactically annotated corpora. In Proceedings of the Third International Conference on Language Resources and Evaluation (LREC 2002) (pp. 1704–1711), ELRA.Google Scholar
  42. Tiede H.J. (2008) Inessential features, ineliminable features, and modal logics for model theoretic syntax. Journal of Logic, Language and Information 17(2): 217–227CrossRefGoogle Scholar
  43. Tiede H.J., Kepser S. (2006) Monadic second-order logic and transitive closure logics over trees. Electronic Notes in Theoretical Computer Science 165: 189–199Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  1. 1.Department of LinguisticsUniversity of PennsylvaniaPhiladelphiaUSA
  2. 2.Department of Computer Science and Software EngineeringUniversity of MelbourneMelbourneAustralia
  3. 3.Linguistic Data ConsortiumUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations