Querying Linguistic Trees


DOI: 10.1007/s10849-009-9086-9

Cite this article as:
Lai, C. & Bird, S. J of Log Lang and Inf (2010) 19: 53. doi:10.1007/s10849-009-9086-9


Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in nature, and have a hierarchical structure. Query languages are used to select particular structures of interest, or to project out large slices of a corpus for external analysis. Existing languages suffer from a variety of problems in the areas of expressiveness, efficiency, and naturalness for linguistic query. We describe the domain of linguistic trees and discuss the expressive requirements for a query language. Then we present a language that can express a wide range of queries over these trees, and show that the language is first-order complete over trees.


Linguistic databases Treebank Tree query XPath Annotation First order logic 

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  1. 1.Department of LinguisticsUniversity of PennsylvaniaPhiladelphiaUSA
  2. 2.Department of Computer Science and Software EngineeringUniversity of MelbourneMelbourneAustralia
  3. 3.Linguistic Data ConsortiumUniversity of PennsylvaniaPhiladelphiaUSA