Abstract
We introduce LTAG-spinal, a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. Unlike in traditional LTAG, subcategorization frames and the argument–adjunct distinction are left underspecified in LTAG-spinal. LTAG-spinal with adjunction constraints is weakly equivalent to LTAG. The LTAG-spinal formalism is used to extract an LTAG-spinal Treebank from the Penn Treebank with Propbank annotation. Based on Propbank annotation, predicate coordination and LTAG adjunction structures are successfully extracted. The LTAG-spinal Treebank makes explicit semantic relations that are implicit or absent from the original PTB. LTAG-spinal provides a very desirable resource for statistical LTAG parsing, incremental parsing, dependency parsing, and semantic parsing. This treebank has been successfully used to train an incremental LTAG-spinal parser and a bidirectional LTAG dependency parser.
Similar content being viewed by others
Notes
By non-predicate adjuncts, we mean those auxiliary trees whose foot node does not subcategorize for the anchor; these are essentially modifier trees. LTAG also uses auxiliary trees to model phenomena other than non-predicate adjuncts. Examples are raising verbs and parentheticals. In going from LTAG to LTAG-spinal, we do not change the analysis of these phenomena. See Sect. 4 for further discussion.
Most recently, subsets of the PTB and Propbank have been reconciliated by hand (Babko-Malaya et al. 2006; Yi 2007). Our own extraction process was carried out automatically before that data became available and covers the entire PTB and Propbank. To a certain extent, it has been informed by that ongoing work.
We treat conjoining as if it were a distinct operation. Theoretically, though, conjoining can be seen as a special case of the attachment operation. This is somewhat similar to traditional LTAG, where substitution is a distinct operation but can be seen as a special case of adjunction. Indeed, historically the first definition of TAG does not refer to substitution at all (Joshi et al. 1975).
For a general reference for the use of LTAGs for linguistic description, see Frank (2002).
Coindexation information is not maintained in the trees because Propbank can be used to recover it. We have included these traces in the LTAG-spinal treebank to record the annotation decisions of the PTB. We do not attach any theoretical significance to these traces and provide them for informational purposes only. If this information is not needed, a purely lexicalized version of our treebank can be easily obtained by stripping off the e-trees anchored in traces.
For the sake of convenience, particles are represented as arguments.
Section 23 of our treebank contains 2401 of the 2416 sentences in PTB Section 23.
Abbreviations
- LTAG:
-
Lexicalized Tree Adjoining Grammar
References
Abeillé, A., & Rambow, O. (Eds.) (2001). Tree Adjoining Grammars: Formalisms, linguistic analysis and processing. Center for the Study of Language and Information.
Babko-Malaya, O., Bies, A., Taylor, A., Yi, S., Palmer, M., Marcus, M., Kulick, S., & Shen, L. (2006). Issues in synchronizing the English Treebank and PropBank. In Frontiers in Linguistically Annotated Corpora (ACL Workshop).
Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence.
Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics (ACL).
Chen, J., Bangalore, S., & Vijay-Shanker, K. (2006). Automated extraction of Tree Adjoining Grammars from treebanks. Natural Language Engineering, 12(3), 251–299.
Chen, J., & Rambow, O. (2003). Use of deep linguistic features for the recognition and labeling of semantic arguments. In Proceedings of the 2003 Conference of Empirical Methods in Natural Language Processing.
Chiang, D. (2000). Statistical parsing with an automatically-extracted Tree Adjoining Grammar. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL).
Collins, M. (1999). Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania.
Frank, R. (2002). Phrase structure composition and syntactic dependencies. The MIT Press.
Hockenmaier, J., & Steedman, M. (2002). Generative models for statistical parsing with combinatory categorial grammar. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL).
Joshi, A. K., Levy, L. S., & Takahashi, M. (1975). Tree adjunct grammars. Journal of Computer and System Sciences, 10(1), 136–163.
Joshi, A. K., & Schabes, Y. (1997). Tree-Adjoining Grammars. In G. Rozenberg & A. Salomaa (Eds.), Handbook of formal languages (Vol. 3, pp. 69–124). Springer-Verlag.
Joshi, A. K., & Srinivas, B. (1994). Disambiguation of super parts of speech (or Supertags): Almost parsing. In Proceedings of COLING ’94: The 15th Int. Conf. on Computational Linguistics.
Kroch, A., & Joshi, A. K. (1985). The linguistic relevance of Tree Adjoining Grammar. Report MS-CIS-85-16. CIS Department, University of Pennsylvania.
Magerman, D. (1995). Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.
Pradhan, S., Hacioglu, K., Krugler, V., Ward, W., Martin, J., & Jurafsky, D. (2005). Support vector learning for semantic argument classification. Machine Learning, 60(1–3), 11–39.
Rambow, O., Weir, D., & Vijay-Shanker, K. (2001). D-Tree substitution grammars. Computational Linguistics, 27(1), 89–121.
Sarkar, A., & Joshi, A. K. (1996). Coordination in Tree Adjoining Grammars: Formalization and implementation. In Proceedings of COLING ’96: The 16th Int. Conf. on Computational Linguistics.
Schabes, Y., & Waters, R. C. (1995). A cubic-time, parsable formalism that lexicalizes context-free grammar without changing the trees produced. Computational Linguistics, 21(4), 479–513.
Shen, L. (2006). Statistical LTAG parsing. PhD Thesis, University of Pennsylvania.
Shen, L., & Joshi, A. K. (2005). Incremental LTAG parsing. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.
Shen, L., & Joshi, A. K. (2007). Bidirectional LTAG dependency parsing. Technical Report 07-02, IRCS, University of Pennsylvania.
Steedman, M. (2000). The syntactic process. The MIT Press.
Sturt, P., & Lombardo, V. (2005). Processing coordinated structures: Incrementality and connectedness. Cognitive Science, 29(2), 291–305.
Vadas, D., & Curran, J. (2007). Adding noun phrase structure to the Penn Treebank. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL).
Xia, F. (2001). Automatic grammar generation from two different perspectives. PhD thesis, University of Pennsylvania.
XTAG-Group (2001). A lexicalized tree adjoining grammar for English. Technical Report 01-03, IRCS, University of Pennsylvania.
Yi, S. (2007). Robust semantic role labeling using parsing variations and semantic classes. PhD thesis, University of Pennsylvania.
Acknowledgments
We would like to thank our anonymous reviewers for valuable comments. We are grateful to Ryan Gabbard, who has contributed to the code for the LTAG-spinal API. We also thank Julia Hockenmaier, Mark Johnson, Yudong Liu, Mitch Marcus, Sameer Pradhan, Anoop Sarkar, and the CLRG and XTAG groups at Penn for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shen, L., Champollion, L. & Joshi, A.K. LTAG-spinal and the Treebank. Lang Resources & Evaluation 42, 1–19 (2008). https://doi.org/10.1007/s10579-007-9043-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-007-9043-7