Skip to main content
Log in

LTAG-spinal and the Treebank

A new resource for incremental, dependency and semantic parsing

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

We introduce LTAG-spinal, a novel variant of traditional Lexicalized Tree Adjoining Grammar (LTAG) with desirable linguistic, computational and statistical properties. Unlike in traditional LTAG, subcategorization frames and the argument–adjunct distinction are left underspecified in LTAG-spinal. LTAG-spinal with adjunction constraints is weakly equivalent to LTAG. The LTAG-spinal formalism is used to extract an LTAG-spinal Treebank from the Penn Treebank with Propbank annotation. Based on Propbank annotation, predicate coordination and LTAG adjunction structures are successfully extracted. The LTAG-spinal Treebank makes explicit semantic relations that are implicit or absent from the original PTB. LTAG-spinal provides a very desirable resource for statistical LTAG parsing, incremental parsing, dependency parsing, and semantic parsing. This treebank has been successfully used to train an incremental LTAG-spinal parser and a bidirectional LTAG dependency parser.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. By non-predicate adjuncts, we mean those auxiliary trees whose foot node does not subcategorize for the anchor; these are essentially modifier trees. LTAG also uses auxiliary trees to model phenomena other than non-predicate adjuncts. Examples are raising verbs and parentheticals. In going from LTAG to LTAG-spinal, we do not change the analysis of these phenomena. See Sect. 4 for further discussion.

  2. Most recently, subsets of the PTB and Propbank have been reconciliated by hand (Babko-Malaya et al. 2006; Yi 2007). Our own extraction process was carried out automatically before that data became available and covers the entire PTB and Propbank. To a certain extent, it has been informed by that ongoing work.

  3. We treat conjoining as if it were a distinct operation. Theoretically, though, conjoining can be seen as a special case of the attachment operation. This is somewhat similar to traditional LTAG, where substitution is a distinct operation but can be seen as a special case of adjunction. Indeed, historically the first definition of TAG does not refer to substitution at all (Joshi et al. 1975).

  4. Detailed operations for tree transformations were described in Shen (2006). Similar work was reported in Babko-Malaya et al. (2006) and Yi (2007).

  5. For a general reference for the use of LTAGs for linguistic description, see Frank (2002).

  6. Coindexation information is not maintained in the trees because Propbank can be used to recover it. We have included these traces in the LTAG-spinal treebank to record the annotation decisions of the PTB. We do not attach any theoretical significance to these traces and provide them for informational purposes only. If this information is not needed, a purely lexicalized version of our treebank can be easily obtained by stripping off the e-trees anchored in traces.

  7. Extraposition can be handled by multi-component LTAG (MC-LTAG) (Kroch and Joshi 1985; Frank 2002). Our LTAG-spinal Treebank at present does not support MC-LTAG.

  8. For the sake of convenience, particles are represented as arguments.

  9. Section 23 of our treebank contains 2401 of the 2416 sentences in PTB Section 23.

Abbreviations

LTAG:

Lexicalized Tree Adjoining Grammar

References

  • Abeillé, A., & Rambow, O. (Eds.) (2001). Tree Adjoining Grammars: Formalisms, linguistic analysis and processing. Center for the Study of Language and Information.

  • Babko-Malaya, O., Bies, A., Taylor, A., Yi, S., Palmer, M., Marcus, M., Kulick, S., & Shen, L. (2006). Issues in synchronizing the English Treebank and PropBank. In Frontiers in Linguistically Annotated Corpora (ACL Workshop).

  • Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. In Proceedings of the Fourteenth National Conference on Artificial Intelligence.

  • Charniak, E., & Johnson, M. (2005). Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43th Annual Meeting of the Association for Computational Linguistics (ACL).

  • Chen, J., Bangalore, S., & Vijay-Shanker, K. (2006). Automated extraction of Tree Adjoining Grammars from treebanks. Natural Language Engineering, 12(3), 251–299.

    Article  Google Scholar 

  • Chen, J., & Rambow, O. (2003). Use of deep linguistic features for the recognition and labeling of semantic arguments. In Proceedings of the 2003 Conference of Empirical Methods in Natural Language Processing.

  • Chiang, D. (2000). Statistical parsing with an automatically-extracted Tree Adjoining Grammar. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL).

  • Collins, M. (1999). Head-driven statistical models for natural language parsing. PhD thesis, University of Pennsylvania.

  • Frank, R. (2002). Phrase structure composition and syntactic dependencies. The MIT Press.

  • Hockenmaier, J., & Steedman, M. (2002). Generative models for statistical parsing with combinatory categorial grammar. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL).

  • Joshi, A. K., Levy, L. S., & Takahashi, M. (1975). Tree adjunct grammars. Journal of Computer and System Sciences, 10(1), 136–163.

    Google Scholar 

  • Joshi, A. K., & Schabes, Y. (1997). Tree-Adjoining Grammars. In G. Rozenberg & A. Salomaa (Eds.), Handbook of formal languages (Vol. 3, pp. 69–124). Springer-Verlag.

  • Joshi, A. K., & Srinivas, B. (1994). Disambiguation of super parts of speech (or Supertags): Almost parsing. In Proceedings of COLING ’94: The 15th Int. Conf. on Computational Linguistics.

  • Kroch, A., & Joshi, A. K. (1985). The linguistic relevance of Tree Adjoining Grammar. Report MS-CIS-85-16. CIS Department, University of Pennsylvania.

  • Magerman, D. (1995). Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics.

  • Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1994). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  • Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.

    Google Scholar 

  • Pradhan, S., Hacioglu, K., Krugler, V., Ward, W., Martin, J., & Jurafsky, D. (2005). Support vector learning for semantic argument classification. Machine Learning, 60(1–3), 11–39.

    Article  Google Scholar 

  • Rambow, O., Weir, D., & Vijay-Shanker, K. (2001). D-Tree substitution grammars. Computational Linguistics, 27(1), 89–121.

    Article  Google Scholar 

  • Sarkar, A., & Joshi, A. K. (1996). Coordination in Tree Adjoining Grammars: Formalization and implementation. In Proceedings of COLING ’96: The 16th Int. Conf. on Computational Linguistics.

  • Schabes, Y., & Waters, R. C. (1995). A cubic-time, parsable formalism that lexicalizes context-free grammar without changing the trees produced. Computational Linguistics, 21(4), 479–513.

    Google Scholar 

  • Shen, L. (2006). Statistical LTAG parsing. PhD Thesis, University of Pennsylvania.

  • Shen, L., & Joshi, A. K. (2005). Incremental LTAG parsing. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing.

  • Shen, L., & Joshi, A. K. (2007). Bidirectional LTAG dependency parsing. Technical Report 07-02, IRCS, University of Pennsylvania.

  • Steedman, M. (2000). The syntactic process. The MIT Press.

  • Sturt, P., & Lombardo, V. (2005). Processing coordinated structures: Incrementality and connectedness. Cognitive Science, 29(2), 291–305.

    Google Scholar 

  • Vadas, D., & Curran, J. (2007). Adding noun phrase structure to the Penn Treebank. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL).

  • Xia, F. (2001). Automatic grammar generation from two different perspectives. PhD thesis, University of Pennsylvania.

  • XTAG-Group (2001). A lexicalized tree adjoining grammar for English. Technical Report 01-03, IRCS, University of Pennsylvania.

  • Yi, S. (2007). Robust semantic role labeling using parsing variations and semantic classes. PhD thesis, University of Pennsylvania.

Download references

Acknowledgments

We would like to thank our anonymous reviewers for valuable comments. We are grateful to Ryan Gabbard, who has contributed to the code for the LTAG-spinal API. We also thank Julia Hockenmaier, Mark Johnson, Yudong Liu, Mitch Marcus, Sameer Pradhan, Anoop Sarkar, and the CLRG and XTAG groups at Penn for helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Libin Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, L., Champollion, L. & Joshi, A.K. LTAG-spinal and the Treebank. Lang Resources & Evaluation 42, 1–19 (2008). https://doi.org/10.1007/s10579-007-9043-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9043-7

Keywords

Navigation