Skip to main content
Log in

LinGO Redwoods

A Rich and Dynamic Treebank for HPSG

  • Published:
Research on Language and Computation

Abstract

Reflecting an increased need for stochastic parse selection models over hand-built linguistic grammars and a lack of appropriately detailed training material, we present the Linguistic Grammars On-Line (LinGo) Redwoods initiative, a seed activity in the design and development of a new type of treebank. LinGo Redwoods aims at the development of a novel treebanking methodology, (i) rich in nature and dynamic in both (ii) the ways linguistic data can be retrieved from the treebank in varying granularity and (iii) the constant evolution and regular updating of the treebank itself, synchronized to the development of ideas in syntactic theory. Starting in June 2001, the project has been working to build the foundations for this new type of treebank, develop a basic set of tools required for treebank construction and maintenance, and construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agresti A. (1990). Categorical Data Analysis. John Wiley & Sons

  • E. Atwell (1996) Comparative Evaluation of Grammatical Annotation Models R. Sutcliffe H.D. Koch A McElligott (Eds) Proceedings of the Workshop on Industrial Parsing of Software Manuals Rodopi Amsterdam 25–46

    Google Scholar 

  • G. Bouma G. Noord Particlevan R. Malouf (2001) Alpino Wide-Coverage Computational Analysis of Dutch W. Daelemans K. Simaan J. Veenstra J. Zavrel (Eds) Computational Linguistics in the Netherlands. Rodopi Amsterdam 45–59

    Google Scholar 

  • U. Callmeier (2000) ArticleTitlePET—A platform for experimentation with efficient HPSG processing techniques Natural Language Engineering. 6/1 IssueIDSpecial Issue on Efficient Processing with HPSG 99–108 Occurrence Handle10.1017/S1351324900002369

    Article  Google Scholar 

  • Carroll J., Briscoe E., Sanfilippo A. (1998). Parser evaluation: a survey and a new proposal. In Proceedings of the 1st International Conference on Language Resources and Evaluation, Granada, Spain, 447–454

  • Carter D. (1997). The TreeBanker. A tool for supervised training of parsed corpora. In Proceedings of the Workshop on Computational Environments for Grammar Development and Linguistic Engineering, Madrid, Spain

  • Charniak E. (1997). Statistical Parsing with a Context-Free Grammar and Word Statistics In Proceedings of the Fourteenth National Conference on Artificial Intelligence. Providence, RI. 598–603

  • Collins M.J. (1997). Three Generative Lexicalised Models for Statistical Parsing. In Proceedings of the 35th Meeting of the Association for Computational Linguistics and the 7th Conference of the European Chapter of the ACL, Madrid, Spain, 16–23

  • Copestake A. (1992). The ACQUILEX LKB. Representation Issues in Semi-Automatic Acquisition of Large Lexicons. In Proceedings of the 3rd ACL Conference on Applied Natural Language Processing Trento, Italy, 88–96

  • A. Copestake (2002) Implementing Typed Feature Structure Grammars CSLI Publications Stanford, CA

    Google Scholar 

  • A. Copestake D. Flickinger I.A. Sag C. Pollard (1999) Minimal Recursion Semantics. An Introduction. In preparation CSLI Stanford, CA

    Google Scholar 

  • Copestake A., Lascarides A., Flickinger D. (2001). An Algebra for Semantic Construction in Constraint-based Grammars. In Proceedings of the 39th Meeting of the Association for Computational Linguistics, Toulouse, France

  • Dipper S. (2000). Grammar-based Corpus Annotation. In Workshop on Linguistically Interpreted Corpora LINC-2000, Luxembourg, 56–64

  • Flickinger D. (2000). On building a more efficient grammar by exploiting types. Natural Language Engineering 6/1 % Flickinger Dan and Oepen Stephan and Tsujii J. and Uszkoreit Hans 6(1) (Special Issue on Efficient Processing with HPSG), 15–28

  • Hajic J. (1998). Building a syntactically annotated corpus. The Prague dependency treebank In Issues of Valency and Meaning. Karolinum, Prague, Czech Republic, 106–132

  • Harris T.E. (1963). The Theory of Branching Processes, Springer, Berlin, Germany. Johnson, M., Geman, S., Canon, S., Chi, Z., Riezler, S. (1999) Estimators for Stochastic ‘Unification-based’ Grammars. In Proceedings of the 37th Meeting of the Association for Computational Linguistics, College Park, MD, 535–541

  • Kiefer B., Krieger H.-U., Carroll J., Malouf R. (1999). A Bag of Useful Techniques for Efficient and Robust Parsing. In Proceedings of the 37th Meeting of the Association for Computational Linguistics, College Park, MD, 473–480

  • King T.H., Dipper S., Frank A., Kuhn J., Maxwell J. (2000) Ambiguity management in grammar writing In Workshop on Linguistic Theory and Grammar Implementation. Birmingham, UK, 5–19

  • R. Malouf J. Carroll A. Copestake (2002) Efficient feature structure operations without compilation S. Oepen D. Flickinger J. Tsujii H. Uszkoreit (Eds) Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing CSLI Publications Stanford, USA

    Google Scholar 

  • M.P. Marcus B. Santorini M.A. Marcinkiewicz (1993) ArticleTitleBuilding a large annotated corpus of English The Penn Treebank Computational Linguistics 19 313–330

    Google Scholar 

  • Mullen T., Malouf R., vanNoord G. (2001). Statistical parsing of Dutch using Maximum Entropy Models with Feature Merging. In Proceedings of the Natural Language Processing Pacific Rim Symposium, Tokyo, Japan

  • S. Müller W. Kasper (2000) HPSG Analysis of German W. Wahlster (Eds) Verbmobil. Foundations of Speech-to-Speech Translation (Artificial Intelligence ed.) Springer Berlin 238–253

    Google Scholar 

  • Oepen S., Callmeier U. (2000). Measure for measure: parser cross-fertilization. Towards increased component comparability and exchange. In Proceedings of the 6th International Workshop on Parsing Technologies, Trento, Italy, 183–194

  • S. Oepen J. Carroll (2000) ArticleTitlePerformance Profiling for Parser Engineering Natural Language Engineering. 6/1 IssueIDSpecial Issue on Efficient Processing with HPSG 81–97 Occurrence Handle10.1017/S1351324900002394

    Article  Google Scholar 

  • Oepen S., Toutanova K., Shieber S., Manning C., Flickinger D., Brants T. (2002). The Redwoods Treebank. Motivation and Preliminary Applications. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan

  • C. Pollard I.A. Sag (1994) Head-Driven Phrase Structure Grammar The University of Chicago Press and CSLI Publications Chicago, IL and Stanford, CA

    Google Scholar 

  • Simov K., Osenova P., Slavcheva M., Kolkovska S., Balabanova E., Doikoff D., Ivanova K., Simov A., Kouylekov M. (2002). Building a Linguistically Interpreted Corpus of Bulgarian. The BulTreeBank. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, Canary Islands, Spain, 1729–1736

  • Skut W., Krenn B., Brants T., Uszkoreit H. (1997). An Annotation Scheme for Free Word Order Languages. In Proceedings of the 5th ACL Conference on Applied Natural Language Processing, Washington, DC

  • Toutanova K., Manning C.D. (2002). Feature Selection for a Rich HPSG Grammar Using Decision Trees. In Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwan

  • L. Beek Particlevan der G. Bouma R. Malouf G. Noord Particlevan (2002) The Alpino Dependency Treebank M. Theune A. Nijholt H. Hondorp (Eds) Computational Linguistics in the Netherlands Rodopi Amsterdam, The Netherlands

    Google Scholar 

  • W Wahlster (Eds) (2000) Verbmobil Foundations of Speech-To-Speech Translation Springer Berlin, Germany

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephan Oepen.

About this article

Cite this article

Oepen, S., Flickinger, D., Toutanova, K. et al. LinGO Redwoods. Res Lang Comput 2, 575–596 (2004). https://doi.org/10.1007/s11168-004-7430-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11168-004-7430-4

Keywords

Navigation