Research on Language and Computation

, Volume 2, Issue 4, pp 575–596 | Cite as

LinGO Redwoods

A Rich and Dynamic Treebank for HPSG
  • Stephan Oepen
  • Dan Flickinger
  • Kristina Toutanova
  • Christopher D. Manning
Article

Abstract

Reflecting an increased need for stochastic parse selection models over hand-built linguistic grammars and a lack of appropriately detailed training material, we present the Linguistic Grammars On-Line (LinGo) Redwoods initiative, a seed activity in the design and development of a new type of treebank. LinGo Redwoods aims at the development of a novel treebanking methodology, (i) rich in nature and dynamic in both (ii) the ways linguistic data can be retrieved from the treebank in varying granularity and (iii) the constant evolution and regular updating of the treebank itself, synchronized to the development of ideas in syntactic theory. Starting in June 2001, the project has been working to build the foundations for this new type of treebank, develop a basic set of tools required for treebank construction and maintenance, and construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license.

Keywords

HPSG parse selection treebank maintenance treebanks 

References

  1. Agresti A. (1990). Categorical Data Analysis. John Wiley & SonsGoogle Scholar
  2. Atwell, E. 1996

    Comparative Evaluation of Grammatical Annotation Models

    Sutcliffe, R.Koch, H.D.McElligott, A eds. Proceedings of the Workshop on Industrial Parsing of Software ManualsRodopiAmsterdam2546
    Google Scholar
  3. Bouma, G., Noord, G., Malouf, R. 2001

    Alpino Wide-Coverage Computational Analysis of Dutch

    Daelemans, W.Simaan, K.Veenstra, J.Zavrel, J. eds. Computational Linguistics in the Netherlands.RodopiAmsterdam4559
    Google Scholar
  4. Callmeier, U. 2000PET—A platform for experimentation with efficient HPSG processing techniquesNatural Language Engineering.6/199108CrossRefGoogle Scholar
  5. Carroll J., Briscoe E., Sanfilippo A. (1998). Parser evaluation: a survey and a new proposal. In Proceedings of the 1st International Conference on Language Resources and Evaluation, Granada, Spain, 447–454Google Scholar
  6. Carter D. (1997). The TreeBanker. A tool for supervised training of parsed corpora. In Proceedings of the Workshop on Computational Environments for Grammar Development and Linguistic Engineering, Madrid, SpainGoogle Scholar
  7. Charniak E. (1997). Statistical Parsing with a Context-Free Grammar and Word Statistics In Proceedings of the Fourteenth National Conference on Artificial Intelligence. Providence, RI. 598–603Google Scholar
  8. Collins M.J. (1997). Three Generative Lexicalised Models for Statistical Parsing. In Proceedings of the 35th Meeting of the Association for Computational Linguistics and the 7th Conference of the European Chapter of the ACL, Madrid, Spain, 16–23Google Scholar
  9. Copestake A. (1992). The ACQUILEX LKB. Representation Issues in Semi-Automatic Acquisition of Large Lexicons. In Proceedings of the 3rd ACL Conference on Applied Natural Language Processing Trento, Italy, 88–96Google Scholar
  10. Copestake, A. 2002Implementing Typed Feature Structure GrammarsCSLI PublicationsStanford, CAGoogle Scholar
  11. Copestake, A., Flickinger, D., Sag, I.A., Pollard, C. 1999Minimal Recursion Semantics. An Introduction. In preparationCSLIStanford, CAGoogle Scholar
  12. Copestake A., Lascarides A., Flickinger D. (2001). An Algebra for Semantic Construction in Constraint-based Grammars. In Proceedings of the 39th Meeting of the Association for Computational Linguistics, Toulouse, FranceGoogle Scholar
  13. Dipper S. (2000). Grammar-based Corpus Annotation. In Workshop on Linguistically Interpreted Corpora LINC-2000, Luxembourg, 56–64Google Scholar
  14. Flickinger D. (2000). On building a more efficient grammar by exploiting types. Natural Language Engineering 6/1 % Flickinger Dan and Oepen Stephan and Tsujii J. and Uszkoreit Hans 6(1) (Special Issue on Efficient Processing with HPSG), 15–28Google Scholar
  15. Hajic J. (1998). Building a syntactically annotated corpus. The Prague dependency treebank In Issues of Valency and Meaning. Karolinum, Prague, Czech Republic, 106–132Google Scholar
  16. Harris T.E. (1963). The Theory of Branching Processes, Springer, Berlin, Germany. Johnson, M., Geman, S., Canon, S., Chi, Z., Riezler, S. (1999) Estimators for Stochastic ‘Unification-based’ Grammars. In Proceedings of the 37th Meeting of the Association for Computational Linguistics, College Park, MD, 535–541Google Scholar
  17. Kiefer B., Krieger H.-U., Carroll J., Malouf R. (1999). A Bag of Useful Techniques for Efficient and Robust Parsing. In Proceedings of the 37th Meeting of the Association for Computational Linguistics, College Park, MD, 473–480Google Scholar
  18. King T.H., Dipper S., Frank A., Kuhn J., Maxwell J. (2000) Ambiguity management in grammar writing In Workshop on Linguistic Theory and Grammar Implementation. Birmingham, UK, 5–19Google Scholar
  19. Malouf, R., Carroll, J., Copestake, A. 2002

    Efficient feature structure operations without compilation

    Oepen, S.Flickinger, D.Tsujii, J.Uszkoreit, H. eds. Collaborative Language Engineering. A Case Study in Efficient Grammar-based ProcessingCSLI PublicationsStanford, USA
    Google Scholar
  20. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A. 1993Building a large annotated corpus of English The Penn TreebankComputational Linguistics19313330Google Scholar
  21. Mullen T., Malouf R., vanNoord G. (2001). Statistical parsing of Dutch using Maximum Entropy Models with Feature Merging. In Proceedings of the Natural Language Processing Pacific Rim Symposium, Tokyo, JapanGoogle Scholar
  22. Müller, S., Kasper, W. 2000

    HPSG Analysis of German

    Wahlster, W. eds. Verbmobil. Foundations of Speech-to-Speech Translation (Artificial Intelligence ed.)SpringerBerlin238253
    Google Scholar
  23. Oepen S., Callmeier U. (2000). Measure for measure: parser cross-fertilization. Towards increased component comparability and exchange. In Proceedings of the 6th International Workshop on Parsing Technologies, Trento, Italy, 183–194Google Scholar
  24. Oepen, S., Carroll, J. 2000Performance Profiling for Parser EngineeringNatural Language Engineering.6/18197CrossRefGoogle Scholar
  25. Oepen S., Toutanova K., Shieber S., Manning C., Flickinger D., Brants T. (2002). The Redwoods Treebank. Motivation and Preliminary Applications. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, TaiwanGoogle Scholar
  26. Pollard, C., Sag, I.A. 1994Head-Driven Phrase Structure GrammarThe University of Chicago Press and CSLI PublicationsChicago, IL and Stanford, CAGoogle Scholar
  27. Simov K., Osenova P., Slavcheva M., Kolkovska S., Balabanova E., Doikoff D., Ivanova K., Simov A., Kouylekov M. (2002). Building a Linguistically Interpreted Corpus of Bulgarian. The BulTreeBank. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, Canary Islands, Spain, 1729–1736Google Scholar
  28. Skut W., Krenn B., Brants T., Uszkoreit H. (1997). An Annotation Scheme for Free Word Order Languages. In Proceedings of the 5th ACL Conference on Applied Natural Language Processing, Washington, DCGoogle Scholar
  29. Toutanova K., Manning C.D. (2002). Feature Selection for a Rich HPSG Grammar Using Decision Trees. In Proceedings of the 6th Conference on Natural Language Learning, Taipei, TaiwanGoogle Scholar
  30. Beek, L., Bouma, G., Malouf, R., Noord, G. 2002

    The Alpino Dependency Treebank

    Theune, M.Nijholt, A.Hondorp, H. eds. Computational Linguistics in the NetherlandsRodopiAmsterdam, The Netherlands
    Google Scholar
  31. Wahlster, W eds. 2000Verbmobil Foundations of Speech-To-Speech TranslationSpringerBerlin, GermanyGoogle Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  • Stephan Oepen
    • 1
  • Dan Flickinger
    • 1
  • Kristina Toutanova
    • 2
  • Christopher D. Manning
    • 2
  1. 1.Center for the Study of Language and InformationStanford UniversityStanfordUSA
  2. 2.Department of Computer ScienceStanford UniversityStanfordUSA

Personalised recommendations