Abstract
Reflecting an increased need for stochastic parse selection models over hand-built linguistic grammars and a lack of appropriately detailed training material, we present the Linguistic Grammars On-Line (LinGo) Redwoods initiative, a seed activity in the design and development of a new type of treebank. LinGo Redwoods aims at the development of a novel treebanking methodology, (i) rich in nature and dynamic in both (ii) the ways linguistic data can be retrieved from the treebank in varying granularity and (iii) the constant evolution and regular updating of the treebank itself, synchronized to the development of ideas in syntactic theory. Starting in June 2001, the project has been working to build the foundations for this new type of treebank, develop a basic set of tools required for treebank construction and maintenance, and construct an initial set of 10,000 annotated trees to be distributed together with the tools under an open-source license.
Similar content being viewed by others
References
Agresti A. (1990). Categorical Data Analysis. John Wiley & Sons
E. Atwell (1996) Comparative Evaluation of Grammatical Annotation Models R. Sutcliffe H.D. Koch A McElligott (Eds) Proceedings of the Workshop on Industrial Parsing of Software Manuals Rodopi Amsterdam 25–46
G. Bouma G. Noord Particlevan R. Malouf (2001) Alpino Wide-Coverage Computational Analysis of Dutch W. Daelemans K. Simaan J. Veenstra J. Zavrel (Eds) Computational Linguistics in the Netherlands. Rodopi Amsterdam 45–59
U. Callmeier (2000) ArticleTitlePET—A platform for experimentation with efficient HPSG processing techniques Natural Language Engineering. 6/1 IssueIDSpecial Issue on Efficient Processing with HPSG 99–108 Occurrence Handle10.1017/S1351324900002369
Carroll J., Briscoe E., Sanfilippo A. (1998). Parser evaluation: a survey and a new proposal. In Proceedings of the 1st International Conference on Language Resources and Evaluation, Granada, Spain, 447–454
Carter D. (1997). The TreeBanker. A tool for supervised training of parsed corpora. In Proceedings of the Workshop on Computational Environments for Grammar Development and Linguistic Engineering, Madrid, Spain
Charniak E. (1997). Statistical Parsing with a Context-Free Grammar and Word Statistics In Proceedings of the Fourteenth National Conference on Artificial Intelligence. Providence, RI. 598–603
Collins M.J. (1997). Three Generative Lexicalised Models for Statistical Parsing. In Proceedings of the 35th Meeting of the Association for Computational Linguistics and the 7th Conference of the European Chapter of the ACL, Madrid, Spain, 16–23
Copestake A. (1992). The ACQUILEX LKB. Representation Issues in Semi-Automatic Acquisition of Large Lexicons. In Proceedings of the 3rd ACL Conference on Applied Natural Language Processing Trento, Italy, 88–96
A. Copestake (2002) Implementing Typed Feature Structure Grammars CSLI Publications Stanford, CA
A. Copestake D. Flickinger I.A. Sag C. Pollard (1999) Minimal Recursion Semantics. An Introduction. In preparation CSLI Stanford, CA
Copestake A., Lascarides A., Flickinger D. (2001). An Algebra for Semantic Construction in Constraint-based Grammars. In Proceedings of the 39th Meeting of the Association for Computational Linguistics, Toulouse, France
Dipper S. (2000). Grammar-based Corpus Annotation. In Workshop on Linguistically Interpreted Corpora LINC-2000, Luxembourg, 56–64
Flickinger D. (2000). On building a more efficient grammar by exploiting types. Natural Language Engineering 6/1 % Flickinger Dan and Oepen Stephan and Tsujii J. and Uszkoreit Hans 6(1) (Special Issue on Efficient Processing with HPSG), 15–28
Hajic J. (1998). Building a syntactically annotated corpus. The Prague dependency treebank In Issues of Valency and Meaning. Karolinum, Prague, Czech Republic, 106–132
Harris T.E. (1963). The Theory of Branching Processes, Springer, Berlin, Germany. Johnson, M., Geman, S., Canon, S., Chi, Z., Riezler, S. (1999) Estimators for Stochastic ‘Unification-based’ Grammars. In Proceedings of the 37th Meeting of the Association for Computational Linguistics, College Park, MD, 535–541
Kiefer B., Krieger H.-U., Carroll J., Malouf R. (1999). A Bag of Useful Techniques for Efficient and Robust Parsing. In Proceedings of the 37th Meeting of the Association for Computational Linguistics, College Park, MD, 473–480
King T.H., Dipper S., Frank A., Kuhn J., Maxwell J. (2000) Ambiguity management in grammar writing In Workshop on Linguistic Theory and Grammar Implementation. Birmingham, UK, 5–19
R. Malouf J. Carroll A. Copestake (2002) Efficient feature structure operations without compilation S. Oepen D. Flickinger J. Tsujii H. Uszkoreit (Eds) Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing CSLI Publications Stanford, USA
M.P. Marcus B. Santorini M.A. Marcinkiewicz (1993) ArticleTitleBuilding a large annotated corpus of English The Penn Treebank Computational Linguistics 19 313–330
Mullen T., Malouf R., vanNoord G. (2001). Statistical parsing of Dutch using Maximum Entropy Models with Feature Merging. In Proceedings of the Natural Language Processing Pacific Rim Symposium, Tokyo, Japan
S. Müller W. Kasper (2000) HPSG Analysis of German W. Wahlster (Eds) Verbmobil. Foundations of Speech-to-Speech Translation (Artificial Intelligence ed.) Springer Berlin 238–253
Oepen S., Callmeier U. (2000). Measure for measure: parser cross-fertilization. Towards increased component comparability and exchange. In Proceedings of the 6th International Workshop on Parsing Technologies, Trento, Italy, 183–194
S. Oepen J. Carroll (2000) ArticleTitlePerformance Profiling for Parser Engineering Natural Language Engineering. 6/1 IssueIDSpecial Issue on Efficient Processing with HPSG 81–97 Occurrence Handle10.1017/S1351324900002394
Oepen S., Toutanova K., Shieber S., Manning C., Flickinger D., Brants T. (2002). The Redwoods Treebank. Motivation and Preliminary Applications. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan
C. Pollard I.A. Sag (1994) Head-Driven Phrase Structure Grammar The University of Chicago Press and CSLI Publications Chicago, IL and Stanford, CA
Simov K., Osenova P., Slavcheva M., Kolkovska S., Balabanova E., Doikoff D., Ivanova K., Simov A., Kouylekov M. (2002). Building a Linguistically Interpreted Corpus of Bulgarian. The BulTreeBank. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, Canary Islands, Spain, 1729–1736
Skut W., Krenn B., Brants T., Uszkoreit H. (1997). An Annotation Scheme for Free Word Order Languages. In Proceedings of the 5th ACL Conference on Applied Natural Language Processing, Washington, DC
Toutanova K., Manning C.D. (2002). Feature Selection for a Rich HPSG Grammar Using Decision Trees. In Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwan
L. Beek Particlevan der G. Bouma R. Malouf G. Noord Particlevan (2002) The Alpino Dependency Treebank M. Theune A. Nijholt H. Hondorp (Eds) Computational Linguistics in the Netherlands Rodopi Amsterdam, The Netherlands
W Wahlster (Eds) (2000) Verbmobil Foundations of Speech-To-Speech Translation Springer Berlin, Germany
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Oepen, S., Flickinger, D., Toutanova, K. et al. LinGO Redwoods. Res Lang Comput 2, 575–596 (2004). https://doi.org/10.1007/s11168-004-7430-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11168-004-7430-4