Abstract
This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGER in, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.
Similar content being viewed by others
References
Abeillé A., Brants T., Uszkoreit H. (eds). (2000a.). Proceedings of the COLING-2000 Post-Conference Workshop on Linguistically Interpreted Corpora LINC-2000 , Luxembourg
Abeillé A., Clement L., Kinyon A. (eds). (2000b). Building a Treebank for French. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 pp. 87–94. Athens, Greece
Abeillé A., Hansen-Schirra S., Uszkoreit H. (eds). (2003). Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) , Budapest
Boguslavsky I., Grigorieva S., Grigoriev N., Kreidlin L., Frid N. (2000). Dependency Treebank for Russian: Concept, Tools, Types of Information. Proceedings of the 18th International Conference on Computational Linguistics COLING-2000 , Saarbrücken, Germany
Bosco C., Lombardo V., Vassallo D., Lesmo L. (2000). Building a Treebank for Italian: A datadriven annotation schema. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 , pp. 99–106, Athens, Greece
Brants T. (1997). The NEGRA Export Format. CLAUS Report No. 98. Saarbrücken, Germany : Dept. of Computational Linguistics, Saarland University
Brants T. (1999). Tagging and Parsing with Cascaded Markov Models – automation of corpus annotation. Saarbrücken, Germany: German Research Center for Artificial Intelligence and Saarland University: Saarbrücken Dissertations in Computational Linguistics and Language Technology, Vol. 6
Brants T. (2000a). TnT A Statistical Part-of-Speech Tagger. Proceedings of the Sixth Conference on Applied Natural Language Processing ANLP-2000. Seattle, WA
Brants T. (2000b). Inter-annotator agreement for a German newspaper corpus. Proceedings of Second International Conference on Language Resources and Evaluation LREC-2000. Athens, Greece
Brants S., Hansen S. (2002). Developments in the TIGER Annotation Scheme and their Realization in the Corpus. Proceedings of the Third Conference on Language Resources and Evaluation LREC-2002 , pp. 1643–1649, Las Palmas de Gran Canaria, Spain
T. Brants R. Hendriks S. Kramp B. Krenn C. Preis W. Skut H. Uszkoreit (1999) Das NEGRA-Annotationsschema. (Tech Rep.). Saarbrücken Dept. of Computational Linguistics Germany
Brants T., Skut W. (1998). Automation of treebank annotation. Proceedings of New Methods in Language Processing NeMLaP-98. Sydney, Australia
Brants T., Skut W., Uszkoreit H. (1999a). Syntactic Annotation of a German Newspaper Corpus. Proceedings of the ATALA Treebank Workshop , pp. 69–76, Paris, France
Bresnan J. ed. (1982). The Mental Representation of Grammatical Relations. MIT Press
Dipper S. (2000). Grammar-based Corpus Annotation. Proceedings of the COLING-2000 Post-Conference Workshop on Linguistically Interpreted Corpora LINC-2000 , pp. 56–64, Luxembourg
Dipper S. (2003). Implementing and Documenting Large-Scale Grammars – German LFG. Doctoral Dissertation, University of Stuttgart. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeittung (AIMS), Vol. 9(1)
Frank A., King T. H., Kuhn J., Maxwell J. (1998). Optimality Theory Style Constraint Ranking in Large-scale LFG Grammars. Proceedings of the LFG-98 Conference. Brisbane, Australia: CSLI Online Publications, http://www-csli.stanford.edu/publications
S. Greenbaum (Eds) (1996) Comparing English worldwide: The International Corpus of English Clarendon Press Oxford, UK
J. Hajic (1999) Building a Syntactically Annotated corpus: The Prague Dependency Treebank E. Hajicova (Eds) Issues of Valency and meaning. Studies in honour of Jarmila Panevova Charles University Press Prague, Czech Republic
King T.H., Dipper S., Frank A., Kuhn J., Maxwell J. To Appear. Ambiguity Management in Grammar Writing. Journal of Language and Computation.
König E., Lezius W. (2000a). A description Language for syntactically annotated corpora. Proceedings of the 18th International Conference on Computational Linguistics COLING-2000 pp. 1056–1060, Saarbrücken, Germany
König, E., Lezius W., Voormann H. (2003). TIGERSearch User’s Manual. IMS, University of Stuttgart. (http://www.tigersearch.de)
König E., Lezius W. (2003). The TIGER language – A Description Language for Syntax Graphs, Formal Definition. Tech. Rep. IMS, University of Stuttgart. (http://www.ims.uni–stuttgart.de/projekte/corplex/paper/lezius/tigerLangForm.ps.gz)
Kramp S., Preis C. (2000). Konventionen fur die Verwendung des STTS im NEGRA-Korpus. Tech. Rep. Saarbrücken, Germany, Department of Computational Linguistics, Saarland University
Leech G. (1992). The Lancaster Parsed Corpus. ICAME Journal, 16(124)
W. Lezius (2001) Baumbanken K.-U. Carstensen C. Ebert C. Endriss S. Jekat R. Klabunde H Langer (Eds) Computerlinguistik und Sprachtechnologie – eine Einführung Spektrum Akademischer Verlag Heidelberg, Germany 377–385
Diss Lezius W. (2002). Ein Werkzeug zur Suche auf syntaktisch annotierten Textkorpora. PhD Thesis. IMS, University of Stuttgart
Lezius W., Biesinger H., Gerstenberger C. (2002a). TIGER-XML Quick Reference Guide. Tech. Rep. IMS, University of Stuttgart
Lezius W., Biesinger H., Gerstenberger C. (2002b). TIGERRegistry Manual. Tech. Rep. IMS, University of Stuttgart
Lezius W., König E. (2000). Towards a Search Engine for Syntactically Annotated corpora. Proceedings of the Fifth KONVENS Conference, Ilmenau, Germany
Marcus M., Kim G., Marcinkiewicz M., MacIntyre R., Bies A., Gerguson M., Katz K., Schasberger B. (1994). The Penn Treebank: Annotating predicate Argument structure. Proceedings of the ARPA Human Language Technology Workshop , Morgan Kaufman, San Francisco, CA
Mengel A., Lezius W. (2000). An XML-based encoding format for syntactically annotated corpora. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 , pp. 121–126, Athens, Greece
Moreno A., Grishman R., López S., Sánchez F., Sekine S. (2000). A Treebank of Spanish and its Application to Parsing. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 , pp. 107–112, Athens, Greece
Oflazer K., Hakkani-Tür D., Tür G. (1999). Design for a Turkish treebank. Proceedings of the Workshop on Linguistically Interpreted Corpora LINC-99, Bergen, Norway
ParGram. (2002). The ParGram Project. (URL: http://www2.parc.com/istl/groups/nltt/pargram/)
Plaehn O., Brants T. (2000). ANNOTATE – An Efficient Interactive Annotation tool. Proceedings of the Sixth Conference on Applied Natural Language Processing ANLP-2000, Seattle, WA
Riezler S., King T.H., Kaplan R., Crouch R., Maxwell J., Johnson M. (2002). Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. Proceedings of the ACL-02 , Philadephia, PA
G. Sampson (1995) English for the Computer The SUSANNE Corpus and Analytic Scheme.Clarendon Press Oxford, UK
Schiller A., Teufel S., Stöckert C. (1999). Guidelines fur das Tagging deutscher Textcorpora mit STTS. Tech. Rep. University of Stuttgart, University of Tübingen.Schrader 2001 ]
B. Schrader (2001) Modifikation einer deutschen LFG-Grammatik für Partial Parsing University of Stuttgart Studienarbeit
Schuurman I., Schouppe M., Hoekstra H., van der Wouden T. (2003). CGN, An Annotated Corpus of Spoken Dutch. Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03). Budapest
Simov K., Osenova P., Slavcheva M., Kolkovska S., Balabanova E., Doikoff D., Ivanova K., Simov A., Kouylekov M. (2002). Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank. Proceedings of Third International Conference on Language Resources and Evaluation LREC-2002 , pp.1729–1736. Las Palmas de Gran Canaria, Spain
Skut W., Brants T., Krenn B., Uszkoreit H. (1998). A Linguistically Interpreted Corpus of German Newspaper Text. Proceedings of the Conference on Language Resources and Evaluation LREC-98 , pp. 705–711. Granada, Spain
Skut W., Krenn B., Brants T., Uszkoreit H. (1997). An Annotation Scheme for Free Word Order Languages. Proceedings of the Conference on Applied Natural Language Processing ANLP-97, Washington, DC
G. Smith (2000) Encoding thematic roles via syntactic categories in a German treebank. Proceedings of the Workshop on Syntactic Annotation of Electronic Corpora Tübingen Germany
Smith G., Eisenberg P. (2000). Kommentare zur Verwendung des STTS im NEGRA-Korpus. Tech. Rep. University of Potsdam
H. Uszkoreit T. Brants B. Krenn (Eds) (1999) Proceedings of the Workshop on Linguistically Interpreted Corpora LINC-99 Bergen Norway
Voorman H. (2002). TIGERin – Graphische Eingabe von Suchanfragen in TIGERSearch. Diploma Thesis. IMS, University of Stuttgart
W. Wahlster (Eds) (2000) Verbmobil: Foundations of Speech-to-Speech Translation Springer Heidelberg, Germany
Zinsmeister H., Kuhn J., Dipper S. (2002). Utilizing LFG Parses for Treebank Annotation. Proceedings of the LFG-02 Conference , Athens, Greece. Agresti A. (1990). Categorical Data Analysis. John Wiley & Sons
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Brants, S., Dipper, S., Eisenberg, P. et al. TIGER: Linguistic Interpretation of a German Corpus. Res Lang Comput 2, 597–620 (2004). https://doi.org/10.1007/s11168-004-7431-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11168-004-7431-3