Skip to main content
Log in

TIGER: Linguistic Interpretation of a German Corpus

  • Published:
Research on Language and Computation

Abstract

This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGER in, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abeillé A., Brants T., Uszkoreit H. (eds). (2000a.). Proceedings of the COLING-2000 Post-Conference Workshop on Linguistically Interpreted Corpora LINC-2000 , Luxembourg

  • Abeillé A., Clement L., Kinyon A. (eds). (2000b). Building a Treebank for French. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 pp. 87–94. Athens, Greece

  • Abeillé A., Hansen-Schirra S., Uszkoreit H. (eds). (2003). Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) , Budapest

  • Boguslavsky I., Grigorieva S., Grigoriev N., Kreidlin L., Frid N. (2000). Dependency Treebank for Russian: Concept, Tools, Types of Information. Proceedings of the 18th International Conference on Computational Linguistics COLING-2000 , Saarbrücken, Germany

  • Bosco C., Lombardo V., Vassallo D., Lesmo L. (2000). Building a Treebank for Italian: A datadriven annotation schema. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 , pp. 99–106, Athens, Greece

  • Brants T. (1997). The NEGRA Export Format. CLAUS Report No. 98. Saarbrücken, Germany : Dept. of Computational Linguistics, Saarland University

  • Brants T. (1999). Tagging and Parsing with Cascaded Markov Models – automation of corpus annotation. Saarbrücken, Germany: German Research Center for Artificial Intelligence and Saarland University: Saarbrücken Dissertations in Computational Linguistics and Language Technology, Vol. 6

  • Brants T. (2000a). TnT A Statistical Part-of-Speech Tagger. Proceedings of the Sixth Conference on Applied Natural Language Processing ANLP-2000. Seattle, WA

  • Brants T. (2000b). Inter-annotator agreement for a German newspaper corpus. Proceedings of Second International Conference on Language Resources and Evaluation LREC-2000. Athens, Greece

  • Brants S., Hansen S. (2002). Developments in the TIGER Annotation Scheme and their Realization in the Corpus. Proceedings of the Third Conference on Language Resources and Evaluation LREC-2002 , pp. 1643–1649, Las Palmas de Gran Canaria, Spain

  • T. Brants R. Hendriks S. Kramp B. Krenn C. Preis W. Skut H. Uszkoreit (1999) Das NEGRA-Annotationsschema. (Tech Rep.). Saarbrücken Dept. of Computational Linguistics Germany

    Google Scholar 

  • Brants T., Skut W. (1998). Automation of treebank annotation. Proceedings of New Methods in Language Processing NeMLaP-98. Sydney, Australia

  • Brants T., Skut W., Uszkoreit H. (1999a). Syntactic Annotation of a German Newspaper Corpus. Proceedings of the ATALA Treebank Workshop , pp. 69–76, Paris, France

  • Bresnan J. ed. (1982). The Mental Representation of Grammatical Relations. MIT Press

  • Dipper S. (2000). Grammar-based Corpus Annotation. Proceedings of the COLING-2000 Post-Conference Workshop on Linguistically Interpreted Corpora LINC-2000 , pp. 56–64, Luxembourg

  • Dipper S. (2003). Implementing and Documenting Large-Scale Grammars – German LFG. Doctoral Dissertation, University of Stuttgart. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeittung (AIMS), Vol. 9(1)

  • Frank A., King T. H., Kuhn J., Maxwell J. (1998). Optimality Theory Style Constraint Ranking in Large-scale LFG Grammars. Proceedings of the LFG-98 Conference. Brisbane, Australia: CSLI Online Publications, http://www-csli.stanford.edu/publications

  • S. Greenbaum (Eds) (1996) Comparing English worldwide: The International Corpus of English Clarendon Press Oxford, UK

    Google Scholar 

  • J. Hajic (1999) Building a Syntactically Annotated corpus: The Prague Dependency Treebank E. Hajicova (Eds) Issues of Valency and meaning. Studies in honour of Jarmila Panevova Charles University Press Prague, Czech Republic

    Google Scholar 

  • King T.H., Dipper S., Frank A., Kuhn J., Maxwell J. To Appear. Ambiguity Management in Grammar Writing. Journal of Language and Computation.

  • König E., Lezius W. (2000a). A description Language for syntactically annotated corpora. Proceedings of the 18th International Conference on Computational Linguistics COLING-2000 pp. 1056–1060, Saarbrücken, Germany

  • König, E., Lezius W., Voormann H. (2003). TIGERSearch User’s Manual. IMS, University of Stuttgart. (http://www.tigersearch.de)

  • König E., Lezius W. (2003). The TIGER language – A Description Language for Syntax Graphs, Formal Definition. Tech. Rep. IMS, University of Stuttgart. (http://www.ims.uni–stuttgart.de/projekte/corplex/paper/lezius/tigerLangForm.ps.gz)

  • Kramp S., Preis C. (2000). Konventionen fur die Verwendung des STTS im NEGRA-Korpus. Tech. Rep. Saarbrücken, Germany, Department of Computational Linguistics, Saarland University

  • Leech G. (1992). The Lancaster Parsed Corpus. ICAME Journal, 16(124)

  • W. Lezius (2001) Baumbanken K.-U. Carstensen C. Ebert C. Endriss S. Jekat R. Klabunde H Langer (Eds) Computerlinguistik und Sprachtechnologie – eine Einführung Spektrum Akademischer Verlag Heidelberg, Germany 377–385

    Google Scholar 

  • Diss Lezius W. (2002). Ein Werkzeug zur Suche auf syntaktisch annotierten Textkorpora. PhD Thesis. IMS, University of Stuttgart

  • Lezius W., Biesinger H., Gerstenberger C. (2002a). TIGER-XML Quick Reference Guide. Tech. Rep. IMS, University of Stuttgart

  • Lezius W., Biesinger H., Gerstenberger C. (2002b). TIGERRegistry Manual. Tech. Rep. IMS, University of Stuttgart

  • Lezius W., König E. (2000). Towards a Search Engine for Syntactically Annotated corpora. Proceedings of the Fifth KONVENS Conference, Ilmenau, Germany

  • Marcus M., Kim G., Marcinkiewicz M., MacIntyre R., Bies A., Gerguson M., Katz K., Schasberger B. (1994). The Penn Treebank: Annotating predicate Argument structure. Proceedings of the ARPA Human Language Technology Workshop , Morgan Kaufman, San Francisco, CA

  • Mengel A., Lezius W. (2000). An XML-based encoding format for syntactically annotated corpora. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 , pp. 121–126, Athens, Greece

  • Moreno A., Grishman R., López S., Sánchez F., Sekine S. (2000). A Treebank of Spanish and its Application to Parsing. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 , pp. 107–112, Athens, Greece

  • Oflazer K., Hakkani-Tür D., Tür G. (1999). Design for a Turkish treebank. Proceedings of the Workshop on Linguistically Interpreted Corpora LINC-99, Bergen, Norway

  • ParGram. (2002). The ParGram Project. (URL: http://www2.parc.com/istl/groups/nltt/pargram/)

  • Plaehn O., Brants T. (2000). ANNOTATE – An Efficient Interactive Annotation tool. Proceedings of the Sixth Conference on Applied Natural Language Processing ANLP-2000, Seattle, WA

  • Riezler S., King T.H., Kaplan R., Crouch R., Maxwell J., Johnson M. (2002). Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. Proceedings of the ACL-02 , Philadephia, PA

  • G. Sampson (1995) English for the Computer The SUSANNE Corpus and Analytic Scheme.Clarendon Press Oxford, UK

    Google Scholar 

  • Schiller A., Teufel S., Stöckert C. (1999). Guidelines fur das Tagging deutscher Textcorpora mit STTS. Tech. Rep. University of Stuttgart, University of Tübingen.Schrader 2001 ]

  • B. Schrader (2001) Modifikation einer deutschen LFG-Grammatik für Partial Parsing University of Stuttgart Studienarbeit

    Google Scholar 

  • Schuurman I., Schouppe M., Hoekstra H., van der Wouden T. (2003). CGN, An Annotated Corpus of Spoken Dutch. Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03). Budapest

  • Simov K., Osenova P., Slavcheva M., Kolkovska S., Balabanova E., Doikoff D., Ivanova K., Simov A., Kouylekov M. (2002). Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank. Proceedings of Third International Conference on Language Resources and Evaluation LREC-2002 , pp.1729–1736. Las Palmas de Gran Canaria, Spain

  • Skut W., Brants T., Krenn B., Uszkoreit H. (1998). A Linguistically Interpreted Corpus of German Newspaper Text. Proceedings of the Conference on Language Resources and Evaluation LREC-98 , pp. 705–711. Granada, Spain

  • Skut W., Krenn B., Brants T., Uszkoreit H. (1997). An Annotation Scheme for Free Word Order Languages. Proceedings of the Conference on Applied Natural Language Processing ANLP-97, Washington, DC

  • G. Smith (2000) Encoding thematic roles via syntactic categories in a German treebank. Proceedings of the Workshop on Syntactic Annotation of Electronic Corpora Tübingen Germany

    Google Scholar 

  • Smith G., Eisenberg P. (2000). Kommentare zur Verwendung des STTS im NEGRA-Korpus. Tech. Rep. University of Potsdam

  • H. Uszkoreit T. Brants B. Krenn (Eds) (1999) Proceedings of the Workshop on Linguistically Interpreted Corpora LINC-99 Bergen Norway

    Google Scholar 

  • Voorman H. (2002). TIGERin – Graphische Eingabe von Suchanfragen in TIGERSearch. Diploma Thesis. IMS, University of Stuttgart

  • W. Wahlster (Eds) (2000) Verbmobil: Foundations of Speech-to-Speech Translation Springer Heidelberg, Germany

    Google Scholar 

  • Zinsmeister H., Kuhn J., Dipper S. (2002). Utilizing LFG Parses for Treebank Annotation. Proceedings of the LFG-02 Conference , Athens, Greece. Agresti A. (1990). Categorical Data Analysis. John Wiley & Sons

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sabine Brants.

About this article

Cite this article

Brants, S., Dipper, S., Eisenberg, P. et al. TIGER: Linguistic Interpretation of a German Corpus. Res Lang Comput 2, 597–620 (2004). https://doi.org/10.1007/s11168-004-7431-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11168-004-7431-3

Keywords

Navigation