Advertisement

Research on Language and Computation

, Volume 2, Issue 4, pp 597–620 | Cite as

TIGER: Linguistic Interpretation of a German Corpus

  • Sabine Brants
  • Stefanie Dipper
  • Peter Eisenberg
  • Silvia Hansen-Schirra
  • Esther König
  • Wolfgang Lezius
  • Christian Rohrer
  • George Smith
  • Hans Uszkoreit
Article

Abstract

This paper reports on the TIGER Treebank, a corpus of currently 40,000 syntactically annotated German newspaper sentences. We describe what kind of information is encoded in the treebank and introduce the different representation formats that are used for the annotation and exploitation of the treebank. We explain the different methods used for the annotation: interactive annotation, using the tool ANNOTATE, and LFG parsing. Furthermore, we give an account of the annotation scheme used for the TIGER treebank. This scheme is an extended and improved version of the NEGRA annotation scheme and we illustrate in detail the linguistic extensions that were made concerning the annotation in the TIGER project. The main differences are concerned with coordination, verb-subcategorization, expletives as well as proper nouns. In addition, the paper also presents the query tool TIGERSearch that was developed in the project to exploit the treebank in an adequate way. We describe the query language which was designed to facilitate a simple formulation of complex queries; furthermore, we shortly introduce TIGER in, a graphical user interface for query input. The paper concludes with a summary and some directions for future work.

Keywords

annotation German treebank 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abeillé A., Brants T., Uszkoreit H. (eds). (2000a.). Proceedings of the COLING-2000 Post-Conference Workshop on Linguistically Interpreted Corpora LINC-2000 , LuxembourgGoogle Scholar
  2. Abeillé A., Clement L., Kinyon A. (eds). (2000b). Building a Treebank for French. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 pp. 87–94. Athens, GreeceGoogle Scholar
  3. Abeillé A., Hansen-Schirra S., Uszkoreit H. (eds). (2003). Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) , BudapestGoogle Scholar
  4. Boguslavsky I., Grigorieva S., Grigoriev N., Kreidlin L., Frid N. (2000). Dependency Treebank for Russian: Concept, Tools, Types of Information. Proceedings of the 18th International Conference on Computational Linguistics COLING-2000 , Saarbrücken, GermanyGoogle Scholar
  5. Bosco C., Lombardo V., Vassallo D., Lesmo L. (2000). Building a Treebank for Italian: A datadriven annotation schema. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 , pp. 99–106, Athens, GreeceGoogle Scholar
  6. Brants T. (1997). The NEGRA Export Format. CLAUS Report No. 98. Saarbrücken, Germany : Dept. of Computational Linguistics, Saarland UniversityGoogle Scholar
  7. Brants T. (1999). Tagging and Parsing with Cascaded Markov Models – automation of corpus annotation. Saarbrücken, Germany: German Research Center for Artificial Intelligence and Saarland University: Saarbrücken Dissertations in Computational Linguistics and Language Technology, Vol. 6Google Scholar
  8. Brants T. (2000a). TnT A Statistical Part-of-Speech Tagger. Proceedings of the Sixth Conference on Applied Natural Language Processing ANLP-2000. Seattle, WAGoogle Scholar
  9. Brants T. (2000b). Inter-annotator agreement for a German newspaper corpus. Proceedings of Second International Conference on Language Resources and Evaluation LREC-2000. Athens, GreeceGoogle Scholar
  10. Brants S., Hansen S. (2002). Developments in the TIGER Annotation Scheme and their Realization in the Corpus. Proceedings of the Third Conference on Language Resources and Evaluation LREC-2002 , pp. 1643–1649, Las Palmas de Gran Canaria, SpainGoogle Scholar
  11. Brants, T., Hendriks, R., Kramp, S., Krenn, B., Preis, C., Skut, W., Uszkoreit, H. 1999Das NEGRA-Annotationsschema. (Tech Rep.). SaarbrückenDept. of Computational LinguisticsGermanyGoogle Scholar
  12. Brants T., Skut W. (1998). Automation of treebank annotation. Proceedings of New Methods in Language Processing NeMLaP-98. Sydney, AustraliaGoogle Scholar
  13. Brants T., Skut W., Uszkoreit H. (1999a). Syntactic Annotation of a German Newspaper Corpus. Proceedings of the ATALA Treebank Workshop , pp. 69–76, Paris, FranceGoogle Scholar
  14. Bresnan J. ed. (1982). The Mental Representation of Grammatical Relations. MIT PressGoogle Scholar
  15. Dipper S. (2000). Grammar-based Corpus Annotation. Proceedings of the COLING-2000 Post-Conference Workshop on Linguistically Interpreted Corpora LINC-2000 , pp. 56–64, LuxembourgGoogle Scholar
  16. Dipper S. (2003). Implementing and Documenting Large-Scale Grammars – German LFG. Doctoral Dissertation, University of Stuttgart. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeittung (AIMS), Vol. 9(1)Google Scholar
  17. Frank A., King T. H., Kuhn J., Maxwell J. (1998). Optimality Theory Style Constraint Ranking in Large-scale LFG Grammars. Proceedings of the LFG-98 Conference. Brisbane, Australia: CSLI Online Publications, http://www-csli.stanford.edu/publicationsGoogle Scholar
  18. Greenbaum, S. eds. 1996Comparing English worldwide: The International Corpus of EnglishClarendon PressOxford, UKGoogle Scholar
  19. Hajic, J. 1999Building a Syntactically Annotated corpus: The Prague Dependency TreebankHajicova, E. eds. Issues of Valency and meaning. Studies in honour of Jarmila PanevovaCharles University PressPrague, Czech RepublicGoogle Scholar
  20. King T.H., Dipper S., Frank A., Kuhn J., Maxwell J. To Appear. Ambiguity Management in Grammar Writing. Journal of Language and Computation.Google Scholar
  21. König E., Lezius W. (2000a). A description Language for syntactically annotated corpora. Proceedings of the 18th International Conference on Computational Linguistics COLING-2000 pp. 1056–1060, Saarbrücken, GermanyGoogle Scholar
  22. König, E., Lezius W., Voormann H. (2003). TIGERSearch User’s Manual. IMS, University of Stuttgart. (http://www.tigersearch.de)Google Scholar
  23. König E., Lezius W. (2003). The TIGER language – A Description Language for Syntax Graphs, Formal Definition. Tech. Rep. IMS, University of Stuttgart. (http://www.ims.uni–stuttgart.de/projekte/corplex/paper/lezius/tigerLangForm.ps.gz)Google Scholar
  24. Kramp S., Preis C. (2000). Konventionen fur die Verwendung des STTS im NEGRA-Korpus. Tech. Rep. Saarbrücken, Germany, Department of Computational Linguistics, Saarland UniversityGoogle Scholar
  25. Leech G. (1992). The Lancaster Parsed Corpus. ICAME Journal, 16(124)Google Scholar
  26. Lezius, W. 2001BaumbankenCarstensen, K.-U.Ebert, C.Endriss, C.Jekat, S.Klabunde, R.Langer, H eds. Computerlinguistik und Sprachtechnologie – eine EinführungSpektrum Akademischer VerlagHeidelberg, Germany377385Google Scholar
  27. Diss Lezius W. (2002). Ein Werkzeug zur Suche auf syntaktisch annotierten Textkorpora. PhD Thesis. IMS, University of StuttgartGoogle Scholar
  28. Lezius W., Biesinger H., Gerstenberger C. (2002a). TIGER-XML Quick Reference Guide. Tech. Rep. IMS, University of StuttgartGoogle Scholar
  29. Lezius W., Biesinger H., Gerstenberger C. (2002b). TIGERRegistry Manual. Tech. Rep. IMS, University of StuttgartGoogle Scholar
  30. Lezius W., König E. (2000). Towards a Search Engine for Syntactically Annotated corpora. Proceedings of the Fifth KONVENS Conference, Ilmenau, GermanyGoogle Scholar
  31. Marcus M., Kim G., Marcinkiewicz M., MacIntyre R., Bies A., Gerguson M., Katz K., Schasberger B. (1994). The Penn Treebank: Annotating predicate Argument structure. Proceedings of the ARPA Human Language Technology Workshop , Morgan Kaufman, San Francisco, CAGoogle Scholar
  32. Mengel A., Lezius W. (2000). An XML-based encoding format for syntactically annotated corpora. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 , pp. 121–126, Athens, GreeceGoogle Scholar
  33. Moreno A., Grishman R., López S., Sánchez F., Sekine S. (2000). A Treebank of Spanish and its Application to Parsing. Proceedings of the Second International Conference on Language Resources and Evaluation LREC-2000 , pp. 107–112, Athens, GreeceGoogle Scholar
  34. Oflazer K., Hakkani-Tür D., Tür G. (1999). Design for a Turkish treebank. Proceedings of the Workshop on Linguistically Interpreted Corpora LINC-99, Bergen, NorwayGoogle Scholar
  35. ParGram. (2002). The ParGram Project. (URL: http://www2.parc.com/istl/groups/nltt/pargram/)Google Scholar
  36. Plaehn O., Brants T. (2000). ANNOTATE – An Efficient Interactive Annotation tool. Proceedings of the Sixth Conference on Applied Natural Language Processing ANLP-2000, Seattle, WAGoogle Scholar
  37. Riezler S., King T.H., Kaplan R., Crouch R., Maxwell J., Johnson M. (2002). Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques. Proceedings of the ACL-02 , Philadephia, PAGoogle Scholar
  38. Sampson, G. 1995English for the ComputerThe SUSANNE Corpus and Analytic Scheme.Clarendon PressOxford, UKGoogle Scholar
  39. Schiller A., Teufel S., Stöckert C. (1999). Guidelines fur das Tagging deutscher Textcorpora mit STTS. Tech. Rep. University of Stuttgart, University of Tübingen.Schrader 2001 ]Google Scholar
  40. Schrader, B. 2001Modifikation einer deutschen LFG-Grammatik für Partial ParsingUniversity of StuttgartStudienarbeitGoogle Scholar
  41. Schuurman I., Schouppe M., Hoekstra H., van der Wouden T. (2003). CGN, An Annotated Corpus of Spoken Dutch. Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03). BudapestGoogle Scholar
  42. Simov K., Osenova P., Slavcheva M., Kolkovska S., Balabanova E., Doikoff D., Ivanova K., Simov A., Kouylekov M. (2002). Building a Linguistically Interpreted Corpus of Bulgarian: the BulTreeBank. Proceedings of Third International Conference on Language Resources and Evaluation LREC-2002 , pp.1729–1736. Las Palmas de Gran Canaria, SpainGoogle Scholar
  43. Skut W., Brants T., Krenn B., Uszkoreit H. (1998). A Linguistically Interpreted Corpus of German Newspaper Text. Proceedings of the Conference on Language Resources and Evaluation LREC-98 , pp. 705–711. Granada, SpainGoogle Scholar
  44. Skut W., Krenn B., Brants T., Uszkoreit H. (1997). An Annotation Scheme for Free Word Order Languages. Proceedings of the Conference on Applied Natural Language Processing ANLP-97, Washington, DCGoogle Scholar
  45. Smith, G. 2000Encoding thematic roles via syntactic categories in a German treebank. Proceedings of the Workshop on Syntactic Annotation of Electronic CorporaTübingenGermanyGoogle Scholar
  46. Smith G., Eisenberg P. (2000). Kommentare zur Verwendung des STTS im NEGRA-Korpus. Tech. Rep. University of PotsdamGoogle Scholar
  47. Uszkoreit, H.Brants, T.Krenn, B. eds. 1999Proceedings of the Workshop on Linguistically Interpreted Corpora LINC-99BergenNorwayGoogle Scholar
  48. Voorman H. (2002). TIGERin – Graphische Eingabe von Suchanfragen in TIGERSearch. Diploma Thesis. IMS, University of StuttgartGoogle Scholar
  49. Wahlster, W. eds. 2000Verbmobil: Foundations of Speech-to-Speech TranslationSpringerHeidelberg, GermanyGoogle Scholar
  50. Zinsmeister H., Kuhn J., Dipper S. (2002). Utilizing LFG Parses for Treebank Annotation. Proceedings of the LFG-02 Conference , Athens, Greece. Agresti A. (1990). Categorical Data Analysis. John Wiley & SonsGoogle Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  • Sabine Brants
    • 1
  • Stefanie Dipper
    • 2
  • Peter Eisenberg
    • 3
  • Silvia Hansen-Schirra
    • 1
  • Esther König
    • 2
  • Wolfgang Lezius
    • 2
  • Christian Rohrer
    • 2
  • George Smith
    • 3
  • Hans Uszkoreit
    • 1
  1. 1.Computational LinguisticsSaarland UniversityGermany
  2. 2.Institute of Natural Language Processing (IMS)Stuttgart UniversityGermany
  3. 3.Institut für GermanistikPotsdam UniversityGermany

Personalised recommendations