Skip to main content

Experiments in German Treebank Parsing

  • Conference paper
Text, Speech and Dialogue (TSD 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2807))

Included in the following conference series:

Abstract

State-of-the-art parsers for English are based on probabilistic grammars induced from treebanks. German, a language with a free-er word order and a richer morphology, is likely to pose new problems to techniques developed especially for English. This paper describes one of the first parsing experiments using probabilistic context-free grammars extracted from the German treebank NEGRA. Particularly, we introduce “partial-parent encoding” for NEGRA, a variant of the parent-encoding technique. Evaluation shows that grammatical functions, and partial-parent encoding improve the performance of a parser for German. Moreover, we find evidence that extending the size of the treebank as well as adding more linguistic information will improve the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Becker, M., Frank, A.: A stochastic topological parser of German. In: Proc. of COLING 2002, Taipei, Taiwan (2002)

    Google Scholar 

  2. Black, E.: Meeting of interest group on evaluation of broad-coverage grammars of English. LINGUIST List 3.587, http://www.linguistlist.org/issues/3/3-587.html

  3. Brants, T., Skut, W., Krenn, B.: Tagging Grammtical Functions. In: Proc. of EMNLP 1997, Providence, RI, USA (1997)

    Google Scholar 

  4. Brants, T.: Cascaded Markov Models. In: Proc. of EACL 1999, Bergen (1999)

    Google Scholar 

  5. Charniak, E.: Tree-bank grammars. Technical Report CS-96-02, Brown University (1996)

    Google Scholar 

  6. Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of ACL 1997, Madrid (1997)

    Google Scholar 

  7. Crysmann, B., et al.: An integrated architecture for shallow and deep processing. In: Proc. Of ACL 2002, UPenn, PA (2002)

    Google Scholar 

  8. Johnson, M.: PCFG models of linguistic tree representations. Computational Linguistics 24(4) (1998)

    Google Scholar 

  9. Kübler, S., Hinrichs, E.W.: From chunks to function-argument structure: a similarity-based approach. In: Proc. of ACL 2001, Toulouse, France (2001)

    Google Scholar 

  10. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: the Penn Treebank. In: Proc. of the ARPA Human Language Technology Workshop (1993)

    Google Scholar 

  11. Neumann, G., Piskorski, V.: A shallow text processing core engine. Journal of Computational Intelligence (2002)

    Google Scholar 

  12. PARSEVAL. Beyond PARSEVAL - Towards Improved Evaluation Measures for Parsing Systems.Workshop of the Third LREC Conference (June 2, 2002)

    Google Scholar 

  13. Riezler, S., Prescher, D., Kuhn, J., Johnson, M.: Lexicalized stochastic modeling of constraintbased grammars using log-linear measures and EM training. In: Proc. of ACL 2000 (2000)

    Google Scholar 

  14. Schmid, H.: LoPar. Design and Implementation. Insitut für Maschinelle Sprachverarbeitung, Universität Stuttgart (1999)

    Google Scholar 

  15. Sabine, S.V.: Robust German noun chunking with a probabilistic context-free grammar. In: Proc. of COLING 2000 (2000)

    Google Scholar 

  16. Sekine, S.: Scoring code for the Wall Street Journal Penn treebank (1999), http://www.research.att.com/~mcollins/

  17. Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proc. of ANLP 1997, Washington, DC (1997)

    Google Scholar 

  18. Thielen, C., Schiller, A.: Ein kleines und erweitertes Tagset fürs Deutsche. In: Tagungsberichte des Arbeitstreffens Lexikon und Text, Tübingen. Lexicographica Series Maior, Niemeier (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fissaha, S., Olejnik, D., Kornberger, R., Müller, K., Prescher, D. (2003). Experiments in German Treebank Parsing. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39398-6_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20024-6

  • Online ISBN: 978-3-540-39398-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics