Abstract
State-of-the-art parsers for English are based on probabilistic grammars induced from treebanks. German, a language with a free-er word order and a richer morphology, is likely to pose new problems to techniques developed especially for English. This paper describes one of the first parsing experiments using probabilistic context-free grammars extracted from the German treebank NEGRA. Particularly, we introduce “partial-parent encoding” for NEGRA, a variant of the parent-encoding technique. Evaluation shows that grammatical functions, and partial-parent encoding improve the performance of a parser for German. Moreover, we find evidence that extending the size of the treebank as well as adding more linguistic information will improve the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Becker, M., Frank, A.: A stochastic topological parser of German. In: Proc. of COLING 2002, Taipei, Taiwan (2002)
Black, E.: Meeting of interest group on evaluation of broad-coverage grammars of English. LINGUIST List 3.587, http://www.linguistlist.org/issues/3/3-587.html
Brants, T., Skut, W., Krenn, B.: Tagging Grammtical Functions. In: Proc. of EMNLP 1997, Providence, RI, USA (1997)
Brants, T.: Cascaded Markov Models. In: Proc. of EACL 1999, Bergen (1999)
Charniak, E.: Tree-bank grammars. Technical Report CS-96-02, Brown University (1996)
Collins, M.: Three generative, lexicalised models for statistical parsing. In: Proceedings of ACL 1997, Madrid (1997)
Crysmann, B., et al.: An integrated architecture for shallow and deep processing. In: Proc. Of ACL 2002, UPenn, PA (2002)
Johnson, M.: PCFG models of linguistic tree representations. Computational Linguistics 24(4) (1998)
Kübler, S., Hinrichs, E.W.: From chunks to function-argument structure: a similarity-based approach. In: Proc. of ACL 2001, Toulouse, France (2001)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: the Penn Treebank. In: Proc. of the ARPA Human Language Technology Workshop (1993)
Neumann, G., Piskorski, V.: A shallow text processing core engine. Journal of Computational Intelligence (2002)
PARSEVAL. Beyond PARSEVAL - Towards Improved Evaluation Measures for Parsing Systems.Workshop of the Third LREC Conference (June 2, 2002)
Riezler, S., Prescher, D., Kuhn, J., Johnson, M.: Lexicalized stochastic modeling of constraintbased grammars using log-linear measures and EM training. In: Proc. of ACL 2000 (2000)
Schmid, H.: LoPar. Design and Implementation. Insitut für Maschinelle Sprachverarbeitung, Universität Stuttgart (1999)
Sabine, S.V.: Robust German noun chunking with a probabilistic context-free grammar. In: Proc. of COLING 2000 (2000)
Sekine, S.: Scoring code for the Wall Street Journal Penn treebank (1999), http://www.research.att.com/~mcollins/
Skut, W., Krenn, B., Brants, T., Uszkoreit, H.: An annotation scheme for free word order languages. In: Proc. of ANLP 1997, Washington, DC (1997)
Thielen, C., Schiller, A.: Ein kleines und erweitertes Tagset fürs Deutsche. In: Tagungsberichte des Arbeitstreffens Lexikon und Text, Tübingen. Lexicographica Series Maior, Niemeier (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fissaha, S., Olejnik, D., Kornberger, R., Müller, K., Prescher, D. (2003). Experiments in German Treebank Parsing. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2003. Lecture Notes in Computer Science(), vol 2807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39398-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-39398-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20024-6
Online ISBN: 978-3-540-39398-6
eBook Packages: Springer Book Archive