Abstract
Rule-based and statistical approaches constitute the two leading paradigms in computational linguistics. This paper applies the two types of approaches to the task of assigning morpho-syntactic categories to words in German, a language with rich inflectional morphology. The rule-based approach uses the Xerox Incremental Deep Parsing System and provides a novel constraint-based framework that integrates phrase-internal concord rules and phrase-external syntactic heuristics into one uniform architecture. The statistical approach utilizes the PCFG-parser LoPar which yields acceptable results even for moderate amounts of manually-annotated treebank training data. It is shown that tree transformations constitute a crucial step in weakening the independence assumptions inherent in probabilistic context-free grammars and in optimizing the performance for the task at hand.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brants, T. (2000) TnT — A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Conference on Applied Natural Language Processing, (ANLP 2000).
Brill, E. (1992) A Simple Rule-Based Part of Speech Tagger. Proceedings of the 3rd Conference on Applied Natural Language Processing, (ANLP 1992), 112–116.
Ait-Mokhtar, S., Chanod, J.-P., Roux, C. (2002) Robustness beyond Shallowness: Incremental Deep Parsing. Natural Language Engineering. 8, 121–144.
Church, K. (1988) A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the Second ACL Conference on Applied Natural Language Processing (ANLP 1998), 136–143.
Daelemans, W., Zavrel, J., Berck, P., Gillis, S. (1996) MBT: A Memory-Based Part of Speech Tagger-Generator. In: Ejerhed, E., Dagan, I. eds. Proceedings of the Fourth Workshop on Very Large Corpora, 14–27.
Dienes, P., Oravecz, C. (2000) Bottom-Up Tagset Design from Maximally Reduced Tagset. In: International Conference on Computational Linguistics (COLING 2000 ), 42–47.
Greene, B., Rubin, G. (1971) Automatic Grammatical Tagging of English. Technical Report, Brown University, Providence, RI.
Hajic, J., Hladkä, B. (1998) Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL 1998)483–490.
Hinrichs, E., Bartels, J., Kawata, Y., Kordoni, V., Telljohann H. (2002) The Verbmobil Treebanks. In 5. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2000)107–112.
Hinrichs, E., Trushkina J. (2003) N-gram and PCFG Models for Morpho-Syntactic Tagging of German. In Proceedings of the Second International Workshop on Treebanks and Linguistic Theories (TLT 2003), Växjö University Press, 81–92.
Hinrichs, E., Trushkina J. (to appear) Forging Agreement: Morphological Disambiguation of Noun Phrases. To appear in: Journal of Language and Computation.
Johnson, M. (1998) PCFG Models of Linguistic Tree Representation. In Computational Linguistics 24(4), 613–632.
Klein D., Manning C. (2003) Accurate Unlexicalized Parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003)423–430.
Mel’cuk, I. (1987) Dependency Syntax: Theory and Practice. State University of New York Press, Albany.
Ratnaparkhi, A. (1996) A maximum entropy model for part-of-speech tagging. In: Proceedings of the First Conference on Empirical Methods in Computational Linguistics (EMNLP 1996)133–142.
Schiller, A., Teufel, S., Thielen, C. Guidelines für das Tagging deutscher Textkorpora mit STTS. Universität Stuttgart and Universität Tübingen.
Schmid, H. (1994) Probabilistic Part-of-speech Tagging Using Decision Trees. In: Proceedings of the International Conference on New Methods in Language Processing.
Schmid, H. (2000) LoPar: Design and Implementation. Arbeitspapiere des Sonderforschungsbereiches 340, No. 149, IMS Stuttgart.
taz — die Tageszeitung (CD-ROM). (1999) September 1986 — May 1999. www.taz.de.
Tesnière, L. (1959) Éléments de Syntaxe Structurale. Librarie C. Klincksieck, Paris.
Tufi§, D. (2000) Using a Large Set of EAGLES-compliant Morpho-Syntactic Descriptors as a Tagset for Probabilistic Tagging. In: International Conference on Language Resources and Evaluation (LREC 2000), 1105–1112.
Voutilainen, A. (1995) Morphological Disambiguation. In: Karlsson, F., Voutilainen, A.,Heikkilä, J., Anttila, A. eds., Constraint Grammar, 165–285. Mouton de Gruyter. Berlin.
Zwicky, A. (1986) German Adjective Agreement in GPSG. Linguistics24.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hinrichs, E.W., Trushkina, J.S. (2004). Rule-based and Statistical Approaches to Morpho-syntactic Tagging of German. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39985-8_57
Download citation
DOI: https://doi.org/10.1007/978-3-540-39985-8_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21331-4
Online ISBN: 978-3-540-39985-8
eBook Packages: Springer Book Archive