Skip to main content

Rule-based and Statistical Approaches to Morpho-syntactic Tagging of German

  • Conference paper
Intelligent Information Processing and Web Mining

Part of the book series: Advances in Soft Computing ((AINSC,volume 25))

  • 619 Accesses

Abstract

Rule-based and statistical approaches constitute the two leading paradigms in computational linguistics. This paper applies the two types of approaches to the task of assigning morpho-syntactic categories to words in German, a language with rich inflectional morphology. The rule-based approach uses the Xerox Incremental Deep Parsing System and provides a novel constraint-based framework that integrates phrase-internal concord rules and phrase-external syntactic heuristics into one uniform architecture. The statistical approach utilizes the PCFG-parser LoPar which yields acceptable results even for moderate amounts of manually-annotated treebank training data. It is shown that tree transformations constitute a crucial step in weakening the independence assumptions inherent in probabilistic context-free grammars and in optimizing the performance for the task at hand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brants, T. (2000) TnT — A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Conference on Applied Natural Language Processing, (ANLP 2000).

    Google Scholar 

  2. Brill, E. (1992) A Simple Rule-Based Part of Speech Tagger. Proceedings of the 3rd Conference on Applied Natural Language Processing, (ANLP 1992), 112–116.

    Google Scholar 

  3. Ait-Mokhtar, S., Chanod, J.-P., Roux, C. (2002) Robustness beyond Shallowness: Incremental Deep Parsing. Natural Language Engineering. 8, 121–144.

    Article  Google Scholar 

  4. Church, K. (1988) A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text. In Proceedings of the Second ACL Conference on Applied Natural Language Processing (ANLP 1998), 136–143.

    Google Scholar 

  5. Daelemans, W., Zavrel, J., Berck, P., Gillis, S. (1996) MBT: A Memory-Based Part of Speech Tagger-Generator. In: Ejerhed, E., Dagan, I. eds. Proceedings of the Fourth Workshop on Very Large Corpora, 14–27.

    Google Scholar 

  6. Dienes, P., Oravecz, C. (2000) Bottom-Up Tagset Design from Maximally Reduced Tagset. In: International Conference on Computational Linguistics (COLING 2000 ), 42–47.

    Google Scholar 

  7. Greene, B., Rubin, G. (1971) Automatic Grammatical Tagging of English. Technical Report, Brown University, Providence, RI.

    Google Scholar 

  8. Hajic, J., Hladkä, B. (1998) Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL 1998)483–490.

    Google Scholar 

  9. Hinrichs, E., Bartels, J., Kawata, Y., Kordoni, V., Telljohann H. (2002) The Verbmobil Treebanks. In 5. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2000)107–112.

    Google Scholar 

  10. Hinrichs, E., Trushkina J. (2003) N-gram and PCFG Models for Morpho-Syntactic Tagging of German. In Proceedings of the Second International Workshop on Treebanks and Linguistic Theories (TLT 2003), Växjö University Press, 81–92.

    Google Scholar 

  11. Hinrichs, E., Trushkina J. (to appear) Forging Agreement: Morphological Disambiguation of Noun Phrases. To appear in: Journal of Language and Computation.

    Google Scholar 

  12. Johnson, M. (1998) PCFG Models of Linguistic Tree Representation. In Computational Linguistics 24(4), 613–632.

    Google Scholar 

  13. Klein D., Manning C. (2003) Accurate Unlexicalized Parsing. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003)423–430.

    Google Scholar 

  14. Mel’cuk, I. (1987) Dependency Syntax: Theory and Practice. State University of New York Press, Albany.

    Google Scholar 

  15. Ratnaparkhi, A. (1996) A maximum entropy model for part-of-speech tagging. In: Proceedings of the First Conference on Empirical Methods in Computational Linguistics (EMNLP 1996)133–142.

    Google Scholar 

  16. Schiller, A., Teufel, S., Thielen, C. Guidelines für das Tagging deutscher Textkorpora mit STTS. Universität Stuttgart and Universität Tübingen.

    Google Scholar 

  17. Schmid, H. (1994) Probabilistic Part-of-speech Tagging Using Decision Trees. In: Proceedings of the International Conference on New Methods in Language Processing.

    Google Scholar 

  18. Schmid, H. (2000) LoPar: Design and Implementation. Arbeitspapiere des Sonderforschungsbereiches 340, No. 149, IMS Stuttgart.

    Google Scholar 

  19. taz — die Tageszeitung (CD-ROM). (1999) September 1986 — May 1999. www.taz.de.

    Google Scholar 

  20. Tesnière, L. (1959) Éléments de Syntaxe Structurale. Librarie C. Klincksieck, Paris.

    Google Scholar 

  21. Tufi§, D. (2000) Using a Large Set of EAGLES-compliant Morpho-Syntactic Descriptors as a Tagset for Probabilistic Tagging. In: International Conference on Language Resources and Evaluation (LREC 2000), 1105–1112.

    Google Scholar 

  22. Voutilainen, A. (1995) Morphological Disambiguation. In: Karlsson, F., Voutilainen, A.,Heikkilä, J., Anttila, A. eds., Constraint Grammar, 165–285. Mouton de Gruyter. Berlin.

    Google Scholar 

  23. Zwicky, A. (1986) German Adjective Agreement in GPSG. Linguistics24.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hinrichs, E.W., Trushkina, J.S. (2004). Rule-based and Statistical Approaches to Morpho-syntactic Tagging of German. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds) Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol 25. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39985-8_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39985-8_57

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21331-4

  • Online ISBN: 978-3-540-39985-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics