TAGH: A Complete Morphology for German Based on Weighted Finite State Automata

  • Alexander Geyken
  • Thomas Hanneforth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4002)

Abstract

TAGH is a system for automatic recognition of German word forms. It is based on a stem lexicon with allomorphs and a concatenative mechanism for inflection and word formation. Weighted FSA and a cost function are used in order to determine the correct segmentation of complex forms: the correct segmentation for a given compound is supposed to be the one with the least cost. TAGH is based on a large stem lexicon of almost 80.000 stems that was compiled within 5 years on the basis of large newspaper corpora and literary texts. The number of analyzable word forms is increased considerably by more than 1000 different rules for derivational and compositional word formation. The recognition rate of TAGH is more than 99% for modern newspaper text and approximately 98.5% for literary texts.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Augst, G.: Lexikon zur Wortbildung. Morpheminventar Bd. 1-3. Tübingen (1975)Google Scholar
  2. 2.
    Cormen, T.H., Leiserson, C.L., Rivest, R.L., Stein, C.: Introduction to Algorithms. MIT Press, Cambridge (2001)MATHGoogle Scholar
  3. 3.
    Courtois, B.: Dictionnaires électroniques DELAF anglais et français. In: Leclère, C., Laporte, E., Piot, M., Silberztein, M. (eds.) Syntax, Lexis and Lexicon-Grammar. Papers in honour of Maurice Gross, Lingvisticae Investigationes Supplementa 24, pp. 113–125. Benjamins, Amsterdam-Philadelphia (2004)CrossRefGoogle Scholar
  4. 4.
    Geyken, A., Schrader, N.: LexikoNet - a lexical database based on type and role hierarchies. Technical Report BBAW/DWDS, Berlin (2005)Google Scholar
  5. 5.
    Golan, J.S.: Semirings and Their Applications. Kluwer, Dordrecht (1999)CrossRefMATHGoogle Scholar
  6. 6.
    Haapalainen, M., Majorin, A.: Gertwol: Ein System zur automatischen Wortformerkennung deutscher Wörter. Lingsoft, Inc. (1994)Google Scholar
  7. 7.
    Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading (1979)MATHGoogle Scholar
  8. 8.
    Kaplan, R.M., Kay, M.: Regular Models of Phonological Rule Systems. Computational Linguistics 20(3), 331–378 (1994)Google Scholar
  9. 9.
    Karttunen, L.: Constructing Lexical Transducers. In: Proceedings of the Fifteenth International Conference on Computational Linguistics. Coling I 1994, Kyoto, Japan, pp. 406–411 (1994)Google Scholar
  10. 10.
    Klappenbach, R., Steinitz, W. (eds.): Wörterbuch der deutschen Gegenwartssprache (WDG). Akademie Verlag (1977)Google Scholar
  11. 11.
    Mohri, M.: Semiring Frameworks and Algorithms for Shortest-Distance Problems. Journal of Automata, Language, and Combinatorics 7(3), 321–350 (2002)MathSciNetMATHGoogle Scholar
  12. 12.
    Pustejovsky, J., Hanks, P., Rumshisky, A.: Automated Induction of Sense in Context. In: 5th International Workshop on Linguistically Interpreted Corpora (LINC 2004), Coling (2004)Google Scholar
  13. 13.
    Riley, M.: The Design Principles of a Weighted Finite-State Transducer Library. Theoretical Computer Science 231, 17–32 (2000)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Sproat, R.: Finite-State Methods in Morphology, Text Analysis and the Analysis of Writing Systems. In: ROCLING X (1997)Google Scholar
  15. 15.
    Volk, M.: Choosing the right lemma when analysing German nouns. In: Multilinguale Corpora: Codierung, Strukturierung, Analyse, Jahrestagung der GLDV 11, Frankfurt, pp. 304–310 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Alexander Geyken
    • 1
  • Thomas Hanneforth
    • 2
  1. 1.Berlin-Brandenburg Academy of SciencesGermany
  2. 2.University of PotsdamGermany

Personalised recommendations