Compounding is a very productive process in German to form complex nouns and adjectives which represent about 7% of the words of a newspaper text. Unlike English, German compounds do not contain spaces or other word boundaries, and the automatic analysis is often ambiguous. A (non-weighted) finite-state morphological analyzer provides all potential segmentations for a compound without any filtering or prioritization of the results.

The paper presents an experiment in analyzing German compounds with the Xerox Weighted Finite-State Compiler (wfsc). The model is based on weights for compound segments and gives priority (a) to compounds with the minimal number of segments and (b) to compound segments with the highest frequency in a training list. The results with this rather simple model will show the advantage of using weighted finite-state transducers over simple FSTs.


Regular Expression Word Boundary Human Reader Feminine Noun Masculine Noun 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baroni, M., Matiasek, J., Trost, H.: Predicting the Components of German Nominal Compounds. In: Proceedings of ECAI-2002, pp. 470–474. IOS Press, Amsterdam (2002)Google Scholar
  2. 2.
    Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Studies in Computational Linguistics. CSLI Publications (2003)Google Scholar
  3. 3.
    Karttunen, L.: Applications of Finite-State Transducers in Natural Language Processing. In: Yu, S., Păun, A. (eds.) CIAA 2000. LNCS, vol. 2088, p. 34. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  4. 4.
    Kempe, A.: NLP Applications based on weighted multi tape automata. In: Proceedings of 11th Conference TALN, Fes, Morocco, April 19–22 (2004)Google Scholar
  5. 5.
    Kempe, A., Baeijs, C., Gaál, T., Guingne, F., Nicart, F.: WFSC - A new weighted finite state compiler. In: H. Ibarra, O., Dang, Z. (eds.) CIAA 2003. LNCS, vol. 2759, pp. 108–119. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Koehn, P., Knight, K.: Empirical Methods for Compound Splitting. In: Proceedings of ECAI 2003, Budapest, Hungary (2003)Google Scholar
  7. 7.
    Mohri, M., Pereira, F., Riley, M.: Weighted Automata in Text and Speech Processing. In: Proceedings ECAI 1996, Workshop on Extended finite state models of language, Budapest, Hungary (1996)Google Scholar
  8. 8.
    Monz, C., de Rijke, M.: Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, p. 262. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Rackow, U., Dagan, I., Schwall, U.: Automatic Translation of Noun Compounds. In: Proceedings of COLING 1992, Nantes (1992)Google Scholar
  10. 10.
    Schiller, A.: Xerox Finite-State Morphological Analyzer for German (2004), On-line demo:

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Anne Schiller
    • 1
  1. 1.Xerox Research Centre EuropeMeylanFrance

Personalised recommendations