German Compound Analysis with wfsc
Compounding is a very productive process in German to form complex nouns and adjectives which represent about 7% of the words of a newspaper text. Unlike English, German compounds do not contain spaces or other word boundaries, and the automatic analysis is often ambiguous. A (non-weighted) finite-state morphological analyzer provides all potential segmentations for a compound without any filtering or prioritization of the results.
The paper presents an experiment in analyzing German compounds with the Xerox Weighted Finite-State Compiler (wfsc). The model is based on weights for compound segments and gives priority (a) to compounds with the minimal number of segments and (b) to compound segments with the highest frequency in a training list. The results with this rather simple model will show the advantage of using weighted finite-state transducers over simple FSTs.
KeywordsRegular Expression Word Boundary Human Reader Feminine Noun Masculine Noun
Unable to display preview. Download preview PDF.
- 1.Baroni, M., Matiasek, J., Trost, H.: Predicting the Components of German Nominal Compounds. In: Proceedings of ECAI-2002, pp. 470–474. IOS Press, Amsterdam (2002)Google Scholar
- 2.Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Studies in Computational Linguistics. CSLI Publications (2003)Google Scholar
- 4.Kempe, A.: NLP Applications based on weighted multi tape automata. In: Proceedings of 11th Conference TALN, Fes, Morocco, April 19–22 (2004)Google Scholar
- 6.Koehn, P., Knight, K.: Empirical Methods for Compound Splitting. In: Proceedings of ECAI 2003, Budapest, Hungary (2003)Google Scholar
- 7.Mohri, M., Pereira, F., Riley, M.: Weighted Automata in Text and Speech Processing. In: Proceedings ECAI 1996, Workshop on Extended finite state models of language, Budapest, Hungary (1996)Google Scholar
- 9.Rackow, U., Dagan, I., Schwall, U.: Automatic Translation of Noun Compounds. In: Proceedings of COLING 1992, Nantes (1992)Google Scholar
- 10.Schiller, A.: Xerox Finite-State Morphological Analyzer for German (2004), On-line demo: http://www.xrce.xerox.com/competencies/content-analysis/demos/german