German Decompounding in a Difficult Corpus

  • Enrique Alfonseca
  • Slaven Bilac
  • Stefan Pharies
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4919)

Abstract

Splitting compound words has proved to be useful in areas such as Machine Translation, Speech Recognition or Information Retrieval (IR). In the case of IR systems, they usually have to cope with noisy data, as user queries are usually written quickly and submitted without review. This work attempts at improving the current approaches for German decompounding when applied to query keywords. The results show an increase of more than 10% in accuracy compared to other state-of-the-art methods.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baroni, M., Matiasek, J., Trost, H.: Predicting the Components of German Nominal Compounds. In: Proceedings of ECAI (2002)Google Scholar
  2. 2.
    Schiller, A.: German compound analysis with wfsc. In: Proceedings of Finite State Methods and Natural Language Processing 2005, Helsinki (2005)Google Scholar
  3. 3.
    Larson, M., Willett, D., Köhler, J., Rigoll, G.: Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches. In: Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP) (2000)Google Scholar
  4. 4.
    Braschler, M., Göhring, A., Schäuble, P.: Eurospider at CLEF 2002. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 127–132. Springer, Heidelberg (2003)Google Scholar
  5. 5.
    Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Brown, R.: Adding Linguistic Knowledge to a Lexical Example-Based Translation System. In: Proceedings of the Eighth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 1999), pp. 22–32 (1999)Google Scholar
  7. 7.
    Brown, R.: Corpus-driven splitting of compound words. In: Proceedings of the Ninth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-2002) (2002)Google Scholar
  8. 8.
    Koehn, P., Knight, K.: Empirical methods for compound splitting. In: Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, vol. 1, pp. 187–193 (2003)Google Scholar
  9. 9.
    Adda-Decker, M., Adda, G., Lamel, L.: Investigating text normalization and pronunciation variants for German broadcast transcription. In: Proceedings of ICSLP, pp. 66–269 (2000)Google Scholar
  10. 10.
    Marek, T.: Analysis of german compounds using weighted finite state transducers. Technical report, BA Thesis, Universität Tbingen (2006)Google Scholar
  11. 11.
    Finkler, W., Neumann, G.: Morphix. A fast realization of a classification-based approach to morphology. In: 4. Osterreichische Artificial-Intelligence-Tagung, Wiener Workshop-Wissensbasierte Sprachverarbeitung (1998)Google Scholar
  12. 12.
    Rackow, U., Dagan, I., Schwall, U.: Automatic translation of noun compounds. In: Proceedings of COLING-1992 (1992)Google Scholar
  13. 13.
    Demberg, V.: A language-independent unsupervised model for morphological segmentation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic (2007)Google Scholar
  14. 14.
    Langer, S.: Zur Morphologie und Semantik von Nominalkomposita. In: Tagungsband der 4. Konferenz zur Verarbeitung naturlicher Sprache (KONVENS) (1998)Google Scholar
  15. 15.
    Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Enrique Alfonseca
    • 1
  • Slaven Bilac
    • 1
  • Stefan Pharies
    • 1
  1. 1.Google, Inc. 

Personalised recommendations