Stemming and Decompounding for German Text Retrieval

  • Martin Braschler
  • Bärbel Ripplinger
Conference paper

DOI: 10.1007/3-540-36618-0_13

Part of the Lecture Notes in Computer Science book series (LNCS, volume 2633)
Cite this paper as:
Braschler M., Ripplinger B. (2003) Stemming and Decompounding for German Text Retrieval. In: Sebastiani F. (eds) Advances in Information Retrieval. ECIR 2003. Lecture Notes in Computer Science, vol 2633. Springer, Berlin, Heidelberg

Abstract

The stemming problem, i.e. finding a common stem for different forms of a term, has been extensively studied for English, but considerably less is known for other languages. Previously, it has been claimed that stemming is essential for highly declensional languages. We report on our experiments on stemming for German, where an additional issue is the handling of compounds, which are formed by concatenating several words. Rarely do studies on stemming for any language cover more than one or two different approaches. This paper makes a major contribution that transcends its focus on German by investigating a complete spectrum of approaches, ranging from language-independent to elaborate linguistic methods. The main findings are that stemming is beneficial even when using a simple approach, and that carefully designed decompounding, the splitting of compound words, remarkably boosts performance. All findings are based on a thorough analysis using a large reliable test collection.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Martin Braschler
    • 1
    • 2
  • Bärbel Ripplinger
    • 1
  1. 1.Eurospider Information Technology AGZürichSwitzerland
  2. 2.Institut Interfacultaire d’InformatiqueUniversité de NeuchâtelNeuchâtelSwitzerland

Personalised recommendations