Skip to main content

Semantic Splitting of German Medical Compounds

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

  • 1817 Accesses

Abstract

Compounding is widespread in highly inflectional languages with a quarter of all nouns created by composition. In our field of study, the German medical language, the amount of compounds significantly outnumbers this figure with 64 %. Thus, their correct splitting is a high-impact preprocessing step for any NLP-based application. In this work we address two challenges of medical decomposition: First, we introduce the consideration of unknown constituents in order to split compounds that were not recognized as such so far. Second, our approach builds on the corpus-based approach of Koehn and Knight and adds semantic knowledge from domain ontologies to increase the accuracy during disambiguation of the various split options. Using this first-of-a-kind semantic approach in a study on decomposition of German medical compounds, we outperform the existing approaches by far.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Langer, S.: Zur morphologie und semantik von nominalkomposita. In: Proceedings of KONVENS (1998)

    Google Scholar 

  2. Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for dutch, german, and italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  3. Wilmanns, J.C., Schmitt, G.: Die Medizin und ihre Sprache. Ecomed, Landsberg (2002)

    Google Scholar 

  4. Brown, R.D.: Corpus-driven splitting of compound words. In: Proceedings of TMI (2002)

    Google Scholar 

  5. Braschler, M., Ripplinger, B.: How effective is stemming and decompounding for german text retrieval? Information Retrieval 7, 291–316 (2004)

    Article  MATH  Google Scholar 

  6. Koehn, P., Knight, K.: Empirical methods for compound splitting. In: Proceedings of the EACL (2003)

    Google Scholar 

  7. Stymne, S.: German compounds in factored statistical machine translation. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 464–475. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  8. Popović, M., Stein, D., Ney, H.: Statistical machine translation of german compound words. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 616–624. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Fritzinger, F., Fraser, A.: How to avoid burning ducks: combining linguistic analysis and corpus statistics for german compound processing. In: Proceedings of WMT and MetricsMATR, pp. 224–234 (2010)

    Google Scholar 

  10. Schmid, H.: Improvements in part-of-speech tagging with an application to german. In: Proceedings of the EACL-SIGDAT Workshop, Dublin, Ireland (1995)

    Google Scholar 

  11. Radiological Society of North America: Radlex (2012). (http://rsna.org/RadLex.aspx)

  12. Bretschneider, C., Oberkampf, H., Zillner, S., Bauer, B., Hammon, M.: Corpus-based translation of ontologies for improved multilingual semantic annotation. In: Proceedings of the 3rd SWAIE Workshop (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia Bretschneider .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bretschneider, C., Zillner, S. (2015). Semantic Splitting of German Medical Compounds. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics