Abstract
Compounding is widespread in highly inflectional languages with a quarter of all nouns created by composition. In our field of study, the German medical language, the amount of compounds significantly outnumbers this figure with 64 %. Thus, their correct splitting is a high-impact preprocessing step for any NLP-based application. In this work we address two challenges of medical decomposition: First, we introduce the consideration of unknown constituents in order to split compounds that were not recognized as such so far. Second, our approach builds on the corpus-based approach of Koehn and Knight and adds semantic knowledge from domain ontologies to increase the accuracy during disambiguation of the various split options. Using this first-of-a-kind semantic approach in a study on decomposition of German medical compounds, we outperform the existing approaches by far.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Langer, S.: Zur morphologie und semantik von nominalkomposita. In: Proceedings of KONVENS (1998)
Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for dutch, german, and italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)
Wilmanns, J.C., Schmitt, G.: Die Medizin und ihre Sprache. Ecomed, Landsberg (2002)
Brown, R.D.: Corpus-driven splitting of compound words. In: Proceedings of TMI (2002)
Braschler, M., Ripplinger, B.: How effective is stemming and decompounding for german text retrieval? Information Retrieval 7, 291–316 (2004)
Koehn, P., Knight, K.: Empirical methods for compound splitting. In: Proceedings of the EACL (2003)
Stymne, S.: German compounds in factored statistical machine translation. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 464–475. Springer, Heidelberg (2008)
Popović, M., Stein, D., Ney, H.: Statistical machine translation of german compound words. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 616–624. Springer, Heidelberg (2006)
Fritzinger, F., Fraser, A.: How to avoid burning ducks: combining linguistic analysis and corpus statistics for german compound processing. In: Proceedings of WMT and MetricsMATR, pp. 224–234 (2010)
Schmid, H.: Improvements in part-of-speech tagging with an application to german. In: Proceedings of the EACL-SIGDAT Workshop, Dublin, Ireland (1995)
Radiological Society of North America: Radlex (2012). (http://rsna.org/RadLex.aspx)
Bretschneider, C., Oberkampf, H., Zillner, S., Bauer, B., Hammon, M.: Corpus-based translation of ontologies for improved multilingual semantic annotation. In: Proceedings of the 3rd SWAIE Workshop (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bretschneider, C., Zillner, S. (2015). Semantic Splitting of German Medical Compounds. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-24033-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)