Advertisement

Compound Terms and Their Multi-word Variants: Case of German and Russian Languages

  • Elizaveta Clouet
  • Béatrice Daille
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8403)

Abstract

The terminology of any language and any domain continuously evolves and leads to a constant term renewal. Terms undergo a wide range of morphological and syntactic variations which have to be handled by any NLP applications. If the syntactic variations of multi-word terms have been described and tools designed to process them, only a few works studied the syntagmatic variants of compound terms. This paper is dedicated to the identification of such variants, and more precisely to the detection of synonymic pairs that consist of “compound term - multi-word term ”. We describe a pipeline for their detection, from compound recognition and splitting to alignment of the variants with original terms, through multi-word term extraction. The experiments are carried out for two compound-producing languages, German and Russian, and two specialised domains: wind energy and breast cancer. We identify variation patterns for these two languages and demonstrate that the transformation of a morphological compound into a syntagmatic compound mainly occurs when the term denomination needs to be enlarged.

Keywords

Noun Phrase Wind Energy Variant Pair Russian Language Original Term 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: Proceedings of 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), pp. 341–348 (1999)Google Scholar
  2. 2.
    Weller, M., Blancafort, H., Gojun, A., Heid, U.: Terminology extraction and term variation patterns: A study of french and german data. In: Proceedings of German Society for Computational Linguistics and Language Technology (GSCL 2011), Hamburg, Germany (2011)Google Scholar
  3. 3.
    Yoshikane, F., Tsuji, K., Kageura, K., Jacquemin, C.: Detecting japanese term variation in textual corpus. In: Proceedings of 4th International Workshop on Information Retrieval with Asian Languages (IRAL 1999), Taipei, Taiwan, pp. 97–108 (1999)Google Scholar
  4. 4.
    Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge (2001)Google Scholar
  5. 5.
    Daille, B.: Variations and application-oriented terminology engineering. Terminology 11, 181–196 (2005)CrossRefGoogle Scholar
  6. 6.
    Macherey, K., Dai, A., Talbot, D., Popat, A., Och, F.: Language-independent compound splitting with morphological operations. In: Proceedings of ACL 2011, Portland, Oregon, pp. 1395–1404 (2011)Google Scholar
  7. 7.
    Langer, S.: Zur Morphologie und Semantik von Nominalkomposita. In: Proceedings of KONVENS 1998, Bonn, pp. 83–97 (1998)Google Scholar
  8. 8.
    Koehn, P., Knight, K.: Empirical methods for compound splitting. In: Proceedings of EACL 2003, Budapest, Hungary (2003)Google Scholar
  9. 9.
    Namer, F.: Morphologie, Lexique et Traitement Automatique des Langues. Lavoisier, Paris (2009)Google Scholar
  10. 10.
    Ville-Ometz, F., Royauté, J., Zasadzinski, A.: Enhancing in automatic recognition and extraction of term variants with linguistic features. Terminology 13, 61–84 (2007)CrossRefGoogle Scholar
  11. 11.
    Jacquemin, C.: Fastr: A unification-based front-end to automatic indexing. In: Proceedings of Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1994), pp. 34–47 (1994)Google Scholar
  12. 12.
    Schmid, H., Fitschen, A., Heid, U.: Smor: A german computational morphology covering derivation, composition, and inflection. In: Proceedings of LREC 2004, Lisbon, Portugal, pp. 1263–1266 (2004)Google Scholar
  13. 13.
    Dyer, C.: Using a maximum entropy model to build segmentation lattices for mt. In: Proceedings of HLT-NAACL 2009 (2009)Google Scholar
  14. 14.
    Hewlett, D., Cohen, P.: Fully unsupervised word segmentation with bve and mdl. In: Proceedings of ACL 2011, Portland, Oregon, pp. 540–545 (2011)Google Scholar
  15. 15.
    Ott, N.: Measuring semantic relatedness of german compounds using germanet (2005), http://niels.drni.de/n3files/bananasplit/Compound-GermaNet-Slides.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Elizaveta Clouet
    • 1
  • Béatrice Daille
    • 1
  1. 1.LINAUniversity of NantesFrance

Personalised recommendations