Neoclassical Compound Alignments from Comparable Corpora

  • Rima Harastani
  • Béatrice Daille
  • Emmanuel Morin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7182)


The paper deals with the automatic compilation of bilingual dictionary from specialized comparable corpora. We concentrate on a method to automatically extract and to align neoclassical compounds in two languages from comparable corpora. In order to do this, we assume that neoclassical compounds translate compositionally to neoclassical compounds from one language to another. The method covers the two main forms of neoclassical compounds and is split into three steps: extraction, generation, and selection. Our program takes as input a list of aligned neoclassical elements and a bilingual dictionary in two languages. We also align neoclassical compounds by a pivot language approach depending on the hypothesis that the neoclassical element remains stable in meaning across languages. We experiment with four languages: English, French, German, and Spanish using corpora in the domain of renewable energy; we obtain a precision of 96%.


Machine Translation Complex Term Correct Translation Bilingual Dictionary Comparable Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Estopa, R., Vivaldi, J., Cabre, M.T.: Use of greek and latin forms for term detection. In: The 2nd International Conference on Language Resources and Evaluation, vol. 78, pp. 885–859 (2000)Google Scholar
  2. 2.
    Robitaille, X., Sasaki, Y., Tonoike, M., Sato, S., Utsuro, T.: Compiling french-japanese terminologies from the web. In: EACL, pp. 225–232 (2006)Google Scholar
  3. 3.
    Baldwin, T., Tanaka, T.: Translation by machine of complex nominals: Getting it right. In: ACL Workshop on Multiword Expressions: Integrating Processing, pp. 24–31 (2004)Google Scholar
  4. 4.
    Vintar, S.: Bilingual term recognition revisited the bag-of-equivalents term alignment approach and its evaluation. Terminology 16, 141–158 (2010)CrossRefGoogle Scholar
  5. 5.
    Grefenstette, G.: The world wide web as a resource for example-based machine translation tasks. In: Proceedings of the ASLIB Conference on Translating and the Computer, London, vol. 21 (1999)Google Scholar
  6. 6.
    Vincent, C., Ewa, K.: Analyse morphologique en terminologie biomédicale par alignement et apprentissage non-supervisé. In: Conférence Traitement automatique des langues naturelles TALN, Montréal, Québec, Canada (2010)Google Scholar
  7. 7.
    Namer, F., Baud, R.H.: Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system. I. J. Medical Informatics, 226–233 (2007)Google Scholar
  8. 8.
    Cartoni, B.: Lexical morphology in machine translation: A feasibility study. In: EACL, pp. 130–138 (2009)Google Scholar
  9. 9.
    Bowker, L., Pearson, J.: Working with specialized language: a practical guide to using corpora. Routledge, London (2002)CrossRefGoogle Scholar
  10. 10.
    Amiot, D., Dal, G.: La composition néoclassique en français et l’ordre des constituants. La composition dans les langues. Artois Presses Université, 89–113 (2008)Google Scholar
  11. 11.
    Namer, F.: Morphologie, lexique et traitement automatique des langues. Lavoisier (2009)Google Scholar
  12. 12.
    Lüdeling, A.: Neoclassical word-formation. In: Keith Brown (ed) Encyclopedia of Language and Linguistics, 2nd edn., Elsevier, Oxford (2006)Google Scholar
  13. 13.
    Bauer, L.: English word-formation. Cambridge University Press (1983)Google Scholar
  14. 14.
    Baeskow, H.: Lexical properties of selected non-native morphemes of English. Gunter Narr Verlag (2004)Google Scholar
  15. 15.
    McCray, A., Browne, A., Moore, D.: The semantic structure of neo-classical compounds. In: The Annual Symposium on Computer Application in Medical Care, pp. 165–168 (1988)Google Scholar
  16. 16.
    Groc, C.D.: Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction. In: The IEEEWICACM International Conferences on Web Intelligence, Lyon, France, pp. 497–498 (2011)Google Scholar
  17. 17.
    Béchade, H.D.: Phonétique et morphologie du français moderne et contemporain. Presses Universitaires de France (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Rima Harastani
    • 1
  • Béatrice Daille
    • 1
  • Emmanuel Morin
    • 1
  1. 1.LINAUniversity of NantesNantesFrance

Personalised recommendations