Abstract
The paper deals with the automatic compilation of bilingual dictionary from specialized comparable corpora. We concentrate on a method to automatically extract and to align neoclassical compounds in two languages from comparable corpora. In order to do this, we assume that neoclassical compounds translate compositionally to neoclassical compounds from one language to another. The method covers the two main forms of neoclassical compounds and is split into three steps: extraction, generation, and selection. Our program takes as input a list of aligned neoclassical elements and a bilingual dictionary in two languages. We also align neoclassical compounds by a pivot language approach depending on the hypothesis that the neoclassical element remains stable in meaning across languages. We experiment with four languages: English, French, German, and Spanish using corpora in the domain of renewable energy; we obtain a precision of 96%.
An Erratum for this chapter can be found at http://dx.doi.org/10.1007/978-3-642-28601-8_43
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Estopa, R., Vivaldi, J., Cabre, M.T.: Use of greek and latin forms for term detection. In: The 2nd International Conference on Language Resources and Evaluation, vol. 78, pp. 885–859 (2000)
Robitaille, X., Sasaki, Y., Tonoike, M., Sato, S., Utsuro, T.: Compiling french-japanese terminologies from the web. In: EACL, pp. 225–232 (2006)
Baldwin, T., Tanaka, T.: Translation by machine of complex nominals: Getting it right. In: ACL Workshop on Multiword Expressions: Integrating Processing, pp. 24–31 (2004)
Vintar, S.: Bilingual term recognition revisited the bag-of-equivalents term alignment approach and its evaluation. Terminology 16, 141–158 (2010)
Grefenstette, G.: The world wide web as a resource for example-based machine translation tasks. In: Proceedings of the ASLIB Conference on Translating and the Computer, London, vol. 21 (1999)
Vincent, C., Ewa, K.: Analyse morphologique en terminologie biomédicale par alignement et apprentissage non-supervisé. In: Conférence Traitement automatique des langues naturelles TALN, Montréal, Québec, Canada (2010)
Namer, F., Baud, R.H.: Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system. I. J. Medical Informatics, 226–233 (2007)
Cartoni, B.: Lexical morphology in machine translation: A feasibility study. In: EACL, pp. 130–138 (2009)
Bowker, L., Pearson, J.: Working with specialized language: a practical guide to using corpora. Routledge, London (2002)
Amiot, D., Dal, G.: La composition néoclassique en français et l’ordre des constituants. La composition dans les langues. Artois Presses Université, 89–113 (2008)
Namer, F.: Morphologie, lexique et traitement automatique des langues. Lavoisier (2009)
LĂĽdeling, A.: Neoclassical word-formation. In: Keith Brown (ed) Encyclopedia of Language and Linguistics, 2nd edn., Elsevier, Oxford (2006)
Bauer, L.: English word-formation. Cambridge University Press (1983)
Baeskow, H.: Lexical properties of selected non-native morphemes of English. Gunter Narr Verlag (2004)
McCray, A., Browne, A., Moore, D.: The semantic structure of neo-classical compounds. In: The Annual Symposium on Computer Application in Medical Care, pp. 165–168 (1988)
Groc, C.D.: Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction. In: The IEEEWICACM International Conferences on Web Intelligence, Lyon, France, pp. 497–498 (2011)
Béchade, H.D.: Phonétique et morphologie du français moderne et contemporain. Presses Universitaires de France (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Harastani, R., Daille, B., Morin, E. (2012). Neoclassical Compound Alignments from Comparable Corpora. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol 7182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28601-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-28601-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28600-1
Online ISBN: 978-3-642-28601-8
eBook Packages: Computer ScienceComputer Science (R0)