Abstract
In this paper we present a system for automatic terminology extraction and automatic detection of the equivalent terms in the target language to be used alongside a computer assisted translation (CAT) tool that provides term candidates and their translations in an automatic way each time the translator goes from one segment to the next one. The system uses several sources of information: the text from the segment being translated and from the whole translation project, the translation memories assigned to the project and a translation phrase table from a statistical machine translation system. It also uses the terminological database assigned to the project in order to avoid presenting already known terms. The use of translation phrase tables allows us to use very large parallel corpora in a very efficient way. We have used Moses to calculate and to consult the translation phrase tables. The program is written in Python and it can be used with any CAT tool. In our experiments we have used OmegaT, a well-known open source CAT tool. Evaluation results for English–Spanish and for three subjects (politics, finance, and medicine) are presented.
Similar content being viewed by others
Notes
References
Arcan M, Turchi M, Tonelli S, Buitelaar P (2014) Enhancing statistical machine translation with bilingual terminology in a cat environment. In: Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014), pp 54–68
Astrakhantsev NA, Fedorenko DG, Turdakov DY (2015) Methods for automatic term recognition in domain-specific text collections: a survey. Program Comput Softw 41(6):336–349
Bononno R (2000) Terminology for translators—an implementation of ISO 12620. Meta 45(4):646–669
Bourigault D (1992) Surface grammatical analysis for the extraction of terminological noun phrases. In: Proceedings of the 14th conference on computational linguistics, vol 3. COLING ’92, Association for Computational Linguistics, Stroudsburg, pp 977–981
Cabré MT (2010) Terminology and translation. In: Gambier Y, van Doorslaer L (eds) Handbook of translation studies. John Benjamins, Amsterdam, pp 356–365
Cánovas M, Samson R (2011) Open source software in translator training. Tradumática: traducció i tecnologies de la informació i la comunicació 9:46–56
Cram D, Daille B (2016) Termsuite: terminology extraction with term variant detection. In: Proceedings of the 54th annual meeting of the association from computational linguistics—system demonstrations, pp 13–18
Dagan I, Church K (1994) Termight: identifying and translating technical terminology. In: Proceedings of the 4th conference on applied natural language processing, ANLC ’94, Association for Computational Linguistics, Stroudsburg, pp 34–40
Daille B, Gaussier E, Langé J-M (1994) Towards automatic extraction of monolingual and bilingual terminology. In: Proceedings of the 15th conference on computational linguistics, vol 1. COLING ’94, Association for Computational Linguistics, Stroudsburg, pp 515–521
Earl LL (1970) Experiments in automatic extracting and indexing. Inf Storage Retr 6(4):313–330
Eckl M, Haselbeck S (2006) Survey of the global translators community 2014. Technical report, LingoIO
Eijk P (1993) Automating the acquisition of bilingual terminology. In: Proceedings of the 6th conference on European Chapter of the Association for Computational Linguistics, EACL ’93, Association for Computational Linguistics, Stroudsburg, pp 113–119
Evans DA, Zhai C (1996) Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of the 34th annual meeting on association for computational linguistics, ACL ’96, Association for Computational Linguistics, Stroudsburg, pp 17–24
Federico M, Bertoldi N, Cettolo M, Negri M, Turchi M, Trombetti M, Cattelan A, Farina A, Lupinetti D, Martines A et al (2014) The matecat tool. In: COLING (Demos), pp 129–132
Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multi-word terms: the c-value/nc-value method. Int J Digital Libr 3(2):115–130
Fung P (1998) A statistical view on bilingual lexicon extraction: from parallel corpora to non-parallel corpora. In: Machine translation and the information soup: 3rd conference of the association for machine translation in the Americas AMTA’98. Springer, Langhorne, pp 28–31
Gaussier E (2001) General considerations on bilingual terminology extraction. In: Bourigault D, Jacquemin C, L’Homme M-C (eds) Recent advances in computational terminology. John Benjamins Publishing Company, Amsterdam/Philadelphia, pp 167–183
Gornostay T, Vodopiyanova O, Vasijevs A, Schmitz K-D (2013) Cloud-based terminology services for acquiring, sharing and reusing multilingual terminology for human and machine users. In: Proceedings of the TRALOGY II conference “The quest for meaning: where are our weak points and what do we need?, Paris
Gupta R, Orăsan C, Zampieri M, Vela M, van Genabith J, Mitkov R (2016) Improving translation memory matching and retrieval using paraphrases. Mach Transl 30(1–2):19–40
Heylen K, Hertog DD (2015) Automatic term extraction. In: Kockaert HJ, Steurs F (eds) Handbook of Terminology, vol 1. John Benjamins Publishing Company, Amsterdam/Philadelphia, pp 203–221
Hjelm H (2007) Identifying cross language term equivalents using statistical machine translation and distributional association measures. In: Proceedings of NODALIDA. Citeseer, pp 97–104
Hodász G, Pohl G (2005) Metamorpho tm: a linguistically enriched translation memory. In: International workshop, modern approaches in translation technologies
Ideue M, Yamamoto K, Utiyama M, Sumita E (2011) A comparison of unsupervised bilingual term extraction methods using phrase tables. In: Proceedings of the MT Summit XIII, Xiamen
Isabelle P (1992) Bi-textual aids for translators. In: Proceedings of the annual conference of the UW Center for the New OED and Text Research
Johnson I, MacPhail A (2000) Iate-inter-agency terminology exchange: development of a single central terminology database for the institutions and agencies of the european union. In: Workshop on terminology resources and computation
Junczys-Dowmunt M (2012) Phrasal rank-encoding: exploiting phrase redundancy and translational relations for phrase table compression. Prague Bull Math Linguist 98:63–74
Justeson JS, Katz SM (1995) Technical terminology: some linguistic properties and an algorithm for identification in text. Nat Lang Eng 1(01):9–27
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B., Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, ACL ’07. Association for Computational Linguistics, Stroudsburg, pp 177–180
Macken L, Lefever E, Hoste V (2013) Texsis: bilingual terminology extraction from parallel corpora using chunk-based alignment. Terminology 19(1):1–30
Macklovitch E, Russell G (2000) What’s been forgotten in translation memory. In: Conference of the association for machine translation in the Americas, Springer, pp 137–146
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Oliver A, Vàzquez M (2015) TBXTools: a free, fast and flexible tool for automatic terminology extraction. In: Proceedings of recent advances in natural language processing (RANLP-2015), pp 473–479
Padró L, Stanilovsky E (2012, May). Freeling 3.0: Towards wider multilinguality. In: Proceedings of the language resources and evaluation conference (LREC 2012). ELRA, Istanbul
Pal S, Zampieri M, Naskar, SK, Nayak T, Vela M, van Genabith J (2016) Catalog online: porting a post-editing tool to the web. In: Proceedings of LREC
Pazienza MT, Pennacchiotti M, Zanzotto FM (2005) Terminology extraction: an analysis of linguistic and statistical approaches. In: Sirmakessis S (ed) Knowledge mining. Springer, Heidelberg, pp 255–279
Pekar V, Mitkov R (2007) New generation translation memory: content-sensivite matching. In: Proceedings of the 40th anniversary congress of the swiss association of translators, terminologists and interpreters
Planas E, Furuse O (1999) Formalizing translation memories. Machine Translation Summit VII, Singapore, pp 331–339
Salton G, Yang C-S, Yu CT (1975) A theory of term importance in automatic text analysis. J Am Soc Inf Sci 26(1):33–44
Tiedemann J (2012) Parallel data, tools and interfaces in opus. In: Proceedings of the 8th international conference on language resources and evaluation (LREC’2012), pp 2214–2218
Utiyama M, Neubig G, Onishi T, Sumita E (2011) Searching translation memories for paraphrases. In Machine Translation Summit pp 13:325–331
Varga D, Halácsy P, Kornai A, Nagy V, Németh L, Trón V (2005) Parallel corpora for medium density languages. In Proceedings of RANLP, pp 590–596
Vivaldi J, Rodríguez H (2007) Evaluation of terms and term extraction systems: a practical approach. Terminology 13(2):225–248
Weitz M (2017) Improving retrieval performance of translation memories using morphosyntactic analyses and generalized suffix arrays. Mach Transl, 1–30
Xiong D, Meng F, Liu Q (2016) Topic-based term translation models for statistical machine translation. Artif Intell 232:54–75
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Oliver, A. A system for terminology extraction and translation equivalent detection in real time. Machine Translation 31, 147–161 (2017). https://doi.org/10.1007/s10590-017-9201-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-017-9201-7