Abstract
As basic as bilingual concordancers may appear, they are some of the most widely used computer-assisted translation tools among professional translators. Nevertheless, they still do not benefit from recent breakthroughs in machine translation. This paper describes the improvement of the commercial bilingual concordancer TransSearch in order to embed a word alignment feature. The use of statistical word alignment methods allows the system to spot user query translations, and thus the tool is transformed into a translation search engine. We describe several translation identification and postprocessing algorithms that enhance the application. The excellent results obtained using a large translation memory consisting of 8.3 million sentence pairs are confirmed via human evaluation.
Similar content being viewed by others
References
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York
Blunsom P, Cohn T (2006) Discriminative word alignment with conditional random fields. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, proceedings of the conference. Sydney, Australia, pp 65–72
Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140
Breiman L (2001) Random forests. Mach Learn 45(1): 5–32
Brown P, Della Pietra V, Della Pietra S, Mercer R (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311
Callison-Burch C, Bannard C, Schroeder J (2005) A compact data structure for searchable translation memories. In: 10th EAMT conference: practical applications of machine translation, conference proceedings. Budapest, Hungary, pp 59–65
Casacuberta F, Civera J, Cubel E, Lagarda AL, Lapalme G, Macklovitch E, Enrique V (2009) Human interaction for high-quality machine translation. Commun ACM 52(10): 135–138
Cherry C, Lin D (2003) A probability model to improve word alignment. In: 41st annual meeting of the association for computational linguistics, proceedings of the conference. Sapporo, Japan, pp 88–95
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge
Croft W, Harper D (1979) Using probabilistic models of information retrieval without relevance information. J Doc 35(4): 285–295
Deng Y, Byrne W (2005) HMM word and phrase alignment for statistical machine translation. In: HLT/EMNLP 2005: human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference. Vancouver, BC, Canada, pp 169–176
Fleiss JL, Levin B, Pai MC (2003) Statistical methods for rates and proportions, 3rd edn. Wiley, New York
Foster G, Isabelle P, Plamondon P (1997) Target-text mediated interactive machine translation. Mach Transl 12: 175–194
Fraser A, Marcu D (2007) Getting the structure right for word alignment: LEAF. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning. Prague, Czech Republic, pp 51–60
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: 13th international conference on machine learning (ICML). Bari, Italy, pp 148–156
Ittycheriah A, Roukos S (2005) A maximum entropy word aligner for Arabic–English machine translation. In: HLT/EMNLP 2005: human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference. Vancouver, BC, Canada, pp 89–96
Johnson H, Martin J, Foster G, Kuhn R (2007) Improving translation quality by discarding most of the phrasetable. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning. Prague, Czech Republic, pp 967–975
Kando N, Kuriyama K, Yoshioka M (2001) Information retrieval system evaluation using multi-grade relevance judgments: discussion on averageable single-numbered measures. IPSJ SIG Notes (in Japanese) FI-63: 105–112
Kittler J, Hatef M, Duin RP, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3): 226–239
Kockaert HJ, Vanallemeersch T, Steurs F (2007) Term-based context extraction in legal terminology : a case study in Belgium. In: Fóris Á, Pusztay J (eds) Current trends in terminology. International conference on terminology. Terminologia et Corpora 4. Berzsenyi Dániel College, Szombathely, Hungary
Koehn P (2009) A process study of computer-aided translation. Mach Transl 23(4): 241–263
Koehn P, Haddow B (2009) Interactive assistance to human translators using statistical machine translation methods. In: MT summit XII: proceedings of the twelfth machine translation summit. Ottawa, ON, Canada, pp 73–80
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: 45th annual meeting of the association for computational linguistics (ACL), companion volume. Prague, Czech Republic, pp 177–180
Kuhn R, De Mori R (1990) A cache-based natural language model for speech recognition. IEEE Trans Pattern Anal Mach Intell 12(6): 570–583
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33: 159–174
Langlais P (1997) A system to align complex bilingual corpora. Technical report, CTT KTH, Stockholm, Sweden
Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Transl 20(1):1–23 (special issue on example-based machine translation)
Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Human language technology conference of the North American association for computational linguistics, proceedings of the main conference. New York, NY, USA, pp 104–111
Lin D (1998) Dependency-based evaluation of MINIPAR. In: LREC workshop on the evaluation of parsing systems. Granada, Spain, pp 48–56
Macklovitch E (2006) TransType2: the last word. In: 5th international conference on language resources and evaluation (LREC). Genoa, Italy, pp 167–172
Macklovitch E, Simard M, Langlais P (2000) TransSearch: a free translation memory on the World Wide Web. In: 2nd international conference on language resources and evaluation (LREC). Athens, Greece, pp 1201–1208
Macklovitch E, Lapalme G, Gotti F (2008) TransSearch: what are translators looking for? In: AMTA-2008: MT at work: proceedings of the eighth conference of the association for machine translation in the Americas. Waikiki, Hawai’i, USA, pp 412–419
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, chapt. Evaluation in information retrieval. Cambridge University Press, New York, pp 151–175
Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the 2002 conference on empirical methods in natural language processing. Philadelphia, PA, USA, pp 133–139
Moore RC (2004) Improving IBM word alignment model 1. In: ACL-04, 42nd annual meeting of the association for computational linguistics, proceedings of the conference. Barcelona, Spain, pp 518–525
Moore RC, Yih W-T, Bode A (2006) Improved discriminative bilingual word alignment. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, proceedings of the conference. Sydney, Australia, pp 513–520
Och F, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Owczarzak K, Mellebeek B, Groves D, Van Genabith J, Way A (2006) Wrapper syntax for example-based machine translation. In: AMTA 2006: proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation. Cambridge, MA, USA, pp 148–155
Rocchio J (1971) Relevance feedback in information retrieval, chap 14. Prentice-Hall Inc, Upper Saddle River, pp 313–323
Sakai T (2004) New performance metrics based on multigrade relevance: their application to question answering. In: Proceedings of the fourth ntcir workshop on research in information access technologies: information retrieval, question answering and summarization (NTCIR-4). Tokyo, Japan
Simard M (2003a) Mémoires de traduction sous-phrastiques. Ph.D. thesis, Université de Montréal, Québec, Canada
Simard M (2003b) Translation spotting for translation memories. In: HLT-NAACL 2003 workshop on building and using parallel texts: data driven machine translation and beyond. Edmonton, AB, Canada, pp 65–72
Toutanova K, Ilhan HT, Manning CD (2002) Extensions to HMM-based statistical word alignment models. In: Proceedings of the 2002 conference on empirical methods in natural language processing. Philadelphia, PA, USA, pp 87–94
Véronis J, Langlais P (2000) Evaluation of parallel text alignment systems—the arcade project, chap 19. Kluwer Academic Publishers, Dordrecht, pp, pp 369–388
Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13: 1260–1269
Vogel S (2005) PESA: phrase pair extraction as sentence splitting. In: MT summit X: the tenth machine translation summit. Phuket, Thailand, pp 251–258
Vogel S, Ney H, Tillmann C (1996) HMM-based word alignment in statistical translation. In: COLING-96: the 16th international conference on computational linguistics, proceedings, vol 2. Copenhagen, Denmark, pp 836–841
Wu J-C, Yeh KC, Chuang TC, Tao-Yuan C-L, Shei W-C, Chang JS (2003) TotalRecall: a bilingual concordance for computer assisted translation and language learning. In: The companion volume to the proceedings of 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 201–204
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bourdaillet, J., Huet, S., Langlais, P. et al. TransSearch: from a bilingual concordancer to a translation finder. Machine Translation 24, 241–271 (2010). https://doi.org/10.1007/s10590-011-9089-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-011-9089-6