TransSearch: from a bilingual concordancer to a translation finder

Bourdaillet, Julien; Huet, Stéphane; Langlais, Philippe; Lapalme, Guy

doi:10.1007/s10590-011-9089-6

TransSearch: from a bilingual concordancer to a translation finder

Original Paper
Published: 29 March 2011

Volume 24, pages 241–271, (2010)
Cite this article

Machine Translation

Julien Bourdaillet¹,
Stéphane Huet¹,
Philippe Langlais¹ &
…
Guy Lapalme¹

197 Accesses
10 Citations
Explore all metrics

Abstract

As basic as bilingual concordancers may appear, they are some of the most widely used computer-assisted translation tools among professional translators. Nevertheless, they still do not benefit from recent breakthroughs in machine translation. This paper describes the improvement of the commercial bilingual concordancer TransSearch in order to embed a word alignment feature. The use of statistical word alignment methods allows the system to spot user query translations, and thus the tool is transformed into a translation search engine. We describe several translation identification and postprocessing algorithms that enhance the application. The excellent results obtained using a large translation memory consisting of 8.3 million sentence pairs are confirmed via human evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

A survey on large language model based autonomous agents

Article Open access 22 March 2024

Near-term advances in quantum natural language processing

Article 11 April 2024

References

Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York
Google Scholar
Blunsom P, Cohn T (2006) Discriminative word alignment with conditional random fields. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, proceedings of the conference. Sydney, Australia, pp 65–72
Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140
MATH MathSciNet Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1): 5–32
Article MATH Google Scholar
Brown P, Della Pietra V, Della Pietra S, Mercer R (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311
Google Scholar
Callison-Burch C, Bannard C, Schroeder J (2005) A compact data structure for searchable translation memories. In: 10th EAMT conference: practical applications of machine translation, conference proceedings. Budapest, Hungary, pp 59–65
Casacuberta F, Civera J, Cubel E, Lagarda AL, Lapalme G, Macklovitch E, Enrique V (2009) Human interaction for high-quality machine translation. Commun ACM 52(10): 135–138
Article Google Scholar
Cherry C, Lin D (2003) A probability model to improve word alignment. In: 41st annual meeting of the association for computational linguistics, proceedings of the conference. Sapporo, Japan, pp 88–95
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge
Google Scholar
Croft W, Harper D (1979) Using probabilistic models of information retrieval without relevance information. J Doc 35(4): 285–295
Article Google Scholar
Deng Y, Byrne W (2005) HMM word and phrase alignment for statistical machine translation. In: HLT/EMNLP 2005: human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference. Vancouver, BC, Canada, pp 169–176
Fleiss JL, Levin B, Pai MC (2003) Statistical methods for rates and proportions, 3rd edn. Wiley, New York
Book MATH Google Scholar
Foster G, Isabelle P, Plamondon P (1997) Target-text mediated interactive machine translation. Mach Transl 12: 175–194
Article Google Scholar
Fraser A, Marcu D (2007) Getting the structure right for word alignment: LEAF. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning. Prague, Czech Republic, pp 51–60
Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: 13th international conference on machine learning (ICML). Bari, Italy, pp 148–156
Ittycheriah A, Roukos S (2005) A maximum entropy word aligner for Arabic–English machine translation. In: HLT/EMNLP 2005: human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference. Vancouver, BC, Canada, pp 89–96
Johnson H, Martin J, Foster G, Kuhn R (2007) Improving translation quality by discarding most of the phrasetable. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning. Prague, Czech Republic, pp 967–975
Kando N, Kuriyama K, Yoshioka M (2001) Information retrieval system evaluation using multi-grade relevance judgments: discussion on averageable single-numbered measures. IPSJ SIG Notes (in Japanese) FI-63: 105–112
Google Scholar
Kittler J, Hatef M, Duin RP, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3): 226–239
Article Google Scholar
Kockaert HJ, Vanallemeersch T, Steurs F (2007) Term-based context extraction in legal terminology : a case study in Belgium. In: Fóris Á, Pusztay J (eds) Current trends in terminology. International conference on terminology. Terminologia et Corpora 4. Berzsenyi Dániel College, Szombathely, Hungary
Koehn P (2009) A process study of computer-aided translation. Mach Transl 23(4): 241–263
Article Google Scholar
Koehn P, Haddow B (2009) Interactive assistance to human translators using statistical machine translation methods. In: MT summit XII: proceedings of the twelfth machine translation summit. Ottawa, ON, Canada, pp 73–80
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: 45th annual meeting of the association for computational linguistics (ACL), companion volume. Prague, Czech Republic, pp 177–180
Kuhn R, De Mori R (1990) A cache-based natural language model for speech recognition. IEEE Trans Pattern Anal Mach Intell 12(6): 570–583
Article Google Scholar
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33: 159–174
Article MATH MathSciNet Google Scholar
Langlais P (1997) A system to align complex bilingual corpora. Technical report, CTT KTH, Stockholm, Sweden
Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Transl 20(1):1–23 (special issue on example-based machine translation)
Google Scholar
Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Human language technology conference of the North American association for computational linguistics, proceedings of the main conference. New York, NY, USA, pp 104–111
Lin D (1998) Dependency-based evaluation of MINIPAR. In: LREC workshop on the evaluation of parsing systems. Granada, Spain, pp 48–56
Macklovitch E (2006) TransType2: the last word. In: 5th international conference on language resources and evaluation (LREC). Genoa, Italy, pp 167–172
Macklovitch E, Simard M, Langlais P (2000) TransSearch: a free translation memory on the World Wide Web. In: 2nd international conference on language resources and evaluation (LREC). Athens, Greece, pp 1201–1208
Macklovitch E, Lapalme G, Gotti F (2008) TransSearch: what are translators looking for? In: AMTA-2008: MT at work: proceedings of the eighth conference of the association for machine translation in the Americas. Waikiki, Hawai’i, USA, pp 412–419
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, chapt. Evaluation in information retrieval. Cambridge University Press, New York, pp 151–175
Google Scholar
Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the 2002 conference on empirical methods in natural language processing. Philadelphia, PA, USA, pp 133–139
Moore RC (2004) Improving IBM word alignment model 1. In: ACL-04, 42nd annual meeting of the association for computational linguistics, proceedings of the conference. Barcelona, Spain, pp 518–525
Moore RC, Yih W-T, Bode A (2006) Improved discriminative bilingual word alignment. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, proceedings of the conference. Sydney, Australia, pp 513–520
Och F, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51
Article Google Scholar
Owczarzak K, Mellebeek B, Groves D, Van Genabith J, Way A (2006) Wrapper syntax for example-based machine translation. In: AMTA 2006: proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation. Cambridge, MA, USA, pp 148–155
Rocchio J (1971) Relevance feedback in information retrieval, chap 14. Prentice-Hall Inc, Upper Saddle River, pp 313–323
Google Scholar
Sakai T (2004) New performance metrics based on multigrade relevance: their application to question answering. In: Proceedings of the fourth ntcir workshop on research in information access technologies: information retrieval, question answering and summarization (NTCIR-4). Tokyo, Japan
Simard M (2003a) Mémoires de traduction sous-phrastiques. Ph.D. thesis, Université de Montréal, Québec, Canada
Simard M (2003b) Translation spotting for translation memories. In: HLT-NAACL 2003 workshop on building and using parallel texts: data driven machine translation and beyond. Edmonton, AB, Canada, pp 65–72
Toutanova K, Ilhan HT, Manning CD (2002) Extensions to HMM-based statistical word alignment models. In: Proceedings of the 2002 conference on empirical methods in natural language processing. Philadelphia, PA, USA, pp 87–94
Véronis J, Langlais P (2000) Evaluation of parallel text alignment systems—the arcade project, chap 19. Kluwer Academic Publishers, Dordrecht, pp, pp 369–388
Google Scholar
Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13: 1260–1269
Google Scholar
Vogel S (2005) PESA: phrase pair extraction as sentence splitting. In: MT summit X: the tenth machine translation summit. Phuket, Thailand, pp 251–258
Vogel S, Ney H, Tillmann C (1996) HMM-based word alignment in statistical translation. In: COLING-96: the 16th international conference on computational linguistics, proceedings, vol 2. Copenhagen, Denmark, pp 836–841
Wu J-C, Yeh KC, Chuang TC, Tao-Yuan C-L, Shei W-C, Chang JS (2003) TotalRecall: a bilingual concordance for computer assisted translation and language learning. In: The companion volume to the proceedings of 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 201–204

Download references

Author information

Authors and Affiliations

DIRO—Université de Montréal, C.P. 6128 succursale Centre-ville, Montréal, QC, H3C 3J7, Canada
Julien Bourdaillet, Stéphane Huet, Philippe Langlais & Guy Lapalme

Authors

Julien Bourdaillet
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Huet
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Langlais
View author publications
You can also search for this author in PubMed Google Scholar
Guy Lapalme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julien Bourdaillet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bourdaillet, J., Huet, S., Langlais, P. et al. TransSearch: from a bilingual concordancer to a translation finder. Machine Translation 24, 241–271 (2010). https://doi.org/10.1007/s10590-011-9089-6

Download citation

Received: 27 November 2009
Accepted: 24 February 2011
Published: 29 March 2011
Issue Date: December 2010
DOI: https://doi.org/10.1007/s10590-011-9089-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TransSearch: from a bilingual concordancer to a translation finder

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on large language model based autonomous agents

Near-term advances in quantum natural language processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TransSearch: from a bilingual concordancer to a translation finder

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on large language model based autonomous agents

Near-term advances in quantum natural language processing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation