Skip to main content
Log in

TransSearch: from a bilingual concordancer to a translation finder

  • Original Paper
  • Published:
Machine Translation

Abstract

As basic as bilingual concordancers may appear, they are some of the most widely used computer-assisted translation tools among professional translators. Nevertheless, they still do not benefit from recent breakthroughs in machine translation. This paper describes the improvement of the commercial bilingual concordancer TransSearch in order to embed a word alignment feature. The use of statistical word alignment methods allows the system to spot user query translations, and thus the tool is transformed into a translation search engine. We describe several translation identification and postprocessing algorithms that enhance the application. The excellent results obtained using a large translation memory consisting of 8.3 million sentence pairs are confirmed via human evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York

    Google Scholar 

  • Blunsom P, Cohn T (2006) Discriminative word alignment with conditional random fields. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, proceedings of the conference. Sydney, Australia, pp 65–72

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140

    MATH  MathSciNet  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1): 5–32

    Article  MATH  Google Scholar 

  • Brown P, Della Pietra V, Della Pietra S, Mercer R (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311

    Google Scholar 

  • Callison-Burch C, Bannard C, Schroeder J (2005) A compact data structure for searchable translation memories. In: 10th EAMT conference: practical applications of machine translation, conference proceedings. Budapest, Hungary, pp 59–65

  • Casacuberta F, Civera J, Cubel E, Lagarda AL, Lapalme G, Macklovitch E, Enrique V (2009) Human interaction for high-quality machine translation. Commun ACM 52(10): 135–138

    Article  Google Scholar 

  • Cherry C, Lin D (2003) A probability model to improve word alignment. In: 41st annual meeting of the association for computational linguistics, proceedings of the conference. Sapporo, Japan, pp 88–95

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, Cambridge

    Google Scholar 

  • Croft W, Harper D (1979) Using probabilistic models of information retrieval without relevance information. J Doc 35(4): 285–295

    Article  Google Scholar 

  • Deng Y, Byrne W (2005) HMM word and phrase alignment for statistical machine translation. In: HLT/EMNLP 2005: human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference. Vancouver, BC, Canada, pp 169–176

  • Fleiss JL, Levin B, Pai MC (2003) Statistical methods for rates and proportions, 3rd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Foster G, Isabelle P, Plamondon P (1997) Target-text mediated interactive machine translation. Mach Transl 12: 175–194

    Article  Google Scholar 

  • Fraser A, Marcu D (2007) Getting the structure right for word alignment: LEAF. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning. Prague, Czech Republic, pp 51–60

  • Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: 13th international conference on machine learning (ICML). Bari, Italy, pp 148–156

  • Ittycheriah A, Roukos S (2005) A maximum entropy word aligner for Arabic–English machine translation. In: HLT/EMNLP 2005: human language technology conference and conference on empirical methods in natural language processing, proceedings of the conference. Vancouver, BC, Canada, pp 89–96

  • Johnson H, Martin J, Foster G, Kuhn R (2007) Improving translation quality by discarding most of the phrasetable. In: EMNLP-CoNLL 2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning. Prague, Czech Republic, pp 967–975

  • Kando N, Kuriyama K, Yoshioka M (2001) Information retrieval system evaluation using multi-grade relevance judgments: discussion on averageable single-numbered measures. IPSJ SIG Notes (in Japanese) FI-63: 105–112

    Google Scholar 

  • Kittler J, Hatef M, Duin RP, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Mach Intell 20(3): 226–239

    Article  Google Scholar 

  • Kockaert HJ, Vanallemeersch T, Steurs F (2007) Term-based context extraction in legal terminology : a case study in Belgium. In: Fóris Á, Pusztay J (eds) Current trends in terminology. International conference on terminology. Terminologia et Corpora 4. Berzsenyi Dániel College, Szombathely, Hungary

  • Koehn P (2009) A process study of computer-aided translation. Mach Transl 23(4): 241–263

    Article  Google Scholar 

  • Koehn P, Haddow B (2009) Interactive assistance to human translators using statistical machine translation methods. In: MT summit XII: proceedings of the twelfth machine translation summit. Ottawa, ON, Canada, pp 73–80

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: 45th annual meeting of the association for computational linguistics (ACL), companion volume. Prague, Czech Republic, pp 177–180

  • Kuhn R, De Mori R (1990) A cache-based natural language model for speech recognition. IEEE Trans Pattern Anal Mach Intell 12(6): 570–583

    Article  Google Scholar 

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33: 159–174

    Article  MATH  MathSciNet  Google Scholar 

  • Langlais P (1997) A system to align complex bilingual corpora. Technical report, CTT KTH, Stockholm, Sweden

  • Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Transl 20(1):1–23 (special issue on example-based machine translation)

    Google Scholar 

  • Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Human language technology conference of the North American association for computational linguistics, proceedings of the main conference. New York, NY, USA, pp 104–111

  • Lin D (1998) Dependency-based evaluation of MINIPAR. In: LREC workshop on the evaluation of parsing systems. Granada, Spain, pp 48–56

  • Macklovitch E (2006) TransType2: the last word. In: 5th international conference on language resources and evaluation (LREC). Genoa, Italy, pp 167–172

  • Macklovitch E, Simard M, Langlais P (2000) TransSearch: a free translation memory on the World Wide Web. In: 2nd international conference on language resources and evaluation (LREC). Athens, Greece, pp 1201–1208

  • Macklovitch E, Lapalme G, Gotti F (2008) TransSearch: what are translators looking for? In: AMTA-2008: MT at work: proceedings of the eighth conference of the association for machine translation in the Americas. Waikiki, Hawai’i, USA, pp 412–419

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval, chapt. Evaluation in information retrieval. Cambridge University Press, New York, pp 151–175

    Google Scholar 

  • Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the 2002 conference on empirical methods in natural language processing. Philadelphia, PA, USA, pp 133–139

  • Moore RC (2004) Improving IBM word alignment model 1. In: ACL-04, 42nd annual meeting of the association for computational linguistics, proceedings of the conference. Barcelona, Spain, pp 518–525

  • Moore RC, Yih W-T, Bode A (2006) Improved discriminative bilingual word alignment. In: COLING ACL 2006, 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, proceedings of the conference. Sydney, Australia, pp 513–520

  • Och F, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51

    Article  Google Scholar 

  • Owczarzak K, Mellebeek B, Groves D, Van Genabith J, Way A (2006) Wrapper syntax for example-based machine translation. In: AMTA 2006: proceedings of the 7th conference of the association for machine translation in the Americas: visions for the future of machine translation. Cambridge, MA, USA, pp 148–155

  • Rocchio J (1971) Relevance feedback in information retrieval, chap 14. Prentice-Hall Inc, Upper Saddle River, pp 313–323

    Google Scholar 

  • Sakai T (2004) New performance metrics based on multigrade relevance: their application to question answering. In: Proceedings of the fourth ntcir workshop on research in information access technologies: information retrieval, question answering and summarization (NTCIR-4). Tokyo, Japan

  • Simard M (2003a) Mémoires de traduction sous-phrastiques. Ph.D. thesis, Université de Montréal, Québec, Canada

  • Simard M (2003b) Translation spotting for translation memories. In: HLT-NAACL 2003 workshop on building and using parallel texts: data driven machine translation and beyond. Edmonton, AB, Canada, pp 65–72

  • Toutanova K, Ilhan HT, Manning CD (2002) Extensions to HMM-based statistical word alignment models. In: Proceedings of the 2002 conference on empirical methods in natural language processing. Philadelphia, PA, USA, pp 87–94

  • Véronis J, Langlais P (2000) Evaluation of parallel text alignment systems—the arcade project, chap 19. Kluwer Academic Publishers, Dordrecht, pp, pp 369–388

    Google Scholar 

  • Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13: 1260–1269

    Google Scholar 

  • Vogel S (2005) PESA: phrase pair extraction as sentence splitting. In: MT summit X: the tenth machine translation summit. Phuket, Thailand, pp 251–258

  • Vogel S, Ney H, Tillmann C (1996) HMM-based word alignment in statistical translation. In: COLING-96: the 16th international conference on computational linguistics, proceedings, vol 2. Copenhagen, Denmark, pp 836–841

  • Wu J-C, Yeh KC, Chuang TC, Tao-Yuan C-L, Shei W-C, Chang JS (2003) TotalRecall: a bilingual concordance for computer assisted translation and language learning. In: The companion volume to the proceedings of 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 201–204

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julien Bourdaillet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bourdaillet, J., Huet, S., Langlais, P. et al. TransSearch: from a bilingual concordancer to a translation finder. Machine Translation 24, 241–271 (2010). https://doi.org/10.1007/s10590-011-9089-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9089-6

Keywords

Navigation