Skip to main content
Log in

Purest ever example-based machine translation: Detailed presentation and assessment

  • Example-Based Mt
  • Published:
Machine Translation

Abstract

We have designed, implemented and assessed an EBMT system that can be dubbed the “purest ever built”: it strictly does not make any use of variables, templates or patterns, does not have any explicit transfer component, and does not require any preprocessing or training of the aligned examples. It uses only a specific operation, proportional analogy, that implicitly neutralizes divergences between languages and captures lexical and syntactic variations along the paradigmatic and syntagmatic axes without explicitly decomposing sentences into fragments. Exactly the same genuine implementation of such a core engine was evaluated on different tasks and language pairs. To begin with, we compared our system on two tasks of a previous MT evaluation campaign to rank it among other current state-of-the-art systems. Then, we illustrated the “universality” of our system by participating in a recent MT evaluation campaign, with exactly the same core engine, for a wide variety of language pairs. Finally, we studied the influence of extra data like dictionaries and paraphrases on the system performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aha DW (1997) Editorial. Artif Intell Rev 11:7–10

    Article  Google Scholar 

  • Aha DW (1998) Feature weighting for lazy learning algorithms. In: Liu H, Motoda H (eds) Feature extraction, construction and selection: A data mining perspective. Kluwer Dordrecht, The Netherlands, pp. 13–32

    Google Scholar 

  • Akiba Y, Federico M, Kando N, Nakaiwa H, Paul M, Tsujii J (2004) Overview of the IWSLT04 evaluation campaign. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 1–12

  • Allison L, Dix TI (1986) A bit string longest common subsequence algorithm. Inform Proc Lett 23:305–310

    Article  Google Scholar 

  • Amores JG, Mora JP (1998) Machine translation of motion verbs from English to Spanish. In: Martín-Vide C (ed) pp 191–206

  • Aramaki E, Kurohashi S (2004) Example-based machine translation using structual translation examples. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 91–94

  • Bender O, Zens R, Matusov E, Ney H (2004) Alignment templates: the RWTH SMT system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp. 79–84

  • Bertoldi N, Cattoni R, Cettolo M, Federico M (2004) The ITC-irst statistical machine translation system for IWSLT-2004. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, 51–58

  • Blanchon H, Boitet C, Brunet-Manquat F, Tomokiyo M, Hamon A, Hung VT, Bey Y (2004) Towards fairer evaluations of commercial MT systems on basic travel expressions corpora. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, 21–26

  • Bloomfield L (1933) Language. Holt. New York, NY

    Google Scholar 

  • Brown PE, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Ling 19:263–311

    Google Scholar 

  • Carl M (1998) A constructivist approach to machine translation. In: Proceedings of the joint conference on new methods in language processing and computational natural language learning, NeMLaP3/CoNLL98. Macquarie University, [Sydney, Australia], pp 247–256

  • Carl M (2006) A system-theoretic view of EBMT. Mach Translat 19:147–167

    Google Scholar 

  • Carl M, Way A (eds) (2003) Recent advances in example-based machine translation. Kluwer Academic Publishers, Dordrecht, The Netherlands

    Google Scholar 

  • Claveau V, L’Homme M-C (2005) Apprentissage par analogie pour la structuration de terminologie—Utilisation comparée de ressources endogènes et exogènes [Analogical learning of terminological structure—Comparison of the use of endogenous and exogenous resources]. In: TIA 2005: 6èmes rencontres terminologie et intelligence artificielle. Rouen, France, p 10

  • Damper RI, Eastman JEG (1996) Pronouncing text by analogy. In: COLING-96: The 16th international conference on computational linguistics. Copenhagen, Denmark, pp 268–269

  • Delhay A, Miclet L (2004) Analogical equations in sequences: Definition and resolution. In: Paliouras G, Sakakibara Y (eds) Grammatical inference: Algorithms and applications, 7th international colloquium, ICGI 2004. Springer, Berlin, Germany, pp 127–138

    Google Scholar 

  • Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: ARPA workshop on human language technology notebook proceedings. San Diego, CA, pp 139–145

  • Dorr BJ (1994) Machine translation divergences. Comput Ling 20:597–633

    Google Scholar 

  • Dorr BJ, Pearl L, Hwa R, Habash N (2002) DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment. In: Richardson S (ed) Machine translation: From research to real users (Fifth conference of the Association for Machine Translation in the Americas AMTA-2002. Tiburon, CA, USA, ...), Springer, Berlin, pp 31–43

  • Gentner D (1983) Structure mapping: a theoretical model for analogy. Cognitive Sci 7:155–170

    Article  Google Scholar 

  • Habash N (2002) Generation-heavy hybrid machine translation. In: Proceedings of the international natural language generation conference (INLG’02). New York, NY, pp 185–191

  • Harris ZS (1954) Distributional structure. Word 10:146–162

    Google Scholar 

  • Harris Z (1982) A grammar of English on mathematical principles. J Wiley, New York, NY

    Google Scholar 

  • Hathout N (2001) Analogies morpho-synonymiques: Une méthode d’acquisition automatique de liens morphologiques à partir d’un dictionnaire de synonymes [Morpho-synonymic analogies: A method of automatically acquiring morphological links starting from a synonym dictionary]. In: TALN-Récital 2001: 8ème conférence sur le traitement automatique des langues naturelles et 5ème rencontre des étudiants chercheurs en informatique pour le traitement des langues. Tours, France, pp 223–232

  • Hofstadter D, Fluid Analogies Research Group (1994) Fluid concepts and creative analogies. Basic Books, New York, NY

  • Hou H, Deng D, Zou G, Yu H, Liu Y, Xiong D, Liu Q (2004) An EBMT system based on word alignment. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 47–49

  • Hutchins J (2006) Example-based machine translation: a review and commentary. Mach Translat 19:116–130

    Google Scholar 

  • Ilie L (1998) On ambiguity in internal contextual languages. In: Martín-Vide C (ed) pp 29–46

  • Itkonen E (1999) Grammaticalization: Abduction, analogy, and rational explanation. In: Shapiro M, Haley M (eds) The Peirce seminar papers: Essays in semiotic analysis vol IV. Berghahn Books. Oxford, England, pp 159–175

    Google Scholar 

  • Joshi A, Vijay-Shanker K, Weir D (1991) The convergence of mildly context-sensitive grammar formalisms. In: Sells P, Shieber SM, Wasow T (eds) Foundational issues in natural language processing. MIT Press, Cambridge, MA, pp 31–81

    Google Scholar 

  • Lee Y-S, Roukos S (2004) IBM spoken language translation system evaluation. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 39–46

  • Lepage Y (1998) Solving analogies on words: An algorithm. In: COLING-ACL ’98: 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics. Montreal, Quebec, Canada, pp 728–735

  • Lepage Y (2001) Analogy and formal languages. In: Proceedings of the joint meeting of the sixth conference on formal grammar and the seventh conference on the mathematics of language (FG/MOL 2001). Helsinki, Finland, pp 1–12

  • Lepage Y (2003) De l’analogie rendant compte de la commutation en linguistique [On analogy considering commutation in linguistics]. Mémoire d’habilitation à diriger les recherches. Université de Grenoble, Grenoble, France

  • Lepage Y (2004) Lower and higher estimates of the number of “true analogies” between sentences contained in a large multilingual corpus. In: Coling: 20th international conference on computational linguistics. Geneva, Switzerland, pp 736–742

  • Lepage Y, Peralta G (2004) Using paradigm tables to generate new utterances similar to those existing in linguistic resources. In: Proceedings of the fourth international conference on language resources and evaluation (LREC-2004). Lisbon, Portugal, pp 243–246

  • Levenshtein VI Levenshte n, VI] (1965) [Dvoichnye kody s ispravleniem vypadeni, vstavok i zameshcheni simbolov. Dokl Akad Nauk SSSR] 163:845–848; appeared (1966) as Binary codes capable of correcting deletions, insertions and reversals, Sov Phys Dokl 10:707–710

  • Martín-Vide C (ed) (1998) Mathematical and computational analysis of natural language. John Benjamins, Amsterdam, The Netherlands/Philadelphia, PA

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) bleu: A method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, pp 311–318

  • Paul H (1920) Prinzipien der Sprachgeschichte [Principles of the history of language]. Niemayer, Tübingen, Germany

    Google Scholar 

  • Polański K (1984) Słownik syntaktyczno-generatywny czasowników polskich [Syntactic-generative dictionary of Polish verbs]. Wydawnictwo im. Ossolińskich, Warszawa

  • Reichert J, Waibel A (2004) The ISL EDTRL system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, 61–64

  • Sager N (1981) Natural language information processing: A computer grammar of English and its applications. Addison-Wesley, Reading, MA

    Google Scholar 

  • Salkoff M (1973) Une grammaire en chaîne du français [A string grammar of French]. Dunod, Paris, France

    Google Scholar 

  • Sasayama M, Ren F, Kuroiwa S (2003) Super-function based Japanese-English machine translation system. In: NLPK-KE 2003: International conference on natural language processing and knowledge engineering. Beijing, China, pp 555–560

  • Sato S (1991) Example-based machine translation. PhD thesis, Kyoto University, Kyoto, Japan

  • de Saussure F (1955) Cours de linguistique générale [A course in general linguistics]. Payot, Lausanne, Switzerland

  • Shieber SM (1985) Evidence against the context-freeness of natural language. Ling Philos 8:333–343

    Article  Google Scholar 

  • Skousen R (1989) Analogical modeling of language. Kluwer, Dordrecht, The Netherlands

    Google Scholar 

  • Stephen GA (1994) String searching algorithms. World Scientific, Singapore

    Google Scholar 

  • Stroppa N, Yvon F (2005) An analogical learner for morphological analysis. In: CoNLL-2005: Ninth conference on computational natural language learning. Ann Arbor, MI, pp 120–127

  • Sumita E (2003) EBMT using DP-matching between word sequences. In: Carl M, Way A, (eds) Recent advances in example-based machine translation. Kluwer Academic Publishers, Dordrecht, The Netherlands, pp 189–209

    Google Scholar 

  • Sumita E, Akiba Y, Doi T, Finch A, Imamura K, Okuma H, Paul M, Shimohata M, Watanabe T (2004) EBMT, SMT, hybrid and more: ATR spoken language translation system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 13–20

  • Sumita E, Iida H (1991) Experiments and prospects of example-based machine translation. In: 29th annual meeting of the Association for Computational Linguistics. Berkeley, CA, pp 185–192

  • Takezawa T, Sumita E, Sugaya F, Yamamoto H, Yamamoto S (2002) Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In: LREC 2002: Third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 147–152

  • Thayer I, Ettelaie E, Knight K, Marcu D, Munteanu DS, Och FJ, Tipu Q (2004) The ISI/USC system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 59–60

  • Turian JP, Shen L, Melamed ID (2003) Evaluation of machine translation and its evaluation. In: MT Summit IX: Proceedings of the ninth machine translation summit New Orleans, USA, pp 386–393

  • Ukkonen E (1985) Algorithms for approximate string matching. Inform Control 64: 100–118

    Article  Google Scholar 

  • Vogel S, Hewavitharna S, Kolss M, Waibel A (2004) The ISL statistical translation system for spoken language translation. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 65–72

  • Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J Assoc Comput Mach 21:168–173

    Google Scholar 

  • Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Ling 23:377–403

    Google Scholar 

  • Yamamoto K (2004) Interaction between paraphraser and transfer for spoken language translation. J Nat Lang Proc 11.5:63–86

    Google Scholar 

  • Yang M, Zhao T, Liu H, SHi X, Jiang H (2004) Auto word alignment based Chinese-English EBMT. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 27–29

  • Zuo Y, Zhou Y, Zong C (2004) Multi-engine based Chinese-to-English translation system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 73–77

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yves Lepage.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lepage, Y., Denoual, E. Purest ever example-based machine translation: Detailed presentation and assessment. Machine Translation 19, 251–282 (2005). https://doi.org/10.1007/s10590-006-9010-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-006-9010-x

Keywords

Navigation