Machine Translation

, Volume 19, Issue 3–4, pp 251–282 | Cite as

Purest ever example-based machine translation: Detailed presentation and assessment

Example-Based Mt

Abstract

We have designed, implemented and assessed an EBMT system that can be dubbed the “purest ever built”: it strictly does not make any use of variables, templates or patterns, does not have any explicit transfer component, and does not require any preprocessing or training of the aligned examples. It uses only a specific operation, proportional analogy, that implicitly neutralizes divergences between languages and captures lexical and syntactic variations along the paradigmatic and syntagmatic axes without explicitly decomposing sentences into fragments. Exactly the same genuine implementation of such a core engine was evaluated on different tasks and language pairs. To begin with, we compared our system on two tasks of a previous MT evaluation campaign to rank it among other current state-of-the-art systems. Then, we illustrated the “universality” of our system by participating in a recent MT evaluation campaign, with exactly the same core engine, for a wide variety of language pairs. Finally, we studied the influence of extra data like dictionaries and paraphrases on the system performance.

Keywords

Example-based machine translation Proportional analogies Divergences across languages 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aha DW (1997) Editorial. Artif Intell Rev 11:7–10CrossRefGoogle Scholar
  2. Aha DW (1998) Feature weighting for lazy learning algorithms. In: Liu H, Motoda H (eds) Feature extraction, construction and selection: A data mining perspective. Kluwer Dordrecht, The Netherlands, pp. 13–32Google Scholar
  3. Akiba Y, Federico M, Kando N, Nakaiwa H, Paul M, Tsujii J (2004) Overview of the IWSLT04 evaluation campaign. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 1–12Google Scholar
  4. Allison L, Dix TI (1986) A bit string longest common subsequence algorithm. Inform Proc Lett 23:305–310CrossRefGoogle Scholar
  5. Amores JG, Mora JP (1998) Machine translation of motion verbs from English to Spanish. In: Martín-Vide C (ed) pp 191–206Google Scholar
  6. Aramaki E, Kurohashi S (2004) Example-based machine translation using structual translation examples. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 91–94Google Scholar
  7. Bender O, Zens R, Matusov E, Ney H (2004) Alignment templates: the RWTH SMT system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp. 79–84Google Scholar
  8. Bertoldi N, Cattoni R, Cettolo M, Federico M (2004) The ITC-irst statistical machine translation system for IWSLT-2004. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, 51–58Google Scholar
  9. Blanchon H, Boitet C, Brunet-Manquat F, Tomokiyo M, Hamon A, Hung VT, Bey Y (2004) Towards fairer evaluations of commercial MT systems on basic travel expressions corpora. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, 21–26Google Scholar
  10. Bloomfield L (1933) Language. Holt. New York, NYGoogle Scholar
  11. Brown PE, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Ling 19:263–311Google Scholar
  12. Carl M (1998) A constructivist approach to machine translation. In: Proceedings of the joint conference on new methods in language processing and computational natural language learning, NeMLaP3/CoNLL98. Macquarie University, [Sydney, Australia], pp 247–256Google Scholar
  13. Carl M (2006) A system-theoretic view of EBMT. Mach Translat 19:147–167Google Scholar
  14. Carl M, Way A (eds) (2003) Recent advances in example-based machine translation. Kluwer Academic Publishers, Dordrecht, The NetherlandsGoogle Scholar
  15. Claveau V, L’Homme M-C (2005) Apprentissage par analogie pour la structuration de terminologie—Utilisation comparée de ressources endogènes et exogènes [Analogical learning of terminological structure—Comparison of the use of endogenous and exogenous resources]. In: TIA 2005: 6èmes rencontres terminologie et intelligence artificielle. Rouen, France, p 10Google Scholar
  16. Damper RI, Eastman JEG (1996) Pronouncing text by analogy. In: COLING-96: The 16th international conference on computational linguistics. Copenhagen, Denmark, pp 268–269Google Scholar
  17. Delhay A, Miclet L (2004) Analogical equations in sequences: Definition and resolution. In: Paliouras G, Sakakibara Y (eds) Grammatical inference: Algorithms and applications, 7th international colloquium, ICGI 2004. Springer, Berlin, Germany, pp 127–138Google Scholar
  18. Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: ARPA workshop on human language technology notebook proceedings. San Diego, CA, pp 139–145Google Scholar
  19. Dorr BJ (1994) Machine translation divergences. Comput Ling 20:597–633Google Scholar
  20. Dorr BJ, Pearl L, Hwa R, Habash N (2002) DUSTer: A method for unraveling cross-language divergences for statistical word-level alignment. In: Richardson S (ed) Machine translation: From research to real users (Fifth conference of the Association for Machine Translation in the Americas AMTA-2002. Tiburon, CA, USA, ...), Springer, Berlin, pp 31–43Google Scholar
  21. Gentner D (1983) Structure mapping: a theoretical model for analogy. Cognitive Sci 7:155–170CrossRefGoogle Scholar
  22. Habash N (2002) Generation-heavy hybrid machine translation. In: Proceedings of the international natural language generation conference (INLG’02). New York, NY, pp 185–191Google Scholar
  23. Harris ZS (1954) Distributional structure. Word 10:146–162Google Scholar
  24. Harris Z (1982) A grammar of English on mathematical principles. J Wiley, New York, NYGoogle Scholar
  25. Hathout N (2001) Analogies morpho-synonymiques: Une méthode d’acquisition automatique de liens morphologiques à partir d’un dictionnaire de synonymes [Morpho-synonymic analogies: A method of automatically acquiring morphological links starting from a synonym dictionary]. In: TALN-Récital 2001: 8ème conférence sur le traitement automatique des langues naturelles et 5ème rencontre des étudiants chercheurs en informatique pour le traitement des langues. Tours, France, pp 223–232Google Scholar
  26. Hofstadter D, Fluid Analogies Research Group (1994) Fluid concepts and creative analogies. Basic Books, New York, NYGoogle Scholar
  27. Hou H, Deng D, Zou G, Yu H, Liu Y, Xiong D, Liu Q (2004) An EBMT system based on word alignment. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 47–49Google Scholar
  28. Hutchins J (2006) Example-based machine translation: a review and commentary. Mach Translat 19:116–130Google Scholar
  29. Ilie L (1998) On ambiguity in internal contextual languages. In: Martín-Vide C (ed) pp 29–46Google Scholar
  30. Itkonen E (1999) Grammaticalization: Abduction, analogy, and rational explanation. In: Shapiro M, Haley M (eds) The Peirce seminar papers: Essays in semiotic analysis vol IV. Berghahn Books. Oxford, England, pp 159–175Google Scholar
  31. Joshi A, Vijay-Shanker K, Weir D (1991) The convergence of mildly context-sensitive grammar formalisms. In: Sells P, Shieber SM, Wasow T (eds) Foundational issues in natural language processing. MIT Press, Cambridge, MA, pp 31–81Google Scholar
  32. Lee Y-S, Roukos S (2004) IBM spoken language translation system evaluation. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 39–46Google Scholar
  33. Lepage Y (1998) Solving analogies on words: An algorithm. In: COLING-ACL ’98: 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics. Montreal, Quebec, Canada, pp 728–735Google Scholar
  34. Lepage Y (2001) Analogy and formal languages. In: Proceedings of the joint meeting of the sixth conference on formal grammar and the seventh conference on the mathematics of language (FG/MOL 2001). Helsinki, Finland, pp 1–12Google Scholar
  35. Lepage Y (2003) De l’analogie rendant compte de la commutation en linguistique [On analogy considering commutation in linguistics]. Mémoire d’habilitation à diriger les recherches. Université de Grenoble, Grenoble, FranceGoogle Scholar
  36. Lepage Y (2004) Lower and higher estimates of the number of “true analogies” between sentences contained in a large multilingual corpus. In: Coling: 20th international conference on computational linguistics. Geneva, Switzerland, pp 736–742Google Scholar
  37. Lepage Y, Peralta G (2004) Using paradigm tables to generate new utterances similar to those existing in linguistic resources. In: Proceedings of the fourth international conference on language resources and evaluation (LREC-2004). Lisbon, Portugal, pp 243–246Google Scholar
  38. Levenshtein VI Levenshte Open image in new windown, VI] (1965) [Dvoichnye kody s ispravleniem vypadeniOpen image in new window, vstavok i zameshcheniOpen image in new window simbolov. Dokl Akad Nauk SSSR] 163:845–848; appeared (1966) as Binary codes capable of correcting deletions, insertions and reversals, Sov Phys Dokl 10:707–710Google Scholar
  39. Martín-Vide C (ed) (1998) Mathematical and computational analysis of natural language. John Benjamins, Amsterdam, The Netherlands/Philadelphia, PAGoogle Scholar
  40. Papineni K, Roukos S, Ward T, Zhu W-J (2002) bleu: A method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, pp 311–318Google Scholar
  41. Paul H (1920) Prinzipien der Sprachgeschichte [Principles of the history of language]. Niemayer, Tübingen, GermanyGoogle Scholar
  42. Polański K (1984) Słownik syntaktyczno-generatywny czasowników polskich [Syntactic-generative dictionary of Polish verbs]. Wydawnictwo im. Ossolińskich, WarszawaGoogle Scholar
  43. Reichert J, Waibel A (2004) The ISL EDTRL system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, 61–64Google Scholar
  44. Sager N (1981) Natural language information processing: A computer grammar of English and its applications. Addison-Wesley, Reading, MAGoogle Scholar
  45. Salkoff M (1973) Une grammaire en chaîne du français [A string grammar of French]. Dunod, Paris, FranceGoogle Scholar
  46. Sasayama M, Ren F, Kuroiwa S (2003) Super-function based Japanese-English machine translation system. In: NLPK-KE 2003: International conference on natural language processing and knowledge engineering. Beijing, China, pp 555–560Google Scholar
  47. Sato S (1991) Example-based machine translation. PhD thesis, Kyoto University, Kyoto, JapanGoogle Scholar
  48. de Saussure F (1955) Cours de linguistique générale [A course in general linguistics]. Payot, Lausanne, SwitzerlandGoogle Scholar
  49. Shieber SM (1985) Evidence against the context-freeness of natural language. Ling Philos 8:333–343CrossRefGoogle Scholar
  50. Skousen R (1989) Analogical modeling of language. Kluwer, Dordrecht, The NetherlandsGoogle Scholar
  51. Stephen GA (1994) String searching algorithms. World Scientific, SingaporeGoogle Scholar
  52. Stroppa N, Yvon F (2005) An analogical learner for morphological analysis. In: CoNLL-2005: Ninth conference on computational natural language learning. Ann Arbor, MI, pp 120–127Google Scholar
  53. Sumita E (2003) EBMT using DP-matching between word sequences. In: Carl M, Way A, (eds) Recent advances in example-based machine translation. Kluwer Academic Publishers, Dordrecht, The Netherlands, pp 189–209Google Scholar
  54. Sumita E, Akiba Y, Doi T, Finch A, Imamura K, Okuma H, Paul M, Shimohata M, Watanabe T (2004) EBMT, SMT, hybrid and more: ATR spoken language translation system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 13–20Google Scholar
  55. Sumita E, Iida H (1991) Experiments and prospects of example-based machine translation. In: 29th annual meeting of the Association for Computational Linguistics. Berkeley, CA, pp 185–192Google Scholar
  56. Takezawa T, Sumita E, Sugaya F, Yamamoto H, Yamamoto S (2002) Toward a broad-coverage bilingual corpus for speech translation of travel conversations in the real world. In: LREC 2002: Third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 147–152Google Scholar
  57. Thayer I, Ettelaie E, Knight K, Marcu D, Munteanu DS, Och FJ, Tipu Q (2004) The ISI/USC system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 59–60Google Scholar
  58. Turian JP, Shen L, Melamed ID (2003) Evaluation of machine translation and its evaluation. In: MT Summit IX: Proceedings of the ninth machine translation summit New Orleans, USA, pp 386–393Google Scholar
  59. Ukkonen E (1985) Algorithms for approximate string matching. Inform Control 64: 100–118CrossRefGoogle Scholar
  60. Vogel S, Hewavitharna S, Kolss M, Waibel A (2004) The ISL statistical translation system for spoken language translation. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 65–72Google Scholar
  61. Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J Assoc Comput Mach 21:168–173Google Scholar
  62. Wu D (1997) Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Comput Ling 23:377–403Google Scholar
  63. Yamamoto K (2004) Interaction between paraphraser and transfer for spoken language translation.Open image in new window J Nat Lang Proc 11.5:63–86Google Scholar
  64. Yang M, Zhao T, Liu H, SHi X, Jiang H (2004) Auto word alignment based Chinese-English EBMT. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 27–29Google Scholar
  65. Zuo Y, Zhou Y, Zong C (2004) Multi-engine based Chinese-to-English translation system. In: Proceedings of the international workshop on spoken language translation. Kyoto, Japan, pp 73–77Google Scholar

Copyright information

© Springer Science+Business Media 2006

Authors and Affiliations

  1. 1.ATR Spoken Language Communication Research LabsKyōtoJapan
  2. 2.Université de CaenCaen CedexFrance
  3. 3.GETA-CLIPS-IMAGUniversité Joseph FourierGrenoble Cedex 9France

Personalised recommendations