Machine Translation

, 25:87

Deep open-source machine translation

  • Francis Bond
  • Stephan Oepen
  • Eric Nichols
  • Dan Flickinger
  • Erik Velldal
  • Petter Haugereid
Article

Abstract

This paper summarizes ongoing efforts to provide software infrastructure (and methodology) for open-source machine translation that combines a deep semantic transfer approach with advanced stochastic models. The resulting infrastructure combines precise grammars for parsing and generation, a semantic-transfer based translation engine and stochastic controllers. We provide both a qualitative and quantitative experience report from instantiating our general architecture for Japanese–English MT using only open-source components, including HPSG-based grammars of English and Japanese.

Keywords

Machine translation Open source Semantic transfer HPSG MRS 

References

  1. Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Association for Computational Linguistics, Ann Arbor, Michigan, pp 65–72Google Scholar
  2. Barreiro A, Scott B, Kasper W, Kiefer B (2011) OpenLogos machine translation: philosophy, model, resources and customization. Mach Transl 25 (this volume)Google Scholar
  3. Bond F, Breen J (2007) Semi-automatic refinement of the JMdict/EDICT Japanese–English dictionary. In: 13th annual meeting of the Association for Natural Language Processing, Kyoto, pp 364–367Google Scholar
  4. Bond F, Oepen S, Siegel M, Copestake A, Flickinger D (2005) Open source machine translation with DELPH-IN. In: Open-source machine translation: workshop at MT Summit X, Phuket, pp 15–22Google Scholar
  5. Bond F, Kuribayashi T, Hashimoto C (2008) Construction of a free Japanese treebank based on HPSG. In: 14th annual meeting of the Association for Natural Language Processing, Tokyo, pp 241–244 (in Japanese)Google Scholar
  6. Bond F, Isahara H, Uchimoto K, Kuribayashi T, Kanzaki K (2010) Japanese WordNet 1.0. In: 16th annual meeting of the Association for Natural Language Processing, Tokyo, pp A3–A5Google Scholar
  7. Breen JW (2004) JMDict: a Japanese-multilingual dictionary. In: Coling 2004 workshop on multilingual linguistic resources, Geneva, pp 71–78Google Scholar
  8. Burnard L (2000) The British National Corpus users reference guide. Oxford University Computing Services, OxfordGoogle Scholar
  9. Callmeier U (2002) Preprocessing and encoding techniques in PET. In: Oepen S, Flickinger D, Tsujii J, Uszkoreit H (eds) Collaborative language engineering. A case study in efficient grammar-based processing. CSLI Publications, Stanford, CAGoogle Scholar
  10. Carroll J, Oepen S (2005) High-efficiency realization for a wide-coverage unification grammar. In: Dale R, Wong KF (eds) Proceedings of the 2nd International Joint Conference on Natural Language Processing (Jeju, Korea). Lecture Notes in Artificial Intelligence, vol 3651. Springer, pp 165–176Google Scholar
  11. Copestake A (2002) Implementing typed feature structure grammars. CSLI Publications, Stanford, CAMATHGoogle Scholar
  12. Copestake A (2009) Slacker semantics: why superficiality, dependency and avoidance of commitment can be the right way to go. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009), Athens, pp 1–9Google Scholar
  13. Copestake A, Flickinger D, Pollard C, Sag IA (2005) Minimal recursion semantics. An introduction. J Res Lang Comput 3(4): 281–332CrossRefGoogle Scholar
  14. Dyvik H (1999) The universality of f-structure. Discovery or stipulation? The case of modals. In: Proceedings of the 4th International Lexical Functional Gammar Conference, Manchester, UKGoogle Scholar
  15. Flickinger D (2000) On building a more efficient grammar by exploiting types. Nat Lang Eng 6(1): 15–28CrossRefGoogle Scholar
  16. Forcada ML, Ginestí-Rosell M, Nordfalk J, O’Regan J, Ortiz-Rojas S, Pérez-Ortiz JA, Sánchez-Martínez F, Ramírez-Sánchez G, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Trans 25 (this volume)Google Scholar
  17. Fujita S, Bond F, Oepen S, Tanaka T (2007) Exploiting semantic information for HPSG parse selection. In: Proceedings of the first ACL workshop on deep linguistic processing, Prague, Czech Republic, pp 25–32Google Scholar
  18. Haugereid P, Bond F (2011) Extracting transfer rules for multiword expressions from parallel corpora. In: Proceedings of the workshop on multiword expressions: from parsing and generation to the real world. ACL HLT 2011, Portland, Oregon, pp 92–100Google Scholar
  19. Ikehara S, Shirai S, Bond F (1996) Approaches to disambiguation in ALT-J/E. In: International seminar on multimodal interactive disambiguation: MIDDIM-96, Grenoble, pp 107–117Google Scholar
  20. Jellinghaus M (2007) Automatic acquisition of semantic transfer rules for machine translation. Master’s thesis, Universität des SaarlandesGoogle Scholar
  21. Koehn P, Shen W, Federico M, Bertoldi N, Callison-Burch C, Cowan B, Dyer C, Hoang H, Bojar O, Zens R, Constantin A, Herbst E, Moran C, Birch A (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 interactive presentation sessions, PragueGoogle Scholar
  22. Lardilleux A, Lepage Y (2009) Sampling-based multilingual alignment. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2009), Borovets, pp 214–218Google Scholar
  23. Mayor A, Alegria I, Díazde Ilarraza A, Labaka G, Lersundi M, Sarasola K (2011) Matxin, an open-source rule-based machine translation system for Basque. Mach Trans. 25(1): 53–82CrossRefGoogle Scholar
  24. Mel’čuk I, Wanner L (2006) Syntactic mismatches in machine translation. Mach Trans 20: 81–138Google Scholar
  25. Nichols E, Bond F, Appling DS, Matsumoto Y (2007) Combining resources for open source machine translation. In: The 11th international conference on theoretical and methodological issues in machine translation (TMI-07), Skövde, pp 134–142Google Scholar
  26. Nichols E, Bond F, Appling DS, Matsumoto Y (2010) Paraphrasing training data for statistical machine translation. J Nat Lang Process 17(3): 101–122 (Special issue on empirical methods in Asian language processing)CrossRefGoogle Scholar
  27. Nygaard L, Lønning JT, Nordgård T, Oepen S (2006) Using a bi-lingual dictionary in lexical transfer. In: Proceedings of the 11th conference of the European Association for Machine Translation, Oslo, Norway, pp 233–238Google Scholar
  28. Och FJ (2005) Statistical machine translation: foundations and recent advances. In: MT Summit X. Tutorial, PhuketGoogle Scholar
  29. Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302Google Scholar
  30. Oepen S, Flickinger DP (1998) Towards systematic grammar profiling. Test suite technology ten years after. J Comput Speech Lang 12(4): 411–436 ((Special issue on evaluation))Google Scholar
  31. Oepen S, Lønning JT (2006) Discriminant-based MRS banking. In: Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy, pp 1250–1255Google Scholar
  32. Oepen S, Dyvik H, Lønning JT, Velldal E, Beermann D, Carroll J, Flickinger D, Hellan L, Johannessen JB, Meurer P, Nordgård T, Rosén V (2004a) Som å kapp-ete med trollet? Towards MRS-based Norwegian–English machine translation. In: Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, Baltimore, MD, pp 11–20Google Scholar
  33. Oepen S, Flickinger D, Toutanova K, Manning CD (2004b) LinGO Redwoods. A rich and dynamic treebank for HPSG. J Res Lang Comput 2(4): 575–596CrossRefGoogle Scholar
  34. Oepen S, Velldal E, Lønning JT, Meurer P, Rosén V, Flickinger D (2007) Towards hybrid quality-oriented machine translation. On linguistics and probabilities in MT. In: Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, Skövde, Sweden, pp 144–153Google Scholar
  35. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics: ACL-2002, pp 311–318Google Scholar
  36. Paul M (2006) Overview of the IWSLT 2006 evaluation campaign. In: Proceedings of the International Workshop on Spoken Language Translation, Kyoto, Japan, pp 1–15Google Scholar
  37. Siegel M, Bender EM (2002) Efficient deep processing of Japanese. In: Proceedings of the 3rd workshop on asian language resources and international standardization at the 19th international conference on computational linguistics, Taipei, pp 1–8Google Scholar
  38. Sukehiro T, Kitamura M, Murata T (2001) Collaborative translation environment ‘Yakushite.Net’. In: Proceedings of the sixth Natural Language Processing Pacific Rim Symposium: NLPRS-2001, Tokyo, pp 769–770Google Scholar
  39. Tanaka Y (2001) Compilation of a multilingual parallel corpus. In: Proceedings of PACLING 2001, Kyushu, pp 265–268Google Scholar
  40. Uchimoto K, Zhang Y, Sudo K, Murata M, Sekine S, Isahara H (2004) Multilingual aligned parallel treebank corpus reflecting contextual information and its applications. In: Sérasset G (ed) COLING 2004 multilingual linguistic resources, COLING, Geneva, Switzerland, pp 57–64Google Scholar
  41. Velldal E, Oepen S (2006) Statistical ranking in tactical generation. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Sydney, Australia, pp 517–525Google Scholar
  42. Way A (1999) A hybrid architecture for robust MT using LFG-DOP. J Exper Theor Artif Intell 11: 441–471 (Special issue on memory-based language processing)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Francis Bond
    • 1
  • Stephan Oepen
    • 2
  • Eric Nichols
    • 3
  • Dan Flickinger
    • 4
  • Erik Velldal
    • 2
  • Petter Haugereid
    • 1
  1. 1.Division of Linguistics and Multilingual StudiesNanyang Technological UniversitySingaporeSingapore
  2. 2.Department of InformaticsUniversity of OsloOsloNorway
  3. 3.Graduate School of Information SciencesTohoku UniversitySendaiJapan
  4. 4.Center for the Study of Language and InformationStanford UniversityStanfordUSA

Personalised recommendations