Skip to main content
Log in

Deep open-source machine translation

  • Published:
Machine Translation

Abstract

This paper summarizes ongoing efforts to provide software infrastructure (and methodology) for open-source machine translation that combines a deep semantic transfer approach with advanced stochastic models. The resulting infrastructure combines precise grammars for parsing and generation, a semantic-transfer based translation engine and stochastic controllers. We provide both a qualitative and quantitative experience report from instantiating our general architecture for Japanese–English MT using only open-source components, including HPSG-based grammars of English and Japanese.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Association for Computational Linguistics, Ann Arbor, Michigan, pp 65–72

  • Barreiro A, Scott B, Kasper W, Kiefer B (2011) OpenLogos machine translation: philosophy, model, resources and customization. Mach Transl 25 (this volume)

  • Bond F, Breen J (2007) Semi-automatic refinement of the JMdict/EDICT Japanese–English dictionary. In: 13th annual meeting of the Association for Natural Language Processing, Kyoto, pp 364–367

  • Bond F, Oepen S, Siegel M, Copestake A, Flickinger D (2005) Open source machine translation with DELPH-IN. In: Open-source machine translation: workshop at MT Summit X, Phuket, pp 15–22

  • Bond F, Kuribayashi T, Hashimoto C (2008) Construction of a free Japanese treebank based on HPSG. In: 14th annual meeting of the Association for Natural Language Processing, Tokyo, pp 241–244 (in Japanese)

  • Bond F, Isahara H, Uchimoto K, Kuribayashi T, Kanzaki K (2010) Japanese WordNet 1.0. In: 16th annual meeting of the Association for Natural Language Processing, Tokyo, pp A3–A5

  • Breen JW (2004) JMDict: a Japanese-multilingual dictionary. In: Coling 2004 workshop on multilingual linguistic resources, Geneva, pp 71–78

  • Burnard L (2000) The British National Corpus users reference guide. Oxford University Computing Services, Oxford

    Google Scholar 

  • Callmeier U (2002) Preprocessing and encoding techniques in PET. In: Oepen S, Flickinger D, Tsujii J, Uszkoreit H (eds) Collaborative language engineering. A case study in efficient grammar-based processing. CSLI Publications, Stanford, CA

    Google Scholar 

  • Carroll J, Oepen S (2005) High-efficiency realization for a wide-coverage unification grammar. In: Dale R, Wong KF (eds) Proceedings of the 2nd International Joint Conference on Natural Language Processing (Jeju, Korea). Lecture Notes in Artificial Intelligence, vol 3651. Springer, pp 165–176

  • Copestake A (2002) Implementing typed feature structure grammars. CSLI Publications, Stanford, CA

    MATH  Google Scholar 

  • Copestake A (2009) Slacker semantics: why superficiality, dependency and avoidance of commitment can be the right way to go. In: Proceedings of the 12th conference of the European chapter of the ACL (EACL 2009), Athens, pp 1–9

  • Copestake A, Flickinger D, Pollard C, Sag IA (2005) Minimal recursion semantics. An introduction. J Res Lang Comput 3(4): 281–332

    Article  Google Scholar 

  • Dyvik H (1999) The universality of f-structure. Discovery or stipulation? The case of modals. In: Proceedings of the 4th International Lexical Functional Gammar Conference, Manchester, UK

  • Flickinger D (2000) On building a more efficient grammar by exploiting types. Nat Lang Eng 6(1): 15–28

    Article  Google Scholar 

  • Forcada ML, Ginestí-Rosell M, Nordfalk J, O’Regan J, Ortiz-Rojas S, Pérez-Ortiz JA, Sánchez-Martínez F, Ramírez-Sánchez G, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Trans 25 (this volume)

  • Fujita S, Bond F, Oepen S, Tanaka T (2007) Exploiting semantic information for HPSG parse selection. In: Proceedings of the first ACL workshop on deep linguistic processing, Prague, Czech Republic, pp 25–32

  • Haugereid P, Bond F (2011) Extracting transfer rules for multiword expressions from parallel corpora. In: Proceedings of the workshop on multiword expressions: from parsing and generation to the real world. ACL HLT 2011, Portland, Oregon, pp 92–100

  • Ikehara S, Shirai S, Bond F (1996) Approaches to disambiguation in ALT-J/E. In: International seminar on multimodal interactive disambiguation: MIDDIM-96, Grenoble, pp 107–117

  • Jellinghaus M (2007) Automatic acquisition of semantic transfer rules for machine translation. Master’s thesis, Universität des Saarlandes

  • Koehn P, Shen W, Federico M, Bertoldi N, Callison-Burch C, Cowan B, Dyer C, Hoang H, Bojar O, Zens R, Constantin A, Herbst E, Moran C, Birch A (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the ACL 2007 interactive presentation sessions, Prague

  • Lardilleux A, Lepage Y (2009) Sampling-based multilingual alignment. In: Proceedings of Recent Advances in Natural Language Processing (RANLP 2009), Borovets, pp 214–218

  • Mayor A, Alegria I, Díazde Ilarraza A, Labaka G, Lersundi M, Sarasola K (2011) Matxin, an open-source rule-based machine translation system for Basque. Mach Trans. 25(1): 53–82

    Article  Google Scholar 

  • Mel’čuk I, Wanner L (2006) Syntactic mismatches in machine translation. Mach Trans 20: 81–138

    Google Scholar 

  • Nichols E, Bond F, Appling DS, Matsumoto Y (2007) Combining resources for open source machine translation. In: The 11th international conference on theoretical and methodological issues in machine translation (TMI-07), Skövde, pp 134–142

  • Nichols E, Bond F, Appling DS, Matsumoto Y (2010) Paraphrasing training data for statistical machine translation. J Nat Lang Process 17(3): 101–122 (Special issue on empirical methods in Asian language processing)

    Article  Google Scholar 

  • Nygaard L, Lønning JT, Nordgård T, Oepen S (2006) Using a bi-lingual dictionary in lexical transfer. In: Proceedings of the 11th conference of the European Association for Machine Translation, Oslo, Norway, pp 233–238

  • Och FJ (2005) Statistical machine translation: foundations and recent advances. In: MT Summit X. Tutorial, Phuket

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302

  • Oepen S, Flickinger DP (1998) Towards systematic grammar profiling. Test suite technology ten years after. J Comput Speech Lang 12(4): 411–436 ((Special issue on evaluation))

    Google Scholar 

  • Oepen S, Lønning JT (2006) Discriminant-based MRS banking. In: Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), Genoa, Italy, pp 1250–1255

  • Oepen S, Dyvik H, Lønning JT, Velldal E, Beermann D, Carroll J, Flickinger D, Hellan L, Johannessen JB, Meurer P, Nordgård T, Rosén V (2004a) Som å kapp-ete med trollet? Towards MRS-based Norwegian–English machine translation. In: Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, Baltimore, MD, pp 11–20

  • Oepen S, Flickinger D, Toutanova K, Manning CD (2004b) LinGO Redwoods. A rich and dynamic treebank for HPSG. J Res Lang Comput 2(4): 575–596

    Article  Google Scholar 

  • Oepen S, Velldal E, Lønning JT, Meurer P, Rosén V, Flickinger D (2007) Towards hybrid quality-oriented machine translation. On linguistics and probabilities in MT. In: Proceedings of the 10th International Conference on Theoretical and Methodological Issues in Machine Translation, Skövde, Sweden, pp 144–153

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the Association for Computational Linguistics: ACL-2002, pp 311–318

  • Paul M (2006) Overview of the IWSLT 2006 evaluation campaign. In: Proceedings of the International Workshop on Spoken Language Translation, Kyoto, Japan, pp 1–15

  • Siegel M, Bender EM (2002) Efficient deep processing of Japanese. In: Proceedings of the 3rd workshop on asian language resources and international standardization at the 19th international conference on computational linguistics, Taipei, pp 1–8

  • Sukehiro T, Kitamura M, Murata T (2001) Collaborative translation environment ‘Yakushite.Net’. In: Proceedings of the sixth Natural Language Processing Pacific Rim Symposium: NLPRS-2001, Tokyo, pp 769–770

  • Tanaka Y (2001) Compilation of a multilingual parallel corpus. In: Proceedings of PACLING 2001, Kyushu, pp 265–268

  • Uchimoto K, Zhang Y, Sudo K, Murata M, Sekine S, Isahara H (2004) Multilingual aligned parallel treebank corpus reflecting contextual information and its applications. In: Sérasset G (ed) COLING 2004 multilingual linguistic resources, COLING, Geneva, Switzerland, pp 57–64

  • Velldal E, Oepen S (2006) Statistical ranking in tactical generation. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Sydney, Australia, pp 517–525

  • Way A (1999) A hybrid architecture for robust MT using LFG-DOP. J Exper Theor Artif Intell 11: 441–471 (Special issue on memory-based language processing)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francis Bond.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bond, F., Oepen, S., Nichols, E. et al. Deep open-source machine translation. Machine Translation 25, 87–105 (2011). https://doi.org/10.1007/s10590-011-9099-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9099-4

Keywords

Navigation