Machine Translation

, Volume 22, Issue 1–2, pp 67–99

METIS-II: low resource machine translation

  • Michael Carl
  • Maite Melero
  • Toni Badia
  • Vincent Vandeghinste
  • Peter Dirix
  • Ineke Schuurman
  • Stella Markantonatou
  • Sokratis Sofianopoulos
  • Marina Vassiliou
  • Olga Yannoutsou
Article

Abstract

METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their “home” languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions.

Keywords

Low resource MT Statistical MT Pattern-based MT Shallow linguistic processing for MT 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abdelali A, Cowie J, Helmreich S, Jin W, Milagros MP, Ogden B, Rad HM, Zacharski R (2006) Guarani: a case study in resource development for quick ramp-up MT. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 1–9Google Scholar
  2. Alegria I, de Ilarraza AD, Labaka G, Lersundi M, Mayor A, Sarasola K, Forcada ML, Ortiz-Rojas S, Padró L (2005) An open architecture for transfer-based machine translation between Spanish and Basque. In: MT summit X workshop OSMaTran: open-source machine translation. Phuket, Thailand, pp 7–14Google Scholar
  3. Alsina A, Badia T, Boleda G, Bott S, Gil A, Quixal M, Valentí O (2002) CATCG: a general purpose parsing tool applied. In: LREC 2002 third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 1130–1134Google Scholar
  4. Anastasiou D, Čulo O (2007) Using topological information for detecting idiomatic verb phrases in German. In: Proceedings of the conference on practical applications in language and computers (PALC). Łódź, Poland, pp 49–58Google Scholar
  5. Badia T, Boleda G, Melero M, Oliver A (2005) An n-gram approach to exploiting a monolingual corpus for machine translation. In: MT summit X workshop: second workshop on example-based machine translation, Phuket, Thailand, pp 1–7Google Scholar
  6. Badia T, Melero M, Valentín O (2008) Rapid deployment of a new METIS language pair: Catalan-English. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, p 96Google Scholar
  7. Boutsis S, Prokopidis P, Giouli V, Piperidis S (2000) A robust parser for unrestricted Greek text. In: Proceedings of the 2nd international conference on language resources and evaluation (LREC). Athens, Greece, pp 467–482Google Scholar
  8. Brants T (2000) TnT—a statistical part-of-speech tagger. In: Association for Computational Linguistics 6th applied natural language processing conference. Seattle, Washington , pp 224–231Google Scholar
  9. Brown R, Frederking R (1995) Applying statistical English language modeling to symbolic machine translation. In: Proceedings of the sixth international conference on theoretical and methodological issues in machine translation (TMI 95). Leuven, Belgium, pp 221–239Google Scholar
  10. Carl M (2007) METIS-II: the German to English MT system. In: Machine translation summit XI. Copenhagen, Denmark, pp 65–72Google Scholar
  11. Carl M (2008) Using log-linear models for tuning machine translation output. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, pp 49–56Google Scholar
  12. Carl M, Rascu E, (2006) A dictionary lookup strategy for translating discontinuous phrases. In: 11th annual conference of the European Association for Machine Translation. Oslo, Norway, pp 49–58Google Scholar
  13. Carl M, Schmidt P, Schütz J (2005) Reversible template-based shake & bake generation. In: MT summit X Workshop: second workshop on example-based machine translation. Phuket, Thailand, pp 17–26Google Scholar
  14. Carpuat M, Wu D (2007) How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI-07). Skövde, Sweden, pp 43–52Google Scholar
  15. de Gispert A, Mariño JB (2006) Catalan-English statistical machine translation without parallel corpus: bridging through Spanish. In: LREC 2006 satellite workshop W06: strategies for developing machine translation for minority languages. Genova, Italy, pp 65–68Google Scholar
  16. Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the second conference on human language technology conference (HLT-2002). San Diego, CA, pp 128–132Google Scholar
  17. Dologlou I, Markantonatou S, Tambouratzis G, Yannoutsou O, Fourla A, Ioannou N (2003) Using monolingual corpora for statistical machine translation: the METIS system. In: EAMT-CLAW 03, joint conference combining the 8th international workshop of the European Association for Machine Translation and the 4th controlled language applications workshop, controlled language translation. Dublin, Ireland, pp 61–68Google Scholar
  18. EAGLES (1994) Guidelines, EAG-LWG-T4-2. Technical report, ILC-CNR, Pisa, ItalyGoogle Scholar
  19. Engelbrecht H, Schultz T (2005) Rapid development of an Afrikaans English speech-to-speech translator. In: International workshop on spoken language translation: evaluation campaign on spoken language translation. Pittsburgh, PA, pp 24–25Google Scholar
  20. Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Association for Computational Linguistics, 39th annual meeting and 10th conference of the European chapter, proceedings of the conference. Toulouse, France, pp 228–235Google Scholar
  21. Habash N (2004) The use of a structural n-gram language model in generation-heavy hybrid machine translation. In: Belz A, Evans R, Piwek P (eds) Natural language generation, third international conference INLG 2004, Brockenhurst, UK, LNCS 3123. Springer Verlag, Berlin, Germany, pp 61–69Google Scholar
  22. Karlsson F,Voutilainen A, Heikkila J, Anttila A (eds) (1995) Constraint grammar: a language-independent formalism for parsing unrestricted text. Mouton de Gruyter, BerlinGoogle Scholar
  23. Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: FrederKing RE, Taylor KB (eds) Machine translation: from real users to research, 6th Conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, Proceedings, LNAI 3265, Springer Berlin, Germany, pp 115–124Google Scholar
  24. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT Summit X, the tenth machine translation summit. Phuket, Thailand, pp 79–86Google Scholar
  25. Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2: 88–97CrossRefGoogle Scholar
  26. Labropoulou P, Mantzari E, Gavrilidou M (1996) Lexicon – morphosyntactic specifications: language- specific instantiation (Greek). In: PP-PAROLE, MLAP report. Athens, Greece, pp 63–386Google Scholar
  27. Langkilde I, Knight K (1998) The practical value of n-grams in generation. In: Proceedings of the 9th international natural language workshop (INLG ’98). Niagara-on-the-Lake, Ontario, Canada, pp 248–255Google Scholar
  28. Lavie A, Peterson E, Probst K, Wintner S, Eytani Y (2004) Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation. Baltimore, MD, pp 1–10Google Scholar
  29. Maas H-D (1996) MPRO - Ein System zur Analyse und Synthese deutscher Wörter [A system for the analysis and synthesis of German words]. In: Hausser R (eds) Linguistische Verifikation, Sprache und Information. Max Niemeyer Verlag, Tübingen, pp 141–166Google Scholar
  30. Majithia H, Rennart P, Tzoukermann E (2005) Rapid ramp-up for statistical machine translation: minimal training for maximal coverage. In: Proceedings of the machine translation summit X. Phuket, Thailand, pp 438–444Google Scholar
  31. Markantonatou S, Sofianopoulos S, Spilioti V, Tambouratzis G, Vassiliou M, Yannoutsou O (2006) Using patterns for machine translation (MT). In: 11th annual conference of the European Association for Machine Translation. Oslo, Norway, pp 239–246Google Scholar
  32. Melero M, Oliver A, Badia T, Suñol T (2007) Dealing with bilingual divergences in MT using target language n-gram models. In: METIS-II workshop: new approaches to machine translation, Leuven, Belgium, pp 19–26Google Scholar
  33. METIS-II (2006) Validation/Evaluation framework. Public Report, D5.1, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_D5.1.pdf. Accessed 25 Aug 2008
  34. METIS-II (2007) Validation & Fine-tuning results for the first prototype. Public Report, D5.2, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_D5.2.pdf. Accessed 25 Aug 2008
  35. Müller FH (2004) Stylebook for the Tübingen partially parsed corpus of written German (Tüpp-D/Z). http://www.sfb441.uni-tuebingen.de/a1/Publikationen/stylebook-04.pdf. Accessed 19 Nov 2008
  36. Munkres J (1955) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1): 32–38Google Scholar
  37. Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual conference of the Association for Computational Linguistics. Philadelphia, PA, pp 295–302Google Scholar
  38. Oostdijk NHJ, Goedertier W, Van Eynde F, Boves L, Martens J-P, Moortgat M, Baayen H (2002) Experiences from the spoken Dutch corpus project. In: LREC 2002 third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 340–347Google Scholar
  39. Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th annual conference of the Association for Computational Linguistics. Philadelphia, PA, pp 311–318Google Scholar
  40. Pinkham J, Smets M (2002) Modular MT with a learned bilingual dictionary: rapid deployment of a new language pair. In: COLING 2002, Proceedings of the 19th international conference on computational linguistics. Taipei, Taiwan, pp 800–806Google Scholar
  41. Pytlik B, Yarowsky D (2006) Machine translation for languages lacking bitext via multilingual gloss transduction. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 156–165Google Scholar
  42. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 223–231Google Scholar
  43. Szopa R (2007) LRBL, a rule-based lemmatizer (with rules for Dutch). Technical report. Centre for Computational Linguistics, Leuven, BelgiumGoogle Scholar
  44. Tambouratzis G, Sofianopoulos S, Spilioti V, Vassiliou M, Yannoutsou O, Markantonatou S (2006) Pattern matching-based system for machine translation (MT). In: Antoniou G, Potamias G, Spyropoulos C, Plexousakis D (eds) Advances in artificial intelligence: 4th Hellenic conference on AI, SETN 2006. Heraklion, Crete, Greece, LNCS 3955. Springer, Berlin, pp 345–355Google Scholar
  45. Van Eynde F (2004) Part of speech tagging en lemmatisering van het corpus gesproken nederlands [Part of speech tagging and lemmatization of the spoken Dutch corpus]. Annotation protocol, Centrum voor Computerlinguïstiek, Leuven, BelgiumGoogle Scholar
  46. Vandeghinste V (2005) Manual for ShaRPa 2.1. User manual. Centre for Computational Linguistics, Leuven, BelgiumGoogle Scholar
  47. Vandeghinste V (2008) A hybrid modular machine translation system. Phd thesis, Netherlands Graduate School of Linguistics, Leuven, BelgiumGoogle Scholar
  48. Vandeghinste V, Dirix P, Schuurman I, Markantonatou S, Sofianopoulos S, Vassiliou M, Yannoutsou O, Badia T, Melero M, Boleda G, Carl M, Schmidt P (2008) Evaluation of a machine translation system for low resource languages: METIS-II. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, pp 96–103Google Scholar
  49. Vossen P, Bloksma L, Boersma P (1999) The Dutch Wordnet. Technical report. University of Amsterdam, AmsterdamGoogle Scholar
  50. Whitelock P (1992) Shake-and-bake translation. In: Proceedings of the fifteenth [sic] international conference on computational linguistics, COLING-92. Nantes, France, pp 784–791Google Scholar
  51. Zwarts S, Dras M (2007) Syntax-based word reordering in phrase-based statistical machine translation; why does it work? In: Machine translation summit XI. Copenhagen. Denmark, pp 559–566Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Michael Carl
    • 1
  • Maite Melero
    • 2
  • Toni Badia
    • 2
  • Vincent Vandeghinste
    • 3
  • Peter Dirix
    • 3
  • Ineke Schuurman
    • 3
  • Stella Markantonatou
    • 4
  • Sokratis Sofianopoulos
    • 4
  • Marina Vassiliou
    • 4
  • Olga Yannoutsou
    • 4
  1. 1.Institut für Angewandte InformationsforschungSaarbruckenGermany
  2. 2.GLiCom (Fundaci Barcelona Media-UPF)BarcelonaSpain
  3. 3.KU Leuven-Centrum voor ComputerlinguïstiekLeuvenBelgium
  4. 4.Institute for Language and Speech ProcessingAthensGreece

Personalised recommendations