Skip to main content
Log in

METIS-II: low resource machine translation

  • Published:
Machine Translation

Abstract

METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their “home” languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abdelali A, Cowie J, Helmreich S, Jin W, Milagros MP, Ogden B, Rad HM, Zacharski R (2006) Guarani: a case study in resource development for quick ramp-up MT. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 1–9

  • Alegria I, de Ilarraza AD, Labaka G, Lersundi M, Mayor A, Sarasola K, Forcada ML, Ortiz-Rojas S, Padró L (2005) An open architecture for transfer-based machine translation between Spanish and Basque. In: MT summit X workshop OSMaTran: open-source machine translation. Phuket, Thailand, pp 7–14

  • Alsina A, Badia T, Boleda G, Bott S, Gil A, Quixal M, Valentí O (2002) CATCG: a general purpose parsing tool applied. In: LREC 2002 third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 1130–1134

  • Anastasiou D, Čulo O (2007) Using topological information for detecting idiomatic verb phrases in German. In: Proceedings of the conference on practical applications in language and computers (PALC). Łódź, Poland, pp 49–58

  • Badia T, Boleda G, Melero M, Oliver A (2005) An n-gram approach to exploiting a monolingual corpus for machine translation. In: MT summit X workshop: second workshop on example-based machine translation, Phuket, Thailand, pp 1–7

  • Badia T, Melero M, Valentín O (2008) Rapid deployment of a new METIS language pair: Catalan-English. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, p 96

  • Boutsis S, Prokopidis P, Giouli V, Piperidis S (2000) A robust parser for unrestricted Greek text. In: Proceedings of the 2nd international conference on language resources and evaluation (LREC). Athens, Greece, pp 467–482

  • Brants T (2000) TnT—a statistical part-of-speech tagger. In: Association for Computational Linguistics 6th applied natural language processing conference. Seattle, Washington , pp 224–231

  • Brown R, Frederking R (1995) Applying statistical English language modeling to symbolic machine translation. In: Proceedings of the sixth international conference on theoretical and methodological issues in machine translation (TMI 95). Leuven, Belgium, pp 221–239

  • Carl M (2007) METIS-II: the German to English MT system. In: Machine translation summit XI. Copenhagen, Denmark, pp 65–72

  • Carl M (2008) Using log-linear models for tuning machine translation output. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, pp 49–56

  • Carl M, Rascu E, (2006) A dictionary lookup strategy for translating discontinuous phrases. In: 11th annual conference of the European Association for Machine Translation. Oslo, Norway, pp 49–58

  • Carl M, Schmidt P, Schütz J (2005) Reversible template-based shake & bake generation. In: MT summit X Workshop: second workshop on example-based machine translation. Phuket, Thailand, pp 17–26

  • Carpuat M, Wu D (2007) How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI-07). Skövde, Sweden, pp 43–52

  • de Gispert A, Mariño JB (2006) Catalan-English statistical machine translation without parallel corpus: bridging through Spanish. In: LREC 2006 satellite workshop W06: strategies for developing machine translation for minority languages. Genova, Italy, pp 65–68

  • Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the second conference on human language technology conference (HLT-2002). San Diego, CA, pp 128–132

  • Dologlou I, Markantonatou S, Tambouratzis G, Yannoutsou O, Fourla A, Ioannou N (2003) Using monolingual corpora for statistical machine translation: the METIS system. In: EAMT-CLAW 03, joint conference combining the 8th international workshop of the European Association for Machine Translation and the 4th controlled language applications workshop, controlled language translation. Dublin, Ireland, pp 61–68

  • EAGLES (1994) Guidelines, EAG-LWG-T4-2. Technical report, ILC-CNR, Pisa, Italy

  • Engelbrecht H, Schultz T (2005) Rapid development of an Afrikaans English speech-to-speech translator. In: International workshop on spoken language translation: evaluation campaign on spoken language translation. Pittsburgh, PA, pp 24–25

  • Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Association for Computational Linguistics, 39th annual meeting and 10th conference of the European chapter, proceedings of the conference. Toulouse, France, pp 228–235

  • Habash N (2004) The use of a structural n-gram language model in generation-heavy hybrid machine translation. In: Belz A, Evans R, Piwek P (eds) Natural language generation, third international conference INLG 2004, Brockenhurst, UK, LNCS 3123. Springer Verlag, Berlin, Germany, pp 61–69

    Google Scholar 

  • Karlsson F,Voutilainen A, Heikkila J, Anttila A (eds) (1995) Constraint grammar: a language-independent formalism for parsing unrestricted text. Mouton de Gruyter, Berlin

    Google Scholar 

  • Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: FrederKing RE, Taylor KB (eds) Machine translation: from real users to research, 6th Conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, Proceedings, LNAI 3265, Springer Berlin, Germany, pp 115–124

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT Summit X, the tenth machine translation summit. Phuket, Thailand, pp 79–86

  • Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2: 88–97

    Article  Google Scholar 

  • Labropoulou P, Mantzari E, Gavrilidou M (1996) Lexicon – morphosyntactic specifications: language- specific instantiation (Greek). In: PP-PAROLE, MLAP report. Athens, Greece, pp 63–386

  • Langkilde I, Knight K (1998) The practical value of n-grams in generation. In: Proceedings of the 9th international natural language workshop (INLG ’98). Niagara-on-the-Lake, Ontario, Canada, pp 248–255

  • Lavie A, Peterson E, Probst K, Wintner S, Eytani Y (2004) Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation. Baltimore, MD, pp 1–10

  • Maas H-D (1996) MPRO - Ein System zur Analyse und Synthese deutscher Wörter [A system for the analysis and synthesis of German words]. In: Hausser R (eds) Linguistische Verifikation, Sprache und Information. Max Niemeyer Verlag, Tübingen, pp 141–166

    Google Scholar 

  • Majithia H, Rennart P, Tzoukermann E (2005) Rapid ramp-up for statistical machine translation: minimal training for maximal coverage. In: Proceedings of the machine translation summit X. Phuket, Thailand, pp 438–444

  • Markantonatou S, Sofianopoulos S, Spilioti V, Tambouratzis G, Vassiliou M, Yannoutsou O (2006) Using patterns for machine translation (MT). In: 11th annual conference of the European Association for Machine Translation. Oslo, Norway, pp 239–246

  • Melero M, Oliver A, Badia T, Suñol T (2007) Dealing with bilingual divergences in MT using target language n-gram models. In: METIS-II workshop: new approaches to machine translation, Leuven, Belgium, pp 19–26

  • METIS-II (2006) Validation/Evaluation framework. Public Report, D5.1, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_D5.1.pdf. Accessed 25 Aug 2008

  • METIS-II (2007) Validation & Fine-tuning results for the first prototype. Public Report, D5.2, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_D5.2.pdf. Accessed 25 Aug 2008

  • Müller FH (2004) Stylebook for the Tübingen partially parsed corpus of written German (Tüpp-D/Z). http://www.sfb441.uni-tuebingen.de/a1/Publikationen/stylebook-04.pdf. Accessed 19 Nov 2008

  • Munkres J (1955) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1): 32–38

    Google Scholar 

  • Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual conference of the Association for Computational Linguistics. Philadelphia, PA, pp 295–302

  • Oostdijk NHJ, Goedertier W, Van Eynde F, Boves L, Martens J-P, Moortgat M, Baayen H (2002) Experiences from the spoken Dutch corpus project. In: LREC 2002 third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 340–347

  • Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th annual conference of the Association for Computational Linguistics. Philadelphia, PA, pp 311–318

  • Pinkham J, Smets M (2002) Modular MT with a learned bilingual dictionary: rapid deployment of a new language pair. In: COLING 2002, Proceedings of the 19th international conference on computational linguistics. Taipei, Taiwan, pp 800–806

  • Pytlik B, Yarowsky D (2006) Machine translation for languages lacking bitext via multilingual gloss transduction. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 156–165

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 223–231

  • Szopa R (2007) LRBL, a rule-based lemmatizer (with rules for Dutch). Technical report. Centre for Computational Linguistics, Leuven, Belgium

    Google Scholar 

  • Tambouratzis G, Sofianopoulos S, Spilioti V, Vassiliou M, Yannoutsou O, Markantonatou S (2006) Pattern matching-based system for machine translation (MT). In: Antoniou G, Potamias G, Spyropoulos C, Plexousakis D (eds) Advances in artificial intelligence: 4th Hellenic conference on AI, SETN 2006. Heraklion, Crete, Greece, LNCS 3955. Springer, Berlin, pp 345–355

  • Van Eynde F (2004) Part of speech tagging en lemmatisering van het corpus gesproken nederlands [Part of speech tagging and lemmatization of the spoken Dutch corpus]. Annotation protocol, Centrum voor Computerlinguïstiek, Leuven, Belgium

    Google Scholar 

  • Vandeghinste V (2005) Manual for ShaRPa 2.1. User manual. Centre for Computational Linguistics, Leuven, Belgium

    Google Scholar 

  • Vandeghinste V (2008) A hybrid modular machine translation system. Phd thesis, Netherlands Graduate School of Linguistics, Leuven, Belgium

  • Vandeghinste V, Dirix P, Schuurman I, Markantonatou S, Sofianopoulos S, Vassiliou M, Yannoutsou O, Badia T, Melero M, Boleda G, Carl M, Schmidt P (2008) Evaluation of a machine translation system for low resource languages: METIS-II. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, pp 96–103

  • Vossen P, Bloksma L, Boersma P (1999) The Dutch Wordnet. Technical report. University of Amsterdam, Amsterdam

    Google Scholar 

  • Whitelock P (1992) Shake-and-bake translation. In: Proceedings of the fifteenth [sic] international conference on computational linguistics, COLING-92. Nantes, France, pp 784–791

  • Zwarts S, Dras M (2007) Syntax-based word reordering in phrase-based statistical machine translation; why does it work? In: Machine translation summit XI. Copenhagen. Denmark, pp 559–566

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Carl.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carl, M., Melero, M., Badia, T. et al. METIS-II: low resource machine translation. Machine Translation 22, 67–99 (2008). https://doi.org/10.1007/s10590-008-9048-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-008-9048-z

Keywords

Navigation