METIS-II: low resource machine translation

Carl, Michael; Melero, Maite; Badia, Toni; Vandeghinste, Vincent; Dirix, Peter; Schuurman, Ineke; Markantonatou, Stella; Sofianopoulos, Sokratis; Vassiliou, Marina; Yannoutsou, Olga

doi:10.1007/s10590-008-9048-z

METIS-II: low resource machine translation

Published: 27 November 2008

Volume 22, pages 67–99, (2008)
Cite this article

Machine Translation

Michael Carl¹,
Maite Melero²,
Toni Badia²,
Vincent Vandeghinste³,
Peter Dirix³,
Ineke Schuurman³,
Stella Markantonatou⁴,
Sokratis Sofianopoulos⁴,
Marina Vassiliou⁴ &
…
Olga Yannoutsou⁴

137 Accesses
7 Citations
Explore all metrics

Abstract

METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their “home” languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abdelali A, Cowie J, Helmreich S, Jin W, Milagros MP, Ogden B, Rad HM, Zacharski R (2006) Guarani: a case study in resource development for quick ramp-up MT. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 1–9
Alegria I, de Ilarraza AD, Labaka G, Lersundi M, Mayor A, Sarasola K, Forcada ML, Ortiz-Rojas S, Padró L (2005) An open architecture for transfer-based machine translation between Spanish and Basque. In: MT summit X workshop OSMaTran: open-source machine translation. Phuket, Thailand, pp 7–14
Alsina A, Badia T, Boleda G, Bott S, Gil A, Quixal M, Valentí O (2002) CATCG: a general purpose parsing tool applied. In: LREC 2002 third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 1130–1134
Anastasiou D, Čulo O (2007) Using topological information for detecting idiomatic verb phrases in German. In: Proceedings of the conference on practical applications in language and computers (PALC). Łódź, Poland, pp 49–58
Badia T, Boleda G, Melero M, Oliver A (2005) An n-gram approach to exploiting a monolingual corpus for machine translation. In: MT summit X workshop: second workshop on example-based machine translation, Phuket, Thailand, pp 1–7
Badia T, Melero M, Valentín O (2008) Rapid deployment of a new METIS language pair: Catalan-English. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, p 96
Boutsis S, Prokopidis P, Giouli V, Piperidis S (2000) A robust parser for unrestricted Greek text. In: Proceedings of the 2nd international conference on language resources and evaluation (LREC). Athens, Greece, pp 467–482
Brants T (2000) TnT—a statistical part-of-speech tagger. In: Association for Computational Linguistics 6th applied natural language processing conference. Seattle, Washington , pp 224–231
Brown R, Frederking R (1995) Applying statistical English language modeling to symbolic machine translation. In: Proceedings of the sixth international conference on theoretical and methodological issues in machine translation (TMI 95). Leuven, Belgium, pp 221–239
Carl M (2007) METIS-II: the German to English MT system. In: Machine translation summit XI. Copenhagen, Denmark, pp 65–72
Carl M (2008) Using log-linear models for tuning machine translation output. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, pp 49–56
Carl M, Rascu E, (2006) A dictionary lookup strategy for translating discontinuous phrases. In: 11th annual conference of the European Association for Machine Translation. Oslo, Norway, pp 49–58
Carl M, Schmidt P, Schütz J (2005) Reversible template-based shake & bake generation. In: MT summit X Workshop: second workshop on example-based machine translation. Phuket, Thailand, pp 17–26
Carpuat M, Wu D (2007) How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI-07). Skövde, Sweden, pp 43–52
de Gispert A, Mariño JB (2006) Catalan-English statistical machine translation without parallel corpus: bridging through Spanish. In: LREC 2006 satellite workshop W06: strategies for developing machine translation for minority languages. Genova, Italy, pp 65–68
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the second conference on human language technology conference (HLT-2002). San Diego, CA, pp 128–132
Dologlou I, Markantonatou S, Tambouratzis G, Yannoutsou O, Fourla A, Ioannou N (2003) Using monolingual corpora for statistical machine translation: the METIS system. In: EAMT-CLAW 03, joint conference combining the 8th international workshop of the European Association for Machine Translation and the 4th controlled language applications workshop, controlled language translation. Dublin, Ireland, pp 61–68
EAGLES (1994) Guidelines, EAG-LWG-T4-2. Technical report, ILC-CNR, Pisa, Italy
Engelbrecht H, Schultz T (2005) Rapid development of an Afrikaans English speech-to-speech translator. In: International workshop on spoken language translation: evaluation campaign on spoken language translation. Pittsburgh, PA, pp 24–25
Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Association for Computational Linguistics, 39th annual meeting and 10th conference of the European chapter, proceedings of the conference. Toulouse, France, pp 228–235
Habash N (2004) The use of a structural n-gram language model in generation-heavy hybrid machine translation. In: Belz A, Evans R, Piwek P (eds) Natural language generation, third international conference INLG 2004, Brockenhurst, UK, LNCS 3123. Springer Verlag, Berlin, Germany, pp 61–69
Google Scholar
Karlsson F,Voutilainen A, Heikkila J, Anttila A (eds) (1995) Constraint grammar: a language-independent formalism for parsing unrestricted text. Mouton de Gruyter, Berlin
Google Scholar
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: FrederKing RE, Taylor KB (eds) Machine translation: from real users to research, 6th Conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, Proceedings, LNAI 3265, Springer Berlin, Germany, pp 115–124
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT Summit X, the tenth machine translation summit. Phuket, Thailand, pp 79–86
Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2: 88–97
Article Google Scholar
Labropoulou P, Mantzari E, Gavrilidou M (1996) Lexicon – morphosyntactic specifications: language- specific instantiation (Greek). In: PP-PAROLE, MLAP report. Athens, Greece, pp 63–386
Langkilde I, Knight K (1998) The practical value of n-grams in generation. In: Proceedings of the 9th international natural language workshop (INLG ’98). Niagara-on-the-Lake, Ontario, Canada, pp 248–255
Lavie A, Peterson E, Probst K, Wintner S, Eytani Y (2004) Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation. Baltimore, MD, pp 1–10
Maas H-D (1996) MPRO - Ein System zur Analyse und Synthese deutscher Wörter [A system for the analysis and synthesis of German words]. In: Hausser R (eds) Linguistische Verifikation, Sprache und Information. Max Niemeyer Verlag, Tübingen, pp 141–166
Google Scholar
Majithia H, Rennart P, Tzoukermann E (2005) Rapid ramp-up for statistical machine translation: minimal training for maximal coverage. In: Proceedings of the machine translation summit X. Phuket, Thailand, pp 438–444
Markantonatou S, Sofianopoulos S, Spilioti V, Tambouratzis G, Vassiliou M, Yannoutsou O (2006) Using patterns for machine translation (MT). In: 11th annual conference of the European Association for Machine Translation. Oslo, Norway, pp 239–246
Melero M, Oliver A, Badia T, Suñol T (2007) Dealing with bilingual divergences in MT using target language n-gram models. In: METIS-II workshop: new approaches to machine translation, Leuven, Belgium, pp 19–26
METIS-II (2006) Validation/Evaluation framework. Public Report, D5.1, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_D5.1.pdf. Accessed 25 Aug 2008
METIS-II (2007) Validation & Fine-tuning results for the first prototype. Public Report, D5.2, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_D5.2.pdf. Accessed 25 Aug 2008
Müller FH (2004) Stylebook for the Tübingen partially parsed corpus of written German (Tüpp-D/Z). http://www.sfb441.uni-tuebingen.de/a1/Publikationen/stylebook-04.pdf. Accessed 19 Nov 2008
Munkres J (1955) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1): 32–38
Google Scholar
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual conference of the Association for Computational Linguistics. Philadelphia, PA, pp 295–302
Oostdijk NHJ, Goedertier W, Van Eynde F, Boves L, Martens J-P, Moortgat M, Baayen H (2002) Experiences from the spoken Dutch corpus project. In: LREC 2002 third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 340–347
Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th annual conference of the Association for Computational Linguistics. Philadelphia, PA, pp 311–318
Pinkham J, Smets M (2002) Modular MT with a learned bilingual dictionary: rapid deployment of a new language pair. In: COLING 2002, Proceedings of the 19th international conference on computational linguistics. Taipei, Taiwan, pp 800–806
Pytlik B, Yarowsky D (2006) Machine translation for languages lacking bitext via multilingual gloss transduction. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 156–165
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 223–231
Szopa R (2007) LRBL, a rule-based lemmatizer (with rules for Dutch). Technical report. Centre for Computational Linguistics, Leuven, Belgium
Google Scholar
Tambouratzis G, Sofianopoulos S, Spilioti V, Vassiliou M, Yannoutsou O, Markantonatou S (2006) Pattern matching-based system for machine translation (MT). In: Antoniou G, Potamias G, Spyropoulos C, Plexousakis D (eds) Advances in artificial intelligence: 4th Hellenic conference on AI, SETN 2006. Heraklion, Crete, Greece, LNCS 3955. Springer, Berlin, pp 345–355
Van Eynde F (2004) Part of speech tagging en lemmatisering van het corpus gesproken nederlands [Part of speech tagging and lemmatization of the spoken Dutch corpus]. Annotation protocol, Centrum voor Computerlinguïstiek, Leuven, Belgium
Google Scholar
Vandeghinste V (2005) Manual for ShaRPa 2.1. User manual. Centre for Computational Linguistics, Leuven, Belgium
Google Scholar
Vandeghinste V (2008) A hybrid modular machine translation system. Phd thesis, Netherlands Graduate School of Linguistics, Leuven, Belgium
Vandeghinste V, Dirix P, Schuurman I, Markantonatou S, Sofianopoulos S, Vassiliou M, Yannoutsou O, Badia T, Melero M, Boleda G, Carl M, Schmidt P (2008) Evaluation of a machine translation system for low resource languages: METIS-II. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, pp 96–103
Vossen P, Bloksma L, Boersma P (1999) The Dutch Wordnet. Technical report. University of Amsterdam, Amsterdam
Google Scholar
Whitelock P (1992) Shake-and-bake translation. In: Proceedings of the fifteenth [sic] international conference on computational linguistics, COLING-92. Nantes, France, pp 784–791
Zwarts S, Dras M (2007) Syntax-based word reordering in phrase-based statistical machine translation; why does it work? In: Machine translation summit XI. Copenhagen. Denmark, pp 559–566

Download references

Author information

Authors and Affiliations

Institut für Angewandte Informationsforschung, Martin-Luther Str. 14, 66121, Saarbrucken, Germany
Michael Carl
GLiCom (Fundaci Barcelona Media-UPF), Avinguda Diagonal, 177, Barcelona, 08002, Spain
Maite Melero & Toni Badia
KU Leuven-Centrum voor Computerlinguïstiek, Blijde Inkomststraat 13, 3000, Leuven, Belgium
Vincent Vandeghinste, Peter Dirix & Ineke Schuurman
Institute for Language and Speech Processing, Artemidos 6 & Epidavrou, 15125, Athens, Greece
Stella Markantonatou, Sokratis Sofianopoulos, Marina Vassiliou & Olga Yannoutsou

Authors

Michael Carl
View author publications
You can also search for this author in PubMed Google Scholar
Maite Melero
View author publications
You can also search for this author in PubMed Google Scholar
Toni Badia
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Vandeghinste
View author publications
You can also search for this author in PubMed Google Scholar
Peter Dirix
View author publications
You can also search for this author in PubMed Google Scholar
Ineke Schuurman
View author publications
You can also search for this author in PubMed Google Scholar
Stella Markantonatou
View author publications
You can also search for this author in PubMed Google Scholar
Sokratis Sofianopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Marina Vassiliou
View author publications
You can also search for this author in PubMed Google Scholar
Olga Yannoutsou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Carl.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carl, M., Melero, M., Badia, T. et al. METIS-II: low resource machine translation. Machine Translation 22, 67–99 (2008). https://doi.org/10.1007/s10590-008-9048-z

Download citation

Received: 29 August 2008
Accepted: 04 November 2008
Published: 27 November 2008
Issue Date: March 2008
DOI: https://doi.org/10.1007/s10590-008-9048-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

METIS-II: low resource machine translation

Abstract

Access this article

Similar content being viewed by others

BBN’s low-resource machine translation for the LoReHLT 2016 evaluation

A Recipe for Low-Resource NMT

Experimenting with Different Machine Translation Models in Medium-Resource Settings

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

METIS-II: low resource machine translation

Abstract

Access this article

Similar content being viewed by others

BBN’s low-resource machine translation for the LoReHLT 2016 evaluation

A Recipe for Low-Resource NMT

Experimenting with Different Machine Translation Models in Medium-Resource Settings

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation