Abstract
METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their “home” languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions.
Similar content being viewed by others
References
Abdelali A, Cowie J, Helmreich S, Jin W, Milagros MP, Ogden B, Rad HM, Zacharski R (2006) Guarani: a case study in resource development for quick ramp-up MT. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 1–9
Alegria I, de Ilarraza AD, Labaka G, Lersundi M, Mayor A, Sarasola K, Forcada ML, Ortiz-Rojas S, Padró L (2005) An open architecture for transfer-based machine translation between Spanish and Basque. In: MT summit X workshop OSMaTran: open-source machine translation. Phuket, Thailand, pp 7–14
Alsina A, Badia T, Boleda G, Bott S, Gil A, Quixal M, Valentí O (2002) CATCG: a general purpose parsing tool applied. In: LREC 2002 third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 1130–1134
Anastasiou D, Čulo O (2007) Using topological information for detecting idiomatic verb phrases in German. In: Proceedings of the conference on practical applications in language and computers (PALC). Łódź, Poland, pp 49–58
Badia T, Boleda G, Melero M, Oliver A (2005) An n-gram approach to exploiting a monolingual corpus for machine translation. In: MT summit X workshop: second workshop on example-based machine translation, Phuket, Thailand, pp 1–7
Badia T, Melero M, Valentín O (2008) Rapid deployment of a new METIS language pair: Catalan-English. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, p 96
Boutsis S, Prokopidis P, Giouli V, Piperidis S (2000) A robust parser for unrestricted Greek text. In: Proceedings of the 2nd international conference on language resources and evaluation (LREC). Athens, Greece, pp 467–482
Brants T (2000) TnT—a statistical part-of-speech tagger. In: Association for Computational Linguistics 6th applied natural language processing conference. Seattle, Washington , pp 224–231
Brown R, Frederking R (1995) Applying statistical English language modeling to symbolic machine translation. In: Proceedings of the sixth international conference on theoretical and methodological issues in machine translation (TMI 95). Leuven, Belgium, pp 221–239
Carl M (2007) METIS-II: the German to English MT system. In: Machine translation summit XI. Copenhagen, Denmark, pp 65–72
Carl M (2008) Using log-linear models for tuning machine translation output. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, pp 49–56
Carl M, Rascu E, (2006) A dictionary lookup strategy for translating discontinuous phrases. In: 11th annual conference of the European Association for Machine Translation. Oslo, Norway, pp 49–58
Carl M, Schmidt P, Schütz J (2005) Reversible template-based shake & bake generation. In: MT summit X Workshop: second workshop on example-based machine translation. Phuket, Thailand, pp 17–26
Carpuat M, Wu D (2007) How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI-07). Skövde, Sweden, pp 43–52
de Gispert A, Mariño JB (2006) Catalan-English statistical machine translation without parallel corpus: bridging through Spanish. In: LREC 2006 satellite workshop W06: strategies for developing machine translation for minority languages. Genova, Italy, pp 65–68
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of the second conference on human language technology conference (HLT-2002). San Diego, CA, pp 128–132
Dologlou I, Markantonatou S, Tambouratzis G, Yannoutsou O, Fourla A, Ioannou N (2003) Using monolingual corpora for statistical machine translation: the METIS system. In: EAMT-CLAW 03, joint conference combining the 8th international workshop of the European Association for Machine Translation and the 4th controlled language applications workshop, controlled language translation. Dublin, Ireland, pp 61–68
EAGLES (1994) Guidelines, EAG-LWG-T4-2. Technical report, ILC-CNR, Pisa, Italy
Engelbrecht H, Schultz T (2005) Rapid development of an Afrikaans English speech-to-speech translator. In: International workshop on spoken language translation: evaluation campaign on spoken language translation. Pittsburgh, PA, pp 24–25
Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Association for Computational Linguistics, 39th annual meeting and 10th conference of the European chapter, proceedings of the conference. Toulouse, France, pp 228–235
Habash N (2004) The use of a structural n-gram language model in generation-heavy hybrid machine translation. In: Belz A, Evans R, Piwek P (eds) Natural language generation, third international conference INLG 2004, Brockenhurst, UK, LNCS 3123. Springer Verlag, Berlin, Germany, pp 61–69
Karlsson F,Voutilainen A, Heikkila J, Anttila A (eds) (1995) Constraint grammar: a language-independent formalism for parsing unrestricted text. Mouton de Gruyter, Berlin
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: FrederKing RE, Taylor KB (eds) Machine translation: from real users to research, 6th Conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, Proceedings, LNAI 3265, Springer Berlin, Germany, pp 115–124
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT Summit X, the tenth machine translation summit. Phuket, Thailand, pp 79–86
Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2: 88–97
Labropoulou P, Mantzari E, Gavrilidou M (1996) Lexicon – morphosyntactic specifications: language- specific instantiation (Greek). In: PP-PAROLE, MLAP report. Athens, Greece, pp 63–386
Langkilde I, Knight K (1998) The practical value of n-grams in generation. In: Proceedings of the 9th international natural language workshop (INLG ’98). Niagara-on-the-Lake, Ontario, Canada, pp 248–255
Lavie A, Peterson E, Probst K, Wintner S, Eytani Y (2004) Rapid prototyping of a transfer-based Hebrew-to-English machine translation system. In: Proceedings of the tenth conference on theoretical and methodological issues in machine translation. Baltimore, MD, pp 1–10
Maas H-D (1996) MPRO - Ein System zur Analyse und Synthese deutscher Wörter [A system for the analysis and synthesis of German words]. In: Hausser R (eds) Linguistische Verifikation, Sprache und Information. Max Niemeyer Verlag, Tübingen, pp 141–166
Majithia H, Rennart P, Tzoukermann E (2005) Rapid ramp-up for statistical machine translation: minimal training for maximal coverage. In: Proceedings of the machine translation summit X. Phuket, Thailand, pp 438–444
Markantonatou S, Sofianopoulos S, Spilioti V, Tambouratzis G, Vassiliou M, Yannoutsou O (2006) Using patterns for machine translation (MT). In: 11th annual conference of the European Association for Machine Translation. Oslo, Norway, pp 239–246
Melero M, Oliver A, Badia T, Suñol T (2007) Dealing with bilingual divergences in MT using target language n-gram models. In: METIS-II workshop: new approaches to machine translation, Leuven, Belgium, pp 19–26
METIS-II (2006) Validation/Evaluation framework. Public Report, D5.1, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_D5.1.pdf. Accessed 25 Aug 2008
METIS-II (2007) Validation & Fine-tuning results for the first prototype. Public Report, D5.2, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_D5.2.pdf. Accessed 25 Aug 2008
Müller FH (2004) Stylebook for the Tübingen partially parsed corpus of written German (Tüpp-D/Z). http://www.sfb441.uni-tuebingen.de/a1/Publikationen/stylebook-04.pdf. Accessed 19 Nov 2008
Munkres J (1955) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1): 32–38
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th annual conference of the Association for Computational Linguistics. Philadelphia, PA, pp 295–302
Oostdijk NHJ, Goedertier W, Van Eynde F, Boves L, Martens J-P, Moortgat M, Baayen H (2002) Experiences from the spoken Dutch corpus project. In: LREC 2002 third international conference on language resources and evaluation. Las Palmas de Gran Canaria, Spain, pp 340–347
Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: 40th annual conference of the Association for Computational Linguistics. Philadelphia, PA, pp 311–318
Pinkham J, Smets M (2002) Modular MT with a learned bilingual dictionary: rapid deployment of a new language pair. In: COLING 2002, Proceedings of the 19th international conference on computational linguistics. Taipei, Taiwan, pp 800–806
Pytlik B, Yarowsky D (2006) Machine translation for languages lacking bitext via multilingual gloss transduction. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 156–165
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, visions for the future of machine translation. Cambridge, MA, pp 223–231
Szopa R (2007) LRBL, a rule-based lemmatizer (with rules for Dutch). Technical report. Centre for Computational Linguistics, Leuven, Belgium
Tambouratzis G, Sofianopoulos S, Spilioti V, Vassiliou M, Yannoutsou O, Markantonatou S (2006) Pattern matching-based system for machine translation (MT). In: Antoniou G, Potamias G, Spyropoulos C, Plexousakis D (eds) Advances in artificial intelligence: 4th Hellenic conference on AI, SETN 2006. Heraklion, Crete, Greece, LNCS 3955. Springer, Berlin, pp 345–355
Van Eynde F (2004) Part of speech tagging en lemmatisering van het corpus gesproken nederlands [Part of speech tagging and lemmatization of the spoken Dutch corpus]. Annotation protocol, Centrum voor Computerlinguïstiek, Leuven, Belgium
Vandeghinste V (2005) Manual for ShaRPa 2.1. User manual. Centre for Computational Linguistics, Leuven, Belgium
Vandeghinste V (2008) A hybrid modular machine translation system. Phd thesis, Netherlands Graduate School of Linguistics, Leuven, Belgium
Vandeghinste V, Dirix P, Schuurman I, Markantonatou S, Sofianopoulos S, Vassiliou M, Yannoutsou O, Badia T, Melero M, Boleda G, Carl M, Schmidt P (2008) Evaluation of a machine translation system for low resource languages: METIS-II. In: Proceedings of the sixth international conference on language resources and evaluation (LREC 2008). Marrakech, Morocco, pp 96–103
Vossen P, Bloksma L, Boersma P (1999) The Dutch Wordnet. Technical report. University of Amsterdam, Amsterdam
Whitelock P (1992) Shake-and-bake translation. In: Proceedings of the fifteenth [sic] international conference on computational linguistics, COLING-92. Nantes, France, pp 784–791
Zwarts S, Dras M (2007) Syntax-based word reordering in phrase-based statistical machine translation; why does it work? In: Machine translation summit XI. Copenhagen. Denmark, pp 559–566
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Carl, M., Melero, M., Badia, T. et al. METIS-II: low resource machine translation. Machine Translation 22, 67–99 (2008). https://doi.org/10.1007/s10590-008-9048-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-008-9048-z