Skip to main content

Advertisement

Log in

Apertium: a free/open-source platform for rule-based machine translation

  • Published:
Machine Translation

Abstract

Apertium is a free/open-source platform for rule-based machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with related-language pairs) where shallow transfer suffices to produce good quality translations, although it has also proven useful in assimilation scenarios with more distant pairs involved. This article summarises the Apertium platform: the translation engine, the encoding of linguistic data, and the tools developed around the platform. The present limitations of the platform and the challenges posed for the coming years are also discussed. Finally, evaluation results for some of the most active language pairs are presented. An appendix describes Apertium as a free/open-source project.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alcázar A (2005) Towards linguistically searchable text. In: Proceedings of BIDE (Bilbao-Deusto) summer school of linguistics 2005, Universidad de Deusto, Bilbao

  • Alegria I, de Ilarraza A, Labaka G, Lersundi M, Mayor A, Sarasola K (2007) Transfer-based MT from Spanish into Basque: reusability, standardization and open source. In: Lecture notes in computer science, vol 4394. Springer, Heidelberg, pp 374–384

  • Armentano-Oller C, Forcada M (2008) Reutilización de datos linguısticos para la creacion de un sistema de traduccion automatica para un nuevo par de lenguas. Procesamiento del Lenguaje Natural 41: 243–250

    Google Scholar 

  • Bond F, Oepen S, Siegel M, Copestake A, Flickinger D (2005) Open source MT with DELPH-IN. In: OSMaTran, A workshop at MT Summit X, Phuket, pp 15–22

  • Canals-Marote R, Esteve-Guillen A, Garrido-Alenda A, Guardiola-Savall M, Iturraspe-Bellver A, Montserrat-Buendia S, Ortiz-Rojas S, Pastor-Pina H, Perez-Antón P, Forcada M (2001) The Spanish–Catalan machine translation system interNOSTRUM. In: Proceedings of MT Summit VIII, Santiago de Compostela, pp 73–76

  • Carreras X, Chao I, Padro L, Padro M (2004) Freeling: an open-source suite of language analyzers. In: Proceeding of the 4th international conference on language resources and evaluation, Lisbon, pp 239–242

  • Chaudhury S, Sharma D, Kulkarni A (2010) Anusaaraka: an approach to machine translation. In: Proceedings of the international conference on language, society and culture in Asian contexts, Maha Sarakham

  • Cutting D, Kupiec J, Pedersen J, and Sibun P (1992) A practical part-of-speech tagger. In: Proceeding of the 3rd conference on applied natural language processing, Trento, pp 133–140

  • Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Monographs on statistics and applied probability Chapman & Hall, New York

    MATH  Google Scholar 

  • Garrido-Alenda A, Forcada ML, Carrasco RC (2002) Incremental construction and maintenance of morphological analysers based on augmented letter transducers. In: Proceeding of the 9th international conference on theoretical and methodological issues in machine translation, Keihanna, pp 53–62

  • Garrido-Alenda A, Gilabert Zarco P, Pérez-Ortiz JA, Pertusa-Ibáñez A, Ramírez-Sánchez G, Sánchez-Martínez F, Scalco MA, Forcada ML (2004) Shallow parsing for Portuguese–Spanish machine translation. In: Language technology for Portuguese: shallow processing tools and resources, Edições Colibri, pp 135–144

  • Ginestí-Rosell M, Ramírez-Sánchez G, Ortiz-Rojas S, Tyers FM, Forcada ML (2009) Development of a free Basque to Spanish machine translation system. Procesamiento del Lenguaje Natural 43: 187–195

    Google Scholar 

  • Guzmán R (2008) Advanced automatic MT post-editing. Multiling Comput 19(3): 52–57

    Google Scholar 

  • Hutchins WJ, Somers HL (1992) An introduction to machine translation. Academic, London

    MATH  Google Scholar 

  • Karlsson F (1995) Constraint grammar: a language-independent system for parsing unrestricted text. Walter de Gruyter, Berlin

    Google Scholar 

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the conference on empirical methods in natural language processing, Barcelona, pp 388–395

  • Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the X MT Summit, Phuket, pp 79–86

  • Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, pp 177–180

  • Larasati SD, Kuboň V (2010) A study of Indonesian-to-Malaysian MT system. In: Proceedings of the 4th international MALINDO workshop, Depok, pp 16–22

  • Li Z, Callison-Burch C, Dyer C, Ganitkevitch J, Khudanpur S, Lane Schwartz WT, Weese J, Zaidan O (2009) Joshua: an open source toolkit for parsing based machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Athens, pp 135–139

  • Mayor A, Tyers FM (2009) Matxin: moving towards language independence. In: Proceedings of the first international workshop on free/open-source rule-based machine translation, Alacant, pp 11–17

  • Och F, Ney H (2004) The alignment template approach to statistical machine translation. Computational Linguistics 30(4): 417–449

    Article  Google Scholar 

  • Ortiz-Rojas S, Forcada ML, Ramírez-Sánchez G (2005) Construcción y minimización eficiente de transductores de letras a partir de diccionarios con paradigmas. Procesamiento del Lenguaje Natural 35: 51–57

    Google Scholar 

  • Phillips AB (2007) Sub-phrasal matching and structural templates in example-based MT. In: Proceedings of the 11th conference on theoretical and methodological issues in machine translation, Skövde, pp 163–170

  • Roche E, Schabes Y (1997) Introduction. In: Roche E, Schabes Y (eds) Finite-state language processing. MIT, Cambridge, pp 1–65

    Google Scholar 

  • Sánchez-Cartagena VM, Pérez-Ortiz JA (2010a) ScaleMT: a free/open-source framework for building scalable machine translation web services. Prague Bull Math Linguist 93: 97–106

    Article  Google Scholar 

  • Sánchez-Cartagena VM, Pérez-Ortiz JA (2010b) Tradubi: open-source social translation for the apertium machine translation platform. Prague Bull Math Linguist 93: 47–56

    Article  Google Scholar 

  • Sánchez-Martínez F (2008) Using unsupervised corpus-based methods to build rule-based machine translation systems. PhD thesis, Universitat d’Alacant

  • Sánchez-Martínez F, Forcada ML (2009) Inferring shallow-transfer machine translation rules from small parallel corpora. J Artif Intell Res 34: 605–635

    MATH  Google Scholar 

  • Sánchez-Martínez F, Pérez-Ortiz JA, Forcada ML (2008) Using target-language information to train part-of-speech taggers for machine translation. Mach Transl 22(1–2): 29–66

    Article  Google Scholar 

  • Sánchez-Martínez F, Forcada ML, Way A (2009) Hybrid rule-based–example-based MT: feeding apertium with sub-sentential translation units. In: Proceedings of the 3rd workshop on example-based machine translation, Dublin, pp 11–18

  • Scott B, Barreiro A (2009) Openlogos MT and the SAL representation language. In: Proceedings of the first international workshop on free/open-source rule-based machine translation, Alacant, pp 19–26

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas, visions for the future of machine translation, Cambridge, pp 223–231

  • Thurmair G (2009) Comparing different architectures of hybrid machine translation systems. In: Proceedings of MT Summit XII, Ottawa, pp 340–347

  • Tyers FM, Alperen MS (2010) SETimes: a parallel corpus of Balkan languages. In: Proceedings of the multiLR workshop at the language resources and evaluation conference, LREC2010, Malta, pp 49–53

  • Tyers FM, Donnelly K (2009) apertium-cy—a collaboratively-developed free RBMT system for Welsh to English. Prague Bull Math Linguist 91: 57–66

    Article  Google Scholar 

  • Tyers FM, Wiechetek L, Trosterud T (2009) Developing prototypes for machine translation between two Sámi languages. In: Proceedings of the 13th annual conference of the European association for machine translation, Barcelona, pp 120–128

  • Way A (2010) Machine translation. In: Clark A, Fox C, Lappin S (eds) The handbook of computational linguistics and natural language processing. Wiley-Blackwell, Oxford, pp 531–573

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe Sánchez-Martínez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Forcada, M.L., Ginestí-Rosell, M., Nordfalk, J. et al. Apertium: a free/open-source platform for rule-based machine translation. Machine Translation 25, 127–144 (2011). https://doi.org/10.1007/s10590-011-9090-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-011-9090-0

Keywords

Navigation