Apertium: a free/open-source platform for rule-based machine translation

Abstract

Apertium is a free/open-source platform for rule-based machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with related-language pairs) where shallow transfer suffices to produce good quality translations, although it has also proven useful in assimilation scenarios with more distant pairs involved. This article summarises the Apertium platform: the translation engine, the encoding of linguistic data, and the tools developed around the platform. The present limitations of the platform and the challenges posed for the coming years are also discussed. Finally, evaluation results for some of the most active language pairs are presented. An appendix describes Apertium as a free/open-source project.

This is a preview of subscription content, access via your institution.

References

  1. Alcázar A (2005) Towards linguistically searchable text. In: Proceedings of BIDE (Bilbao-Deusto) summer school of linguistics 2005, Universidad de Deusto, Bilbao

  2. Alegria I, de Ilarraza A, Labaka G, Lersundi M, Mayor A, Sarasola K (2007) Transfer-based MT from Spanish into Basque: reusability, standardization and open source. In: Lecture notes in computer science, vol 4394. Springer, Heidelberg, pp 374–384

  3. Armentano-Oller C, Forcada M (2008) Reutilización de datos linguısticos para la creacion de un sistema de traduccion automatica para un nuevo par de lenguas. Procesamiento del Lenguaje Natural 41: 243–250

    Google Scholar 

  4. Bond F, Oepen S, Siegel M, Copestake A, Flickinger D (2005) Open source MT with DELPH-IN. In: OSMaTran, A workshop at MT Summit X, Phuket, pp 15–22

  5. Canals-Marote R, Esteve-Guillen A, Garrido-Alenda A, Guardiola-Savall M, Iturraspe-Bellver A, Montserrat-Buendia S, Ortiz-Rojas S, Pastor-Pina H, Perez-Antón P, Forcada M (2001) The Spanish–Catalan machine translation system interNOSTRUM. In: Proceedings of MT Summit VIII, Santiago de Compostela, pp 73–76

  6. Carreras X, Chao I, Padro L, Padro M (2004) Freeling: an open-source suite of language analyzers. In: Proceeding of the 4th international conference on language resources and evaluation, Lisbon, pp 239–242

  7. Chaudhury S, Sharma D, Kulkarni A (2010) Anusaaraka: an approach to machine translation. In: Proceedings of the international conference on language, society and culture in Asian contexts, Maha Sarakham

  8. Cutting D, Kupiec J, Pedersen J, and Sibun P (1992) A practical part-of-speech tagger. In: Proceeding of the 3rd conference on applied natural language processing, Trento, pp 133–140

  9. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Monographs on statistics and applied probability Chapman & Hall, New York

    Google Scholar 

  10. Garrido-Alenda A, Forcada ML, Carrasco RC (2002) Incremental construction and maintenance of morphological analysers based on augmented letter transducers. In: Proceeding of the 9th international conference on theoretical and methodological issues in machine translation, Keihanna, pp 53–62

  11. Garrido-Alenda A, Gilabert Zarco P, Pérez-Ortiz JA, Pertusa-Ibáñez A, Ramírez-Sánchez G, Sánchez-Martínez F, Scalco MA, Forcada ML (2004) Shallow parsing for Portuguese–Spanish machine translation. In: Language technology for Portuguese: shallow processing tools and resources, Edições Colibri, pp 135–144

  12. Ginestí-Rosell M, Ramírez-Sánchez G, Ortiz-Rojas S, Tyers FM, Forcada ML (2009) Development of a free Basque to Spanish machine translation system. Procesamiento del Lenguaje Natural 43: 187–195

    Google Scholar 

  13. Guzmán R (2008) Advanced automatic MT post-editing. Multiling Comput 19(3): 52–57

    Google Scholar 

  14. Hutchins WJ, Somers HL (1992) An introduction to machine translation. Academic, London

    Google Scholar 

  15. Karlsson F (1995) Constraint grammar: a language-independent system for parsing unrestricted text. Walter de Gruyter, Berlin

    Google Scholar 

  16. Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the conference on empirical methods in natural language processing, Barcelona, pp 388–395

  17. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the X MT Summit, Phuket, pp 79–86

  18. Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge

    Google Scholar 

  19. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, pp 177–180

  20. Larasati SD, Kuboň V (2010) A study of Indonesian-to-Malaysian MT system. In: Proceedings of the 4th international MALINDO workshop, Depok, pp 16–22

  21. Li Z, Callison-Burch C, Dyer C, Ganitkevitch J, Khudanpur S, Lane Schwartz WT, Weese J, Zaidan O (2009) Joshua: an open source toolkit for parsing based machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Athens, pp 135–139

  22. Mayor A, Tyers FM (2009) Matxin: moving towards language independence. In: Proceedings of the first international workshop on free/open-source rule-based machine translation, Alacant, pp 11–17

  23. Och F, Ney H (2004) The alignment template approach to statistical machine translation. Computational Linguistics 30(4): 417–449

    Article  Google Scholar 

  24. Ortiz-Rojas S, Forcada ML, Ramírez-Sánchez G (2005) Construcción y minimización eficiente de transductores de letras a partir de diccionarios con paradigmas. Procesamiento del Lenguaje Natural 35: 51–57

    Google Scholar 

  25. Phillips AB (2007) Sub-phrasal matching and structural templates in example-based MT. In: Proceedings of the 11th conference on theoretical and methodological issues in machine translation, Skövde, pp 163–170

  26. Roche E, Schabes Y (1997) Introduction. In: Roche E, Schabes Y (eds) Finite-state language processing. MIT, Cambridge, pp 1–65

    Google Scholar 

  27. Sánchez-Cartagena VM, Pérez-Ortiz JA (2010a) ScaleMT: a free/open-source framework for building scalable machine translation web services. Prague Bull Math Linguist 93: 97–106

    Article  Google Scholar 

  28. Sánchez-Cartagena VM, Pérez-Ortiz JA (2010b) Tradubi: open-source social translation for the apertium machine translation platform. Prague Bull Math Linguist 93: 47–56

    Article  Google Scholar 

  29. Sánchez-Martínez F (2008) Using unsupervised corpus-based methods to build rule-based machine translation systems. PhD thesis, Universitat d’Alacant

  30. Sánchez-Martínez F, Forcada ML (2009) Inferring shallow-transfer machine translation rules from small parallel corpora. J Artif Intell Res 34: 605–635

    MATH  Google Scholar 

  31. Sánchez-Martínez F, Pérez-Ortiz JA, Forcada ML (2008) Using target-language information to train part-of-speech taggers for machine translation. Mach Transl 22(1–2): 29–66

    Article  Google Scholar 

  32. Sánchez-Martínez F, Forcada ML, Way A (2009) Hybrid rule-based–example-based MT: feeding apertium with sub-sentential translation units. In: Proceedings of the 3rd workshop on example-based machine translation, Dublin, pp 11–18

  33. Scott B, Barreiro A (2009) Openlogos MT and the SAL representation language. In: Proceedings of the first international workshop on free/open-source rule-based machine translation, Alacant, pp 19–26

  34. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas, visions for the future of machine translation, Cambridge, pp 223–231

  35. Thurmair G (2009) Comparing different architectures of hybrid machine translation systems. In: Proceedings of MT Summit XII, Ottawa, pp 340–347

  36. Tyers FM, Alperen MS (2010) SETimes: a parallel corpus of Balkan languages. In: Proceedings of the multiLR workshop at the language resources and evaluation conference, LREC2010, Malta, pp 49–53

  37. Tyers FM, Donnelly K (2009) apertium-cy—a collaboratively-developed free RBMT system for Welsh to English. Prague Bull Math Linguist 91: 57–66

    Article  Google Scholar 

  38. Tyers FM, Wiechetek L, Trosterud T (2009) Developing prototypes for machine translation between two Sámi languages. In: Proceedings of the 13th annual conference of the European association for machine translation, Barcelona, pp 120–128

  39. Way A (2010) Machine translation. In: Clark A, Fox C, Lappin S (eds) The handbook of computational linguistics and natural language processing. Wiley-Blackwell, Oxford, pp 531–573

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Felipe Sánchez-Martínez.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Forcada, M.L., Ginestí-Rosell, M., Nordfalk, J. et al. Apertium: a free/open-source platform for rule-based machine translation. Machine Translation 25, 127–144 (2011). https://doi.org/10.1007/s10590-011-9090-0

Download citation

Keywords

  • Free/open-source machine translation
  • Rule-based machine translation
  • Apertium
  • Shallow transfer
  • Finite-state transducers