Machine Translation

, Volume 25, Issue 2, pp 127–144 | Cite as

Apertium: a free/open-source platform for rule-based machine translation

  • Mikel L. Forcada
  • Mireia Ginestí-Rosell
  • Jacob Nordfalk
  • Jim O’Regan
  • Sergio Ortiz-Rojas
  • Juan Antonio Pérez-Ortiz
  • Felipe Sánchez-Martínez
  • Gema Ramírez-Sánchez
  • Francis M. Tyers
Article

Abstract

Apertium is a free/open-source platform for rule-based machine translation. It is being widely used to build machine translation systems for a variety of language pairs, especially in those cases (mainly with related-language pairs) where shallow transfer suffices to produce good quality translations, although it has also proven useful in assimilation scenarios with more distant pairs involved. This article summarises the Apertium platform: the translation engine, the encoding of linguistic data, and the tools developed around the platform. The present limitations of the platform and the challenges posed for the coming years are also discussed. Finally, evaluation results for some of the most active language pairs are presented. An appendix describes Apertium as a free/open-source project.

Keywords

Free/open-source machine translation Rule-based machine translation Apertium Shallow transfer Finite-state transducers 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alcázar A (2005) Towards linguistically searchable text. In: Proceedings of BIDE (Bilbao-Deusto) summer school of linguistics 2005, Universidad de Deusto, BilbaoGoogle Scholar
  2. Alegria I, de Ilarraza A, Labaka G, Lersundi M, Mayor A, Sarasola K (2007) Transfer-based MT from Spanish into Basque: reusability, standardization and open source. In: Lecture notes in computer science, vol 4394. Springer, Heidelberg, pp 374–384Google Scholar
  3. Armentano-Oller C, Forcada M (2008) Reutilización de datos linguısticos para la creacion de un sistema de traduccion automatica para un nuevo par de lenguas. Procesamiento del Lenguaje Natural 41: 243–250Google Scholar
  4. Bond F, Oepen S, Siegel M, Copestake A, Flickinger D (2005) Open source MT with DELPH-IN. In: OSMaTran, A workshop at MT Summit X, Phuket, pp 15–22Google Scholar
  5. Canals-Marote R, Esteve-Guillen A, Garrido-Alenda A, Guardiola-Savall M, Iturraspe-Bellver A, Montserrat-Buendia S, Ortiz-Rojas S, Pastor-Pina H, Perez-Antón P, Forcada M (2001) The Spanish–Catalan machine translation system interNOSTRUM. In: Proceedings of MT Summit VIII, Santiago de Compostela, pp 73–76Google Scholar
  6. Carreras X, Chao I, Padro L, Padro M (2004) Freeling: an open-source suite of language analyzers. In: Proceeding of the 4th international conference on language resources and evaluation, Lisbon, pp 239–242Google Scholar
  7. Chaudhury S, Sharma D, Kulkarni A (2010) Anusaaraka: an approach to machine translation. In: Proceedings of the international conference on language, society and culture in Asian contexts, Maha SarakhamGoogle Scholar
  8. Cutting D, Kupiec J, Pedersen J, and Sibun P (1992) A practical part-of-speech tagger. In: Proceeding of the 3rd conference on applied natural language processing, Trento, pp 133–140Google Scholar
  9. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Monographs on statistics and applied probability Chapman & Hall, New YorkMATHGoogle Scholar
  10. Garrido-Alenda A, Forcada ML, Carrasco RC (2002) Incremental construction and maintenance of morphological analysers based on augmented letter transducers. In: Proceeding of the 9th international conference on theoretical and methodological issues in machine translation, Keihanna, pp 53–62Google Scholar
  11. Garrido-Alenda A, Gilabert Zarco P, Pérez-Ortiz JA, Pertusa-Ibáñez A, Ramírez-Sánchez G, Sánchez-Martínez F, Scalco MA, Forcada ML (2004) Shallow parsing for Portuguese–Spanish machine translation. In: Language technology for Portuguese: shallow processing tools and resources, Edições Colibri, pp 135–144Google Scholar
  12. Ginestí-Rosell M, Ramírez-Sánchez G, Ortiz-Rojas S, Tyers FM, Forcada ML (2009) Development of a free Basque to Spanish machine translation system. Procesamiento del Lenguaje Natural 43: 187–195Google Scholar
  13. Guzmán R (2008) Advanced automatic MT post-editing. Multiling Comput 19(3): 52–57Google Scholar
  14. Hutchins WJ, Somers HL (1992) An introduction to machine translation. Academic, LondonMATHGoogle Scholar
  15. Karlsson F (1995) Constraint grammar: a language-independent system for parsing unrestricted text. Walter de Gruyter, BerlinGoogle Scholar
  16. Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the conference on empirical methods in natural language processing, Barcelona, pp 388–395Google Scholar
  17. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the X MT Summit, Phuket, pp 79–86Google Scholar
  18. Koehn P (2010) Statistical machine translation. Cambridge University Press, CambridgeMATHGoogle Scholar
  19. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics, Prague, pp 177–180Google Scholar
  20. Larasati SD, Kuboň V (2010) A study of Indonesian-to-Malaysian MT system. In: Proceedings of the 4th international MALINDO workshop, Depok, pp 16–22Google Scholar
  21. Li Z, Callison-Burch C, Dyer C, Ganitkevitch J, Khudanpur S, Lane Schwartz WT, Weese J, Zaidan O (2009) Joshua: an open source toolkit for parsing based machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Athens, pp 135–139Google Scholar
  22. Mayor A, Tyers FM (2009) Matxin: moving towards language independence. In: Proceedings of the first international workshop on free/open-source rule-based machine translation, Alacant, pp 11–17Google Scholar
  23. Och F, Ney H (2004) The alignment template approach to statistical machine translation. Computational Linguistics 30(4): 417–449CrossRefGoogle Scholar
  24. Ortiz-Rojas S, Forcada ML, Ramírez-Sánchez G (2005) Construcción y minimización eficiente de transductores de letras a partir de diccionarios con paradigmas. Procesamiento del Lenguaje Natural 35: 51–57Google Scholar
  25. Phillips AB (2007) Sub-phrasal matching and structural templates in example-based MT. In: Proceedings of the 11th conference on theoretical and methodological issues in machine translation, Skövde, pp 163–170Google Scholar
  26. Roche E, Schabes Y (1997) Introduction. In: Roche E, Schabes Y (eds) Finite-state language processing. MIT, Cambridge, pp 1–65Google Scholar
  27. Sánchez-Cartagena VM, Pérez-Ortiz JA (2010a) ScaleMT: a free/open-source framework for building scalable machine translation web services. Prague Bull Math Linguist 93: 97–106CrossRefGoogle Scholar
  28. Sánchez-Cartagena VM, Pérez-Ortiz JA (2010b) Tradubi: open-source social translation for the apertium machine translation platform. Prague Bull Math Linguist 93: 47–56CrossRefGoogle Scholar
  29. Sánchez-Martínez F (2008) Using unsupervised corpus-based methods to build rule-based machine translation systems. PhD thesis, Universitat d’AlacantGoogle Scholar
  30. Sánchez-Martínez F, Forcada ML (2009) Inferring shallow-transfer machine translation rules from small parallel corpora. J Artif Intell Res 34: 605–635MATHGoogle Scholar
  31. Sánchez-Martínez F, Pérez-Ortiz JA, Forcada ML (2008) Using target-language information to train part-of-speech taggers for machine translation. Mach Transl 22(1–2): 29–66CrossRefGoogle Scholar
  32. Sánchez-Martínez F, Forcada ML, Way A (2009) Hybrid rule-based–example-based MT: feeding apertium with sub-sentential translation units. In: Proceedings of the 3rd workshop on example-based machine translation, Dublin, pp 11–18Google Scholar
  33. Scott B, Barreiro A (2009) Openlogos MT and the SAL representation language. In: Proceedings of the first international workshop on free/open-source rule-based machine translation, Alacant, pp 19–26Google Scholar
  34. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas, visions for the future of machine translation, Cambridge, pp 223–231Google Scholar
  35. Thurmair G (2009) Comparing different architectures of hybrid machine translation systems. In: Proceedings of MT Summit XII, Ottawa, pp 340–347Google Scholar
  36. Tyers FM, Alperen MS (2010) SETimes: a parallel corpus of Balkan languages. In: Proceedings of the multiLR workshop at the language resources and evaluation conference, LREC2010, Malta, pp 49–53Google Scholar
  37. Tyers FM, Donnelly K (2009) apertium-cy—a collaboratively-developed free RBMT system for Welsh to English. Prague Bull Math Linguist 91: 57–66CrossRefGoogle Scholar
  38. Tyers FM, Wiechetek L, Trosterud T (2009) Developing prototypes for machine translation between two Sámi languages. In: Proceedings of the 13th annual conference of the European association for machine translation, Barcelona, pp 120–128Google Scholar
  39. Way A (2010) Machine translation. In: Clark A, Fox C, Lappin S (eds) The handbook of computational linguistics and natural language processing. Wiley-Blackwell, Oxford, pp 531–573CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  • Mikel L. Forcada
    • 1
  • Mireia Ginestí-Rosell
    • 1
  • Jacob Nordfalk
    • 2
  • Jim O’Regan
    • 3
  • Sergio Ortiz-Rojas
    • 4
  • Juan Antonio Pérez-Ortiz
    • 1
  • Felipe Sánchez-Martínez
    • 1
  • Gema Ramírez-Sánchez
    • 4
  • Francis M. Tyers
    • 1
  1. 1.Grup Transducens, Departament de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacantSpain
  2. 2.Copenhagen University College of EngineeringCopenhagenDenmark
  3. 3.Eolaistriu TechnologiesThurlesIreland
  4. 4.Prompsit Language EngineeringElcheSpain

Personalised recommendations