Open-Source Portuguese–Spanish Machine Translation

  • Carme Armentano-Oller
  • Rafael C. Carrasco
  • Antonio M. Corbí-Bellot
  • Mikel L. Forcada
  • Mireia Ginestí-Rosell
  • Sergio Ortiz-Rojas
  • Juan Antonio Pérez-Ortiz
  • Gema Ramírez-Sánchez
  • Felipe Sánchez-Martínez
  • Miriam A. Scalco
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)

Abstract

This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese \(\leftrightarrow\) Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state-based chunking for structural transfer, and is based on a simple rationale: to produce fast, reasonably intelligible and easily correctable translations between related languages, it suffices to use a MT strategy which uses shallow parsing techniques to refine word-for-word MT. This paper briefly describes the MT engine, the formats it uses for linguistic data, and the compilers that convert these data into an efficient format used by the engine, and then goes on to describe in more detail the pilot Portuguese\(\leftrightarrow\)Spanish linguistic data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Canals-Marote, R., Esteve-Guillen, A., Garrido-Alenda, A., Guardiola-Savall, M., Iturraspe-Bellver, A., Montserrat-Buendia, S., Ortiz-Rojas, S., Pastor-Pina, H., Perez-Antón, P., Forcada, M.: The Spanish-Catalan machine translation system interNOSTRUM. In: Proceedings of MT Summit VIII: Machine Translation in the Information Age, Santiago de Compostela, Spain, July 18–22 (2001)Google Scholar
  2. 2.
    Garrido-Alenda, A., Gilabert Zarco, P., Pérez-Ortiz, J.A., Pertusa-Ibáñez, A., Ramírez-Sánchez, G., Sánchez-Martínez, F., Scalco, M.A., Forcada, M.L.: Shallow parsing for Portuguese-Spanish machine translation. In: Branco, A., Mendes, A., Ribeiro, R. (eds.) Language technology for Portuguese: shallow processing tools and resources, Edições Colibri, Lisboa, pp. 135–144 (2004)Google Scholar
  3. 3.
    Corbí-Bellot, A.M., Forcada, M.L., Ortiz-Rojas, S., Pérez-Ortiz, J.A., Ramírez- Sánchez, G., Sánchez-Martínez, F., Alegria, I., Mayor, A., Sarasola, K.: An opensource shallow-transfer machine translation engine for the romance languages of Spain. In: Proceedings of the Tenth Conference of the European Association for Machine Translation, pp. 79–86 (2005)Google Scholar
  4. 4.
    Cutting, D., Kupiec, J., Pedersen, J., Sibun, P.: A practical part-of-speech tagger. In: Third Conference on Applied Natural Language Processing. Association for Computational Linguistics, Proceedings of the Conference, Trento, Italy, pp. 133–140 (1992)Google Scholar
  5. 5.
    Lesk, M.: Lex — a lexical analyzer generator. Technical Report 39, AT&T Bell Laboratories, Murray Hill, N.J (1975)Google Scholar
  6. 6.
    Roche, E., Schabes, Y.: Introduction. In: Roche, E., Schabes, Y. (eds.) Finite-State Language Processing, pp. 1–65. MIT Press, Cambridge (1997)Google Scholar
  7. 7.
    Garrido-Alenda, A., Forcada, M.L., Carrasco, R.C.: Incremental construction and maintenance of morphological analysers based on augmented letter transducers. In: Proceedings of TMI 2002 (Theoretical and Methodological Issues in Machine Translation, Keihanna/Kyoto, Japan, March 2002), pp. 53–62 (2002)Google Scholar
  8. 8.
    Ortiz-Rojas, S., Forcada, M.L., Ramírez-Sánchez, G.: Construcción y minimización eficiente de transductores de letras a partir de diccionarios con paradigmas. Procesamiento del Lenguaje Natural (35), 51–57 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Carme Armentano-Oller
    • 1
  • Rafael C. Carrasco
    • 1
  • Antonio M. Corbí-Bellot
    • 1
  • Mikel L. Forcada
    • 1
  • Mireia Ginestí-Rosell
    • 1
  • Sergio Ortiz-Rojas
    • 1
  • Juan Antonio Pérez-Ortiz
    • 1
  • Gema Ramírez-Sánchez
    • 1
  • Felipe Sánchez-Martínez
    • 1
  • Miriam A. Scalco
    • 1
  1. 1.Transducens Group, Departament de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacantSpain

Personalised recommendations