Domain-Specific Hybrid Machine Translation from English to Portuguese

  • João Rodrigues
  • Luís Gomes
  • Steven Neale
  • Andreia Querido
  • Nuno Rendeiro
  • Sanja Štajner
  • João Silva
  • António Branco
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9727)

Abstract

Machine translation (MT) from English to Portuguese has not typically received much attention in existing research. In this paper, we focus on MT from English to Portuguese for the specific domain of information technology (IT), building a small in-domain parallel corpus to address the lack of IT-specific and publicly-available parallel corpora and then adapted an existing hybrid MT system to the new language pair (English to Portuguese). We further improved the initial version of the EN-PT hybrid system by adding various modules to address the most frequently occurring errors in the initial system. In order to assess the improvements achieved by each of these dedicated modules, we compared all versions of our MT system automatically. In addition, we conduct and report on a detailed error analysis of the initial and final versions of our system.

Keywords

Hybrid machine translation TectoMT Lexical semantics IT domain Portuguese 

References

  1. 1.
    Agirre, E., Soroa, A.: Personalizing PageRank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009, pp. 33–41. Association for Computational Linguistics, Athens (2009)Google Scholar
  2. 2.
    Aziz, W., Specia, L.: Fully automatic compilation of a Portuguese-english parallel corpus for statistical machine translation. In: Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology. Cuiabá, MT, October 2011Google Scholar
  3. 3.
    Bojar, O., Týnovský, M.: Evaluation of tree transfer system. Technical report, Charles University in Prague (2009)Google Scholar
  4. 4.
    Bojar, O., Žabokrtský, Z., Dušek, O., Galuščáková, P., Majliš, M., Mareček, D., Maršík, J., Novák, M., Popel, M., Tamchyna, A.: The joy of parallelism with CzEng 1.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), pp. 3921–3928 (2012)Google Scholar
  5. 5.
    Branco, A., Silva, J.R.: A suite of shallow processing tools for Portuguese: LX-suite. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL) (2006)Google Scholar
  6. 6.
    Costa, A., Luís, T., Coheur, L.: Translation errors from english to portuguese: an annotated corpus. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) (2014)Google Scholar
  7. 7.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)Google Scholar
  8. 8.
    Gaudio, R.D., Burchardt, A., Branco, A.: Evaluating machine translation in a usage scenario. In: Proceedings of LREC (2016). (to appear in print)Google Scholar
  9. 9.
    Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the Tenth Machine Translation Summit, pp. 79–86 (2005)Google Scholar
  10. 10.
    Koehn, P., Birch, A., Steinberger, R.: 462 machine translation systems for Europe. In: Proceedings of the MT Summit XII (2009)Google Scholar
  11. 11.
    McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (EMNLP), pp. 523–530 (2005)Google Scholar
  12. 12.
    Neale, S., Gomes, L., Branco, A.: First steps in using word senses as contextual features in maxent models for machine translation. In: Proceedings of the First Workshop on Deep Machine Translation, DMTW-2015, pp. 64–72 (2015)Google Scholar
  13. 13.
    Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL shared task session of EMNLP-CoNLL, pp. 915–932 (2007)Google Scholar
  14. 14.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL (2002)Google Scholar
  15. 15.
    Rodrigues, J., Rendeiro, N., Querido, A., Štajner, S., Branco, A.: Bootstrapping a hybrid MT system to a new language pair. In: Proceedings of LREC (2016). (to appear in print)Google Scholar
  16. 16.
    Sgall, P., Hajicová, E., Panevová, J.: The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Springer Science & Business Media (1986)Google Scholar
  17. 17.
    Silva, J., Rodrigues, J., Gomes, L., Branco, A.: Bootstrapping a hybrid deep MT system. In: Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra), pp. 1–5. ACL (2015)Google Scholar
  18. 18.
    Spoustová, D., Hajič, J., Votrubec, J., Krbec, P., Květoň, P.: The best of two worlds: cooperation of statistical and rule-based taggers for czech. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing, pp. 67–74 (2007)Google Scholar
  19. 19.
    Štajner, S., Rodrigues, J., Gomes, L., Branco, A.: Machine translation for multilingual troubleshooting in the IT domain: a comparison of different strategies. In: Proceedings of the Deep Machine Translation Workshop (DMTW), pp. 106–115 (2015)Google Scholar
  20. 20.
    Žabokrtský, Z., Ptáček, J., Pajas, P.: TectoMT: highly modular MT system with tectogrammatics used as transfer layer. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 167–170 (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • João Rodrigues
    • 1
  • Luís Gomes
    • 1
  • Steven Neale
    • 1
  • Andreia Querido
    • 1
  • Nuno Rendeiro
    • 1
  • Sanja Štajner
    • 1
  • João Silva
    • 1
  • António Branco
    • 1
  1. 1.Department of Informatics, Faculty of SciencesUniversity of LisbonLisbonPortugal

Personalised recommendations