Skip to main content

Domain-Specific Hybrid Machine Translation from English to Portuguese

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2016)

Abstract

Machine translation (MT) from English to Portuguese has not typically received much attention in existing research. In this paper, we focus on MT from English to Portuguese for the specific domain of information technology (IT), building a small in-domain parallel corpus to address the lack of IT-specific and publicly-available parallel corpora and then adapted an existing hybrid MT system to the new language pair (English to Portuguese). We further improved the initial version of the EN-PT hybrid system by adding various modules to address the most frequently occurring errors in the initial system. In order to assess the improvements achieved by each of these dedicated modules, we compared all versions of our MT system automatically. In addition, we conduct and report on a detailed error analysis of the initial and final versions of our system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available from: http://www.meta-share.org/.

  2. 2.

    http://www.qtleap.eu.

  3. 3.

    http://downloads.videolan.org/pub/videolan/vlc/2.1.5/vlc-2.1.5.tar.xz.

  4. 4.

    http://download.documentfoundation.org/libreoffice/src/4.4.0/libreoffice-translations-4.4.0.3.tar.xz.

  5. 5.

    svn://anonsvn.kde.org/home/kde/branches/stable/l10n-kde4/pt/messages.

  6. 6.

    Available from: http://www.microsoft.com/Language/en-US/Terminology.aspx.

  7. 7.

    Available from: https://www.libreoffice.org/community/localization/.

  8. 8.

    http://www.qt21.eu/launchpad/content/multidimensional-quality-metrics.

References

  1. Agirre, E., Soroa, A.: Personalizing PageRank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2009, pp. 33–41. Association for Computational Linguistics, Athens (2009)

    Google Scholar 

  2. Aziz, W., Specia, L.: Fully automatic compilation of a Portuguese-english parallel corpus for statistical machine translation. In: Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology. Cuiabá, MT, October 2011

    Google Scholar 

  3. Bojar, O., Týnovský, M.: Evaluation of tree transfer system. Technical report, Charles University in Prague (2009)

    Google Scholar 

  4. Bojar, O., Žabokrtský, Z., Dušek, O., Galuščáková, P., Majliš, M., Mareček, D., Maršík, J., Novák, M., Popel, M., Tamchyna, A.: The joy of parallelism with CzEng 1.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), pp. 3921–3928 (2012)

    Google Scholar 

  5. Branco, A., Silva, J.R.: A suite of shallow processing tools for Portuguese: LX-suite. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL) (2006)

    Google Scholar 

  6. Costa, A., Luís, T., Coheur, L.: Translation errors from english to portuguese: an annotated corpus. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) (2014)

    Google Scholar 

  7. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  8. Gaudio, R.D., Burchardt, A., Branco, A.: Evaluating machine translation in a usage scenario. In: Proceedings of LREC (2016). (to appear in print)

    Google Scholar 

  9. Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the Tenth Machine Translation Summit, pp. 79–86 (2005)

    Google Scholar 

  10. Koehn, P., Birch, A., Steinberger, R.: 462 machine translation systems for Europe. In: Proceedings of the MT Summit XII (2009)

    Google Scholar 

  11. McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (EMNLP), pp. 523–530 (2005)

    Google Scholar 

  12. Neale, S., Gomes, L., Branco, A.: First steps in using word senses as contextual features in maxent models for machine translation. In: Proceedings of the First Workshop on Deep Machine Translation, DMTW-2015, pp. 64–72 (2015)

    Google Scholar 

  13. Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL shared task session of EMNLP-CoNLL, pp. 915–932 (2007)

    Google Scholar 

  14. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL (2002)

    Google Scholar 

  15. Rodrigues, J., Rendeiro, N., Querido, A., Štajner, S., Branco, A.: Bootstrapping a hybrid MT system to a new language pair. In: Proceedings of LREC (2016). (to appear in print)

    Google Scholar 

  16. Sgall, P., Hajicová, E., Panevová, J.: The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Springer Science & Business Media (1986)

    Google Scholar 

  17. Silva, J., Rodrigues, J., Gomes, L., Branco, A.: Bootstrapping a hybrid deep MT system. In: Proceedings of the Fourth Workshop on Hybrid Approaches to Translation (HyTra), pp. 1–5. ACL (2015)

    Google Scholar 

  18. Spoustová, D., Hajič, J., Votrubec, J., Krbec, P., Květoň, P.: The best of two worlds: cooperation of statistical and rule-based taggers for czech. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing, pp. 67–74 (2007)

    Google Scholar 

  19. Štajner, S., Rodrigues, J., Gomes, L., Branco, A.: Machine translation for multilingual troubleshooting in the IT domain: a comparison of different strategies. In: Proceedings of the Deep Machine Translation Workshop (DMTW), pp. 106–115 (2015)

    Google Scholar 

  20. Žabokrtský, Z., Ptáček, J., Pajas, P.: TectoMT: highly modular MT system with tectogrammatics used as transfer layer. In: Proceedings of the Third Workshop on Statistical Machine Translation, pp. 167–170 (2008)

    Google Scholar 

Download references

Acknowledgements

The results reported in this paper were partially supported by the Portuguese Government’s P2020 program under the grant 08/SI/2015/3279: ASSET-Intelligent Assistance for Everyone Everywhere, and by the EC’s FP7 program under the grant number 610516: QTLeap-Quality Translation by Deep Language Engineering Approaches.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Rodrigues .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Rodrigues, J. et al. (2016). Domain-Specific Hybrid Machine Translation from English to Portuguese. In: Silva, J., Ribeiro, R., Quaresma, P., Adami, A., Branco, A. (eds) Computational Processing of the Portuguese Language. PROPOR 2016. Lecture Notes in Computer Science(), vol 9727. Springer, Cham. https://doi.org/10.1007/978-3-319-41552-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41552-9_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41551-2

  • Online ISBN: 978-3-319-41552-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics