Skip to main content
Log in

Improving Arabic neural machine translation via n-best list re-ranking

  • Published:
Machine Translation

Abstract

Even though the rise of the neural machine translation (NMT) paradigm has brought a great deal of improvement to the field of machine translation (MT), the current translation results are still not perfect. One of the main reasons for this imperfection is the decoding task complexity. Indeed, the problem of finding the one best translation from the space of all possible translations was and still is a challenging problem. One of the most successful ways to address it is via n-best list re-ranking which attempts to reorder the n-best decoder translations according to some defined features. In this paper, we propose a set of new re-ranking features that can be extracted directly from the parallel corpus without needing any external tools. The feature set that we propose takes into account lexical, syntactic, and even semantic aspects of the n-best list translations. We also present a method for feature weights optimization that uses a quantum-behaved particle swarm optimization algorithm. Our system has been evaluated on multiple English-to-Arabic and Arabic-to-English MT test sets, and the obtained re-ranking results yield noticeable improvements over the baseline NMT systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Each translation in the n-best list is known as a “translation candidate” or a “translation hypothesis”.

  2. Generally a padding process is used to pad all the sentences into the same length.

  3. We follow the MBR re-ranking method given by González-Rubio et al. (2011).

  4. In this work the list of n-best list candidates is considered as the evidence space.

  5. The value of the contraction-expansion coefficient parameter is generally set to 0.75 as recommended by Sun et al. (2012).

  6. https://github.com/moses-smt/giza-pp/tree/master/GIZA%2B%2B-v2.

  7. https://github.com/kpu/kenlm.

  8. http://opennmt.net/.

  9. https://github.com/google/sentencepiece.

  10. https://github.com/Babylonpartners/fastText_multilingual.

  11. https://github.com/Maluuba/nlg-eval.

  12. For the remainder of this section “MET” is used as an abbreviation for “METEOR”.

  13. http://opus.lingfil.uu.se/.

  14. https://cms.unov.org/UNCorpus/.

  15. http://workshop2016.iwslt.org/59.php.

  16. The effectiveness of this preprocessing step has been investigated in the work of Habash and Sadat (2006).

  17. http://www.nltk.org/.

References

  • Arun A, Koehn P (2007) Online learning methods for discriminative training of phrase based statistical machine translation. In: MT Summit XI, Proceedings. Copenhagen, Denmark, pp 15–20

  • Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473

  • Brown PF, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311

    Google Scholar 

  • Carter S, Monz C, (2010) Discriminative syntactic reranking for statistical machine translation. In: AMTA, (2010) the ninth conference of the association for machine translation in the Americas. Denver, p 10

  • Carter S, Monz C (2011) Syntactic discriminative language model rerankers for statistical machine translation. Mach Transl 25(4):317–339

    Article  Google Scholar 

  • Chen B, Cherry C (2014) A systematic comparison of smoothing techniques for sentence-level BLEU. In: Proceedings of the ninth workshop on statistical machine translation. Baltimore, pp 362–367

  • Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014a) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Doha, pp 1724–1734

  • Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014b) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259

  • Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. San Diego, pp 93–98

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  • Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation. Baltimore, pp 376–380

  • Duh K, Kirchhoff K (2008) Beyond log-linear models: boosted minimum error rate training for n-best re-ranking. In: Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies: short papers. Columbus, pp 37–40

  • Duh K, Sudoh K, Tsukada H, Isozaki H, Nagata M (2010) N-best reranking by multitask learning. In: Proceedings of the joint fifth workshop on statistical machine translation and metricsMATR. Uppsala, pp 375–383

  • Farzi S, Faili H (2015) A swarm-inspired re-ranker system for statistical machine translation. Comput Speech Lang 29(1):45–62

    Article  Google Scholar 

  • Freitag M, Al-Onaizan Y (2017) Beam search strategies for neural machine translation. In: Proceedings of the first workshop on neural machine translation. Vancouver, pp 56–60

  • Goller C, Kuchler A (1996) Learning task-dependent distributed representations by backpropagation through structure. In: Proceedings of the IEEE international conference on neural networks, vol. 1. Washington, DC, pp 347–352

  • González-Rubio J, Juan A, Casacuberta F (2011) Minimum Bayes-risk system combination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Portland, Oregon, pp 1268–1277

  • Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the human language technology conference of the NAACL, companion volume: short papers. New York City, pp 49–52

  • Hasan S, Zens R, Ney H (2007) Are very large N-best lists useful for SMT? Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics; companion volume: short papers, Rochester, NY, pp 57–60

  • Hassan H, Aue A, Chen C, Chowdhary V, Clark J, Federmann C, Huang X, Junczys-Dowmunt M, Lewis W, Li M, Liu S, Liu T, Luo R, Menezes A, Qin T, Seide F, Tan X, Tian F, Wu L, Wu S, Xia Y, Zhang D, Zhang Z, Zhou M (2018) Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567 25pp

  • Heafield K (2011) KenLM: Faster and smaller language model queries. In: Proceedings of the sixth workshop on statistical machine translation. Edinburgh, Scotland, pp 187–197

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882

  • Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Kirchhoff K, Yang M (2005) Improved language modeling for statistical machine translation. In: Proceedings of the ACL workshop on building and using parallel texts, Ann Arbor, MI, pp 125–128

  • Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810

  • Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1. Detroit, MI, pp 181–184

  • Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation. Vancouver, Canada, pp 28–39

  • Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics: HLT-NAACL 2004. Boston, MA, pp 169–176

  • Li J, Jurafsky D (2016) Mutual information and diverse decoding improve neural machine translation. arXiv preprint arXiv:1601.00372

  • Liu L, Utiyama M, Finch A, Sumita E (2016) Agreement on target-bidirectional neural machine translation. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. San Diego, CA, pp 411–416

  • Liu Y, Zhou L, Wang Y, Zhao Y, Zhang J, Zong C (2018) A comparable study on model averaging, ensembling and reranking in NMT. In: Natural language processing and Chinese computing. NLPCC 2018. Lecture Notes in Computer Science, vol 11109. Springer, Cham, pp 299–308

  • Luong NQ, Popescu-Belis A (2016) A contextual language model to improve machine translation of pronouns by re-ranking translation hypotheses. In: Proceedings of the 19th annual conference of the European association for machine translation. Riga, Latvia, pp 292–304

  • Mikolov T, Karafiát M, Burget L, Černockỳ J, Khudanpur S (2010) Recurrent neural network based language model. INTERSPEECH 2010: eleventh annual conference of the international speech communication association. Makuhari, Chiba, Japan, pp 1045–1048

  • Neubig G, Morishita M, Nakamura S (2015) Neural reranking improves subjective quality of machine translation: NAIST at WAT2015. arXiv preprint arXiv:1510.05203

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 160–167

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  Google Scholar 

  • Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics: HLT-NAACL 2004, Boston, MA, pp 161–168

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, Philadelphia, PA, pp 311–318

  • Russell SJ, Norvig P (2016) Artificial intelligence: a modern approach. Pearson Education Limited, Malaysia

    MATH  Google Scholar 

  • Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  • Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909

  • Sharma S, El Asri L, Schulz H, Zumer J (2017) Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation. CoRR arXiv:1706.09799, URL http://arxiv.org/abs/1706.09799

  • Shu R, Nakayama H (2017) Later-stage minimum bayes-risk decoding for neural machine translation. arXiv preprint arXiv:1704.03169

  • Smith SL, Turban DH, Hamblin S, Hammerla NY (2017) Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv preprint arXiv:1702.03859

  • Sokolov A, Wisniewski G, Yvon F (2012) Non-linear n-best list reranking with few features. In: AMTA-2012: the tenth biennial conference of the association for machine translation in the Americas. San Diego, CA, p 10

  • Specia L, Sankaran B, das Graças Volpe Nunes M (2008) N-best reranking for the efficient integration of word sense disambiguation and statistical machine translation. In: Computational linguistics and intelligent text processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, pp 399–410

  • Stahlberg F, Hasler E, Waite A, Byrne B (2016) Syntactically guided neural machine translation. arXiv preprint arXiv:1605.04569

  • Sun J, Xu W, Feng B (2004) A global search strategy of quantum-behaved particle swarm optimization. In: IEEE conference on cybernetics and intelligent systems, vol 1. Singapore, pp 111–116

  • Sun J, Fang W, Wu X, Palade V, Xu W (2012) Quantum-behaved particle swarm optimization: analysis of individual particle behavior and parameter selection. Evolut Comput 20(3):349–393

    Article  Google Scholar 

  • Tong Y, Wong DF, Chao LS (2016) Exploiting rich feature representation for smt n-best reranking. In: International conference on wavelet analysis and pattern recognition (ICWAPR). Jeju Island, South Korea, pp 101–106

  • Tromble RW, Kumar S, Och F, Macherey W (2008) Lattice minimum bayes-risk decoding for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing. Waikiki, Honolulu, HI, pp 620–629

  • Vijayakumar AK, Cogswell M, Selvaraju RR, Sun Q, Lee S, Crandall D, Batra D (2016) Diverse beam search: decoding diverse solutions from neural sequence models. arXiv preprint arXiv:1610.02424

  • Wang D, Nyberg E (2015) A long short-term memory model for answer sentence selection in question answering. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Short Papers), vol 2. Beijing, China, pp 707–712

  • Watanabe T, Suzuki J, Tsukada H, Isozaki H (2007) Online large margin training for statistical machine translation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL). Prague, Czech Republic, pp 764–773

  • Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR arXiv:1609.08144

  • Xiao T, Zhu J, Liu T (2013) Bagging and boosting statistical machine translation systems. Artif Intell 195:496–527

    Article  MathSciNet  Google Scholar 

  • Zhang J, Utiyama M, Sumita E, Neubig G, Nakamura S (2017) Improving neural machine translation through phrase-based forced decoding. arXiv preprint arXiv:1711.00309

  • Zhang Z, Wang R, Utiyama M, Sumita E, Zhao H (2018) Exploring recombination for efficient decoding of neural machine translation. arXiv preprint arXiv:1808.08482

  • Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The United Nations Parallel Corpus v1. 0. In: Proceedings of the tenth international conference on language resources and evaluation (LREC 2016). Portorož, Slovenia, pp 3530–3534

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Seghir Hadj Ameur.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hadj Ameur, M.S., Guessoum, A. & Meziane, F. Improving Arabic neural machine translation via n-best list re-ranking. Machine Translation 33, 279–314 (2019). https://doi.org/10.1007/s10590-019-09237-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-019-09237-6

Keywords

Navigation