Advertisement

Machine Translation

, Volume 33, Issue 4, pp 279–314 | Cite as

Improving Arabic neural machine translation via n-best list re-ranking

  • Mohamed Seghir Hadj AmeurEmail author
  • Ahmed Guessoum
  • Farid Meziane
Article

Abstract

Even though the rise of the neural machine translation (NMT) paradigm has brought a great deal of improvement to the field of machine translation (MT), the current translation results are still not perfect. One of the main reasons for this imperfection is the decoding task complexity. Indeed, the problem of finding the one best translation from the space of all possible translations was and still is a challenging problem. One of the most successful ways to address it is via n-best list re-ranking which attempts to reorder the n-best decoder translations according to some defined features. In this paper, we propose a set of new re-ranking features that can be extracted directly from the parallel corpus without needing any external tools. The feature set that we propose takes into account lexical, syntactic, and even semantic aspects of the n-best list translations. We also present a method for feature weights optimization that uses a quantum-behaved particle swarm optimization algorithm. Our system has been evaluated on multiple English-to-Arabic and Arabic-to-English MT test sets, and the obtained re-ranking results yield noticeable improvements over the baseline NMT systems.

Keywords

Natural language processing Machine translation Neural machine translation Quantum-behaved PSO 

Notes

References

  1. Arun A, Koehn P (2007) Online learning methods for discriminative training of phrase based statistical machine translation. In: MT Summit XI, Proceedings. Copenhagen, Denmark, pp 15–20Google Scholar
  2. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. CoRR arXiv:1409.0473
  3. Brown PF, Della Pietra VJ, Della Pietra SA, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311Google Scholar
  4. Carter S, Monz C, (2010) Discriminative syntactic reranking for statistical machine translation. In: AMTA, (2010) the ninth conference of the association for machine translation in the Americas. Denver, p 10Google Scholar
  5. Carter S, Monz C (2011) Syntactic discriminative language model rerankers for statistical machine translation. Mach Transl 25(4):317–339CrossRefGoogle Scholar
  6. Chen B, Cherry C (2014) A systematic comparison of smoothing techniques for sentence-level BLEU. In: Proceedings of the ninth workshop on statistical machine translation. Baltimore, pp 362–367Google Scholar
  7. Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014a) Learning phrase representations using rnn encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Doha, pp 1724–1734Google Scholar
  8. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014b) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259
  9. Chopra S, Auli M, Rush AM (2016) Abstractive sentence summarization with attentive recurrent neural networks. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. San Diego, pp 93–98Google Scholar
  10. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537zbMATHGoogle Scholar
  11. Denkowski M, Lavie A (2014) Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the ninth workshop on statistical machine translation. Baltimore, pp 376–380Google Scholar
  12. Duh K, Kirchhoff K (2008) Beyond log-linear models: boosted minimum error rate training for n-best re-ranking. In: Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies: short papers. Columbus, pp 37–40Google Scholar
  13. Duh K, Sudoh K, Tsukada H, Isozaki H, Nagata M (2010) N-best reranking by multitask learning. In: Proceedings of the joint fifth workshop on statistical machine translation and metricsMATR. Uppsala, pp 375–383Google Scholar
  14. Farzi S, Faili H (2015) A swarm-inspired re-ranker system for statistical machine translation. Comput Speech Lang 29(1):45–62CrossRefGoogle Scholar
  15. Freitag M, Al-Onaizan Y (2017) Beam search strategies for neural machine translation. In: Proceedings of the first workshop on neural machine translation. Vancouver, pp 56–60Google Scholar
  16. Goller C, Kuchler A (1996) Learning task-dependent distributed representations by backpropagation through structure. In: Proceedings of the IEEE international conference on neural networks, vol. 1. Washington, DC, pp 347–352Google Scholar
  17. González-Rubio J, Juan A, Casacuberta F (2011) Minimum Bayes-risk system combination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Portland, Oregon, pp 1268–1277Google Scholar
  18. Habash N, Sadat F (2006) Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the human language technology conference of the NAACL, companion volume: short papers. New York City, pp 49–52Google Scholar
  19. Hasan S, Zens R, Ney H (2007) Are very large N-best lists useful for SMT? Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics; companion volume: short papers, Rochester, NY, pp 57–60Google Scholar
  20. Hassan H, Aue A, Chen C, Chowdhary V, Clark J, Federmann C, Huang X, Junczys-Dowmunt M, Lewis W, Li M, Liu S, Liu T, Luo R, Menezes A, Qin T, Seide F, Tan X, Tian F, Wu L, Wu S, Xia Y, Zhang D, Zhang Z, Zhou M (2018) Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:1803.05567 25pp
  21. Heafield K (2011) KenLM: Faster and smaller language model queries. In: Proceedings of the sixth workshop on statistical machine translation. Edinburgh, Scotland, pp 187–197Google Scholar
  22. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
  23. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
  24. Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
  25. Kirchhoff K, Yang M (2005) Improved language modeling for statistical machine translation. In: Proceedings of the ACL workshop on building and using parallel texts, Ann Arbor, MI, pp 125–128Google Scholar
  26. Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) Opennmt: Open-source toolkit for neural machine translation. arXiv preprint arXiv:1701.02810
  27. Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1. Detroit, MI, pp 181–184Google Scholar
  28. Koehn P (2009) Statistical machine translation. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  29. Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation. Vancouver, Canada, pp 28–39Google Scholar
  30. Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics: HLT-NAACL 2004. Boston, MA, pp 169–176Google Scholar
  31. Li J, Jurafsky D (2016) Mutual information and diverse decoding improve neural machine translation. arXiv preprint arXiv:1601.00372
  32. Liu L, Utiyama M, Finch A, Sumita E (2016) Agreement on target-bidirectional neural machine translation. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. San Diego, CA, pp 411–416Google Scholar
  33. Liu Y, Zhou L, Wang Y, Zhao Y, Zhang J, Zong C (2018) A comparable study on model averaging, ensembling and reranking in NMT. In: Natural language processing and Chinese computing. NLPCC 2018. Lecture Notes in Computer Science, vol 11109. Springer, Cham, pp 299–308Google Scholar
  34. Luong NQ, Popescu-Belis A (2016) A contextual language model to improve machine translation of pronouns by re-ranking translation hypotheses. In: Proceedings of the 19th annual conference of the European association for machine translation. Riga, Latvia, pp 292–304Google Scholar
  35. Mikolov T, Karafiát M, Burget L, Černockỳ J, Khudanpur S (2010) Recurrent neural network based language model. INTERSPEECH 2010: eleventh annual conference of the international speech communication association. Makuhari, Chiba, Japan, pp 1045–1048Google Scholar
  36. Neubig G, Morishita M, Nakamura S (2015) Neural reranking improves subjective quality of machine translation: NAIST at WAT2015. arXiv preprint arXiv:1510.05203
  37. Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the association for computational linguistics. Sapporo, Japan, pp 160–167Google Scholar
  38. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51CrossRefGoogle Scholar
  39. Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics: HLT-NAACL 2004, Boston, MA, pp 161–168Google Scholar
  40. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, Philadelphia, PA, pp 311–318Google Scholar
  41. Russell SJ, Norvig P (2016) Artificial intelligence: a modern approach. Pearson Education Limited, MalaysiazbMATHGoogle Scholar
  42. Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRefGoogle Scholar
  43. Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909
  44. Sharma S, El Asri L, Schulz H, Zumer J (2017) Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation. CoRR arXiv:1706.09799, URL http://arxiv.org/abs/1706.09799
  45. Shu R, Nakayama H (2017) Later-stage minimum bayes-risk decoding for neural machine translation. arXiv preprint arXiv:1704.03169
  46. Smith SL, Turban DH, Hamblin S, Hammerla NY (2017) Offline bilingual word vectors, orthogonal transformations and the inverted softmax. arXiv preprint arXiv:1702.03859
  47. Sokolov A, Wisniewski G, Yvon F (2012) Non-linear n-best list reranking with few features. In: AMTA-2012: the tenth biennial conference of the association for machine translation in the Americas. San Diego, CA, p 10Google Scholar
  48. Specia L, Sankaran B, das Graças Volpe Nunes M (2008) N-best reranking for the efficient integration of word sense disambiguation and statistical machine translation. In: Computational linguistics and intelligent text processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, pp 399–410Google Scholar
  49. Stahlberg F, Hasler E, Waite A, Byrne B (2016) Syntactically guided neural machine translation. arXiv preprint arXiv:1605.04569
  50. Sun J, Xu W, Feng B (2004) A global search strategy of quantum-behaved particle swarm optimization. In: IEEE conference on cybernetics and intelligent systems, vol 1. Singapore, pp 111–116Google Scholar
  51. Sun J, Fang W, Wu X, Palade V, Xu W (2012) Quantum-behaved particle swarm optimization: analysis of individual particle behavior and parameter selection. Evolut Comput 20(3):349–393CrossRefGoogle Scholar
  52. Tong Y, Wong DF, Chao LS (2016) Exploiting rich feature representation for smt n-best reranking. In: International conference on wavelet analysis and pattern recognition (ICWAPR). Jeju Island, South Korea, pp 101–106Google Scholar
  53. Tromble RW, Kumar S, Och F, Macherey W (2008) Lattice minimum bayes-risk decoding for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing. Waikiki, Honolulu, HI, pp 620–629Google Scholar
  54. Vijayakumar AK, Cogswell M, Selvaraju RR, Sun Q, Lee S, Crandall D, Batra D (2016) Diverse beam search: decoding diverse solutions from neural sequence models. arXiv preprint arXiv:1610.02424
  55. Wang D, Nyberg E (2015) A long short-term memory model for answer sentence selection in question answering. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Short Papers), vol 2. Beijing, China, pp 707–712Google Scholar
  56. Watanabe T, Suzuki J, Tsukada H, Isozaki H (2007) Online large margin training for statistical machine translation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL). Prague, Czech Republic, pp 764–773Google Scholar
  57. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR arXiv:1609.08144
  58. Xiao T, Zhu J, Liu T (2013) Bagging and boosting statistical machine translation systems. Artif Intell 195:496–527MathSciNetCrossRefGoogle Scholar
  59. Zhang J, Utiyama M, Sumita E, Neubig G, Nakamura S (2017) Improving neural machine translation through phrase-based forced decoding. arXiv preprint arXiv:1711.00309
  60. Zhang Z, Wang R, Utiyama M, Sumita E, Zhao H (2018) Exploring recombination for efficient decoding of neural machine translation. arXiv preprint arXiv:1808.08482
  61. Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The United Nations Parallel Corpus v1. 0. In: Proceedings of the tenth international conference on language resources and evaluation (LREC 2016). Portorož, Slovenia, pp 3530–3534Google Scholar

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  1. 1.Laboratory for Research in Artificial Intelligence, Natural Language Processing and Machine Learning Research Group, Computer Science DepartmentUniversity of Science and Technology Houari Boumediene (USTHB)AlgiersAlgeria
  2. 2.School of Computing, Science and EngineeringUniversity of SalfordSalfordUK

Personalised recommendations