Skip to main content
Log in

English–Mizo Machine Translation using neural and statistical approaches

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Machine translation helps resolve language incomprehensibility issues and eases interaction among people from varying linguistic backgrounds. Although corpus-based approaches (statistical and neural) offer reasonable translation accuracy for large-sized corpus, robustness of such approaches lie in their ability to adapt to low-resource languages, which confront unavailability of large-sized corpus. In this paper, prediction aptness of two approaches has been meticulously explored in the context of Mizo, a low-resource Indian language. Translations predicted by the two approaches have been comparatively and adequately analyzed on a number of grounds to infer their strengths and weaknesses, particularly in low-resource scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://en.wikipedia.org/wiki/Mizo_language.

  2. http://www.cs.cmu.edu/~alavie/METEOR/.

  3. http://opennmt.net/.

  4. https://github.com/pathakamarnath/Parallel-Mizo-Data.

References

  1. Bentham J, Pakray P, Majumder G, Lalbiaknia S, Gelbukh A (2016) Identification of rules for recognition of named entity classes in Mizo language. In: 15th Mexican international conference on artificial intelligence (MICAI 2016). Springer, Cancun

  2. Bhattacharyya P (2015) Machine translation. CRC Press, Boca Raton

    Book  Google Scholar 

  3. Brown PF, Cocke J, Pietra SAD, Pietra VJD, Jelinek F, Lafferty JD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2):79–85

    Google Scholar 

  4. Cheng Y, Xu W, He Z, He W, Wu H, Sun M, Liu Y (2016) Semi-supervised learning for neural machine translation. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, Berlin (long papers), vol 1, pp 1965–1974. https://doi.org/10.18653/v1/P16-1185

  5. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder–decoder approaches. In: Proceedings of the eighth workshop on syntax, semantics and structure in statistical translation. Association for Computational Linguistics, Doha, Qatar, pp 103–111. https://doi.org/10.3115/v1/W14-4012

  6. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Doha, Qatar, pp 1724–1734

  7. Dabre R, Cromieres F, Kurohashi S, Bhattacharyya P (2015) Leveraging small multilingual corpora for SMT using many Pivot languages. In: Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, CO, pp 1192–1202

  8. Das A, Yerra P, Kumar K, Sarkar S (2016) A study of attention-based neural machine translation models on Indian languages. In: 6th workshop on South and Southeast Asian Natural Language Processing, Osaka, pp 153–162

  9. Dave S, Parikh J, Bhattacharyya P (2001) Interlingua-based English–Hindi Machine Translation and Language Divergence. Mach Transl 16(4):251–304

    Article  Google Scholar 

  10. Fadaee M, Bisazza A, Monz C (2017) Data augmentation for low-resource neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (short papers), Association for Computational Linguistics, Vancouver, vol 2, pp 567–573. https://doi.org/10.18653/v1/P17-2090

  11. Firat O, Cho K, Sankaran B, Yarman Vural FT, Bengio Y (2017) Multi-way, multilingual neural machine translation. Comput Speech Lang 45:236–252. https://doi.org/10.1016/j.csl.2016.10.006

    Article  Google Scholar 

  12. Gu J, Hassan H, Devlin J, Li VO (2018) Universal neural machine translation for extremely low resource languages. arXiv preprint arXiv:1802.05368

  13. Hearne M, Way A (2011) Statistical machine translation: a guide for linguists and translators. Lang Linguist Compass 5(5):205–226

    Article  Google Scholar 

  14. Hutchins WJ, Somers HL (1992) An introduction to machine translation, vol 362. Academic Press London, Oxford

    MATH  Google Scholar 

  15. Irvine A, Callison-Burch C (2013) Combining bilingual and comparable corpora for Low Resource Machine Translation. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, pp 262–270

  16. Kalchbrenner N, Blunsom P (2013) Recurrent convolutional neural networks for discourse compositionality. In: Proceedings of the 2013 workshop on continuous vector space models and their compositionality, Sofia, Bulgaria, pp 119–126

  17. Karakanta A, Dehdari J, van Genabith J (2017) neural machine translation for low-resource languages without parallel corpora. Mach Transl. https://doi.org/10.1007/s10590-017-9203-5

    Article  Google Scholar 

  18. Klein G, Kim Y, Deng Y, Senellart J, Rush A (2017) OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, system demonstrations. Association for Computational Linguistics, Vancouver, pp 67–72

  19. Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge

    Book  Google Scholar 

  20. Koehn P, Hoang H (2010) Moses, statistical machine translation system. User manual and code guide

  21. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R et al (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, Stroudsburg, PA, pp 177–180

  22. Kunchukuttan A, Shah M, Prakash P, Bhattacharyya P (2017) Utilizing lexical similarity between related, low-resource languages for Pivot-based SMT. In: Proceedings of the eighth international joint conference on natural language processing (short papers), Taipei, Taiwan, vol 2, pp 283–289

  23. Lakew SM, Mattia A, Marcello F (2017) Multilingual neural machine translation for low resource languages. In: CLiC-it 2017—4th Italian conference on computational linguistics. CLiC-it, Rome

  24. Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation. Association for Computational Linguistics, Prague, pp 228–231

  25. Lavie A, Denkowski MJ (2009) The METEOR metric for automatic evaluation of machine translation. Mach Transl 23(2–3):105–115

    Article  Google Scholar 

  26. Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in Natural Language Processing, EMNLP 2015. Lisbon, pp 1412–1421

  27. Luong T, Sutskever I, Le QV, Vinyals O, Zaremba W (2015) Addressing the rare word problem in neural machine translation. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on Natural Language Processing of the Asian Federation of Natural Language Processing. Beijing, pp 11–19

  28. Majumder G, Pakray P, Khiangte Z, Gelbukh A (2016) Multiword expressions (MWE) for Mizo Language: literature survey. In: International conference on intelligent text processing and computational linguistics, Springer, Konya, pp 623–635

  29. Marie B, Fujita A (2018) Phrase table induction using monolingual data for low-resource statistical machine translation. ACM Trans Asian Low Resour Lang Inf Process (TALLIP) 17(3):1–25

    Article  Google Scholar 

  30. Martınez A, Matsumoto Y (2016) Improving neural machine translation on resource-limited pairs using auxiliary data of a third language. In: Proceedings of AMTA 2016, Austin, pp 135–204

  31. Marton Y, Callison-Burch C, Resnik P (2009) Improved statistical machine translation using monolingually-derived paraphrases. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Association for Computational Linguistics, Singapore, pp 381–390

  32. Pakray P, Pal A, Majumder G, Gelbukh A (2015) Resource Building and POS Tagging for Mizo Language. In: Fourteenth Mexican International Conference on Artificial Intelligence (MICAI), pp 3–7. Cuernavaca, Mexico

  33. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania

  34. Sánchez-Cartagena VM, Sánchez-Martínez F, Pérez-Ortiz JA (2011) Enriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases. In: Proceedings of the international conference recent advances in natural language processing 2011. Hissar, Bulgaria, pp 90–96

  35. Somers H (1999) Review article: Example-based machine translation. Mach Transl 14(2):113–157

    Article  MathSciNet  Google Scholar 

  36. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of advances in neural information processing systems. Montréal, pp 3104–3112

  37. Tinsley J, Hearne M, Way A (2009) Exploiting parallel treebanks to improve phrase-based statistical machine translation. In: International conference on intelligent text processing and computational linguistics, Springer, Mexico City, pp 318–331

  38. Tyers FM, Dugast L, Park J (2009) Rule-based augmentation of training data in Breton–French statistical machine translation. In: Proceedings of the 13th annual conference of the european association of machine translation, EAMT09, Barcelona, pp 213–217

  39. Wu J, Hou H, Shen Z, Du J, Li J (2016) Adapting attention-based neural network to low-resource Mongolian–Chinese machine translation, pp 470–480

  40. Xiang B, Deng Y, Zhou B (2010) Diversify and combine: improving word alignment for machine translation on low-resource languages. In: Proceedings of the ACL 2010 conference short papers. Association for Computational Linguistics, Uppsala, pp 22–26

  41. Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics, Austin, TX, pp 1568–1575. https://doi.org/10.18653/v1/D16-1163

Download references

Acknowledgements

Authors would like to thank Department of Computer Science and Engineering, National Institute of Technology Mizoram, for providing the requisite support and infrastructure to execute this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Partha Pakray.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pathak, A., Pakray, P. & Bentham, J. English–Mizo Machine Translation using neural and statistical approaches. Neural Comput & Applic 31, 7615–7631 (2019). https://doi.org/10.1007/s00521-018-3601-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3601-3

Keywords

Navigation