Skip to main content
Log in

An empirical analysis on statistical and neural machine translation system for English to Mizo language

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Machine Translation Systems for under-resource languages encounter quality and comprehension issues. Our research work focuses on the Statistical and Neural approaches methodologies for translating English into Mizo in a specific domain. We created an English-to-Mizo parallel dataset from the National Platform of Language Technology (NPLT) domains, the Bible and other domains as part of the system development. The performance of translations produced by Phrase-Based Statistical Machine Translation (PB-SMT) and Neural Machine Translation (NMT) systems were trained and tested in under-resource and domain-specific circumstances which were then explored thoroughly utilizing automatic and subjective evaluation approaches. The experiment conducted with PB-SMT displayed better results as compared to the state-of-the-art NMT on English to Mizo translation works. The testing quality of our system was evaluated through a suitable example with automatic BLEU and Manual evaluation consisting of two parameters namely, adequacy and fluency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data availability

Data can be provides under special request.

References

  1. Mahata S, Das D, Pal S(2016) Wmt2016: a hybrid approach to bilingual document alignment. In: Proceedings of the first conference on machine translation: volume 2, Shared Task Papers, pp. 724–727

  2. Mahata S, Das D, Bandyopadhyay S (2017) Bucc2017: a hybrid approach for identifying parallel sentences in comparable corpora. In: Proceedings of the 10th workshop on building and using comparable Corpora, pp. 56–59

  3. Hutchins J (1995) Machine translation: a brief history, concise history of the language sciences: from the sumerians to the cognitivists. edited by efk koerner and re asher. Oxford: Pergamon Press (431–445). https://www.Infoamerica.org/documentospdf/bar05.pdf on 10(08), 2021 (1995)

  4. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R et al. (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pp. 177–180

  5. Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  6. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  7. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al. (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144

  8. Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1700–1709

  9. Luong M-T, Sutskever I, Le QV, Vinyals O, Zaremba W (2014) Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206

  10. Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025

  11. He W, He Z, Wu H, Wang H (2016) Improved neural machine translation with smt features. In: Proceedings of the AAAI conference on artificial intelligence, vol. 30

  12. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259

  13. Doherty S, O’Brien S, Carl M (2010) Eye tracking as an mt evaluation technique. Mach Transl 24:1–13

    Article  Google Scholar 

  14. Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 1387–1392

  15. Liu S, Yang N, Li M, Zhou M (2014) A recursive recurrent neural network for statistical machine translation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), pp. 1491–1500

  16. Kudo T, Richardson J (2018) Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226

  17. Koehn P, Knowles R (2017) Six challenges for neural machine translation. ArXiv preprint arXiv:1706.03872

  18. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp. 311–318

  19. Chhangte L, et al. (1989) The grammar of simple clauses in mizo. In: Papers in southeast asian linguistics no. 11: Southeast Asian Syntax. Pacific Linguistics.

  20. Fanai LT (2015) Tones in Mizo language. J Human Soc Sci 1(1)

  21. Singh TD, Hujon AV (2020) Low resource and domain specific english to khasi smt and nmt systems. In: 2020 international conference on computational performance evaluation (ComPE), pp. 733–737. IEEE

  22. Esperança-Rodier E, Rossi C, Bèrard A, Besacier L (2017) Evaluation of nmt and smt systems: a study on uses and perceptions. In: 39th conference translating and the computer

  23. Mahata SK, Mandal S, Das D, Bandyopadhyay S (2018) Smt vs nmt: a comparison over Hindi & Bengali simple sentences. arXiv preprint arXiv:1812.04898

  24. Wang R, Ding C, Utiyama M, Sumita E (2018) English-Myanmar nmt and smt with pre-ordering: Nict’s machine translation systems at wat-2018. In: Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

  25. Mutal J, Volkart L, Bouillon P, Girletti S, Estrella P (2019) Differences between smt and nmt output-a translators’ point of view. In: Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019), pp. 75–81

  26. Jassem K, Dwojak T (2019) Statistical versus neural machine translation—a case study for a medium size domain-specific bilingual corpus. Poznan Stud Contemporary Linguistics 55(2):491–515

    Article  Google Scholar 

  27. Devi CS, Purkayastha BS (2020) Steps of pre-processing for english to mizo smt system. In: Machine learning, image processing, network security and data sciences: second international conference, MIND 2020, Silchar, India, July 30–31, 2020, Proceedings, Part II 2, pp. 156–167. Springer

  28. Thihlum Z, Khenglawt V, Debnath S (2020) Machine translation of English language to Mizo language. In: 2020 IEEE international conference on cloud computing in emerging markets (CCEM), pp. 92–97. IEEE

  29. Pathak A, Pakray P, Bentham J (2019) English–mizo machine translation using neural and statistical approaches. Neural Comput Appl 31(11):7615–7631

    Article  Google Scholar 

  30. Lalrempuii C, Soni B, Pakray P (2021) An improved english-to-mizo neural machine translation. Trans Asian Low-Resource Lang Inf Process 20(4):1–21

    Article  Google Scholar 

  31. Lalrempuii C, Soni B (2020) Attention-based English to Mizo neural machine translation. In: Machine learning, image processing, network security and data sciences: second international conference, MIND 2020, Silchar, India, July 30–31, 2020, Proceedings, Part II 2, pp. 193–203. Springer

  32. Devi CS, Purkayastha BS, Meetei LS (2022) An empirical study on English–Mizo statistical machine translation with bible corpus. Int J Electric Comput Eng Syst 13(9):759–765

    Google Scholar 

  33. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 27

  34. GNB:Bible You Version Homepage. https://www.bible.com/en-GB/bible/2163/

  35. NLPT. https://nplt.in/demo/

  36. Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: 1995 international conference on acoustics, speech, and signal processing, vol. 1, pp. 181–184. IEEE

  37. Heafield K (2011) Kenlm: Faster and smaller language model queries. In: Proceedings of the sixth workshop on statistical machine translation, pp. 187–197 (2011)

  38. Casacuberta F, Vidal E (2007) Giza++: training of statistical translation models. Retrieved October 29, 2019

  39. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  MATH  Google Scholar 

  40. Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American Chapter of the association for computational linguistics: human language technologies, pp. 427–436

  41. Klein G, Kim Y, Deng Y, Senellart J, Rush A (2017) OpenNMT: Open- source toolkit for neural machine translation. In: Proceedings of ACL 2017, system demonstrations, pp. 67–72. Association for Computational Linguistics, Vancouver, Canada. https://www.aclweb.org/anthology/P17-4012

  42. Human Evaluation of Machine Translation. https://tech.ebayinc.com/engineering/human-evaluation-of-machine-translation/

  43. Post M (2018) A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771

Download references

Acknowledgements

We are grateful to Ms K. Vanlalruati, Guest Faculty, Mizo Department, Pachuhunga University College, College Veng, Aizawl, Mizoram. and Vanlalhlani Khawlhring, Assam Downtown University, Guwahati, Assam for their support in developing the corpus and for manual evaluation of the corpus and the translation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chanambam Sveta Devi.

Ethics declarations

Competing Interest

The authors declare that they have no known completing financial interests or personal relationships that could have appeared to influence the work reported in the paper.

Conflict of Interest

The authors declare no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Devi, C.S., Purkayastha, B.S. An empirical analysis on statistical and neural machine translation system for English to Mizo language. Int. j. inf. tecnol. 15, 4021–4028 (2023). https://doi.org/10.1007/s41870-023-01488-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-023-01488-0

Keywords

Navigation