Abstract
Machine Translation Systems for under-resource languages encounter quality and comprehension issues. Our research work focuses on the Statistical and Neural approaches methodologies for translating English into Mizo in a specific domain. We created an English-to-Mizo parallel dataset from the National Platform of Language Technology (NPLT) domains, the Bible and other domains as part of the system development. The performance of translations produced by Phrase-Based Statistical Machine Translation (PB-SMT) and Neural Machine Translation (NMT) systems were trained and tested in under-resource and domain-specific circumstances which were then explored thoroughly utilizing automatic and subjective evaluation approaches. The experiment conducted with PB-SMT displayed better results as compared to the state-of-the-art NMT on English to Mizo translation works. The testing quality of our system was evaluated through a suitable example with automatic BLEU and Manual evaluation consisting of two parameters namely, adequacy and fluency.
Similar content being viewed by others
Data availability
Data can be provides under special request.
References
Mahata S, Das D, Pal S(2016) Wmt2016: a hybrid approach to bilingual document alignment. In: Proceedings of the first conference on machine translation: volume 2, Shared Task Papers, pp. 724–727
Mahata S, Das D, Bandyopadhyay S (2017) Bucc2017: a hybrid approach for identifying parallel sentences in comparable corpora. In: Proceedings of the 10th workshop on building and using comparable Corpora, pp. 56–59
Hutchins J (1995) Machine translation: a brief history, concise history of the language sciences: from the sumerians to the cognitivists. edited by efk koerner and re asher. Oxford: Pergamon Press (431–445). https://www.Infoamerica.org/documentospdf/bar05.pdf on 10(08), 2021 (1995)
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R et al. (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions, pp. 177–180
Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al. (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1700–1709
Luong M-T, Sutskever I, Le QV, Vinyals O, Zaremba W (2014) Addressing the rare word problem in neural machine translation. arXiv preprint arXiv:1410.8206
Luong M-T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025
He W, He Z, Wu H, Wang H (2016) Improved neural machine translation with smt features. In: Proceedings of the AAAI conference on artificial intelligence, vol. 30
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259
Doherty S, O’Brien S, Carl M (2010) Eye tracking as an mt evaluation technique. Mach Transl 24:1–13
Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 1387–1392
Liu S, Yang N, Li M, Zhou M (2014) A recursive recurrent neural network for statistical machine translation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), pp. 1491–1500
Kudo T, Richardson J (2018) Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226
Koehn P, Knowles R (2017) Six challenges for neural machine translation. ArXiv preprint arXiv:1706.03872
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp. 311–318
Chhangte L, et al. (1989) The grammar of simple clauses in mizo. In: Papers in southeast asian linguistics no. 11: Southeast Asian Syntax. Pacific Linguistics.
Fanai LT (2015) Tones in Mizo language. J Human Soc Sci 1(1)
Singh TD, Hujon AV (2020) Low resource and domain specific english to khasi smt and nmt systems. In: 2020 international conference on computational performance evaluation (ComPE), pp. 733–737. IEEE
Esperança-Rodier E, Rossi C, Bèrard A, Besacier L (2017) Evaluation of nmt and smt systems: a study on uses and perceptions. In: 39th conference translating and the computer
Mahata SK, Mandal S, Das D, Bandyopadhyay S (2018) Smt vs nmt: a comparison over Hindi & Bengali simple sentences. arXiv preprint arXiv:1812.04898
Wang R, Ding C, Utiyama M, Sumita E (2018) English-Myanmar nmt and smt with pre-ordering: Nict’s machine translation systems at wat-2018. In: Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation
Mutal J, Volkart L, Bouillon P, Girletti S, Estrella P (2019) Differences between smt and nmt output-a translators’ point of view. In: Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019), pp. 75–81
Jassem K, Dwojak T (2019) Statistical versus neural machine translation—a case study for a medium size domain-specific bilingual corpus. Poznan Stud Contemporary Linguistics 55(2):491–515
Devi CS, Purkayastha BS (2020) Steps of pre-processing for english to mizo smt system. In: Machine learning, image processing, network security and data sciences: second international conference, MIND 2020, Silchar, India, July 30–31, 2020, Proceedings, Part II 2, pp. 156–167. Springer
Thihlum Z, Khenglawt V, Debnath S (2020) Machine translation of English language to Mizo language. In: 2020 IEEE international conference on cloud computing in emerging markets (CCEM), pp. 92–97. IEEE
Pathak A, Pakray P, Bentham J (2019) English–mizo machine translation using neural and statistical approaches. Neural Comput Appl 31(11):7615–7631
Lalrempuii C, Soni B, Pakray P (2021) An improved english-to-mizo neural machine translation. Trans Asian Low-Resource Lang Inf Process 20(4):1–21
Lalrempuii C, Soni B (2020) Attention-based English to Mizo neural machine translation. In: Machine learning, image processing, network security and data sciences: second international conference, MIND 2020, Silchar, India, July 30–31, 2020, Proceedings, Part II 2, pp. 193–203. Springer
Devi CS, Purkayastha BS, Meetei LS (2022) An empirical study on English–Mizo statistical machine translation with bible corpus. Int J Electric Comput Eng Syst 13(9):759–765
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Syst 27
GNB:Bible You Version Homepage. https://www.bible.com/en-GB/bible/2163/
NLPT. https://nplt.in/demo/
Kneser R, Ney H (1995) Improved backing-off for m-gram language modeling. In: 1995 international conference on acoustics, speech, and signal processing, vol. 1, pp. 181–184. IEEE
Heafield K (2011) Kenlm: Faster and smaller language model queries. In: Proceedings of the sixth workshop on statistical machine translation, pp. 187–197 (2011)
Casacuberta F, Vidal E (2007) Giza++: training of statistical translation models. Retrieved October 29, 2019
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American Chapter of the association for computational linguistics: human language technologies, pp. 427–436
Klein G, Kim Y, Deng Y, Senellart J, Rush A (2017) OpenNMT: Open- source toolkit for neural machine translation. In: Proceedings of ACL 2017, system demonstrations, pp. 67–72. Association for Computational Linguistics, Vancouver, Canada. https://www.aclweb.org/anthology/P17-4012
Human Evaluation of Machine Translation. https://tech.ebayinc.com/engineering/human-evaluation-of-machine-translation/
Post M (2018) A call for clarity in reporting bleu scores. arXiv preprint arXiv:1804.08771
Acknowledgements
We are grateful to Ms K. Vanlalruati, Guest Faculty, Mizo Department, Pachuhunga University College, College Veng, Aizawl, Mizoram. and Vanlalhlani Khawlhring, Assam Downtown University, Guwahati, Assam for their support in developing the corpus and for manual evaluation of the corpus and the translation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interest
The authors declare that they have no known completing financial interests or personal relationships that could have appeared to influence the work reported in the paper.
Conflict of Interest
The authors declare no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Devi, C.S., Purkayastha, B.S. An empirical analysis on statistical and neural machine translation system for English to Mizo language. Int. j. inf. tecnol. 15, 4021–4028 (2023). https://doi.org/10.1007/s41870-023-01488-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-023-01488-0