Abstract
Neural machine translation handles sequential data over the variable length of input and output sentences and accomplishes a state-of-the-art method for the task of machine translation. Although the neural machine translation shows good performance in both low and high-resource language pairs translation, it requires adequate parallel training data. In low-resource language sets, the preparation of the corpus is strenuous and time-consuming. Automatic translation systems like Google and Bing cover under-resourced Indian languages, but lack the support of the Nyishi language. It is due to the lack of a suitable dataset. In this work, we have contributed a parallel corpus of low-resource language pairs, English-Nyishi, and reported comparative experiments on the baseline neural machine translation systems. The results are evaluated for English to Nyishi and vice-versa via well-known automatic evaluation metrics and manual evaluation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Regular Paper https://www.bing.com/translator
Regular Paper https://www.bible.com/
Regular Paper https://github.com/OpenNMT/OpenNMT-py
References
Karine M and Dan P 2008 Low-density language bootstrapping: the case of Tajiki Persian. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), pp. 3293–3298
Katharina P, Ralf D B, Jaime G C, Alon L, Lori L and Erik P 2001 Design and implementation of controlled elicitation for machine translation of low-density languages. In: Workshop on MT2010: Towards a Road Map for MT
Jiatao G, Hany H, Jacob D and Victor O L 2018 Universal neural machine translation for extremely low resource languages. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 344–354
Tom K 2020 Exploring benefits of transfer learning in neural machine translation. in: Computation and Language (cs.CL), pp. 1–150
Candy L, Badal S and Partha P 2021 An improved English-to-Mizo neural machine translation. Transactions on Asian and Low-Resource Language Information Processing 20(4): 1–21
Amarnath P, Partha P and Jereemi B 2019 English-mizo machine translation using neural and statistical approaches. Neural Computing and Applications 31(11): 7615–7631
Sahinur RL, Abdullah FURK, Partha P and Sivaji B 2020 Enascorp1. 0:English-assamese corpus. In: Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pp. 62–68
Salam MS and Thoudam DS 2020 Unsupervised neural machine translation for english and manipuri. In: Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pp. 69–78
NDonald JT and Bipul SP 2021 Low resource neural machine translation from English to Khasi: A transformer based approach. In: Proceedings of the International Conference on Computing and Communication Systems: I3CS 2020, NEHU, Shillong. India, vol. 170, p. 3
Pierre T A 2005 A Grammar of Nyishi Language. Farsight Publishers and Distributers, Delhi, pp 1–134
Mark WP 2015 Tones in northeast indian languages, with a focus on tani: A fieldworker’s guide. In: Language and culture in Northeast India and beyond: In honour of Robbins Burling, pp. 182–210
Moumita D 2018 Negation in Nyishi. NEHU Publication, pp. 80–100
Xinyi W, Yulia T and Graham N 2020 Balancing training for multilingual neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8526–8537
Guillaume L and Alexis C 2019 Cross-lingual language model pertaining. In: NIPS’19: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 7059–7069
Himanshu C, Shivansh R and Rajesh R 2020 Neural machine translation for low-resourced Indian languages. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, European Language Resources Association, pp. 3610–3615
Karthik R, Kaushik T and Shrisha R. 2017 Neural machine translation of Indian languages. In: Proceedings of the 10th Annual ACM India Compute Conference, pp. 11–20
Surafel M L, Matteo N and Marco T 2020 Low resource neural machine translation: A benchmark for five African languages. Africa NLP workshop at ICLR 2020: 1–10
Sree H R and Krishna P S 2018 Neural machine translation for low resource languages using bilingual lexicon induced from comparable corpora. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 112–119
Sukanta S, Mohammed H, Asif E, Pushpak B and Andy W 2021 Neural machine translation of low-resource languages using smt phrase pair injection. Natural Language Engineering 27(3): 271–292
Vikrant G, Sourav K, and Dipti MS 2020 Efficient neural machine translation for low-resource languages via exploiting related languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 162–168
Aizhan I, Takayuki S and Mamoru K 2019 Filtered pseudo-parallel corpus improves low-resource neural machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19(2): 1–16
Kyunghyun C, Bart VM, Caglar G, Dzmitry B, Fethi B, Holger S and Bengio Y 2014 Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734
Ilya S, Oriol V and Quoc V L 2014 Sequence to sequence learning with neural networks. In: NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 3104–3112
Dzmitry B, Kyunghyun C and Yoshua B 2014 Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations ICLR 2015, pp. 1–15
Minh-Thang L, Hieu P and Christopher D M 2015 Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421
Nal K and Phil B 2013 Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1700–1709
Jonas G, Michael A, David G and Yann D 2016 A convolutional encoder model for neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 123–135
Ashish V, Noam S, Niki P, Jakob U, Llion J, Aidan NG, Łukasz K and Illia P 2017 Attention is all you need. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010
Myle O, Michael A, David G and Marc’AR 2018 Analyzing uncertainty in neural machine translation. In: International Conference on Machine Learning, PMLR, pp. 3956–3965
Kakum N and Sambyo K 2022 Phrase-based English-Nyishi machine translation. In: Pattern Recognition and Data Analysis with Applications, Springer Nature Singapore, Singapore, vol. 888, pp. 467–477
Amarnath P and Partha P 2019 Neural machine translation for Indian languages. Journal of Intelligent Systems 28(3): 465–477
Himanshu C, Aditya KP, Rajiv RS and Ponnurangam K 2018 Neural machine translation for English-Tamil. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp. 770–775
Shivkaran S, Anand Kumar M and Soman K P 2018 Attention-based English to Punjabi neural machine translation. Journal of Intelligent & Fuzzy Systems 34(3): 1551–1559
Sahinur RL, Abinash D, Partha P and Sivaji B 2019 Neural machine translation: English to Hindi. In: IEEE Conference on Information and Communication Technology, pp. 1–6
Sahinur RL, Abdullah Faiz Ur RK, Partha P and Sivaji B 2020 Hindi-Marathi cross lingual model. In: Proceedings of the Fifth Conference on Machine Translation, pp. 396–401
Kyunghyun C, Bart VM, Dzmitry B and Yoshua B 2014 On the properties of neural machine translation: Encoder-decoder approaches. in: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111
Guillaume K, Yoon K, Yuntian D, Jean S and Alexander MR 2017 Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp. 67–72
Kishore P, Salim R, Todd W and Wei-Jing Z 2002 Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318
Matthew S, Bonnie D, Richard S, Linnea M and John M 2006 A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp. 223–231
Alon L and Michael J D 2009 The meteor metric for automatic evaluation of machine translation. Machine Translation 23(2): 105–115
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kakum, N., Laskar, S.R., Sambyo, K. et al. Neural machine translation for limited resources English-Nyishi pair. Sādhanā 48, 237 (2023). https://doi.org/10.1007/s12046-023-02308-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-023-02308-8