Skip to main content
Log in

Neural machine translation for limited resources English-Nyishi pair

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Neural machine translation handles sequential data over the variable length of input and output sentences and accomplishes a state-of-the-art method for the task of machine translation. Although the neural machine translation shows good performance in both low and high-resource language pairs translation, it requires adequate parallel training data. In low-resource language sets, the preparation of the corpus is strenuous and time-consuming. Automatic translation systems like Google and Bing cover under-resourced Indian languages, but lack the support of the Nyishi language. It is due to the lack of a suitable dataset. In this work, we have contributed a parallel corpus of low-resource language pairs, English-Nyishi, and reported comparative experiments on the baseline neural machine translation systems. The results are evaluated for English to Nyishi and vice-versa via well-known automatic evaluation metrics and manual evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Regular Paper https://censusindia.gov.in/nada/index.php/catalog/42458/download/46089/C-16_25062018.pdf

  2. https://bit.ly/36pOsyO

  3. Regular Paper https://www.bing.com/translator

  4. https://translate.google.co.in/

  5. Regular Paper https://www.bible.com/

  6. https://scrapy.org/

  7. Regular Paper https://github.com/OpenNMT/OpenNMT-py

References

  1. Karine M and Dan P 2008 Low-density language bootstrapping: the case of Tajiki Persian. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), pp. 3293–3298

  2. Katharina P, Ralf D B, Jaime G C, Alon L, Lori L and Erik P 2001 Design and implementation of controlled elicitation for machine translation of low-density languages. In: Workshop on MT2010: Towards a Road Map for MT

  3. Jiatao G, Hany H, Jacob D and Victor O L 2018 Universal neural machine translation for extremely low resource languages. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 344–354

  4. Tom K 2020 Exploring benefits of transfer learning in neural machine translation. in: Computation and Language (cs.CL), pp. 1–150

  5. Candy L, Badal S and Partha P 2021 An improved English-to-Mizo neural machine translation. Transactions on Asian and Low-Resource Language Information Processing 20(4): 1–21

    Article  Google Scholar 

  6. Amarnath P, Partha P and Jereemi B 2019 English-mizo machine translation using neural and statistical approaches. Neural Computing and Applications 31(11): 7615–7631

    Article  Google Scholar 

  7. Sahinur RL, Abdullah FURK, Partha P and Sivaji B 2020 Enascorp1. 0:English-assamese corpus. In: Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pp. 62–68

  8. Salam MS and Thoudam DS 2020 Unsupervised neural machine translation for english and manipuri. In: Proceedings of the 3rd Workshop on Technologies for MT of Low Resource Languages, pp. 69–78

  9. NDonald JT and Bipul SP 2021 Low resource neural machine translation from English to Khasi: A transformer based approach. In: Proceedings of the International Conference on Computing and Communication Systems: I3CS 2020, NEHU, Shillong. India, vol. 170, p. 3

  10. Pierre T A 2005 A Grammar of Nyishi Language. Farsight Publishers and Distributers, Delhi, pp 1–134

    Google Scholar 

  11. Mark WP 2015 Tones in northeast indian languages, with a focus on tani: A fieldworker’s guide. In: Language and culture in Northeast India and beyond: In honour of Robbins Burling, pp. 182–210

  12. Moumita D 2018 Negation in Nyishi. NEHU Publication, pp. 80–100

  13. Xinyi W, Yulia T and Graham N 2020 Balancing training for multilingual neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8526–8537

  14. Guillaume L and Alexis C 2019 Cross-lingual language model pertaining. In: NIPS’19: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 7059–7069

  15. Himanshu C, Shivansh R and Rajesh R 2020 Neural machine translation for low-resourced Indian languages. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, European Language Resources Association, pp. 3610–3615

  16. Karthik R, Kaushik T and Shrisha R. 2017 Neural machine translation of Indian languages. In: Proceedings of the 10th Annual ACM India Compute Conference, pp. 11–20

  17. Surafel M L, Matteo N and Marco T 2020 Low resource neural machine translation: A benchmark for five African languages. Africa NLP workshop at ICLR 2020: 1–10

    Google Scholar 

  18. Sree H R and Krishna P S 2018 Neural machine translation for low resource languages using bilingual lexicon induced from comparable corpora. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 112–119

  19. Sukanta S, Mohammed H, Asif E, Pushpak B and Andy W 2021 Neural machine translation of low-resource languages using smt phrase pair injection. Natural Language Engineering 27(3): 271–292

    Article  Google Scholar 

  20. Vikrant G, Sourav K, and Dipti MS 2020 Efficient neural machine translation for low-resource languages via exploiting related languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 162–168

  21. Aizhan I, Takayuki S and Mamoru K 2019 Filtered pseudo-parallel corpus improves low-resource neural machine translation. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19(2): 1–16

    Google Scholar 

  22. Kyunghyun C, Bart VM, Caglar G, Dzmitry B, Fethi B, Holger S and Bengio Y 2014 Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734

  23. Ilya S, Oriol V and Quoc V L 2014 Sequence to sequence learning with neural networks. In: NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 3104–3112

  24. Dzmitry B, Kyunghyun C and Yoshua B 2014 Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations ICLR 2015, pp. 1–15

  25. Minh-Thang L, Hieu P and Christopher D M 2015 Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421

  26. Nal K and Phil B 2013 Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1700–1709

  27. Jonas G, Michael A, David G and Yann D 2016 A convolutional encoder model for neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 123–135

  28. Ashish V, Noam S, Niki P, Jakob U, Llion J, Aidan NG, Łukasz K and Illia P 2017 Attention is all you need. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010

  29. Myle O, Michael A, David G and Marc’AR 2018 Analyzing uncertainty in neural machine translation. In: International Conference on Machine Learning, PMLR, pp. 3956–3965

  30. Kakum N and Sambyo K 2022 Phrase-based English-Nyishi machine translation. In: Pattern Recognition and Data Analysis with Applications, Springer Nature Singapore, Singapore, vol. 888, pp. 467–477

  31. Amarnath P and Partha P 2019 Neural machine translation for Indian languages. Journal of Intelligent Systems 28(3): 465–477

    Article  Google Scholar 

  32. Himanshu C, Aditya KP, Rajiv RS and Ponnurangam K 2018 Neural machine translation for English-Tamil. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pp. 770–775

  33. Shivkaran S, Anand Kumar M and Soman K P 2018 Attention-based English to Punjabi neural machine translation. Journal of Intelligent & Fuzzy Systems 34(3): 1551–1559

    Article  Google Scholar 

  34. Sahinur RL, Abinash D, Partha P and Sivaji B 2019 Neural machine translation: English to Hindi. In: IEEE Conference on Information and Communication Technology, pp. 1–6

  35. Sahinur RL, Abdullah Faiz Ur RK, Partha P and Sivaji B 2020 Hindi-Marathi cross lingual model. In: Proceedings of the Fifth Conference on Machine Translation, pp. 396–401

  36. Kyunghyun C, Bart VM, Dzmitry B and Yoshua B 2014 On the properties of neural machine translation: Encoder-decoder approaches. in: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111

  37. Guillaume K, Yoon K, Yuntian D, Jean S and Alexander MR 2017 Opennmt: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp. 67–72

  38. Kishore P, Salim R, Todd W and Wei-Jing Z 2002 Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311–318

  39. Matthew S, Bonnie D, Richard S, Linnea M and John M 2006 A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, pp. 223–231

  40. Alon L and Michael J D 2009 The meteor metric for automatic evaluation of machine translation. Machine Translation 23(2): 105–115

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Partha Pakray.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kakum, N., Laskar, S.R., Sambyo, K. et al. Neural machine translation for limited resources English-Nyishi pair. Sādhanā 48, 237 (2023). https://doi.org/10.1007/s12046-023-02308-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-023-02308-8

Keywords

Navigation