A Hybrid Machine Transliteration Model Based on Multi-source Encoder–Decoder Framework: English to Manipuri

Laitonjam, Lenin; Singh, Sanasam Ranbir

doi:10.1007/s42979-021-01005-9

A Hybrid Machine Transliteration Model Based on Multi-source Encoder–Decoder Framework: English to Manipuri

Original Research
Published: 11 January 2022

Volume 3, article number 125, (2022)
Cite this article

SN Computer Science Aims and scope Submit manuscript

127 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we propose a neural hybrid machine transliteration model that captures the characteristics of both grapheme and phoneme representations. Unlike previous hybrid models that rely on linear interpolation or statistical correspondence of grapheme and phoneme sequences, the proposed model is based on the popular neural encoder–decoder-based transliteration model. We strengthen the traditional encoder–decoder transliteration models to multi-source framework to take advantage of both grapheme and phoneme sequences. This study investigates the responses of various encoder–decoder neural models integrated with the proposed hybrid model. The performances of the proposed models are investigated on English to Manipuri transliteration task for named entities and loanwords. Manipuri is a low-resource language spoken in the state of Manipur situated in the northeastern part of India. From various experimental observations, it is evident that the proposed framework can effectively combine the grapheme and phoneme sequences of the source word, and it significantly outperforms its phoneme and grapheme counterparts. We further investigate the performance of the proposed models over four other language pairs and also observe similar improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A bilingual speech neuroprosthesis driven by cortical articulatory representations shared between languages

Article 20 May 2024

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Article 27 November 2023

Transformer models for text-based emotion detection: a review of BERT-based approaches

Article 08 February 2021

Notes

A word adopted from a foreign language with little or no modification.
https://en.wikipedia.org/wiki/Meitei_language.
https://en.wikipedia.org/wiki/Meitei_language.
http://www.speech.cs.cmu.edu/cgi-bin/cmudict.
Transliteration of a word from its original language to a foreign language
https://github.com/cmusphinx/g2p-seq2seq.
https://github.com/AdolfVonKleist/Phonetisaurus.
https://www.tensorflow.org/tutorials/text/nmt_with_attention.
https://www.tensorflow.org/tutorials/text/transformer .
http://workshop.colips.org/news2018/.

References

Hermjakob U, Knight K, Daumé III H. Name translation in statistical machine translation-learning when to transliterate. In: Proceedings of ACL-08: HLT. 2008. pp 389–397.
Virga P, Khudanpur S. Transliteration of proper names in cross-lingual information retrieval. In: Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition. 2003. pp 57–64.
Karimi S, Scholer F, Turpin A. Machine transliteration survey. ACM Comput Surv. 2011;43(3):1–46.
Article Google Scholar
Kang B-J, Choi K-S. Automatic Transliteration and Back-transliteration by Decision Tree Learning. In: LREC. 2000. Citeseer.
Knight K, Graehl J. Machine Transliteration. In: 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics 1997. pp. 128–135.
Oh J-H, Choi K-S, Isahara H. A machine transliteration model based on correspondence between graphemes and phonemes. ACM Trans Asian Lang Inf Process. 2006;5(3):185–208.
Article Google Scholar
Ngo GH, Nguyen M, Chen NF. Phonology-augmented statistical framework for machine transliteration using limited linguistic resources. IEEE/ACM Trans Audio Speech Lang Process. 2018;27(1):199–211.
Article Google Scholar
Bilac S, Tanaka H. Direct combination of spelling and pronunciation information for robust back-transliteration. In: International Conference on Intelligent Text Processing and Computational Linguistics. Springer; 2005. p. 413–24.
Google Scholar
Chen N, Banchs RE, Zhang M, Duan X, Li H. Report of news 2018 named entity transliteration shared task. In: Proceedings of the seventh named entities workshop. 2018; pp 55-73.
Le NT, Sadat F Low-resource machine transliteration using recurrent neural networks of asian languages. In: Proceedings of the Seventh Named Entities Workshop, 2018; pp 95–100.
Le NT, Sadat F, Menard L, Dinh D. Low-resource machine transliteration using recurrent neural networks. ACM Trans Asian Low-Resour Lang Inf Process. 2019;18(2):1–14.
Article Google Scholar
Grundkiewicz R, Heafield K Neural machine translation techniques for named entity transliteration. In: Proceedings of the Seventh Named Entities Workshop, 2018; pp 89–94.
Zoph, B., & Knight, K. Multi-Source Neural Translation. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016; pp. 30–34.
Divay M, Vitale AJ. Algorithms for grapheme-phoneme translation for English and French: applications for database searches and speech synthesis. Comput Linguist. 1997;23(4):495–523.
Google Scholar
Laitonjam L, Singh LG, Singh SR Transliteration of English Loanwords and Named-Entities to Manipuri: Phoneme vs Grapheme Representation. In: 2018 International Conference on Asian Language Processing (IALP), 2018. IEEE, pp 255–260
Finch A, Sumita E Transliteration using a phrase-based statistical machine translation system to re-score the output of a joint multigram model. In: Proceedings of the 2010 Named Entities Workshop, 2010. pp 48–52.
Rama T, Gali K Modeling machine transliteration as a phrase based statistical machine translation problem. In: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009), 2009. pp 124–127.
Oh J, Choi K, Isahara H. A comparison of different machine transliteration models. J Artif Intell Res. 2006;27:119–51.
Article Google Scholar
Barros MJ, Weiss C. Maximum entropy motivated grapheme-to-phoneme, stress and syllable boundary prediction for Portuguese text-to-speech. Spain: IV Jornadas en Tecnologías del Habla Zaragoza; 2006. p. 177–82.
Google Scholar
Jiampojamarn S, Kondrak G, Sherif T. Applying many-to-many alignments and hidden markov models to letter-to-phoneme conversion. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, 2007. pp 372–379.
Bisani M, Ney H. Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 2008;50(5):434–51.
Article Google Scholar
Li H, Zhang M, Su J A joint source-channel model for machine transliteration. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), 2004. pp 159–166.
Nicolai G, Hauer B, Salameh M, St Arnaud A, Xu Y, Yao L, Kondrak G Multiple system combination for transliteration. In: Proceedings of the Fifth Named Entity Workshop, 2015. pp 72–77.
Meng HM, Lo W-K, Chen B, Tang K Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In: IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU'01., 2001. IEEE, pp 311–314.
Jung SY, Hong S, Paek E An English to Korean transliteration model of extended Markov window. In: COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics, 2000.
Koehn P, Och FJ, Marcu D. Statistical phrase-based translation. University of Southern California Marina Del Rey Information Science Inst; 2003.
Book Google Scholar
Raganato, A., & Tiedemann, J. An Analysis of Encoder Representations in Transformer-Based Machine Translation. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, 2018. pp. 287–297.
Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014. pp. 1724–1734.
Bahdanau, D., Cho, K. H., & Bengio, Y. Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I Attention is all you need. In: Advances in neural information processing systems, 2017. pp 5998–6008.
Rosca M, Breuel T. Sequence-to-sequence neural network models for transliteration. arXiv preprint arXiv:161009565. 2016.
Kundu S, Paul S, Pal S A deep learning based approach to transliteration. In: Proceedings of the seventh named entities workshop, 2018. pp 79–83.
Merhav Y, Ash S Design challenges in named entity transliteration. In: Proceedings of the 27th international conference on computational linguistics, 2018. pp 630–640.
Alam M, Hussain ul S. Deep learning-based Roman-Urdu to Urdu transliteration. Int J Pattern Recogn Artif Intell. 2020;35:2152001.
Article Google Scholar
Singhania S, Nguyen M, Ngo GH, Chen N Statistical machine transliteration baselines for news 2018. In: Proceedings of the Seventh Named Entities Workshop, 2018. pp 74–78.
Al-Onaizan Y, Knight K Machine transliteration of names in Arabic texts. In: Proceedings of the ACL-02 workshop on Computational approaches to semitic languages, 2002.
Al-Onaizan Y, Knight K Translating named entities using monolingual and bilingual resources. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 2002. pp 400–408.
Hong G, Kim M-J, Lee D-G, Rim HC A hybrid approach to english-korean name transliteration. In: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009), 2009. pp 108–111.
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, 2007. Association for Computational Linguistics, pp 177–180.
Abbas MR, Asif DKH. Punjabi to ISO 15919 and Roman transliteration with phonetic rectification. ACM Trans Asian Low-Resour Lang Inf Process. 2020;19(2):1–20.
Article Google Scholar
Singh LS, Thaoroijam K, Das PK. Written Manipuri (Meiteiron) form phoneme to grapheme. Lang India. 2007;7:6.
Google Scholar
Saikia R, Singh SR Generating Manipuri–English pronunciation dictionary using sequence labelling problem. In: 2016 International Conference on Asian Language Processing (IALP), 2016. IEEE, pp 67–70.
Cho, K., van Merriënboer, B., Bahdanau, D., & Bengio, Y. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. 2014. pp. 103–111.
Libovický, J., Helcl, J., & Mareček, D. Input Combination Strategies for Multi-Source Transformer Decoder. In; EMNLP 2018 Third Conference on Machine Translation (WMT18). pp. 253–260. Association for Computational Linguistics. 2018.
Nishimura Y, Sudoh K, Neubig G, Nakamura S. Multi-source neural machine translation with missing data. IEEE/ACM Trans Audio Speech Lang Process. 2019;28:569–80.
Article Google Scholar
Libovický, J., & Helcl, J. Attention Strategies for Multi-Source Sequence-to-Sequence Learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017; Volume 2: Short Papers. pp. 196–202.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Commun ACM. 2017;60(6):84–90.
Article Google Scholar
Luong, T., Pham, H., & Manning, C. D. Effective Approaches to Attention-based Neural Machine Translation. In EMNLP. 2015.
Levenshtein VI. Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Doklady. 1996;8:707–10.
MathSciNet Google Scholar
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Article Google Scholar
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning, December 2014.
Kingma, Diederik P., and Jimmy Ba. Adam: A Method for Stochastic Optimization. ICLR (Poster). 2015.
Tang, G., Cap, F., Pettersson, E., & Nivre, J. An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization. In: Proceedings of the 27th International Conference on Computational Linguistics. 2018. pp. 1320–1331.
Ezen-Can A. A Comparison of LSTM and BERT for Small Corpus. arXiv preprint arXiv:200905451. 2020.
Tang, G., Müller, M., Gonzales, A. R., & Sennrich, R. (2018). Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. pp. 4263–4272.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Assam, India
Lenin Laitonjam & Sanasam Ranbir Singh
Department of Computer Science and Engineering, National Institute of Technology Mizoram, Aizawl, India
Lenin Laitonjam

Authors

Lenin Laitonjam
View author publications
You can also search for this author in PubMed Google Scholar
Sanasam Ranbir Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lenin Laitonjam.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Laitonjam, L., Singh, S.R. A Hybrid Machine Transliteration Model Based on Multi-source Encoder–Decoder Framework: English to Manipuri. SN COMPUT. SCI. 3, 125 (2022). https://doi.org/10.1007/s42979-021-01005-9

Download citation

Received: 29 August 2021
Accepted: 22 December 2021
Published: 11 January 2022
DOI: https://doi.org/10.1007/s42979-021-01005-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hybrid Machine Transliteration Model Based on Multi-source Encoder–Decoder Framework: English to Manipuri

Abstract

Access this article

Similar content being viewed by others

A bilingual speech neuroprosthesis driven by cortical articulatory representations shared between languages

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Transformer models for text-based emotion detection: a review of BERT-based approaches

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Hybrid Machine Transliteration Model Based on Multi-source Encoder–Decoder Framework: English to Manipuri

Abstract

Access this article

Similar content being viewed by others

A bilingual speech neuroprosthesis driven by cortical articulatory representations shared between languages

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

Transformer models for text-based emotion detection: a review of BERT-based approaches

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation