Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar

Sitender; Bawa, Seema

doi:10.1007/s00530-020-00692-3

Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar

Special Issue Paper
Published: 11 October 2020

Volume 28, pages 2105–2121, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

381 Accesses
5 Citations
Explore all metrics

Abstract

Machine Translation is a mechanism of transforming text from one language to another with the help of computer technology. Earlier in 2018, a machine translation system had been developed by the authors that translate Sanskrit text to Universal Networking Language expressions and was named as SANSUNL. The work presented in this paper is an extension of SANSUNL system by enhancing POS tagging, Sanskrit language processing and parsing. A Sanskrit stemmer having 23 prefixes and 774 suffixes with grammar rules are used for stemming the Sanskrit sentence in the proposed system. Bidirectional long short-term memory (Bi-LSTM) and stacked LSTM deep neural network models have been used for part of speech tagging of the input Sanskrit text. A tagged dataset of around 400 k entries for Sanskrit have been used for training and testing the neural network models. Proposed Sanskrit context-free grammar has been used with CYK parser to perform the parsing of the input sentence. Size of the Sanskrit-Universal Word dictionary has been increased from 15000 to 25000 entries. Approximately 1500 UNL generation rules have been used to resolve the 46 UNL relations. Four datasets UC-A1, UC-A2, Spanish server gold standard dataset, and 500 Sanskrit sentences taken from the general domain have been used for validating the system. The proposed system is evaluated on BLEU and Fluency score metrics and has reported an efficiency of 95.375%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LSTM-Based Model for Sanskrit to English Translation

Machine translation using deep learning for universal networking language based on their structure

Article 27 April 2021

Implementation of Neural Machine Translation for Nahuatl as a Web Platform: A Focus on Text Translation

Article 28 December 2021

Abbreviations

AI:: Artificial intelligence
BiLSTM:: Bi-directional long short-term memory
SLSTM:: Stacked long short-term memory
BLEU:: Bilingual evaluation understudy
CFG:: Context-free grammar
CNF:: Chomsky normal form
CYK:: Cocke–Younger–Kasami
MT:: Machine translation
MTS:: Machine translation system
POS:: Part of speech
TLGR:: Target language generation rule
UNL:: Universal networking Language

References

Sitender, Bawa, Seema: Sansunl: A sanskrit to unl enconverter system. IETE Journal of Research 1–12 (2018)
Uchida, Hiroshi, Zhu, Meiying: The universal networking language beyond machine translation. In: International Symposium on Language in Cyberspace, Seoul, pp. 26–27 (2001)
Hochreiter, Sepp, Schmidhuber, Jürgen: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Cho, Kyunghyun, Van Merriënboer, Bart, Gulcehre, Caglar, Bahdanau, Dzmitry, Bougares, Fethi, Schwenk, Holger, Bengio, Yoshua: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Schuster, Mike, Paliwal, Kuldip K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Graves, Alex, Mohamed, Abdel-rahman, Hinton, Geoffrey: Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649. IEEE (2013)
Lewis, Paul M., Simons, Gary F., Fennig, Charles D.: Ethnologue: languages of ecuador. SIL International, Texas (2015)
Google Scholar
Mallikarjun, B.: Patterns of indian multilingualism. Strength for Today and Bright Hope for Tomorrow Volume 10: 6 June 2010 (2010)
Dorr, Bonnie J., Hovy, Eduard H., Levin, Lori S.: Natural language processing and machine translation encyclopedia of language and linguistics, (ell2). machine translation: Interlingual methods. In: Proceeding International Conference of the World Congress on Engineering (2004)
Singh, Smriti, Dalal, Mrugank, Vachhani, Vishal, Bhattacharyya, Pushpak, Damani, Om P.: Hindi generation from interlingua (unl). Machine Translation Summit XI (2007)
Kumar, Parteek, Sharma, R.K.: Punjabi to unl enconversion system. Sadhana 37(2), 299–318 (2012)
Article Google Scholar
Kumar, Parteek, Sharma, Rajendra Kumar: Punjabi deconverter for generating punjabi from universal networking language. J. Zhejiang Univ. SCIENCE C 14(3), 179–196 (2013)
Article Google Scholar
Dhanabalan, T., Saravanan, K., Geetha, T.V.: Tamil to unl enconverter. In: Proc. Int. Conf. on Universal Knowledge and Language, pp. 1–16 (2002)
Dhanabalan, T., Geetha, T.V.: Unl deconverter for tamil. In: International Conference on the Convergences of Knowledge, Culture, Language and Information Technologies (2003)
Nair, Biji, Rajeev, R.R., Sherly, Elizabeth: Language dependent features for unl-malayalam deconversion. Int. J. Comput. Appl. 975, 8887 (2014)
Google Scholar
Ali, Md, Yousuf, Nawab, Ripon, Shamim, Allayear, Shaikh Muhammad: Unl based bangla natural text conversion-predicate preserving parser approach. arXiv preprint arXiv:1206.0381 (2012)
Choudhury, Md Ershadul H., Ali, Md Nawab Yousuf, Sarkar, Mohammad Zakir Hussain, Ahsan, R.: Bridging bangla to universal networking language-a human language neutral meta-language. In: International Conference on Computer and Information Technology (ICCIT), Dhaka, pp. 104–109 (2005)
Dave, Shachi, Parikh, Jignashu, Bhattacharyya, Pushpak: Interlingua-based english-hindi machine translation and language divergence. Mach. Trans. 16(4), 251–304 (2001)
Article Google Scholar
Jain, Manoj, Damani, Om P.: English to unl (interlingua) enconversion. In: Proc. Second Conference on Language and Technology,(CLT) (2009)
Dey, Kuntal, Bhattacharyya, Pushpak, et al.: Universal networking language based analysis and generation of bengali case structure constructs. Res. Comput. Sci 12, 215–229 (2005)
Google Scholar
Shi, Xiaodong, Chen, Yidong: A Unl Deconverter for Chinese. UNL Book, Lincoln (2005)
Google Scholar
Hung, VO Trung, Fafiotte, Georges: Uvdict-a machine translation dictionary for vietnamese language in unl system. In: 2011 International Conference on Complex, Intelligent, and Software Intensive Systems, pp. 310–314. IEEE (2011)
Sérasset, Gilles, Boitet, Christian: Unl-french deconversion as transfer & generation from an interlingua with possible quality enhancement through offline human interaction. In: MACHINE TRANSLATION SUMMIT VII, pp. 220–228 (1999)
Martins, Ronaldo, Hasegawa, Ricardo, Graças, M., Nunes, V.: Hermeto: a nl-unl enconverting environment. Univ. Netw. Lang 12, 254–260 (2003)
Google Scholar
Dikonov, Viacheslav: English/russian to unl enconverter1. Igor Boguslavsky and Leo Wanner (eds.), p. 48 (2011)
Kalchbrenner, Nal, Blunsom, Phil: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1700–1709 (2013)
Sutskever, Ilya, Vinyals, Oriol, Le, Quoc V.: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp. 3104–3112 (2014)
Cho, Kyunghyun, Van Merriënboer, Bart, Bahdanau, Dzmitry, Bengio, Yoshua: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)
Bahdanau, Dzmitry, Cho, Kyunghyun, Bengio, Yoshua: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Microsoft. Microsoft translator accelerates use of neural networks across its offerings. https://blogs.msdn.microsoft.com/translation/2017/11/15/microsoft-translator-accelerates-use-of-neural-networks-across-its-offerings/, November 2017
Wu, Yonghui, Schuster, Mike, Chen, Zhifeng, Le, Quoc V., Norouzi, Mohammad, Macherey, Wolfgang, Krikun, Maxim, Cao, Yuan, Gao, Qin, Macherey, Klaus, et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Gehring, Jonas, Auli, Michael, Grangier, David, Yarats, Denis, Dauphin, Yann N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)
Gehring, Jonas, Auli, Michael, Grangier, David, Dauphin, Yann N.: A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344, (2016)
Faes, Florian: Amazon and lionbridge share stage to market neural machine translation. https://slator.com/technology/amazon-and-lionbridge-share-stage-to-market-neural-machine-translation/, April 2018
Singh, Shivkaran, Kumar, M.Anand, Soman, K.P.: Attention based english to punjabi neural machine translation. J. Intell. Fuzzy Syst. 34(3), 1551–1559 (2018)
Article Google Scholar
Goyal, Vikrant, Mishra, Pruthwik, Sharma, Dipti Misra: Linguistically informed hindi-english neural machine translation. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 3698–3703 (2020)
Feng, Xiaocheng: Feng, Zhangyin, Zhao, Wanlong, Qin, Bing, Liu, Ting: Enhanced neural machine translation by joint decoding with word and pos-tagging sequences. Mobile Networks and Applications 1–7 (2020)
Dahl, George E., Dong, Yu., Deng, Li, Acero, Alex: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech language Process 20(1), 30–42 (2011)
Article Google Scholar
Hinton, Geoffrey, Deng, Li, Yu, Dong, Dahl, George, Mohamed, Abdel-rahman, Jaitly, Navdeep, Senior, Andrew, Vanhoucke, Vincent, Nguyen, Patrick, Kingsbury, Brian, et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine 29 (2012)
Cireşan, Dan, Meier, Ueli, Schmidhuber, Jürgen: Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745 (2012)
Deshmukh, Rushali Dhumal, Kiwelekar, Arvind: Deep learning techniques for part of speech tagging by natural language processing. In: 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 76–81. IEEE (2020)
Huang, Zhiheng, Xu, Wei, Yu, Kai: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Wang, Peilu, Qian, Yao, Soong, Frank K., He, Lei, Zhao, Hai: A unified tagging solution: bidirectional lstm recurrent neural network with word embedding. arXiv preprint arXiv:1511.00215 (2015)
Zhou, Chunting, Sun, Chonglin, Liu, Zhiyuan, Lau, Francis: A c-lstm neural network for text classification. arXiv preprint arXiv:1511.08630 (2015)
Huang, Po-Yao, Liu, Frederick, Shiang, Sz-Rung, Oh, Jean, Dyer, Chris: Attention-based multimodal neural machine translation. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 639–645 (2016)
Luong, Minh-Thang, Manning, Christopher D.: Stanford neural machine translation systems for spoken language domains. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 76–79 (2015)
Yao, Kaisheng, Cohn, Trevor, Vylomova, Katerina, Duh, Kevin, Dyer, Chris: Depth-gated lstm. arXiv preprint arXiv:1508.03790 (2015)
Mahata, Sainik Kumar, Das, Dipankar, Bandyopadhyay, Sivaji: Mtil 2017: Machine translation using recurrent neural network on statistical machine translation. J. Intell. Syst. 28(3), 447–453 (2019)
Article Google Scholar
Banik, Debajyoty, Ekbal, Asif, Bhattacharyya, Pushpak, Bhattacharyya, Siddhartha: Assembling translations from multi-engine machine translation outputs. Appl. Soft Comput. 78, 230–239 (2019)
Article Google Scholar
Kumar, S., Kumar, M.Anand, Soman, K.P.: Deep learning based part-of-speech tagging for malayalam twitter data (special issue: deep learning techniques for natural language processing). J. Intell. Syst. 28(3), 423–435 (2019)
Article Google Scholar
Akhil, K.K., Rajimol, R., Anoop, V.S.: Parts-of-speech tagging for malayalam using deep learning techniques. Int. J. Inform. Technol. 1–8 (2020)
Gopalakrishnan, Athira, Soman, K.P., Premjith, B.: Part-of-speech tagger for biomedical domain using deep neural network architecture. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5. IEEE (2019)
Wang, Shirui, Zhou, Wenan, Jiang, Chao: A survey of word embeddings based on deep learning. Computing 102(3), 717–740 (2020)
Article MathSciNet MATH Google Scholar
Knapp, Steven K.: Accelerate fpga macros with one-hot approach. Electron. Design 38(17), 71–78 (1990)
Google Scholar
Mikolov, Tomas, Chen, Kai, Corrado, Greg, Dean, Jeffrey: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg S., Dean, Jeff: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119 (2013)
Pennington, Jeffrey, Socher, Richard, Manning, Christopher: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar, October 2014. Association for Computational Linguistics
Joulin, Armand, Grave, Edouard, Bojanowski, Piotr, Mikolov, Tomas: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Peters, Matthew E., Neumann, Mark, Iyyer, Mohit, Gardner, Matt, Clark, Christopher, Lee, Kenton, Zettlemoyer, Luke: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Jakob, Jones, Llion, Gomez, Aidan N., Kaiser, Łukasz, Polosukhin, Illia: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)
Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, Toutanova, Kristina: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Zheng, Xiaoqing, Chen, Hanyang, Xu, Tianyu: Deep learning for chinese word segmentation and pos tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647–657 (2013)
Ng, Hwee Tou, Low, Jin Kiat: Chinese part-of-speech tagging: One-at-a-time or all-at-once? word-based or character-based? In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 277–284 (2004)
Plank, Barbara, Søgaard, Anders, Goldberg, Yoav: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv preprint arXiv:1604.05529 (2016)
Labeau, Matthieu, Löser, Kevin, Allauzen, Alexandre: Non-lexical neural architecture for fine-grained pos tagging. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 232–237 (2015)
Ballesteros, Miguel, Dyer, Chris, Smith, Noah A.: Improved transition-based parsing by modeling characters instead of words with lstms. arXiv preprint arXiv:1508.00657 (2015)
Ling, Wang, Luís, Tiago, Marujo, Luís, Astudillo, Ramón Fernandez, Amir, Silvio, Dyer, Chris, Black, Alan W., Trancoso, Isabel: Finding function in form: Compositional character models for open vocabulary word representation. arXiv preprint arXiv:1508.02096 (2015)
Ling, Wang, Trancoso, Isabel, Dyer, Chris, Black, Alan W.: Character-based neural machine translation. arXiv preprint arXiv:1511.04586 (2015)
Costa-Jussa, Marta R., Fonollosa, José A.R.: Character-based neural machine translation. arXiv preprint arXiv:1603.00810 (2016)
Bhatia, M.P.S., Sangwan, Saurabh Raj: Debunking online reputation rumours using hybrid of lexicon-based and machine learning techniques. In: Proceedings of First International Conference on Computing, Communications, and Cyber-Security (IC4S 2019), pp. 317–327. Springer (2020)
Sangwan, Saurabh Raj, Bhatia, M.P.S.: D-bullyrumbler: a safety rumble strip to resolve online denigration bullying using a hybrid filter-wrapper approach. Multimedia Systems 1–17 (2020)
Sharma, Kapil, Bala, Manju, et al.: An ecological space based hybrid swarm-evolutionary algorithm for software reliability model parameter estimation. Int. J. Syst. Assurance Eng. Manag. 11(1), 77–92 (2020)
Article Google Scholar
Gopal, Madhav, Mishra, Diwakar, Singh, Devi Priyanka: Evaluating tagsets for sanskrit. In: Sanskrit Computational Linguistics, pp. 150–161. Springer (2010)
Gopal, Madhav, Jha, Girish Nath: Tagging sanskrit corpus using bis pos tagset. In: Information Systems for Indian Languages, pp. 191–194. Springer (2011)
Gopal, Madhav, Jha, Grish Nath: Indian language part of speech tagger (il-post), 2007. http://sanskrit.jnu.ac.in/corpora/tagset.jsp
Chandershekhar, R., Jha, Girish Nath: Part-of-Speech Tagging for Sanskrit. PhD thesis, Special Centre for Sanskrit Studies, JNU Delhi, http://sanskrit.jnu.ac.in/corpora/JNU-Sanskrit-Tagset.htm (2007)
Sarkar, Sandipan, Bandyopadhyay, Sivaji: Design of a rule-based stemmer for natural language text in bengali. In: Proceedings of the IJCNLP-08 workshop on NLP for Less Privileged Languages (2008)
Zhang, Weinan, Du, Tianming, Wang, Jun: Deep learning over multi-field categorical data. In: European conference on information retrieval, pp. 45–57. Springer (2016)
Chappelier, Jean-Cédric, Rajman, Martin, et al.: A generalized cyk algorithm for parsing stochastic cfg. TAPD 98(133–137), 5 (1998)
Google Scholar
Younger, Daniel H.: Recognition and parsing of context-free languages in time n3. Inform. Control 10(2), 189–208 (1967)
Article MATH Google Scholar
Li, Te, Alagappan, Devi: a comparison of cyk and earley parsing algorithms. icar.cnr.it (2006)
Chen, Xinxiong, Xu, Lei, Liu, Zhiyuan, Sun, Maosong, Luan, Huanbo: Joint learning of character and word embeddings. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)

Download references

Author information

Authors and Affiliations

Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, India
Sitender & Seema Bawa
MSIT, New Delhi, 110058, India
Sitender

Authors

Sitender
View author publications
You can also search for this author in PubMed Google Scholar
Seema Bawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sitender.

Ethics declarations

conflict of interest

The authors have no conflicts of interest to disclose. This article does not contain any studies with animals performed by any of the authors. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sitender, Bawa, S. Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar. Multimedia Systems 28, 2105–2121 (2022). https://doi.org/10.1007/s00530-020-00692-3

Download citation

Published: 11 October 2020
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00530-020-00692-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar

Abstract

Access this article

Similar content being viewed by others

LSTM-Based Model for Sanskrit to English Translation

Machine translation using deep learning for universal networking language based on their structure

Implementation of Neural Machine Translation for Nahuatl as a Web Platform: A Focus on Text Translation

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar

Abstract

Access this article

Similar content being viewed by others

LSTM-Based Model for Sanskrit to English Translation

Machine translation using deep learning for universal networking language based on their structure

Implementation of Neural Machine Translation for Nahuatl as a Web Platform: A Focus on Text Translation

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation