Skip to main content
Log in

Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Machine Translation is a mechanism of transforming text from one language to another with the help of computer technology. Earlier in 2018, a machine translation system had been developed by the authors that translate Sanskrit text to Universal Networking Language expressions and was named as SANSUNL. The work presented in this paper is an extension of SANSUNL system by enhancing POS tagging, Sanskrit language processing and parsing. A Sanskrit stemmer having 23 prefixes and 774 suffixes with grammar rules are used for stemming the Sanskrit sentence in the proposed system. Bidirectional long short-term memory (Bi-LSTM) and stacked LSTM deep neural network models have been used for part of speech tagging of the input Sanskrit text. A tagged dataset of around 400 k entries for Sanskrit have been used for training and testing the neural network models. Proposed Sanskrit context-free grammar has been used with CYK parser to perform the parsing of the input sentence. Size of the Sanskrit-Universal Word dictionary has been increased from 15000 to 25000 entries. Approximately 1500 UNL generation rules have been used to resolve the 46 UNL relations. Four datasets UC-A1, UC-A2, Spanish server gold standard dataset, and 500 Sanskrit sentences taken from the general domain have been used for validating the system. The proposed system is evaluated on BLEU and Fluency score metrics and has reported an efficiency of 95.375%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Abbreviations

AI:

Artificial intelligence

BiLSTM:

Bi-directional long short-term memory

SLSTM:

Stacked long short-term memory

BLEU:

Bilingual evaluation understudy

CFG:

Context-free grammar

CNF:

Chomsky normal form

CYK:

Cocke–Younger–Kasami

MT:

Machine translation

MTS:

Machine translation system

POS:

Part of speech

TLGR:

Target language generation rule

UNL:

Universal networking Language

References

  1. Sitender, Bawa, Seema: Sansunl: A sanskrit to unl enconverter system. IETE Journal of Research 1–12 (2018)

  2. Uchida, Hiroshi, Zhu, Meiying: The universal networking language beyond machine translation. In: International Symposium on Language in Cyberspace, Seoul, pp. 26–27 (2001)

  3. Hochreiter, Sepp, Schmidhuber, Jürgen: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  4. Cho, Kyunghyun, Van Merriënboer, Bart, Gulcehre, Caglar, Bahdanau, Dzmitry, Bougares, Fethi, Schwenk, Holger, Bengio, Yoshua: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)

  5. Schuster, Mike, Paliwal, Kuldip K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  6. Graves, Alex, Mohamed, Abdel-rahman, Hinton, Geoffrey: Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649. IEEE (2013)

  7. Lewis, Paul M., Simons, Gary F., Fennig, Charles D.: Ethnologue: languages of ecuador. SIL International, Texas (2015)

    Google Scholar 

  8. Mallikarjun, B.: Patterns of indian multilingualism. Strength for Today and Bright Hope for Tomorrow Volume 10: 6 June 2010 (2010)

  9. Dorr, Bonnie J., Hovy, Eduard H., Levin, Lori S.: Natural language processing and machine translation encyclopedia of language and linguistics, (ell2). machine translation: Interlingual methods. In: Proceeding International Conference of the World Congress on Engineering (2004)

  10. Singh, Smriti, Dalal, Mrugank, Vachhani, Vishal, Bhattacharyya, Pushpak, Damani, Om P.: Hindi generation from interlingua (unl). Machine Translation Summit XI (2007)

  11. Kumar, Parteek, Sharma, R.K.: Punjabi to unl enconversion system. Sadhana 37(2), 299–318 (2012)

    Article  Google Scholar 

  12. Kumar, Parteek, Sharma, Rajendra Kumar: Punjabi deconverter for generating punjabi from universal networking language. J. Zhejiang Univ. SCIENCE C 14(3), 179–196 (2013)

    Article  Google Scholar 

  13. Dhanabalan, T., Saravanan, K., Geetha, T.V.: Tamil to unl enconverter. In: Proc. Int. Conf. on Universal Knowledge and Language, pp. 1–16 (2002)

  14. Dhanabalan, T., Geetha, T.V.: Unl deconverter for tamil. In: International Conference on the Convergences of Knowledge, Culture, Language and Information Technologies (2003)

  15. Nair, Biji, Rajeev, R.R., Sherly, Elizabeth: Language dependent features for unl-malayalam deconversion. Int. J. Comput. Appl. 975, 8887 (2014)

    Google Scholar 

  16. Ali, Md, Yousuf, Nawab, Ripon, Shamim, Allayear, Shaikh Muhammad: Unl based bangla natural text conversion-predicate preserving parser approach. arXiv preprint arXiv:1206.0381 (2012)

  17. Choudhury, Md Ershadul H., Ali, Md Nawab Yousuf, Sarkar, Mohammad Zakir Hussain, Ahsan, R.: Bridging bangla to universal networking language-a human language neutral meta-language. In: International Conference on Computer and Information Technology (ICCIT), Dhaka, pp. 104–109 (2005)

  18. Dave, Shachi, Parikh, Jignashu, Bhattacharyya, Pushpak: Interlingua-based english-hindi machine translation and language divergence. Mach. Trans. 16(4), 251–304 (2001)

    Article  Google Scholar 

  19. Jain, Manoj, Damani, Om P.: English to unl (interlingua) enconversion. In: Proc. Second Conference on Language and Technology,(CLT) (2009)

  20. Dey, Kuntal, Bhattacharyya, Pushpak, et al.: Universal networking language based analysis and generation of bengali case structure constructs. Res. Comput. Sci 12, 215–229 (2005)

    Google Scholar 

  21. Shi, Xiaodong, Chen, Yidong: A Unl Deconverter for Chinese. UNL Book, Lincoln (2005)

    Google Scholar 

  22. Hung, VO Trung, Fafiotte, Georges: Uvdict-a machine translation dictionary for vietnamese language in unl system. In: 2011 International Conference on Complex, Intelligent, and Software Intensive Systems, pp. 310–314. IEEE (2011)

  23. Sérasset, Gilles, Boitet, Christian: Unl-french deconversion as transfer & generation from an interlingua with possible quality enhancement through offline human interaction. In: MACHINE TRANSLATION SUMMIT VII, pp. 220–228 (1999)

  24. Martins, Ronaldo, Hasegawa, Ricardo, Graças, M., Nunes, V.: Hermeto: a nl-unl enconverting environment. Univ. Netw. Lang 12, 254–260 (2003)

    Google Scholar 

  25. Dikonov, Viacheslav: English/russian to unl enconverter1. Igor Boguslavsky and Leo Wanner (eds.), p. 48 (2011)

  26. Kalchbrenner, Nal, Blunsom, Phil: Recurrent continuous translation models. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1700–1709 (2013)

  27. Sutskever, Ilya, Vinyals, Oriol, Le, Quoc V.: Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp. 3104–3112 (2014)

  28. Cho, Kyunghyun, Van Merriënboer, Bart, Bahdanau, Dzmitry, Bengio, Yoshua: On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)

  29. Bahdanau, Dzmitry, Cho, Kyunghyun, Bengio, Yoshua: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  30. Microsoft. Microsoft translator accelerates use of neural networks across its offerings. https://blogs.msdn.microsoft.com/translation/2017/11/15/microsoft-translator-accelerates-use-of-neural-networks-across-its-offerings/, November 2017

  31. Wu, Yonghui, Schuster, Mike, Chen, Zhifeng, Le, Quoc V., Norouzi, Mohammad, Macherey, Wolfgang, Krikun, Maxim, Cao, Yuan, Gao, Qin, Macherey, Klaus, et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

  32. Gehring, Jonas, Auli, Michael, Grangier, David, Yarats, Denis, Dauphin, Yann N.: Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122 (2017)

  33. Gehring, Jonas, Auli, Michael, Grangier, David, Dauphin, Yann N.: A convolutional encoder model for neural machine translation. arXiv preprint arXiv:1611.02344, (2016)

  34. Faes, Florian: Amazon and lionbridge share stage to market neural machine translation. https://slator.com/technology/amazon-and-lionbridge-share-stage-to-market-neural-machine-translation/, April 2018

  35. Singh, Shivkaran, Kumar, M.Anand, Soman, K.P.: Attention based english to punjabi neural machine translation. J. Intell. Fuzzy Syst. 34(3), 1551–1559 (2018)

    Article  Google Scholar 

  36. Goyal, Vikrant, Mishra, Pruthwik, Sharma, Dipti Misra: Linguistically informed hindi-english neural machine translation. In: Proceedings of The 12th Language Resources and Evaluation Conference, pp. 3698–3703 (2020)

  37. Feng, Xiaocheng: Feng, Zhangyin, Zhao, Wanlong, Qin, Bing, Liu, Ting: Enhanced neural machine translation by joint decoding with word and pos-tagging sequences. Mobile Networks and Applications 1–7 (2020)

  38. Dahl, George E., Dong, Yu., Deng, Li, Acero, Alex: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech language Process 20(1), 30–42 (2011)

    Article  Google Scholar 

  39. Hinton, Geoffrey, Deng, Li, Yu, Dong, Dahl, George, Mohamed, Abdel-rahman, Jaitly, Navdeep, Senior, Andrew, Vanhoucke, Vincent, Nguyen, Patrick, Kingsbury, Brian, et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal processing magazine 29 (2012)

  40. Cireşan, Dan, Meier, Ueli, Schmidhuber, Jürgen: Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745 (2012)

  41. Deshmukh, Rushali Dhumal, Kiwelekar, Arvind: Deep learning techniques for part of speech tagging by natural language processing. In: 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 76–81. IEEE (2020)

  42. Huang, Zhiheng, Xu, Wei, Yu, Kai: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)

  43. Wang, Peilu, Qian, Yao, Soong, Frank K., He, Lei, Zhao, Hai: A unified tagging solution: bidirectional lstm recurrent neural network with word embedding. arXiv preprint arXiv:1511.00215 (2015)

  44. Zhou, Chunting, Sun, Chonglin, Liu, Zhiyuan, Lau, Francis: A c-lstm neural network for text classification. arXiv preprint arXiv:1511.08630 (2015)

  45. Huang, Po-Yao, Liu, Frederick, Shiang, Sz-Rung, Oh, Jean, Dyer, Chris: Attention-based multimodal neural machine translation. In: Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers, pp. 639–645 (2016)

  46. Luong, Minh-Thang, Manning, Christopher D.: Stanford neural machine translation systems for spoken language domains. In: Proceedings of the International Workshop on Spoken Language Translation, pp. 76–79 (2015)

  47. Yao, Kaisheng, Cohn, Trevor, Vylomova, Katerina, Duh, Kevin, Dyer, Chris: Depth-gated lstm. arXiv preprint arXiv:1508.03790 (2015)

  48. Mahata, Sainik Kumar, Das, Dipankar, Bandyopadhyay, Sivaji: Mtil 2017: Machine translation using recurrent neural network on statistical machine translation. J. Intell. Syst. 28(3), 447–453 (2019)

    Article  Google Scholar 

  49. Banik, Debajyoty, Ekbal, Asif, Bhattacharyya, Pushpak, Bhattacharyya, Siddhartha: Assembling translations from multi-engine machine translation outputs. Appl. Soft Comput. 78, 230–239 (2019)

    Article  Google Scholar 

  50. Kumar, S., Kumar, M.Anand, Soman, K.P.: Deep learning based part-of-speech tagging for malayalam twitter data (special issue: deep learning techniques for natural language processing). J. Intell. Syst. 28(3), 423–435 (2019)

    Article  Google Scholar 

  51. Akhil, K.K., Rajimol, R., Anoop, V.S.: Parts-of-speech tagging for malayalam using deep learning techniques. Int. J. Inform. Technol. 1–8 (2020)

  52. Gopalakrishnan, Athira, Soman, K.P., Premjith, B.: Part-of-speech tagger for biomedical domain using deep neural network architecture. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–5. IEEE (2019)

  53. Wang, Shirui, Zhou, Wenan, Jiang, Chao: A survey of word embeddings based on deep learning. Computing 102(3), 717–740 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  54. Knapp, Steven K.: Accelerate fpga macros with one-hot approach. Electron. Design 38(17), 71–78 (1990)

    Google Scholar 

  55. Mikolov, Tomas, Chen, Kai, Corrado, Greg, Dean, Jeffrey: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  56. Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg S., Dean, Jeff: Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp. 3111–3119 (2013)

  57. Pennington, Jeffrey, Socher, Richard, Manning, Christopher: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar, October 2014. Association for Computational Linguistics

  58. Joulin, Armand, Grave, Edouard, Bojanowski, Piotr, Mikolov, Tomas: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)

  59. Peters, Matthew E., Neumann, Mark, Iyyer, Mohit, Gardner, Matt, Clark, Christopher, Lee, Kenton, Zettlemoyer, Luke: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)

  60. Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Jakob, Jones, Llion, Gomez, Aidan N., Kaiser, Łukasz, Polosukhin, Illia: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)

  61. Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, Toutanova, Kristina: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  62. Zheng, Xiaoqing, Chen, Hanyang, Xu, Tianyu: Deep learning for chinese word segmentation and pos tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647–657 (2013)

  63. Ng, Hwee Tou, Low, Jin Kiat: Chinese part-of-speech tagging: One-at-a-time or all-at-once? word-based or character-based? In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 277–284 (2004)

  64. Plank, Barbara, Søgaard, Anders, Goldberg, Yoav: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv preprint arXiv:1604.05529 (2016)

  65. Labeau, Matthieu, Löser, Kevin, Allauzen, Alexandre: Non-lexical neural architecture for fine-grained pos tagging. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 232–237 (2015)

  66. Ballesteros, Miguel, Dyer, Chris, Smith, Noah A.: Improved transition-based parsing by modeling characters instead of words with lstms. arXiv preprint arXiv:1508.00657 (2015)

  67. Ling, Wang, Luís, Tiago, Marujo, Luís, Astudillo, Ramón Fernandez, Amir, Silvio, Dyer, Chris, Black, Alan W., Trancoso, Isabel: Finding function in form: Compositional character models for open vocabulary word representation. arXiv preprint arXiv:1508.02096 (2015)

  68. Ling, Wang, Trancoso, Isabel, Dyer, Chris, Black, Alan W.: Character-based neural machine translation. arXiv preprint arXiv:1511.04586 (2015)

  69. Costa-Jussa, Marta R., Fonollosa, José A.R.: Character-based neural machine translation. arXiv preprint arXiv:1603.00810 (2016)

  70. Bhatia, M.P.S., Sangwan, Saurabh Raj: Debunking online reputation rumours using hybrid of lexicon-based and machine learning techniques. In: Proceedings of First International Conference on Computing, Communications, and Cyber-Security (IC4S 2019), pp. 317–327. Springer (2020)

  71. Sangwan, Saurabh Raj, Bhatia, M.P.S.: D-bullyrumbler: a safety rumble strip to resolve online denigration bullying using a hybrid filter-wrapper approach. Multimedia Systems 1–17 (2020)

  72. Sharma, Kapil, Bala, Manju, et al.: An ecological space based hybrid swarm-evolutionary algorithm for software reliability model parameter estimation. Int. J. Syst. Assurance Eng. Manag. 11(1), 77–92 (2020)

    Article  Google Scholar 

  73. Gopal, Madhav, Mishra, Diwakar, Singh, Devi Priyanka: Evaluating tagsets for sanskrit. In: Sanskrit Computational Linguistics, pp. 150–161. Springer (2010)

  74. Gopal, Madhav, Jha, Girish Nath: Tagging sanskrit corpus using bis pos tagset. In: Information Systems for Indian Languages, pp. 191–194. Springer (2011)

  75. Gopal, Madhav, Jha, Grish Nath: Indian language part of speech tagger (il-post), 2007. http://sanskrit.jnu.ac.in/corpora/tagset.jsp

  76. Chandershekhar, R., Jha, Girish Nath: Part-of-Speech Tagging for Sanskrit. PhD thesis, Special Centre for Sanskrit Studies, JNU Delhi, http://sanskrit.jnu.ac.in/corpora/JNU-Sanskrit-Tagset.htm (2007)

  77. Sarkar, Sandipan, Bandyopadhyay, Sivaji: Design of a rule-based stemmer for natural language text in bengali. In: Proceedings of the IJCNLP-08 workshop on NLP for Less Privileged Languages (2008)

  78. Zhang, Weinan, Du, Tianming, Wang, Jun: Deep learning over multi-field categorical data. In: European conference on information retrieval, pp. 45–57. Springer (2016)

  79. Chappelier, Jean-Cédric, Rajman, Martin, et al.: A generalized cyk algorithm for parsing stochastic cfg. TAPD 98(133–137), 5 (1998)

    Google Scholar 

  80. Younger, Daniel H.: Recognition and parsing of context-free languages in time n3. Inform. Control 10(2), 189–208 (1967)

    Article  MATH  Google Scholar 

  81. Li, Te, Alagappan, Devi: a comparison of cyk and earley parsing algorithms. icar.cnr.it (2006)

  82. Chen, Xinxiong, Xu, Lei, Liu, Zhiyuan, Sun, Maosong, Luan, Huanbo: Joint learning of character and word embeddings. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sitender.

Ethics declarations

conflict of interest

The authors have no conflicts of interest to disclose. This article does not contain any studies with animals performed by any of the authors. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sitender, Bawa, S. Sanskrit to universal networking language EnConverter system based on deep learning and context-free grammar. Multimedia Systems 28, 2105–2121 (2022). https://doi.org/10.1007/s00530-020-00692-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-020-00692-3

Keywords

Navigation