Skip to main content
Log in

A Sanskrit-to-English machine translation using hybridization of direct and rule-based approach

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The work in this paper presents a MTS from Sanskrit to English language using a hybridized form of direct and rule-based machine translation technique. This paper also discusses the language divergence among Sanskrit and English languages with a recommended solution to handle the divergence. The proposed system has used two bilingual dictionaries (Sanskrit–English, Sanskrit–UNL), a tagged Sanskrit corpus, a Sanskrit analysis rule base and an ELGR base. Elasticsearch technique has enhanced the translation speed of the proposed system for accessing the data from different data dictionaries and rule bases used for the system development. The system uses CFG in CNF for Sanskrit language processing and CYK parsing technique for processing the input Sanskrit sentence. This work also presents a novel algorithm which creates a parse tree from the parsing table. ELGR base and bilingual dictionaries generate the target language sentence. The proposed system is evaluated using natural language toolkit API in python and achieved a BLEU score of 0.7606, fluency score of 3.63 and adequacy score of 3.72. A comparison of the proposed system with state-of-the-art systems shows that the proposed system outperforms existing systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Abbreviations

AI:

Artificial intelligence

API:

Application programming interface

BLEU:

Bilingual evaluation understudy

CBMT:

Corpus-based machine translation

CFG:

Context-free grammar

CLR:

Canonical syntactic realization

CNF:

Chomsky normal form

CYK:

Cocke–Younger–Kasami

DMT:

Direct machine translation

GLR:

Generalized linking routine

HBMT:

Hybrid-based machine translation

LCS:

Lexical conceptual structure

MT:

Machine translation

MTS:

Machine translation system

POS:

Part of speech

RBMT:

Rule-based machine translation

TLGR:

Target language generation rule

UNL:

Universal networking language

References

  1. Kak SC (1987) The paninian approach to natural language processing. Int J Approx Reason 1(1):117–130

    Article  Google Scholar 

  2. Briggs R (1985) Knowledge representation in Sanskrit and artificial intelligence. AI Mag 6(1):32

    Google Scholar 

  3. Bahadur P, Jain A, Chauhan DS (2011) English to Sanskrit machine translation. In: Proceedings of the international conference & workshop on emerging trends in technology. ACM, pp 641–645

  4. Mishra V, Mishra RB (2008) Study of example based English to Sanskrit machine translation. J Res Dev Comput Sci Eng 37:43–54

    Google Scholar 

  5. Mishra V, Mishra RB (2009) Ann and rule based model for English to Sanskrit machine translation. INFOCOMP J Comput Sci 9(1):80–89

    Google Scholar 

  6. Bahadur P, Jain AK, Chauhan DS (2012) Etrans-A complete framework for English to Sanskrit machine translation. In: International Journal of Advanced Computer Science and Applications (IJACSA) from international conference and workshop on emerging trends in technology. Citeseer, pp 52–59

  7. Lewis MP, Simons GF, Fennig CD (2015) Ethnologue: languages of Ecuador. SIL International, Dallas

    Google Scholar 

  8. Mallikarjun B (2010) Patterns of Indian multilingualism. In: Strength for today and bright hope for tomorrow, vol 10, no 6, pp 1–18

  9. Dorr BJ , Hovy EH, Levin LS (2004) Natural language processing and machine translation encyclopedia of language and linguistics, (ELL2). Machine translation: interlingual methods. In: Proceeding international conference of the world congress on engineering

  10. Dorr Bonnie J (1994) Machine translation divergences: a formal description and proposed solution. Comput Linguist 20(4):597–633

    Google Scholar 

  11. Goyal P, Sinha RMK (2009) Translation divergence in English–Sanskrit–Hindi language pairs. In: International sanskrit computational linguistics symposium. Springer, pp 134–143

  12. Mishra V, Mishra RB (2009) Divergence patterns between English and Sanskrit machine translation. INFOCOMP 8(3):62–71

    Google Scholar 

  13. Goyal V, Lehal GS (2010) Web based Hindi to Punjabi machine translation system. J Emerg Technol Web Intell 2(2):148–151

    Google Scholar 

  14. Dubey P et al (2013) Machine translation system for Hindi–Dogri language pair. In: 2013 international conference on machine intelligence and research advancement (ICMIRA). IEEE, pp 422–425

  15. Dubey P (2019) The Hindi to Dogri machine translation system: grammatical perspective. Int J Inf Technol 11(1):171–182

    Google Scholar 

  16. Narayana VN (1994) Anusarak: a device to overcome the language barrier. PhD thesis, Ph.D. thesis, Department of CsE, IIT Kanpur

  17. Bharati A, Chaitanya V, Kulkarni AP, Sangal R (1997) Anusaaraka machine translation in stages. VIVEK-Bombay 10:22–25

    Google Scholar 

  18. Bharati RM, Sankar B, Reddy P, Sharma DM, Sangal R (2003) Machine translation: the shakti approach. Pre-conference tutorial. In: ICON

  19. Josan GS, Lehal GS (2008) A Punjabi to Hindi machine translation system. In: 22nd international conference on on computational linguistics: demonstration papers. Association for Computational Linguistics, pp 157–160

  20. Rajan R, Sivan R, Ravindran R, Soman KP (2009) Rule based machine translation from English to Malayalam. In: ACT’09. International conference on advances in computing, control, & telecommunication technologies, 2009. IEEE, pp 439–441

  21. Goyal P, Sinha RMK (2009) A study towards design of an English to Sanskrit machine translation system. In: Sanskrit computational linguistics. Springer, pp 287–305

  22. Pathak GR, Godse SP (2010) English to Sanskrit machine translation using transfer approach. In: International conference on methods and models in science and technology. American Institute of Physics, Pune, pp 122–126

  23. Mishra V, Mishra RB (2012) English to Sanskrit machine translation system: a rule-based approach. Int J Adv Intell Paradig 4(2):168–184

    Article  Google Scholar 

  24. Reddy MV, Hanumanthappa M (2013) Indic language machine translation tool: English to Kannada/Telugu. In: Multimedia processing, communication and computing applications. Springer, New Delhi, pp 35–49. https://doi.org/10.1007/978-81-322-1143-3_4

  25. Jayan V, Bhadran VK (2014) Anglabharati to Anglamalayalam: an experience with English to Indian language machine translation. In: 2014 international conference on contemporary computing and informatics (IC3I). IEEE, pp 282–287

  26. Desai P, Sangodkar A, Damani OP (2014) A domain-restricted, rule based, English–Hindi machine translation system based on dependency parsing. In: Proceedings of the 11th international conference on natural language processing, pp 177–185

  27. Balyan R, Chatterjee N (2015) Translating noun compounds using semantic relations. Comput Speech Lang 32(1):91–108

    Article  Google Scholar 

  28. Aasha VC, Ganesh A (2015) Machine translation from English to Malayalam using transfer approach. In: 2015 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 1565–1570

  29. Sridhar R, Sethuraman P, Krishnakumar K (2016) English to Tamil machine translation system using universal networking language. Sādhanā 41(6):607–620

    Article  Google Scholar 

  30. Sinha R, sivaraman KS, Agrawal A, Jain R, Srivastava R, Jain A et al (1995) Anglabharti: a multilingual machine aided translation project on translation from English to Indian languages. In: IEEE international conference on systems, man and cybernetics, 1995. Intelligent systems for the 21st century, vol 2. IEEE, pp 1609–1614

  31. Darbari H (1999) Computer-assisted translation system—an Indian perspective. In: Machine translation summit VII, 13th–17th September, pp 80–85

  32. Dave S, Parikh J, Bhattacharyya P (2001) Interlingua-based English-Hindi machine translation and language divergence. Mach Transl 16(4):251–304

    Article  Google Scholar 

  33. Singh S, Dalal M, Vachani V, Bhattacharyya P, Damani OP (2007) Hindi generation from interlingua. In: Proceedings of machine translation summit, pp 1–8

  34. Choudhary A, Singh M (2009) Gb theory based Hindi to English translation system. In: 2nd IEEE international conference on computer science and information technology, 2009. ICCSIT 2009. IEEE, pp 293–297

  35. Christopher M, Rao UM (2010) IL-ILMT sampark: a hybrid machine translation system. In 32nd all India conference of linguistics (AICL32). Lucknow University, Lucknow, pp 69–75

  36. Batra KK, Lehal GS (2010) Rule based machine translation of noun phrases from Punjabi to English. Int J Comput Sci Issues 7(5):409–413

    Google Scholar 

  37. Batra KK, Lehal GS (2011) Automatic translation system from Punjabi to English for simple sentences in legal domain. Int J Trans 23(1):79–98

    Google Scholar 

  38. Kumar P, Sharma RK (2012) Punjabi to unl enconversion system. Sadhana 37(2):299–318

    Article  Google Scholar 

  39. Parteek Kumar and Rajendra Kumar Sharma (2013) Punjabi deconverter for generating Punjabi from universal networking language. J Zhejiang Univ Sci C 14(3):179–196

    Article  Google Scholar 

  40. Udupa UR, Faruquie TA (2005) An English–Hindi statistical machine translation system. In: Su KY, Tsujii J, Lee JH, Kwong OY (eds) Natural language pocessing–IJCNLP 2004. IJCNLP 2004. Lecture notes in computer science, vol 3248. Springer, Berlin, Heidelberg, pp 254–262. https://doi.org/10.1007/978-3-540-30211-7_27

    Chapter  Google Scholar 

  41. Antony PJ (2013) Machine translation approaches and survey for Indian languages. Int J Comput Linguist Chin Lang Process 18(1):47–78

    Google Scholar 

  42. Garje GV, Kharate GK (2013) Survey of machine translation systems in India. Int J Nat Lang Comput (IJNLC) 2(4):47–67

    Article  Google Scholar 

  43. Sinha RMK (2004) An engineering perspective of machine translation: anglabharti-ii and anubharti-ii architectures. In: Proceedings of international symposium on machine translation, NLP and translation support system (iSTRANS-2004), pp 10–17

  44. Jain R Sinha RMK, Jain A (2001) Anubharti-using hybrid example-based approach for machine translation. In: STRANS-2001, IIT Kanpur, pp 20–32

  45. Sinha RMK, Thakur A (2005) Machine translation of bi-lingual Hindi–English (Hinglish) text. In: 10th Machine translation summit (MT Summit X), Phuket, Thailand, pp 149–156

  46. Sachdeva K, Srivastava R, Jain S, Sharma DM (2014) Hindi to English machine translation: using effective selection in multi-model SMT. In: LREC, pp 1807–1811

  47. Dungarwal P, Chatterjee R, Mishra A, Kunchukuttan A, Shah R, Bhattacharyya P (2014) The IIT bombay Hindi–English translation system at WMT 2014. In: ACL 2014, p 90

  48. Och FJ (2007) Google translator. In: Joint conference on empirical methods in natural language processing and computational natural language learning. Prague. Association for Computational Linguistics, pp 858–867

  49. Venkatapathy S, Bangalore S (2009) Discriminative machine translation using global lexical selection. ACM Trans Asian Lang Inf Process (TALIP) 8(2):8

    Google Scholar 

  50. Sharma N (2011) English to Hindi statistical machine translation system. PhD thesis, Thapar University Patiala

  51. Khan N, Anwar W, Bajwa UI, Durrani N (2013) English to Urdu hierarchical phrase-based statistical machine translation. In: WSSANLP2013, Japan, October 2013, pp 72–76

  52. Ali A, Hussain A, Malik MK (2013) Model for English–Urdu statistical machine translation. World Appl Sci 24:1362–1367

    Google Scholar 

  53. Sheikh M, Conlon S (2013) Application of machine translation in bilingual knowledge management. Int J Intercult Inf Manag 3(2):123–137

    Google Scholar 

  54. Jawaid B, Kamran A, Bojar O (2014) English to Urdu statistical machine translation: establishing a baseline. In: Proceedings of the Fifth workshop on south and southeast Asian natural language processing, pp 37–42

  55. Naskar S, Bandyopadhyay S (2005) Use of machine translation in India: current status. AAMT J 16:25–31

    Google Scholar 

  56. Badodekar S (2003) Translation resources, services and tools for Indian languages. In: Computer science and engineering department, Indian Institute of Technology, Mumbai, 400019

  57. Saini TS, Lehal GS, Kalra VS (2008) Shahmukhi to Gurmukhi transliteration system. In: 22nd international conference on on computational linguistics: demonstration papers. Association for Computational Linguistics, pp 177–180

  58. Goyal V, Lehal GS (2011) Hindi to Punjabi machine translation system. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: systems demonstrations. Association for Computational Linguistics, pp 1–6

  59. Narayan R, Singh VP, Chakraverty S (2014) Quantum neural network based machine translator for Hindi to English. Sci World J 2014:1–8. https://doi.org/10.1155/2014/485737

    Article  Google Scholar 

  60. Sinha RMK, Jain A (2003) Anglahindi: an English to Hindi machine-aided translation system. In: MT Summit IX, New Orleans, USA, pp 494–497

  61. Sinha RMK (2005) Integrating CAT and MT in Anglabharti-II architecture. In: 10th EAMT conference, pp 235–244

  62. Saha GK (2005) The eb-anubad translator: a hybrid scheme. J Zhejiang Univ Sci A 6(10):1047–1050

    Article  Google Scholar 

  63. NCST (2008) Matra: an English to Hindi machine translation system. Technical report, NCST Mumbai

  64. Shahnawaz A, Mishra RB (2011) Translation rules and ann based model for English to Urdu machine translation. INFOCOMP J Comput Sci 10(3):25–35

    Google Scholar 

  65. Shahnawaz, Mishra RB (2015) An English to Urdu translation model based on CBR ANN and translation rules. Int J Adv Intell Paradig 7(1:1–23

    Article  Google Scholar 

  66. Jaideepsinh K, Jatinderkumar S (2016) Sanskrit machine translation systems: a comparative analysis. Int J Comput Appl 136:1–4

    Google Scholar 

  67. Huet G (2006) Shallow syntax analysis in Sanskrit guided by semantic nets constraints. In: Proceedings of the 2006 international workshop on research issues in digital libraries. ACM, p 6

  68. Kulkarni A, Pokar S, Shukl D (2010) Designing a constraint based parser for Sanskrit. In Sanskrit computational linguistics. Springer, pp 70–90

  69. Kulkarni A (2013) A deterministic dependency parser with dynamic programming for Sanskrit. In: Proceedings of the second international conference on dependency linguistics (DepLing 2013), pp 157–166

  70. Bhadra M, Singh SK, Kumar S, Agrawal M, Chandrasekhar R, Mishra SK, Jha GN et al (2009) Sanskrit analysis system (SAS). In: Sanskrit computational linguistics. Springer, pp 116–133

  71. Kumar A, Mittal V, Kulkarni A (2010) Sanskrit compound processor. In: Sanskrit computational linguistics. Springer, pp 57–69

  72. Bharati A, Kulkarni A (2009) Anusaaraka: an accessor cum machine translator. Department of Sanskrit Studies, University of Hyderabad, Hyderabad, pp 1–75

  73. Aparna S (2005) Sanskrit to English translator. In: Language in India, vol 5

  74. Upadhyay P, Jaiswal UC, Ashish K (2014) Transish: translator from Sanskrit to English-a rule based machine translation. Int J Curr Eng Technol 4(5):2277–4106

    Google Scholar 

  75. Gopal M, Mishra D, Singh DP (2010) Evaluating tagsets for Sanskrit. In: International sanskrit computational linguistics symposium. Springer, pp 150–161

  76. Gopal M, Jha GN (2011) Tagging Sanskrit corpus using bis pos tagset. In: International conference on information systems for Indian languages. Springer, pp 191–194

  77. Gopal M, Jha GN (2007) Indian language part of speech tagger (IL-post). http://sanskrit.jnu.ac.in/corpora/tagset.jsp. Accessed 24 Dec 2018

  78. Chandershekhar R, Jha GN (2007) Part-of-speech tagging for Sanskrit. PhD thesis, Special Centre for Sanskrit Studies, JNU Delhi. http://sanskrit.jnu.ac.in/corpora/JNU-Sanskrit-Tagset.htm

  79. Sitender Bawa S (2018) Sansunl: a Sanskrit to UNL enconverter system. IETE J Res. https://doi.org/10.1080/03772063.2018.1528187

    Article  Google Scholar 

  80. Younger DH (1967) Recognition and parsing of context-free languages in time n3. Inf Control 10(2):189–208

    Article  Google Scholar 

  81. Li T, Alagappan D (2006) A comparison of CYK and earley parsing algorithms. In: ICAR-CNR, pp 1–5

  82. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 311–318

  83. LDC (2005) Linguistic data annotation specification: assessment of adequacy and fluency in translations. revision 1.5. Technical report, Linguistic Data Consortium

  84. Kumar P, Sharma RK (2012) UNL based machine translation system for Punjabi language. PhD thesis, Thapar University

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sitender.

Ethics declarations

Conflict of interest

We have no conflicts of interest to disclose.

Human and animal rights

This article does not contain any studies with animals performed by any of the authors. This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Generating parsing table and parse tree using CYK parser

In this section, Table 4 explains the processing of Sanskrit text by CYK parser by taking

figure h

as example. Figures 6 and 7 depict the process of parse tree generation from the parsing table.

Fig. 6
figure 6

Sanskrit parse tree

Fig. 7
figure 7

English parse tree

Table 4 CYK processing of Sanskrit text

Appendix 2: Target language generation rule base

This section provides the TLGR and covers three voices of Sanskrit language with corresponding English language equivalent. Table 5 shows tabular representation of three voices and ten tenses of Sanskrit with rules to generate English-equivalent translation.

Table 5 Target language generation rule base

Appendix 3: Implementation of the proposed Sanskrit-to-English MTS

This section shows the software implementation of the proposed Sanskrit-to-English translator using an example.

Fig. 8
figure 8

POS tagging of Sanskrit sentence

Fig. 9
figure 9

Sanskrit parse tree

Fig. 10
figure 10

English parse tree

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sitender, Bawa, S. A Sanskrit-to-English machine translation using hybridization of direct and rule-based approach. Neural Comput & Applic 33, 2819–2838 (2021). https://doi.org/10.1007/s00521-020-05156-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05156-3

Keywords

Navigation