Formal Structure of Sanskrit Text: Requirements Analysis for a Mechanical Sanskrit Processor

  • Gérard Huet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5402)


We discuss the mathematical structure of various levels of representation of Sanskrit text in order to guide the design of computer aids aiming at useful processing of the digitalised Sanskrit corpus. Two main levels are identified, respectively called the linear and functional level. The design space of these two levels is sketched, and the computational implications of the main design choices are discussed. Current solutions to the problems of mechanical segmentation, tagging, and parsing of Sanskrit text are briefly surveyed in this light. An analysis of the requirements of relevant linguistic resources is provided, in view of justifying standards allowing inter-operability of computer tools.

This paper does not attempt to provide definitive solutions to the representation of Sanskrit at the various levels. It should rather be considered as a survey of various choices, allowing an open discussion of such issues in a formally precise general framework.


Sanskrit computational linguistics finite-state machines morphophonemics dependency grammars constraint satisfaction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Apte, V.S.: The Student’s Guide to Sanskrit Composition. In: A Treatise on Sanskrit Syntax for Use of Schools and Colleges. Lokasamgraha Press, Poona (1885)Google Scholar
  2. 2.
    Barendregt, H.: The Lambda Calculus: Its Syntax and Semantics. North Holland, Amsterdam (1984)Google Scholar
  3. 3.
    Bharati, A., Chaitanya, V., Sangal, R.: Natural Language Processing. A Paninian Perspective. Prentice-Hall of India, New Delhi (1995)Google Scholar
  4. 4.
    Dowty, D.: Grammatical relations and Montague Grammars. In: Jacobson, P., Pullum, G.K. (eds.) The nature of Syntactic Representation, Reidel (1982)Google Scholar
  5. 5.
    Eilenberg, S.: Automata, Languages, and Machines, volume A. Academic Press, London (1974)Google Scholar
  6. 6.
    Gillon, B.S.: Bartṛhari’s solution to the problem of asamartha compounds. Études Asiatiques/Asiatische Studien 47(1), 117–133 (1993)Google Scholar
  7. 7.
    Gillon, B.S.: Autonomy of word formation: evidence from Classical Sanskrit. Indian Linguistics 56(1-4), 15–52 (1995)Google Scholar
  8. 8.
    Gillon, B.S.: Word order in Classical Sanskrit. Indian Linguistics 57(1), 1–35 (1996)Google Scholar
  9. 9.
    Gillon, B.S.: Bartṛhari’s rule for unexpressed kārakas: The problem of control in Classical Sanskrit. In: Deshpande, M.M., Hook, P.E. (eds.) Indian linguistic studies: Festschrift in honour of George Cardona, Motilal Banarsidass, Delhi (2002)Google Scholar
  10. 10.
    Gillon, B.S.: Null arguments and constituent structure in Classical Sanskrit. Private communication (2003)Google Scholar
  11. 11.
    Gillon, B.S.: Subject predicate order in Classical Sanskrit. In: Scott, P., Casadio, C., Seely, R. (eds.) Language and grammar: studies in mathematical linguistics and natural language, pp. 211–225. Center for the Study of Language and Information (2005)Google Scholar
  12. 12.
    Gillon, B.S.: Exocentric (bahuvrīhi) compounds in classical Sanskrit. In: Huet, G., Kulkarni, A. (eds.) Proceedings, First International Symposium on Sanskrit Computational Linguistics, pp. 1–12 (2007)Google Scholar
  13. 13.
    Gillon, B.S.: Pāṇini’s aṣṭādhyāyī and linguistic theory. J. Indian Philos 35, 445–468 (2007)CrossRefGoogle Scholar
  14. 14.
    Girard, J.-Y., Lafont, Y., Régnier, L. (eds.): Advances in Linear Logic. London Mathematical Society Lecture Notes, vol. 222. Cambridge University Press, Cambridge (2005)Google Scholar
  15. 15.
    Girard, J.-Y., Lafont, Y., Taylor, P. (eds.): Proofs and Types. Cambridge Tracts in Theoretical Computer Science, vol. 7. Cambridge University Press, Cambridge (1988)Google Scholar
  16. 16.
    Goyal, P., Sinha, R.M.K.: Translation divergence in English-Sanskrit-Hindi language pairs. In: Kulkarni, A., Huet, G. (eds.) Sanskrit Computational Linguistics. LNCS, vol. 5406, pp. 134–143. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  17. 17.
    Hellwig, O.: SanskritTagger, a stochastic lexical and pos tagger for Sanskrit. In: Huet, G., Kulkarni, A. (eds.) Proceedings, First International Symposium on Sanskrit Computational Linguistics, pp. 37–46 (2007)Google Scholar
  18. 18.
    Hindley, J.R., Seldin, J.P. (eds.): Introduction to Combinators and λ-Calculus. Cambridge University Press, Cambridge (1986)Google Scholar
  19. 19.
    Hock, H.H.: The Sanskrit quotative: a historical and comparative study. Studies in the Linguistic Sciences 12(2), 39–85 (1982)Google Scholar
  20. 20.
    Hock, H.H. (ed.): Studies in Sanskrit Syntax. Motilal Banarsidass, Delhi (1991)Google Scholar
  21. 21.
    Hoffmann, K.: Der Injunktiv im Veda. Eine synchronische Untersuchung. Karl Winter Universitätsverlag (1967)Google Scholar
  22. 22.
    Huet, G.: The Zen computational linguistics toolkit: Lexicon structures and morphology computations using a modular functional programming language. In: Tutorial, Language Engineering Conference LEC 2002 (2002)Google Scholar
  23. 23.
    Huet, G.: Towards computational processing of Sanskrit. In: International Conference on Natural Language Processing (ICON) (2003),
  24. 24.
    Huet, G.: Design of a lexical database for Sanskrit. In: Workshop on Enhancing and Using Electronic Dictionaries, COLING 2004. International Conference on Computational Linguistics (2004),
  25. 25.
    Huet, G.: A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger. J. Functional Programming 15(4), 573–614 (2005), CrossRefGoogle Scholar
  26. 26.
    Huet, G.: Lexicon-directed Segmentation and Tagging of Sanskrit. In: Tikkanen, B., Hettrich, H. (eds.) Themes and Tasks in Old and Middle Indo-Aryan Linguistics, pp. 307–325. Motilal Banarsidass, Delhi (2006)Google Scholar
  27. 27.
    Huet, G.: Shallow syntax analysis in Sanskrit guided by semantic nets constraints. In: Proceedings of the 2006 International Workshop on Research Issues in Digital Libraries. ACM, New York (2007), Google Scholar
  28. 28.
    Huet, G., Razet, B.: The reactive engine for modular transducers. In: Futatsugi, K., Jouannaud, J.-P., Meseguer, J. (eds.) Algebra, Meaning, and Computation. LNCS, vol. 4060, pp. 355–374. Springer, Heidelberg (2006), CrossRefGoogle Scholar
  29. 29.
    Kiparsky, P.: On the architecture of Pāṇini’s grammar. In: International Conference on the Architecture of Grammar, Hyderabad (2002)Google Scholar
  30. 30.
    Kiparsky, P., Staal, J.F.: Syntactic and semantic relations in Pāṇini. Foundations of Language 5, 83–117 (1969)Google Scholar
  31. 31.
    Kleene, S.C.: Introduction to Metamathematics. North Holland, Amsterdam (1971) (8th reprint (1st edn. 1952))Google Scholar
  32. 32.
    Kracht, M.: The combinatorics of case. Research on Language and Computation 1(1/2), 59–97 (2003)CrossRefGoogle Scholar
  33. 33.
    Kulkarni, M.: Phonological overgeneration in paninian system. In: Huet, G., Kulkarni, A., Scharf, P. (eds.) Sanskrit CL 2007/2008. LNCS (LNAI), vol. 5402, pp. 306–319. Springer, Heidelberg (2009)Google Scholar
  34. 34.
    Löf, P.M.: Intuitionistic Type Theory. Bibliopolis, Napoli (1984)Google Scholar
  35. 35.
    Mel’cǔk, I.: Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques IV. Les Presses de l’Université de Montréal (1999)Google Scholar
  36. 36.
    Oberlies, T.: A Grammar of Epic Sanskrit. De Gruyter, Berlin (2003)CrossRefGoogle Scholar
  37. 37.
    Pawan Goyal, V.A., Behera, L.: Analysis of Sanskrit text: Parsing and semantic relations. In: Huet, G., Kulkarni, A. (eds.) Proceedings, First International Symposium on Sanskrit Computational Linguistics, pp. 23–36 (2007)Google Scholar
  38. 38.
    Ramanujan, P.: Computer processing of Sanskrit. In: Computer Processing of Asian Languages Conference 2. IIT Kanpur (1992)Google Scholar
  39. 39.
    Renou, L.: La valeur du parfait dans les hymnes védiques. Honoré Champion, Paris (1925); 2ème édition étendue (1967)Google Scholar
  40. 40.
    Renou, L.: Terminologie grammaticale du sanskrit. Honoré Champion, Paris (1942)Google Scholar
  41. 41.
    Rétoré, C.: The logic of categorial grammars. Technical report, INRIA Rapport de recherche 5703 (2005),
  42. 42.
    Sastri, V.: Samskrita Bālādarśa. Vadhyar, Palghat (2002)Google Scholar
  43. 43.
    Scharf, P.: Pāṇinian accounts of the vedic subjunctive. Indo-Iranian Journal 48(1-2), 71–96 (2005)CrossRefGoogle Scholar
  44. 44.
    Scharf, P., Hyman, M.: Linguistic Issues in Encoding Sanskrit. Motilal Banarsidass, Delhi (2009)Google Scholar
  45. 45.
    Speijer, J.S.: Sanskrit Syntax. E. J. Brill, Leyden (1886)Google Scholar
  46. 46.
    Staal, J.F.: Word Order in Sanskrit and Universal Grammar. Reidel, Dordrecht (1967)CrossRefGoogle Scholar
  47. 47.
    Staal, J.F.: Universals - Studies in Indian Logic and Linguistics. The University of Chicago Press (1988)Google Scholar
  48. 48.
    Tesnière, L. (ed.): Éléments de Syntaxe Structurale. Klincksieck, Paris (1959)Google Scholar
  49. 49.
    Tikkanen, B.: The Sanskrit Gerund: a Synchronic, Diachronic and typological analysis. Finnish Oriental Society, Helsinki (1987)Google Scholar
  50. 50.
    Tubb, G.A., Boose, E.R.: Scholastic Sanskrit. Columbia University, New York (2007)Google Scholar
  51. 51.
    Verboom, A.: Towards a sanskrit wordparser. Literary and Linguistic Computing 3, 40–44 (1988)CrossRefGoogle Scholar
  52. 52.
    Yelle, R.A.: Explaining mantras. Routledge, New York (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Gérard Huet
    • 1
  1. 1.INRIA RocquencourtLe Chesnay CedexFrance

Personalised recommendations