Abstract
In this chapter , we first try to present a general picture about the present scenario of corpus generation in the Indian context with an appropriate focus on the works already done as well as adequate attention on the works that are in the process of continuation. Along with the reference to text corpora , we also talk about the speech corpora so far developed in a few Indian languages . Moreover, we suggest for generating annotated text and speech corpora in all major Indian languages keeping the applicational relevance of these corpora in various domains of general linguistics , applied linguistics , and language technology . We also argue for generating special corpora in written and spoken texts for exploring their special linguistics features and propose for generation of dialect corpora in all local and regional varieties for their protection and promotion. Finally, we propose for the formation of a national archive or a digital data center for preservation and distribution of Indian text and speech corpora for the benefit of the languages and their speakers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andor, J. 2004. The Master and His Performance: An Interview with Noam Chomsky. Journal of Intercultural Pragmatics 1 (1): 93–111.
Dash, N.S. 2003. Corpus Linguistics in India: Present Scenario and Future Direction. Indian Linguistics 64 (1–2): 85–113.
Dash, N.S. 2005. Corpus Linguistics and Language Technology: With Reference to Indian Languages. New Delhi: Mittal Publications.
Dash, N.S. 2006. Speech Corpora vs. Text Corpora: The Need for Separate Development. Indian Linguistics 67 (1–4): 65–82.
Dash, N.S. 2008. Corpus Linguistics: An Empirical Approach for Studying a Natural Language. Language Forum 34 (2): 5–21.
Datta Majumder, D., and N.R. Ganguli. 1987. Speech Processing Research in India—Perspective and Trends. In Advances in Computing and Humanities, vol. I, ed. E. Nissan, 115–159. Connecticut: JAI Press Inc.
de Ginestel-Maitland, A., M. De Calmés, and G. Pérennou. 1993. Multi-level Transcription of Speech Corpora from Orthographic Forms. In Proceedings of the 3rd European Conference on Speech Communication and Technology (Eurospeech-93), vol. II, 1441–1444, Berlin, Germany, 21–23 September 1993.
Dutta, A.K., N.R. Ganguli, and B. Mukherjee. 1991. Nasalisation in Bengali Speech Sounds: Acoustic Phonetic Study. In Proceedings of 2nd European Conference on Speech Communication and Technology, vol. 1, 157–180, Geneva, Italy.
Ganguli, N.R., A.K. Dutta, and B. Mukherjee. 1988. Acoustic Phonetics of Non-nasal Standard Bengali Vowels: A Spectrographic Study. Journal of the IETE 34 (1): 50–56.
Harry, B. (ed.). 2003. Corpus Linguistics and Modern Hebrew. Tel Aviv: Tel Aviv University Press.
Izre’el, S., B. Harry, and G. Rahav. 2001. Designing CoSIH: The Corpus of Spoken Israeli Hebrew. International Journal of Corpus Linguistics. 6 (1): 171–197.
Knowles, G. 1994. Annotating Large Speech Corpora: Building on the Experience of MARSEC. Hermes 1: 87–98.
Landau, S.I. 2001. Dictionaries: The Art and Craft of Lexicography, 2nd ed. Cambridge: Cambridge University Press.
McEnery, T., and A. Wilson. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press.
Patil, H.A., and T.K. Basu. 2004. Multilingual Speech Corpus Design for Speaker Identification in Indian Languages. In Proceedings of the International Workshop on Standardization of Speech Databases (Oriental COCOSDA 2004), 8–13, Noida, New Delhi, 17–19 November 2004.
Samarin, W.J. 1966. Field Linguistics. New York: Holt, Rinehart, and Winston.
Singh, U.N. 2006. Proposal to Conduct the New Linguistic Survey of India. In Proceedings of the 28th All India Conference of Linguists (AICL-28), 22–117, 2–4 November 2006. Varanasi, India: Banaras Hindu University.
Web Links
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Dash, N.S., Ramamoorthy, L. (2019). Corpus and Future Indian Needs. In: Utility and Application of Language Corpora . Springer, Singapore. https://doi.org/10.1007/978-981-13-1801-6_15
Download citation
DOI: https://doi.org/10.1007/978-981-13-1801-6_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1800-9
Online ISBN: 978-981-13-1801-6
eBook Packages: Social SciencesSocial Sciences (R0)