Corpus and Future Indian Needs

Dash, Niladri Sekhar; Ramamoorthy, L.

doi:10.1007/978-981-13-1801-6_15

Niladri Sekhar Dash³ &
L. Ramamoorthy⁴

Abstract

In this chapter , we first try to present a general picture about the present scenario of corpus generation in the Indian context with an appropriate focus on the works already done as well as adequate attention on the works that are in the process of continuation. Along with the reference to text corpora , we also talk about the speech corpora so far developed in a few Indian languages . Moreover, we suggest for generating annotated text and speech corpora in all major Indian languages keeping the applicational relevance of these corpora in various domains of general linguistics , applied linguistics , and language technology . We also argue for generating special corpora in written and spoken texts for exploring their special linguistics features and propose for generation of dialect corpora in all local and regional varieties for their protection and promotion. Finally, we propose for the formation of a national archive or a digital data center for preservation and distribution of Indian text and speech corpora for the benefit of the languages and their speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andor, J. 2004. The Master and His Performance: An Interview with Noam Chomsky. Journal of Intercultural Pragmatics 1 (1): 93–111.
Google Scholar
Dash, N.S. 2003. Corpus Linguistics in India: Present Scenario and Future Direction. Indian Linguistics 64 (1–2): 85–113.
Google Scholar
Dash, N.S. 2005. Corpus Linguistics and Language Technology: With Reference to Indian Languages. New Delhi: Mittal Publications.
Google Scholar
Dash, N.S. 2006. Speech Corpora vs. Text Corpora: The Need for Separate Development. Indian Linguistics 67 (1–4): 65–82.
Google Scholar
Dash, N.S. 2008. Corpus Linguistics: An Empirical Approach for Studying a Natural Language. Language Forum 34 (2): 5–21.
Google Scholar
Datta Majumder, D., and N.R. Ganguli. 1987. Speech Processing Research in India—Perspective and Trends. In Advances in Computing and Humanities, vol. I, ed. E. Nissan, 115–159. Connecticut: JAI Press Inc.
Google Scholar
de Ginestel-Maitland, A., M. De Calmés, and G. Pérennou. 1993. Multi-level Transcription of Speech Corpora from Orthographic Forms. In Proceedings of the 3rd European Conference on Speech Communication and Technology (Eurospeech-93), vol. II, 1441–1444, Berlin, Germany, 21–23 September 1993.
Google Scholar
Dutta, A.K., N.R. Ganguli, and B. Mukherjee. 1991. Nasalisation in Bengali Speech Sounds: Acoustic Phonetic Study. In Proceedings of 2nd European Conference on Speech Communication and Technology, vol. 1, 157–180, Geneva, Italy.
Google Scholar
Ganguli, N.R., A.K. Dutta, and B. Mukherjee. 1988. Acoustic Phonetics of Non-nasal Standard Bengali Vowels: A Spectrographic Study. Journal of the IETE 34 (1): 50–56.
Article Google Scholar
Harry, B. (ed.). 2003. Corpus Linguistics and Modern Hebrew. Tel Aviv: Tel Aviv University Press.
Google Scholar
Izre’el, S., B. Harry, and G. Rahav. 2001. Designing CoSIH: The Corpus of Spoken Israeli Hebrew. International Journal of Corpus Linguistics. 6 (1): 171–197.
Article Google Scholar
Knowles, G. 1994. Annotating Large Speech Corpora: Building on the Experience of MARSEC. Hermes 1: 87–98.
Google Scholar
Landau, S.I. 2001. Dictionaries: The Art and Craft of Lexicography, 2nd ed. Cambridge: Cambridge University Press.
Google Scholar
McEnery, T., and A. Wilson. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press.
Google Scholar
Patil, H.A., and T.K. Basu. 2004. Multilingual Speech Corpus Design for Speaker Identification in Indian Languages. In Proceedings of the International Workshop on Standardization of Speech Databases (Oriental COCOSDA 2004), 8–13, Noida, New Delhi, 17–19 November 2004.
Google Scholar
Samarin, W.J. 1966. Field Linguistics. New York: Holt, Rinehart, and Winston.
Google Scholar
Singh, U.N. 2006. Proposal to Conduct the New Linguistic Survey of India. In Proceedings of the 28th All India Conference of Linguists (AICL-28), 22–117, 2–4 November 2006. Varanasi, India: Banaras Hindu University.
Google Scholar

Web Links

Download references

Author information

Authors and Affiliations

Linguistic Research Unit, Indian Statistical Institute, Kolkata, West Bengal, India
Niladri Sekhar Dash
Linguistic Data Consortium-Indian Languages, Central Institute of Indian Languages, Mysore, Karnataka, India
L. Ramamoorthy

Authors

Niladri Sekhar Dash
View author publications
You can also search for this author in PubMed Google Scholar
L. Ramamoorthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Niladri Sekhar Dash .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dash, N.S., Ramamoorthy, L. (2019). Corpus and Future Indian Needs. In: Utility and Application of Language Corpora . Springer, Singapore. https://doi.org/10.1007/978-981-13-1801-6_15

Download citation

DOI: https://doi.org/10.1007/978-981-13-1801-6_15
Published: 14 August 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1800-9
Online ISBN: 978-981-13-1801-6
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics