Skip to main content

Corpus and Future Indian Needs

  • Chapter
  • First Online:
Utility and Application of Language Corpora

Abstract

In this chapter , we first try to present a general picture about the present scenario of corpus generation in the Indian context with an appropriate focus on the works already done as well as adequate attention on the works that are in the process of continuation. Along with the reference to text corpora , we also talk about the speech corpora so far developed in a few Indian languages . Moreover, we suggest for generating annotated text and speech corpora in all major Indian languages keeping the applicational relevance of these corpora in various domains of general linguistics , applied linguistics , and language technology . We also argue for generating special corpora in written and spoken texts for exploring their special linguistics features and propose for generation of dialect corpora in all local and regional varieties for their protection and promotion. Finally, we propose for the formation of a national archive or a digital data center for preservation and distribution of Indian text and speech corpora for the benefit of the languages and their speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Andor, J. 2004. The Master and His Performance: An Interview with Noam Chomsky. Journal of Intercultural Pragmatics 1 (1): 93–111.

    Google Scholar 

  • Dash, N.S. 2003. Corpus Linguistics in India: Present Scenario and Future Direction. Indian Linguistics 64 (1–2): 85–113.

    Google Scholar 

  • Dash, N.S. 2005. Corpus Linguistics and Language Technology: With Reference to Indian Languages. New Delhi: Mittal Publications.

    Google Scholar 

  • Dash, N.S. 2006. Speech Corpora vs. Text Corpora: The Need for Separate Development. Indian Linguistics 67 (1–4): 65–82.

    Google Scholar 

  • Dash, N.S. 2008. Corpus Linguistics: An Empirical Approach for Studying a Natural Language. Language Forum 34 (2): 5–21.

    Google Scholar 

  • Datta Majumder, D., and N.R. Ganguli. 1987. Speech Processing Research in India—Perspective and Trends. In Advances in Computing and Humanities, vol. I, ed. E. Nissan, 115–159. Connecticut: JAI Press Inc.

    Google Scholar 

  • de Ginestel-Maitland, A., M. De Calmés, and G. Pérennou. 1993. Multi-level Transcription of Speech Corpora from Orthographic Forms. In Proceedings of the 3rd European Conference on Speech Communication and Technology (Eurospeech-93), vol. II, 1441–1444, Berlin, Germany, 21–23 September 1993.

    Google Scholar 

  • Dutta, A.K., N.R. Ganguli, and B. Mukherjee. 1991. Nasalisation in Bengali Speech Sounds: Acoustic Phonetic Study. In Proceedings of 2nd European Conference on Speech Communication and Technology, vol. 1, 157–180, Geneva, Italy.

    Google Scholar 

  • Ganguli, N.R., A.K. Dutta, and B. Mukherjee. 1988. Acoustic Phonetics of Non-nasal Standard Bengali Vowels: A Spectrographic Study. Journal of the IETE 34 (1): 50–56.

    Article  Google Scholar 

  • Harry, B. (ed.). 2003. Corpus Linguistics and Modern Hebrew. Tel Aviv: Tel Aviv University Press.

    Google Scholar 

  • Izre’el, S., B. Harry, and G. Rahav. 2001. Designing CoSIH: The Corpus of Spoken Israeli Hebrew. International Journal of Corpus Linguistics. 6 (1): 171–197.

    Article  Google Scholar 

  • Knowles, G. 1994. Annotating Large Speech Corpora: Building on the Experience of MARSEC. Hermes 1: 87–98.

    Google Scholar 

  • Landau, S.I. 2001. Dictionaries: The Art and Craft of Lexicography, 2nd ed. Cambridge: Cambridge University Press.

    Google Scholar 

  • McEnery, T., and A. Wilson. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press.

    Google Scholar 

  • Patil, H.A., and T.K. Basu. 2004. Multilingual Speech Corpus Design for Speaker Identification in Indian Languages. In Proceedings of the International Workshop on Standardization of Speech Databases (Oriental COCOSDA 2004), 8–13, Noida, New Delhi, 17–19 November 2004.

    Google Scholar 

  • Samarin, W.J. 1966. Field Linguistics. New York: Holt, Rinehart, and Winston.

    Google Scholar 

  • Singh, U.N. 2006. Proposal to Conduct the New Linguistic Survey of India. In Proceedings of the 28th All India Conference of Linguists (AICL-28), 22–117, 2–4 November 2006. Varanasi, India: Banaras Hindu University.

    Google Scholar 

Web Links

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niladri Sekhar Dash .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dash, N.S., Ramamoorthy, L. (2019). Corpus and Future Indian Needs. In: Utility and Application of Language Corpora . Springer, Singapore. https://doi.org/10.1007/978-981-13-1801-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1801-6_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1800-9

  • Online ISBN: 978-981-13-1801-6

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics