Skip to main content

Development of Odia Language Corpus from Modern News Paper Texts: Some Problems and Issues

Part of the Advances in Intelligent Systems and Computing book series (AISC,volume 309)

Abstract

In this paper, we have tried to describe the details about the strategies and methods we have adapted to design and develop a digital Odia corpus of newspaper texts. We have also attempted to identify the scopes of its utilization in different domains of Odia language technology and applied linguistics. The corpus is developed with sample news reports produced and published by some major Odia newspapers published from Bhubaneswar and neighboring places. We have followed several issues relating to text corpus design, development, and management, such as size of the corpus with regard to number of sentences and words, coverage of domains and sub-domains of news texts, text representation, question of nativity, determination of target users, selection of time span, selection of texts, amount of sample for each text types, method of data sampling, manner of data input, corpus sanitation, corpus file management, and problem of copyright. The digital corpus is basically in machine readable format, so that the text becomes easy to process very quickly. We presume that the corpus we have developed will come to a great help to look into the present texture of the language as well as to retrieve various linguistic data and information required for writing a modern grammar for Odia with close reference to its empirical identity, usage, and status. The electronic Odia corpus that we have generated can also be used in various fields of research and development activities for Odia.

Keywords

  • Corpus
  • Odia
  • Newspaper
  • Sentence
  • Word
  • Text representation
  • Time span
  • File management
  • Copyright

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-81-322-2009-1_58
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   219.00
Price excludes VAT (USA)
  • ISBN: 978-81-322-2009-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   279.99
Price excludes VAT (USA)

References

  1. Dash, N.S.: Corpus Linguistics and Language Technology: with Reference to Indian Languages. Mittal Publications, New Delhi (2005)

    Google Scholar 

  2. Hofland, K.: Concordance programs for personal computers. In: Johansson, S., Stenström, A.-B. (eds.) English Computer Corpora: Selected Papers and Research Guide, pp. 283–306. Mouton de Gruyter, Berlin (1991)

    Google Scholar 

  3. Dash, N.S.: Techniques of text corpus processing. In: Mohanty, P., Reinhard, K. (eds.) Readings in Quantitative Linguistics, pp. 81–115. Indian Institute of Language Studies, New Delhi (2008)

    Google Scholar 

  4. Dash, N.S.: Corpus Linguistics: an Introduction. Person Education-Longman, New Delhi (2008)

    Google Scholar 

  5. Hunston, S.: Corpora in Applied Linguistics. Cambridge University Press, Cambridge (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bishwa Ranjan Das .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer India

About this paper

Cite this paper

Das, B.R., Patnaik, S., Dash, N.S. (2015). Development of Odia Language Corpus from Modern News Paper Texts: Some Problems and Issues. In: Jain, L., Patnaik, S., Ichalkaranje, N. (eds) Intelligent Computing, Communication and Devices. Advances in Intelligent Systems and Computing, vol 309. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2009-1_58

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2009-1_58

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2008-4

  • Online ISBN: 978-81-322-2009-1

  • eBook Packages: EngineeringEngineering (R0)