Skip to main content
  • Book
  • © 2021

Language Corpora Annotation and Processing

  • Is informative and based on actual language data

  • Shares valuable insights on the linguistic challenges involved in computation

  • Addresses issues involved in language processing, corpus linguistics, language technology, and machine learning

Buying options

eBook USD 149.00
Price excludes VAT (USA)
  • ISBN: 978-981-16-2960-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book USD 199.99
Price excludes VAT (USA)
Hardcover Book USD 199.99
Price excludes VAT (USA)

This is a preview of subscription content, access via your institution.

Table of contents (10 chapters)

  1. Front Matter

    Pages i-xxx
  2. Corpus Text Annotation

    • Niladri Sekhar Dash
    Pages 1-23
  3. Principles and Rules of Part-of-Speech Annotation

    • Niladri Sekhar Dash
    Pages 25-43
  4. Part-of-Speech Annotation

    • Niladri Sekhar Dash
    Pages 45-70
  5. Extratextual Annotation

    • Niladri Sekhar Dash
    Pages 71-90
  6. Etymological Annotation

    • Niladri Sekhar Dash
    Pages 91-114
  7. More Types of Corpus Annotation

    • Niladri Sekhar Dash
    Pages 115-138
  8. Morphological Processing of Words

    • Niladri Sekhar Dash
    Pages 139-164
  9. Lemmatization of Inflected Nouns

    • Niladri Sekhar Dash
    Pages 165-194
  10. Decomposition of Inflected Verbs

    • Niladri Sekhar Dash
    Pages 195-219
  11. Syntactic Annotation

    • Niladri Sekhar Dash
    Pages 221-249
  12. Back Matter

    Pages 251-272

About this book

This book addresses the research, analysis, and description of the methods and processes that are used in the annotation and processing of language corpora in advanced, semi-advanced, and non-advanced languages. It provides the background information and empirical data needed to understand the nature and depth of problems related to corpus annotation and text processing and shows readers how the linguistic elements found in texts are analyzed and applied to develop language technology systems and devices. As such, it offers valuable insights for researchers, educators, and students of linguistics and language technology.


Keywords

  • Corpora Annotation
  • Lexical Collocation
  • Morphological Processing
  • Sentence Parsing
  • Language Processing
  • Computational Linguistics

Authors and Affiliations

  • Linguistic Research Unit, Indian Statistical Institute, Kolkata, India

    Niladri Sekhar Dash

About the author

Dr. Niladri Sekhar Dash is Professor and Head, Linguistic Research Unit, Indian Statistical Institute, Kolkata (The Institute of National Importance, Govt. of India). For the last 28 years, he is working in corpus linguistics, language technology, computational lexicography, computer-assisted language teaching, language documentation, translation, clinical linguistics, and digital ethnography. To his credit, he has published 18 research monographs and more than 285 research papers in indexed and peer-reviewed research journals, anthologies, and conference proceedings. As an invited speaker, he has delivered lectures at more than 50 universities and institutes in India and abroad. He acts as a Research Advisor for several multinational organizations that work on language technology, artificial intelligence, lexicography, digital humanities, and language resource development. He acts as Principal Investigator for several LangTech projects funded by the Govt. of India and corporate houses. He is the Chief Editor of the Journal of Advanced Linguistic Studies―a reviewed international journal of linguistics. He is an Editorial Board Member for several international journals. He is also a member of several linguistic associations across the world. He is a British Academy International Visiting Fellow (2018), Visiting Research Fellow of School of Psychology & Clinical Language Sciences, University of Reading, UK (2018-2021), and Visiting Scholar of Language and Brain Laboratory, University of Oxford, UK (2019). At present, he is heading 5 projects: (a) ‘Upgradation of Bengali WordNet’ funded by the Ministry of Statistics and Programme Implementation (MoSPI), Govt. of India; (b) ‘Sound Imitative Words in Bengali” in collaboration with the Dept. of British and American Studies, Faculty of Arts, P.J. Šafárik University, Slovakia; (c) ‘Bilingual Dementia of Patients with Broca’s Aphasia’ in collaboration with the School of Psychology and Clinical Language Sciences, University of Reading, UK; (d) ‘Public Announcement System at Airports and Railway Stations in Indian Sign Language with Animation’ in a consortium-mode project headed by the Dept. of Computer Science, Punjabi University, Patiala, India, and (e) ‘Dictionary for Sabar Speech Community’ – an endangered tribe of West Bengal, India.

Bibliographic Information

  • Book Title: Language Corpora Annotation and Processing

  • Authors: Niladri Sekhar Dash

  • DOI: https://doi.org/10.1007/978-981-16-2960-0

  • Publisher: Springer Singapore

  • eBook Packages: Education, Education (R0)

  • Copyright Information: The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021

  • Hardcover ISBN: 978-981-16-2959-4

  • Softcover ISBN: 978-981-16-2962-4

  • eBook ISBN: 978-981-16-2960-0

  • Edition Number: 1

  • Number of Pages: XXX, 272

  • Number of Illustrations: 45 b/w illustrations, 2 illustrations in colour

  • Topics: Computational Linguistics, Research Methods in Language and Linguistics

Buying options

eBook USD 149.00
Price excludes VAT (USA)
  • ISBN: 978-981-16-2960-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book USD 199.99
Price excludes VAT (USA)
Hardcover Book USD 199.99
Price excludes VAT (USA)