Skip to main content
  • Book
  • © 2017

Handbook of Linguistic Annotation

  • Leading scientists guide the reader through the process of modeling a phenomenon, creating an annotation language, building a corpus, and evaluating it for correctness

  • Offers a thorough treatment of the science of annotation with clearly defined methodology

  • Aimed at and accessible for both computer scientists and linguistic researchers

  • Includes supplementary material: sn.pub/extras

Buying options

eBook USD 389.00
Price excludes VAT (USA)
  • ISBN: 978-94-024-0881-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book USD 499.99
Price excludes VAT (USA)
Hardcover Book USD 499.99
Price excludes VAT (USA)

This is a preview of subscription content, access via your institution.

Table of contents (55 chapters)

  1. Front Matter

    Pages i-ix
  2. The Science of Annotation

    1. Front Matter

      Pages 19-19
    2. Designing Annotation Schemes: From Theory to Model

      • James Pustejovsky, Harry Bunt, Annie Zaenen
      Pages 21-72
    3. Designing Annotation Schemes: From Model to Representation

      • Nancy Ide, Christian Chiarcos, Manfred Stede, Steve Cassidy
      Pages 73-111
    4. Community Standards for Linguistically-Annotated Resources

      • Nancy Ide, Nicoletta Calzolari, Judith Eckle-Kohler, Dafydd Gibbon, Sebastian Hellmann, Kiyong Lee et al.
      Pages 113-165
    5. Overview of Annotation Creation: Processes and Tools

      • Mark A. Finlayson, Tomaž Erjavec
      Pages 167-191
    6. The Evolution of Text Annotation Frameworks

      • Graham Wilcock
      Pages 193-207
    7. Tools for Multimodal Annotation

      • Steve Cassidy, Thomas Schmidt
      Pages 209-227
    8. Collaborative Web-Based Tools for Multi-layer Text Annotation

      • Chris Biemann, Kalina Bontcheva, Richard Eckart de Castilho, Iryna Gurevych, Seid Muhie Yimam
      Pages 229-256
    9. Iterative Enhancement

      • Markus Dickinson, Dan Tufiş
      Pages 257-276
    10. Crowdsourcing

      • Massimo Poesio, Jon Chamberlain, Udo Kruschwitz
      Pages 277-295
    11. Inter-annotator Agreement

      • Ron Artstein
      Pages 297-313
    12. Machine Learning for Higher-Level Linguistic Tasks

      • Anna Rumshisky, Amber Stubbs
      Pages 333-351
    13. Sustainable Development and Refinement of Complex Linguistic Annotations at Scale

      • Dan Flickinger, Stephan Oepen, Emily M. Bender
      Pages 353-377
    14. Linguistic Annotation in/for Corpus Linguistics

      • Stefan Th. Gries, Andrea L. Berez
      Pages 379-409
    15. Developing Linguistic Theories Using Annotated Corpora

      • Marie-Catherine de Marneffe, Christopher Potts
      Pages 411-438
  3. Case Studies

    1. Front Matter

      Pages 439-439
    2. MULTEXT-East

      • Tomaž Erjavec
      Pages 441-462

About this book

This handbook offers a thorough treatment of the science of linguistic annotation. Leaders in the field guide the reader through the process of modeling, creating an annotation language, building a corpus and evaluating it for correctness. Essential reading for both computer scientists and linguistic researchers.
Linguistic annotation is an increasingly important activity in the field of computational linguistics because of its critical role in the development of language models for natural language processing applications. Part one of this book covers all phases of the linguistic annotation process, from annotation scheme design and choice of representation format through both the manual and automatic annotation process, evaluation, and iterative improvement of annotation accuracy.  The second part of the book includes case studies of annotation projects across the spectrum of linguistic annotation types, including morpho-syntactic tagging, syntactic analyses, a range of semantic analyses (semantic roles, named entities, sentiment and opinion), time and event and spatial analyses, and discourse level analyses including discourse structure, co-reference, etc. Each case study addresses the various phases and processes discussed in the chapters of part one.

Keywords

  • Corpus linguistics
  • Evaluating annotations
  • Intergration of annotations
  • Language models for natural language processing applications
  • Linguistic annotation
  • morphosyntactic tagging

Reviews

“In this context, this book is an important effort towards giving linguistic annotation full attention. … Indeed, this handbook will give you all you need to conceive your annotation scheme and assess its quality … . this book undoubtedly finds its place in every linguistics department library as a major reference on linguistic annotation.” (Emmanuel Schang, The Linguist List, linguistlist.org, August, 2018)

“Handbook of Linguistic Annotation is worth reading in that this volume presents a spate of annotation projects … . This book includes a detailed introduction to a wealth of linguistic annotated resources and is worthy of recommendation for researchers of Quantitative Linguistics because these resources can either be used as direct sources for future quantitative studies or offer various choices on the annotation patterns.” (Peng Bi, Journal of Quantitative Linguistics, January, 2018)

Editors and Affiliations

  • Department of Computer Science, Vassar College, Poughkeepsie, USA

    Nancy Ide

  • Department of Computer Science, Volen Center for Complex Systems, Brandeis University, Waltham, USA

    James Pustejovsky

About the editors

Nancy Ide is Professor of Computer Science at Vassar College in Poughkeepsie, New York, USA. She has been in the field of computational linguistics for over 30 years and made significant contributions to research in word sense disambiguation, computational lexicography, discourse analysis, and the use of semantic web technologies for language data. She is founder of the Text Encoding Initiative (TEI), the first major standard for representing electronic language data, and later developed the XML Corpus Encoding Standard (XCES). More recently, she co-developed the ISO LAF/GrAF representation format for linguistically annotated data. She has also developed major corpora for American English, including the Open American National Corpus (OANC) and the Manually Annotated Sub-Corpus (MASC), and has been a pioneer in efforts to foster open data and resources. Professor Ide is Co-Editor-in-Chief of the journal Language Resources and Evaluation and Editor of the Springer book series Text, Speech, and Language Technology.
 
James Pustejovsky is the TJX Feldberg professor of computer science at Brandeis University in Waltham, Massachusetts, United States. His expertise includes theoretical and computational modeling of language, specifically: Computational linguistics, Lexical semantics, Knowledge representation, temporal and spatial reasoning and Extraction. His main topics of research are Natural language processing generally, and in particular, the computational analysis of linguistic meaning. He proposed Generative Lexicon theory in lexical semantics. His other interests include temporal reasoning, event semantics, spatial language, language annotation, computational linguistics, and machine learning.
  

Bibliographic Information

Buying options

eBook USD 389.00
Price excludes VAT (USA)
  • ISBN: 978-94-024-0881-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book USD 499.99
Price excludes VAT (USA)
Hardcover Book USD 499.99
Price excludes VAT (USA)