© 2014

Computational Methods for Corpus Annotation and Analysis


Table of contents

  1. Front Matter
    Pages i-xi
  2. Xiaofei Lu
    Pages 1-8
  3. Xiaofei Lu
    Pages 39-65
  4. Xiaofei Lu
    Pages 67-93
  5. Xiaofei Lu
    Pages 95-114
  6. Xiaofei Lu
    Pages 115-145
  7. Xiaofei Lu
    Pages 175-184
  8. Back Matter
    Pages 185-186

About this book


In the past few decades the use of increasingly large text corpora has grown rapidly in language and linguistics research. This was enabled by remarkable strides in natural language processing (NLP) technology, technology that enables computers to automatically and efficiently process, annotate and analyze large amounts of spoken and written text in linguistically and/or pragmatically meaningful ways. It has become more desirable than ever before for language and linguistics researchers who use corpora in their research to gain an adequate understanding of the relevant NLP technology to take full advantage of its capabilities.

This volume provides language and linguistics researchers with an accessible introduction to the state-of-the-art NLP technology that facilitates automatic annotation and analysis of large text corpora at both shallow and deep linguistic levels. The book covers a wide range of computational tools for lexical, syntactic, semantic, pragmatic and discourse analysis, together with detailed instructions on how to obtain, install and use each tool in different operating systems and platforms. The book illustrates how NLP technology has been applied in recent corpus-based language studies and suggests effective ways to better integrate such technology in future corpus linguistics research.

This book provides language and linguistics researchers with a valuable reference for corpus annotation and analysis.



Analysis of large text corpora Collin's parser Corpus annotation Creating, editing text files with UTF-8 encoding Lexical annotation Natural language processing NLP Phrase structure grammars Task decompositon and pipes Text processing with the command line interface The UCREL semantic analysis system

Authors and affiliations

  1. 1.Department of Applied LinguisticsThe Pennsylvania State UniversityUniversity ParkUSA

Bibliographic information


From the book reviews:

“‘Computational Methods for Corpus Annotation and Analysis’ is an excellent book for corpus linguists who are interested in using advanced corpus queries. It presents the latest computational tools for corpus annotation and analysis in a very accessible manner. The advice and resources in the book are also very practical and useful. It is highly recommended to researchers and students of corpus linguistics.” (Phoebe M. S. Lin, The Linguist List, March, 2015)