Encyclopedia of Systems Biology

2013 Edition
| Editors: Werner Dubitzky, Olaf Wolkenhauer, Kwang-Hyun Cho, Hiroki Yokota

Part-of-Speech Tagging

  • Sampo Pyysalo
Reference work entry
DOI: https://doi.org/10.1007/978-1-4419-9863-7_162



In  natural language processing, Part-of-Speech (POS) tagging refers to the process of assigning each word (or nonword token) in text with a tag identifying its part of speech, drawn from some fixed set of tags. A number of different tagsets are in use; one of the most frequently applied is the Penn Treebank tagset, which contains 36 POS tags and 12 punctuation and other tags (Marcus et al. 1993). POS tagging is often approached as a sequential labeling task addressed with machine learning methods such as Hidden Markov Models and Conditional Random Fields (Manning and Schütze 1999). For training accurate POS taggers for biomedical domain texts, domain corpora manually annotated for POS tags such as GENIA corpus are typically applied. The parts of speech of words provide useful information for a number of tasks ranging from  word sense disambiguation to  named entity recognition and  information extractionand POS tagging is a frequently applied...

This is a preview of subscription content, log in to check access.


  1. Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, CambridgeGoogle Scholar
  2. Marcus MP, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English: The Penn treebank. Comput Linguist 19(2):313–330Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TokyoTokyoJapan