Definition
Part-of-speech tagging (POS tagging) is a process in which each word in a text is assigned its appropriate morphosyntactic category (for example noun-singular, verb-past, adjective, pronoun-personal, and the like). It therefore provides information about both morphology (structure of words) and syntax (structure of sentences). This disambiguation process is determined both by constraints from the lexicon (what are the possible categories for a word?) and by constraints from the context in which the word occurs (which of the possible categories is the right one in this context?). For example, a word like table can be a noun-singular, but also a verb-present (as in I table this motion). This is lexical knowledge. It is the context of the word that should be used to decide which of the possible categories is the correct one. In a sentence like Put it on the table, the fact that table...
Recommended Reading
Brants, T. (2000). TnT – A statistical part-of-speech tagger. In Proceedings of the sixth applied natural language processing conference ANLP-2000. Seattle, WA.
Brill, E. (1995a). Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computional Linguistics, 21(4), 543–565.
Brill, E. (1995b). Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the third workshop on very large corpora (pp. 1–13). Ohio State University, Ohio.
Cussens, J. (1997). Part-of-speech tagging using progol. In N. Lavrac, & S. Dzeroski (Eds.), Proceedings of the seventh international workshop on inductive logic programming, Lecture Notes in Computer Science (Vol. 1297 pp. 93–108). London: Springer.
Daelemans, W., Zavrel, J., Berck, P., & Gillis, S. (1996). MBT: A memory-based part of speech tagger generator. In Proceedings of the fourth workshop on very large corpora (pp. 14–27). Copenhagen, Denmark
Garside, R., & Smith, N. (1997). A hybrid grammatical tagger: CLAWS4. In R. Garside, G. Leech, & A. McEnery (Eds.), Corpus annotation: Linguistic information from computer text corpora (pp. 102–121). London: Longman.
Jurafsky, D., & Martin, J. (2008). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Karlsson, F., Voutilainen, A., Heikkilä, J., & Anttila, A. (1995). Constraint grammar. A language-independent system for parsing unrestricted text (p. 430). Berlin and New York: Mouton de Gruyter.
Marcus, M., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Ratnaparkhi, A. (1996). A maximum entropy part of speech tagger. In Proceedings of the ACL-SIGDAT conference on empirical methods in natural language processing (pp. 17–18). Philadelphia, PA.
Schmid, H. (1994a). Part-of-speech tagging with neural networks. In Proceedings of COLING-94 (pp. 172–176). Kyoto, Japan.
Schmid, H. (1994b). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing (NeMLaP), (pp. 44–49). Manchester, UK.
Schutze, H. (1995). Distributional part-of-speech tagging. In Proceedings of EACL 7 (pp. 141–148). Dublin, Ireland.
Shen, L., Satta, G., & Joshi, A. (2007). Guided learning for bidirectional sequence classification. In Proceedings of the 45th annual meetings of the association of computational linguistics (ACL 2007) (pp. 760–767). Prague, Czech Republic.
Ushioda, A. (1996). Hierarchical clustering of words and applications to NLP tasks. In Proceedings of the fourth workshop on very large corpora (pp. 28–41). Somerset, NJ.
van Halteren, H. (Ed.). (1999). Syntactic wordclass tagging. Boston: Kluwer Academic Publishers.
van Halteren, H. Zavrel, J., & Daelemans, W. (2001) Improving accuracy in NLP through combination of machine learning systems. Computational Linguistics, 27(2), 199–229.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Daelemans, W. (2011). POS Tagging. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_643
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_643
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering