Part-of-speech tagging using Progol

Cussens, James

doi:10.1007/3540635149_38

James Cussens¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1297))

Included in the following conference series:

International Conference on Inductive Logic Programming

130 Accesses
32 Citations

Abstract

A system for ‘tagging’ words with their part-of-speech (POS) tags is constructed. The system has two components: a lexicon containing the set of possible POS tags for a given word, and rules which use a word's context to eliminate possible tags for a word. The Inductive Logic Programming (ILP) system Progol is used to induce these rules in the form of definite clauses. The final theory contained 885 clauses. For background knowledge, Progol uses a simple grammar, where the tags are terminals and predicates such as nounp (noun phrase) are non-terminals. Progol was altered to allow the caching of information about clauses generated during the induction process which greatly increased efficiency. The system achieved a per-word accuracy of 96.4% on known words drawn from sentences without quotation marks. This is on a par with other tagging systems induced from the same data [5, 2, 4] which all have accuracies in the range 96–97%. The per-sentence accuracy was 4 49.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Steven Abney. Part-of-speech tagging and partial parsing. In Ken Church, Steve Young, and Gerrit Bloothooft, editors, Corpus-Based Methods in Language and Speech. Kluwer, Dordrecht, 1996.
Google Scholar
Eric Brill. Some advances in transformation-based part of speech tagging. In AAA794, 1994.
Google Scholar
J. Cussens. Part-of-speech disambiguation using ILP. Technical Report PRG-TR-25-96, Oxford University Computing Laboratory, 1996.
Google Scholar
Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. A practical part-of-speech tagger. In Third Conference on Applied Natural Linguistic Processing (ANLP-92), pages 133–140, 1992.
Google Scholar
W. Daelemans, J. Zavrel, P. Berck, and S. Gillis. MBT: A memory-based part of speech tagger-generator. In Proceedings of the Fourth Workshop on Very Large Corpora,, pages 14–27, Copenhagen, 1996.
Google Scholar
S. Muggleton. Inverse entailment and Progol. New Generation Computing Journal, 13:245–286, 1995.
Google Scholar
Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–285, February 1989.
Google Scholar
Christer Samuelsson, Pasi Tapanainen, and Atro Voutilainen. Inducing constraint grammars. In Laurent Miclet and Colin de la Higuera, editors, Grammatical Inference: Learning Syntax from Sentences, volume 1147 of Lecture Notes in Artificial Intelligence, pages 146–155. Springer, 1996.
Google Scholar
Pasi Tapanainen and Atro Voutilainen. Tagging accurately — Don't guess if you know. In Proc. ANLP94, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Oxford University Computing Laboratory, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
James Cussens

Authors

James Cussens
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Nada Lavrač Sašo Džeroski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cussens, J. (1997). Part-of-speech tagging using Progol. In: Lavrač, N., Džeroski, S. (eds) Inductive Logic Programming. ILP 1997. Lecture Notes in Computer Science, vol 1297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3540635149_38

Download citation

DOI: https://doi.org/10.1007/3540635149_38
Published: 10 July 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63514-7
Online ISBN: 978-3-540-69587-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics