Tagging Unknown Words

Brill, Eric

doi:10.1007/978-94-015-9273-4_13

Eric Brill

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 9))

258 Accesses

Abstract

In the previous chapters we have assumed that there was information present on each word, or at least on its underlying lemma. Most taggers rely on such lexical information, listing the allowable tags for each word, perhaps along with some probabilistic information such as the probability of the particular word and tag co-occurring. In practice, however, a wordclass tagger will also have to be able to deal with unknown words, for which such information is not available in the lexicon. Except in the case of a truly closed vocabulary system, such as a voice dictation system with a preset vocabulary, unknown words will always be a phenomenon that cannot be ignored. In figure 13.1, we show the number of unrecognized words in a test set from Wall Street Journal text as a function of the number of words in the training set, from which the lexicon has been built.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Authors

Eric Brill
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Nijmegen, The Netherlands
Hans van Halteren

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brill, E. (1999). Tagging Unknown Words. In: van Halteren, H. (eds) Syntactic Wordclass Tagging. Text, Speech and Language Technology, vol 9. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9273-4_13

Download citation

DOI: https://doi.org/10.1007/978-94-015-9273-4_13
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5296-4
Online ISBN: 978-94-015-9273-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics