Abstract
In the previous chapters we have assumed that there was information present on each word, or at least on its underlying lemma. Most taggers rely on such lexical information, listing the allowable tags for each word, perhaps along with some probabilistic information such as the probability of the particular word and tag co-occurring. In practice, however, a wordclass tagger will also have to be able to deal with unknown words, for which such information is not available in the lexicon. Except in the case of a truly closed vocabulary system, such as a voice dictation system with a preset vocabulary, unknown words will always be a phenomenon that cannot be ignored. In figure 13.1, we show the number of unrecognized words in a test set from Wall Street Journal text as a function of the number of words in the training set, from which the lexicon has been built.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Brill, E. (1999). Tagging Unknown Words. In: van Halteren, H. (eds) Syntactic Wordclass Tagging. Text, Speech and Language Technology, vol 9. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9273-4_13
Download citation
DOI: https://doi.org/10.1007/978-94-015-9273-4_13
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5296-4
Online ISBN: 978-94-015-9273-4
eBook Packages: Springer Book Archive