Skip to main content

Tagging Unknown Words

  • Chapter
Syntactic Wordclass Tagging

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 9))

  • 258 Accesses

Abstract

In the previous chapters we have assumed that there was information present on each word, or at least on its underlying lemma. Most taggers rely on such lexical information, listing the allowable tags for each word, perhaps along with some probabilistic information such as the probability of the particular word and tag co-occurring. In practice, however, a wordclass tagger will also have to be able to deal with unknown words, for which such information is not available in the lexicon. Except in the case of a truly closed vocabulary system, such as a voice dictation system with a preset vocabulary, unknown words will always be a phenomenon that cannot be ignored. In figure 13.1, we show the number of unrecognized words in a test set from Wall Street Journal text as a function of the number of words in the training set, from which the lexicon has been built.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Brill, E. (1999). Tagging Unknown Words. In: van Halteren, H. (eds) Syntactic Wordclass Tagging. Text, Speech and Language Technology, vol 9. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-9273-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-94-015-9273-4_13

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5296-4

  • Online ISBN: 978-94-015-9273-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics