Methods for Automatic WSD

Kwong, Oi Yee

doi:10.1007/978-1-4614-1320-2_2

Oi Yee Kwong²

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

603 Accesses

Abstract

Research in automatic word sense disambiguation has a long history on a par with computational linguistics itself. In this chapter, we take a two-dimensional approach to review the development and state of the art of the field, by the knowledge sources used for disambiguation on the one hand, and the algorithmic mechanisms with which the knowledge sources are actually deployed on the other. The trend for the latter is relatively clear, correlating closely with the historical development of many other natural language processing subtasks, where conventional knowledge-based methods gradually give way to scalable, corpus-based statistical and supervised methods. While the importance of multiple knowledge sources has been realised at the outset, their effective use in disambiguation systems has nevertheless been constrained by the notorious problem of “knowledge acquisition bottleneck” and is therefore very much dependent on the availability of suitable lexical resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The subject is also known by other names, e.g. word sense discrimination (McRoy 1992; Schütze 1998), lexical ambiguity resolution (Hirst 1987), automatic sense disambiguation (Lesk 1986), sense tagging (Wilks and Stevenson 1997), sense clustering (Chen and Chang 1998), word sense classification or supersense tagging (Ciaramita and Johnson 2003), word sense induction (Navigli and Crisafulli 2010), etc. Stevenson (2003) distinguishes sense disambiguation and sense tagging as different levels of WSD. However, as he also pointed out, the delimitation is not always clear-cut.
2.
Thus in the past, some only considered a reduced set of senses (e.g. Leacock et al. 1993; Bruce and Wiebe 1994; Leacock et al. 1998; Towell and Voorhees 1998), or even only two distinct senses (e.g. Brown et al. 1991; Gale et al. 1992a; Yarowsky 1995), usually because of data availability and the respective methods used. The recent general understanding is that homonymy is more reasonably and usefully handled by WSD systems, and fine-grained senses in existing resources should better be merged to give more distinct groups of senses (e.g. Ide and Wilks 2006; McCarthy 2006).
3.
In fact, some of the knowledge sources like selectional restrictions and subcategorisation frames may have a mutually beneficial relationship with WSD. On the one hand, these knowledge sources are useful for word sense disambiguation; and on the other hand, the ability to disambiguate word senses is also found to help the acquisition of selectional restriction and verb subcategorisation patterns (e.g. McCarthy 1997; Korhonen and Preiss 2003).
4.
See Landes et al. (1998) for the associated project on building semantic concordances.
5.
http://wordnet.princeton.edu/.
6.
Some of the methods discussed in this section make use of knowledge acquired from large corpora and therefore involve statistical techniques. Since the probabilistic models are applied to the acquisition of particular knowledge sources like selectional preferences and subcategorisation patterns, we group them under knowledge-based methods and distinguish them from other corpus-based WSD methods which are really trained on corpus examples with sense information. The latter will be considered supervised methods as discussed in the next section.
7.
Wilks and Stevenson (1996) remarked that a 12-word sentence could give rise to more than 10⁹ sense combinations to evaluate.
8.
The human judgement scores were from Miller and Charles (1991).
9.
Precision and recall are common performance measures in NLP. See Sect. 3.2.3 for their definitions.
10.
See Gale et al. (1992a) for the “one sense per discourse” property in WSD.
11.
Slightly different is Pedersen (2000), who used one type of information, namely co-occurring words, but combined evidence from various window sizes.
12.
http://ilk.uvt.nl/timbl/.
13.
Actually Stevenson’s (2003) system is more appropriately considered a hybrid system since most of the individual modules have a very prominent knowledge-based element while the supervised learning part serves to conveniently combine them. For example, the three partial taggers work on word overlaps with dictionary definitions using simulated annealing, broad context based on subject areas of words indicated by the pragmatic codes in LDOCE, and selectional restriction information found in LDOCE as expressed by semantic codes in the dictionary, respectively.
14.
SVM attempts to find a hyperplane with the largest margin that separates training examples into two classes. AdaBoost attempts to boost the performance of an ensemble of weak learners by giving more weights to misclassified training examples so that the classifier will concentrate on these hard examples.
15.
The estimation was based on 3,200 most frequent words in the Brown Corpus which cover 90% of all word occurrences, and each should have 1,000 instances tagged.
16.
Methods like this which use a few examples for bootstrapping are sometimes counted as weakly/lightly/minimally supervised approaches, in contrast to genuine unsupervised methods which take no tagged training examples at all.

Author information

Authors and Affiliations

Department of Chinese, Translation and Linguistics, City University of Hong Kong, Kowloon, Hong Kong, People’s Republic of China
Oi Yee Kwong

Authors

Oi Yee Kwong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oi Yee Kwong .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kwong, O.Y. (2013). Methods for Automatic WSD. In: New Perspectives on Computational and Cognitive Strategies for Word Sense Disambiguation. SpringerBriefs in Electrical and Computer Engineering(). Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1320-2_2

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1320-2_2
Published: 24 July 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1319-6
Online ISBN: 978-1-4614-1320-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics