Automatic Discovery of SimilarWords
- 1.9k Downloads
The purpose of this chapter is to review some methods used for automatic extraction of similar words from different kinds of sources: large corpora of documents, the World Wide Web, and monolingual dictionaries. The underlying goal of these methods is in general the automatic discovery of synonyms. This goal, however, is most of the time too difficult to achieve since it is often hard to distinguish in an automatic way among synonyms, antonyms, and, more generally, words that are semantically close to each others. Most methods provide words that are “similar” to each other, with some vague notion of semantic similarity. We mainly describe two kinds of methods: techniques that, upon input of a word, automatically compile a list of good synonyms or near-synonyms, and techniques that generate a thesaurus (from some source, they build a complete lexicon of related words). They differ because in the latter case, a complete thesaurus is generated at the same time while there may not be an entry in the thesaurus for each word in the source. Nevertheless, the purposes of both sorts of techniques are very similar and we shall therefore not distinguish much between them.
KeywordsVector Space Model Similar Word Neighborhood Graph Automatic Discovery Principal Eigenvector
- J.R. Curran and M. Moens. Improvements in automatic thesaurus extraction. In Proc. ACL SIGLEX, Philadelphia, July 2002.Google Scholar
- J.R. Curran. Ensemble methods for automatic thesaurus extraction. In Proc. Conference on Empirical Methods in Natural Language Processing, Philadelphia, July 2002.Google Scholar
- J. Dean and M.R. Henzinger. Finding related pages in the world wide web. In Proc. WWW, Toronto, Canada, May 1999.Google Scholar
- T.G. Dietterich. Ensemble methods in machine learning. In Proc. MCS, Cagliari, Italy, June 2000.Google Scholar
- J. Jannink and G. Wiederhold. Thesaurus entry extraction from an on-line dictionary. In Proc. FUSION, Sunnyvale, CA, July 1999.Google Scholar
- D. Lin. Automatic retrieval and clustering of similar words. In Proc. COLING, Montreal, Canada, August 1998.Google Scholar
- The online plain text English dictionary. http://msowww.anu.edu.au/∼ralph/OPTED/.
- Y. Ollivier and P. Senellart. Finding related pages using Green measures: An illustration with Wikipedia. In Proc. AAAI, Vancouver, Canada, July 2007.Google Scholar
- F. Pereira, N. Tishby, and L. Lee. Distributional clustering of english words. In Proc. ACL, Columbus, OH, June 1993.Google Scholar
- P. Senellart. Extraction of information in large graphs. Automatic search for synonyms. Technical Report 90, Universit é catholique de Louvain, Louvain-la-neuve, Belgium, 2001.Google Scholar
- P.D. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In Proc. ECML, Freiburg, Germany, September 2001.Google Scholar
- Wikipedia. The free encyclopedia. http://en.wikipedia.org/.
- WordNet 1.6. http://wordnet.princeton.edu/.
- H. Wu and M. Zhou. Optimizing synonym extraction using monolingual and bilingual resources. In Proc. International Workshop on Paraphrasing, Sapporo, Japan, July 2003.Google Scholar