Skip to main content

Automatic Discovery of Similar Words

  • Chapter

Abstract

We deal with the issue of automatic discovery of similar words (synonyms and near-synonyms) from different kinds of sources: from large corpora of documents, from the Web, and from monolingual dictionaries. We present in detail three algorithms that extract similar words from a large corpus of documents and consider the specific case of the World Wide Web. We then describe a recent method of automatic synonym extraction in a monolingual dictionary. The method is based on an algorithm that computes similarity measures between vertices in graphs. We use the 1913 Websters Dictionary and apply the method on four synonym queries. The results obtained are analyzed and compared with those obtained by two other methods.

Keywords

  • Similar Word
  • Neighborhood Graph
  • Automatic Discovery
  • Cosine Similarity Measure
  • Term Frequency Inverse Document Frequency

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4757-4305-0_2
  • Chapter length: 19 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   119.00
Price excludes VAT (USA)
  • ISBN: 978-1-4757-4305-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   159.00
Price excludes VAT (USA)
Hardcover Book
USD   159.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Brin and L. Page.The anatomy of a large-scale hypertextual Web search engine.Computer Networks and ISDN Systems, 30 (1–7): 107–117, 1998.

    CrossRef  Google Scholar 

  2. V.D. Blondel and P.P. Senellart.Automatic extraction of synonyms in a dictionary.Technical Report 89, Université catholique de Louvain, Louvain-la-neuve, Belgium, 2001. Presented at the Text Mining Workshop 2002 in Arlington, VA.

    Google Scholar 

  3. V.D. Blondel and P. Van Dooren.A measure of graph similarity between graph vertices.Technical Report, Université catholique de Louvain, Louvain-la-neuve, Belgium, 2002.

    Google Scholar 

  4. H. Chen and K.J. Lynch.Automatic construction of networks of concepts characterizing document databases.IEEE Transactions on Systems, Man and Cybernetics, 22 (5): 885–902, 1992.

    CrossRef  Google Scholar 

  5. C.J. Crouch.An approach to the automatic construction of global thesauri.Information Processing and Management, 26: 629–640, 1990.

    CrossRef  Google Scholar 

  6. J. Dean and M.R. Henzinger.Finding related pages in the World Wide Web. WWW8/Computer Networks, 31 (11–16): 1467–1479, 1999.

    MathSciNet  CrossRef  Google Scholar 

  7. G. Grefenstette.Automatic thesaurus generation from raw text using knowledge-poor techniques.In Making Sense of Words. Ninth Annual Conference of the UW Centre for the New OED and Text Research. 9, 1993.

    Google Scholar 

  8. G. Grefenstette.Explorations in Automatic Thesaurus Discovery.Kluwer Academic, Boston, 1994.

    Google Scholar 

  9. M. Heymans.Extraction d’information dans les graphes, et application aux moteurs de recherche sur interne, Jun 2001. Université Catholique de Louvain, Faculté des Sciences Appliquées, Département d’Ingénierie Mathématique.

    Google Scholar 

  10. JW99] J. Jannink and G. Wiederhold.Thesaurus entry extraction from an on-line dictionary.In Proceedings of Fusion ‘89,Sunnyvale, CA, Jul 1999.

    Google Scholar 

  11. J.M. Kleinberg.Authoritative sources in a hyperlinked environment. Journal of the ACM, 46 (5): 604–632, 1999.

    MathSciNet  CrossRef  Google Scholar 

  12. The online plain text english dictionary, 2000.http: //msowww. anu. edu. au/ralph/OPTED/.

    Google Scholar 

  13. P. P. Senellart.Extraction of information in large graphs. Automatic search for synonyms.Technical Report 90, Université catholique de Louvain, Louvain-laneuve, Belgium, 2001.

    Google Scholar 

  14. G. Salton, C.S. Yang, and C.T. Yu.A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 26 (1): 33–44, 1975.

    CrossRef  Google Scholar 

  15. P. D. Turney.Mining the Web for synonyms: PMI-IR versus LSA on TOEFL.In Proceedings of the European Conference on Machine Learning, pages 491–502, 2001.

    Google Scholar 

  16. Wordnet 1.6, 1998.http://www.cogsci.princeton.edu/~wn/.

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this chapter

Cite this chapter

Senellart, P.P., Blondel, V.D. (2004). Automatic Discovery of Similar Words. In: Berry, M.W. (eds) Survey of Text Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4757-4305-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-4305-0_2

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-3057-6

  • Online ISBN: 978-1-4757-4305-0

  • eBook Packages: Springer Book Archive