Discovering Word Senses from Text Using Random Indexing

  • Niladri Chatterjee
  • Shiwali Mohan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4919)

Abstract

Random Indexing is a novel technique for dimensionality reduction while creating Word Space model from a given text. This paper explores the possible application of Random Indexing in discovering word senses from the text. The words appearing in the text are plotted onto a multi-dimensional Word Space using Random Indexing. The geometric distance between words is used as an indicative of their semantic similarity. Soft Clustering by Committee algorithm (CBC) has been used to constellate similar words. The present work shows that the Word Space model can be used effectively to determine the similarity index required for clustering. The approach does not require parsers, lexicons or any other resources which are traditionally used in sense disambiguation of words. The proposed approach has been applied to TASA corpus and encouraging results have been obtained.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ide, N., Veronis, J.: Word Disambiguation Ambiguation - State Of Art. Computational Linguistic (1998)Google Scholar
  2. 2.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. Technical Report #00-034. Department of Computer Science and Engineering, University of Minnesota (2000)Google Scholar
  3. 3.
    Cutting, D.R., et al.: Scatter/Gather: A cluster-based approach to browsing large document collections. In: Proceedings of SIGIR-1992, Copenhagen, Denmark (1992)Google Scholar
  4. 4.
    Pantel, P., Lin, D.: Discovering word senses from text. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Edmonton, Canada (2002)Google Scholar
  5. 5.
    Karypis, G., Han, E.H., Kumar, V.: Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer: Special Issue on Data Analysis and Mining (1999)Google Scholar
  6. 6.
    Miller, G.: WordNet: An online lexical database. International Journal of Lexicography (1990)Google Scholar
  7. 7.
    Pantel, P.: Clustering by Committee. Ph.D. dissertation. Department of Computing Science. University of Alberta (2003)Google Scholar
  8. 8.
    Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensionalvector spaces.Ph.D. dissertation. Department of Linguistics. Stockholm University (2006)Google Scholar
  9. 9.
    Sahlgren, M.: An Introduction to Random Indexing. In: Proceedings of the Methods and Applications of Semantic Indexing. Workshop at the 7th International Conference on Terminology and Knowledge Engineering. TKE, Copenhagen, Denmark (2005)Google Scholar
  10. 10.
    Landauer, T.K., Foltz, P.W., Laham, D.: An Introduction to Latent Semantic Analysis. In: 45th Annual Computer Personnel Research Conference – ACM (2004)Google Scholar
  11. 11.
    Kanerva, P.: Sparse distributed memory. MIT Press, Cambridge (1968)Google Scholar
  12. 12.
    Kaski, S.: Dimensionality reduction by random mapping - Fast similarity computation for clustering. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN 1998. IEEE Service Center (1998)Google Scholar
  13. 13.
    Porter, M.: An algorithm for suffix stripping. New models in probabilistic information retrieval. London (1980)Google Scholar
  14. 14.
  15. 15.

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Niladri Chatterjee
    • 1
  • Shiwali Mohan
    • 2
  1. 1.Department of MathematicsIndian Institute of Technology DelhiNew DelhiIndia
  2. 2.Yahoo! Research and Development IndiaBangaloreIndia

Personalised recommendations