Advertisement

Filaments of Meaning in Word Space

  • Jussi Karlgren
  • Anders Holst
  • Magnus Sahlgren
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)

Abstract

Word space models, in the sense of vector space models built on distributional data taken from texts, are used to model semantic relations between words. We argue that the high dimensionality of typical vector space models lead to unintuitive effects on modeling likeness of meaning and that the local structure of word spaces is where interesting semantic relations reside. We show that the local structure of word spaces has substantially different dimensionality and character than the global space and that this structure shows potential to be exploited for further semantic analysis using methods for local analysis of vector space structure rather than globally scoped methods typically in use today such as singular value decomposition or principal component analysis.

Keywords

Latent Semantic Analysis Vector Space Model Retrieval Practice Left Tail Random Indexing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)zbMATHCrossRefGoogle Scholar
  2. 2.
    Dubin, D.: The most influential paper Gerard Salton never wrote. Library Trends 52(4), 748–764 (2004)Google Scholar
  3. 3.
    Schütze, H.: Word space. In: Proceedings of the 1993 Conference on Advances in Neural Information Processing Systems, NIPS 1993, pp. 895–902. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  4. 4.
    Chávez, E., Navarro, G.: Measuring the dimensionality of general metric spaces. Technical Report TR/DCC-2000-1, Department of Computer Science, University of Chile (2000)Google Scholar
  5. 5.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  6. 6.
    Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, p. 1036. Erlbaum, Mahwah (2000)Google Scholar
  7. 7.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When Is Nearest Neighbor Meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Landauer, T., Foltz, P., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  9. 9.
    Sahlgren, M.: An introduction to random indexing. In: Witschel, H. (ed.) Methods and Applications of Semantic Indexing Workshop at the 7th International Conference on Terminology and Knowledge Engineering. TermNet News: Newsletter of International Cooperation in Terminology, vol. 87 (2005)Google Scholar
  10. 10.
    Johnson, W., Lindenstrauss, J.: Extensions of lipshitz mapping into hilbert space. Contemporary Mathematics 26, 189–206 (1984)zbMATHMathSciNetGoogle Scholar
  11. 11.
    Sahlgren, M.: The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Department of linguistics, Stockholm university (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jussi Karlgren
    • 1
  • Anders Holst
    • 1
  • Magnus Sahlgren
    • 1
  1. 1.Swedish Institute of Computer Science 

Personalised recommendations