Producing high-dimensional semantic spaces from lexical co-occurrence

  • Kevin Lund
  • Curt Burgess
Analysis Of Semantic And Clinical Data


A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word is presented. This procedure is applied to a large corpus of natural language text taken from Usenet, and the resulting vectors are examined to determine what information is contained within them. These vectors provide the coordinates in a high-dimensional space in which word relationships can be analyzed. Analyses of both vector similarity and multidimensional scaling demonstrate that there is significant semantic information carried in the vectors. A comparison of vector similarity with human reaction times in a single-word priming experiment is presented. These vectors provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).


Target Word Word Pair Semantic Space Semantic Distance Vector Similarity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Armstrong, S. (Ed.) (1994).Using large corpora. Cambridge, MA: MIT Press.Google Scholar
  2. Burgess, C., &Cottrell, G. (1995). Using high-dimensional semantic spaces derived from large text corpora. InProceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 13–14). Hillsdale, NJ: Erlbaum.Google Scholar
  3. Burgess, C., &Lund, K. (1994). Multiple constraints in syntactic ambiguity resolution: A connectionist account of psycholinguistic data. In A. Ram & K. Eiselt (Eds.),Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society (pp. 90–95). Hillsdale, NJ: Erlbaum.Google Scholar
  4. Burgess, C., &Lund, K. (1995a).High-dimensional semantics from corpora and human syntactic processing constraints. Paper presented at the 8th Annual CUNY Sentence Processing Conference, Tucson, AZ.Google Scholar
  5. Burgess, C., &Lund, K. (1995b, November).Hyperspace analogue to language (HAL): A general model of semantic representation. Paper presented at the annual meeting of the Psychonomic Society, Los Angeles.Google Scholar
  6. Burgess, C., &Lund, K. (in press). Modeling cerebral asymmetries of semantic memory using high-dimensional semantic space. In M. Beeman & C. Chiarello (Eds.),Getting it right: The cognitive neuroscience of right hemisphere language comprehension. Hillsdale, NJ: Erlbaum.Google Scholar
  7. Chiarello, C., Burgess, C., Richards, L., &Pollock, A. (1990). Semantic and associative priming in the cerebral hemispheres: Some words do, some words don’t … sometimes, some places.Brain & Language,38, 75–104.CrossRefGoogle Scholar
  8. Ervin-Tripp, S. M. (1970). Substitution, context, and association. In L. Postman & G. Keppel (Eds.),Norms of word association (pp. 383–467). New York: Academic Press.Google Scholar
  9. Fischler, I. (1977). Semantic facilitation without association in a lexical decision task.Memory & Cognition,5, 335–339.CrossRefGoogle Scholar
  10. Landauer, T. K., &Dumais, S. (1994, November).Memory model reads encyclopedia, passes vocabulary test. Paper presented at the annual meeting of the Psychonomic Society, St. Louis.Google Scholar
  11. Lund, K., &Burgess, C. (in press). A general model of semantic representation (abstract).Brain & Cognition.Google Scholar
  12. Lund, K., Burgess, C., &Atchley, R. A. (1995). Semantic and associative priming in high-dimensional semantic space. InProceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 660–665). Hillsdale, NJ: Erlbaum.Google Scholar
  13. McRae, K.,de Sa, V., &Seidenberg, M. S. (1993).The role of correlated properties in accessing conceptual memory. Unpublished manuscript.Google Scholar
  14. Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention.Journal of Experimental Psychology: General,106, 226–254.CrossRefGoogle Scholar
  15. Osgood, C. E., Suci, G. J., &Tannenbaum, P. H. (1957).The measurement of meaning. Urbana: University of Illinois Press.Google Scholar
  16. Schütze, H. (1992).Dimensions of meaning. Unpublished manuscript.Google Scholar
  17. Schvaneveldt, R. W. (1990).Pathfinder associative networks: Studies in knowledge organization. Norwood, NJ: Ablex.Google Scholar
  18. Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering.Science,210, 390–398.CrossRefPubMedGoogle Scholar
  19. Shepard, R. N., Romney, A. K., &Nerlove, S. B. (Eds.) (1972).Multidimension scaling: Theory and applications in the behavioral sciences. New York and London: Seminar Press.Google Scholar
  20. Spence, D. P.&Owens, K. C. (1990). Lexical co-occurrence and association strength.Journal of Psycholinguistic Research,19, 317–330.CrossRefGoogle Scholar
  21. Zernik, U. (Ed.) (1991).Lexical acquisition: Exploiting on-line resources to build a lexicon. Hillsdale, NJ: Erlbaum.Google Scholar

Copyright information

© Psychonomic Society, Inc. 1996

Authors and Affiliations

  1. 1.Psychology DepartmentUniversity of CaliforniaRiverside

Personalised recommendations