Producing high-dimensional semantic spaces from lexical co-occurrence

Analysis Of Semantic And Clinical Data


A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word is presented. This procedure is applied to a large corpus of natural language text taken from Usenet, and the resulting vectors are examined to determine what information is contained within them. These vectors provide the coordinates in a high-dimensional space in which word relationships can be analyzed. Analyses of both vector similarity and multidimensional scaling demonstrate that there is significant semantic information carried in the vectors. A comparison of vector similarity with human reaction times in a single-word priming experiment is presented. These vectors provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).


  1. Armstrong, S. (Ed.) (1994).Using large corpora. Cambridge, MA: MIT Press.Google Scholar
  2. Burgess, C., &Cottrell, G. (1995). Using high-dimensional semantic spaces derived from large text corpora. InProceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 13–14). Hillsdale, NJ: Erlbaum.Google Scholar
  3. Burgess, C., &Lund, K. (1994). Multiple constraints in syntactic ambiguity resolution: A connectionist account of psycholinguistic data. In A. Ram & K. Eiselt (Eds.),Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society (pp. 90–95). Hillsdale, NJ: Erlbaum.Google Scholar
  4. Burgess, C., &Lund, K. (1995a).High-dimensional semantics from corpora and human syntactic processing constraints. Paper presented at the 8th Annual CUNY Sentence Processing Conference, Tucson, AZ.Google Scholar
  5. Burgess, C., &Lund, K. (1995b, November).Hyperspace analogue to language (HAL): A general model of semantic representation. Paper presented at the annual meeting of the Psychonomic Society, Los Angeles.Google Scholar
  6. Burgess, C., &Lund, K. (in press). Modeling cerebral asymmetries of semantic memory using high-dimensional semantic space. In M. Beeman & C. Chiarello (Eds.),Getting it right: The cognitive neuroscience of right hemisphere language comprehension. Hillsdale, NJ: Erlbaum.Google Scholar
  7. Chiarello, C., Burgess, C., Richards, L., &Pollock, A. (1990). Semantic and associative priming in the cerebral hemispheres: Some words do, some words don’t … sometimes, some places.Brain & Language,38, 75–104.CrossRefGoogle Scholar
  8. Ervin-Tripp, S. M. (1970). Substitution, context, and association. In L. Postman & G. Keppel (Eds.),Norms of word association (pp. 383–467). New York: Academic Press.Google Scholar
  9. Fischler, I. (1977). Semantic facilitation without association in a lexical decision task.Memory & Cognition,5, 335–339.CrossRefGoogle Scholar
  10. Landauer, T. K., &Dumais, S. (1994, November).Memory model reads encyclopedia, passes vocabulary test. Paper presented at the annual meeting of the Psychonomic Society, St. Louis.Google Scholar
  11. Lund, K., &Burgess, C. (in press). A general model of semantic representation (abstract).Brain & Cognition.Google Scholar
  12. Lund, K., Burgess, C., &Atchley, R. A. (1995). Semantic and associative priming in high-dimensional semantic space. InProceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 660–665). Hillsdale, NJ: Erlbaum.Google Scholar
  13. McRae, K.,de Sa, V., &Seidenberg, M. S. (1993).The role of correlated properties in accessing conceptual memory. Unpublished manuscript.Google Scholar
  14. Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention.Journal of Experimental Psychology: General,106, 226–254.CrossRefGoogle Scholar
  15. Osgood, C. E., Suci, G. J., &Tannenbaum, P. H. (1957).The measurement of meaning. Urbana: University of Illinois Press.Google Scholar
  16. Schütze, H. (1992).Dimensions of meaning. Unpublished manuscript.Google Scholar
  17. Schvaneveldt, R. W. (1990).Pathfinder associative networks: Studies in knowledge organization. Norwood, NJ: Ablex.Google Scholar
  18. Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering.Science,210, 390–398.CrossRefPubMedGoogle Scholar
  19. Shepard, R. N., Romney, A. K., &Nerlove, S. B. (Eds.) (1972).Multidimension scaling: Theory and applications in the behavioral sciences. New York and London: Seminar Press.Google Scholar
  20. Spence, D. P.&Owens, K. C. (1990). Lexical co-occurrence and association strength.Journal of Psycholinguistic Research,19, 317–330.CrossRefGoogle Scholar
  21. Zernik, U. (Ed.) (1991).Lexical acquisition: Exploiting on-line resources to build a lexicon. Hillsdale, NJ: Erlbaum.Google Scholar

Copyright information

© Psychonomic Society, Inc. 1996

Authors and Affiliations

  1. 1.Psychology DepartmentUniversity of CaliforniaRiverside

Personalised recommendations