Skip to main content

Producing high-dimensional semantic spaces from lexical co-occurrence

Abstract

A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word is presented. This procedure is applied to a large corpus of natural language text taken from Usenet, and the resulting vectors are examined to determine what information is contained within them. These vectors provide the coordinates in a high-dimensional space in which word relationships can be analyzed. Analyses of both vector similarity and multidimensional scaling demonstrate that there is significant semantic information carried in the vectors. A comparison of vector similarity with human reaction times in a single-word priming experiment is presented. These vectors provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).

References

  • Armstrong, S. (Ed.) (1994).Using large corpora. Cambridge, MA: MIT Press.

    Google Scholar 

  • Burgess, C., &Cottrell, G. (1995). Using high-dimensional semantic spaces derived from large text corpora. InProceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 13–14). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Burgess, C., &Lund, K. (1994). Multiple constraints in syntactic ambiguity resolution: A connectionist account of psycholinguistic data. In A. Ram & K. Eiselt (Eds.),Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society (pp. 90–95). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Burgess, C., &Lund, K. (1995a).High-dimensional semantics from corpora and human syntactic processing constraints. Paper presented at the 8th Annual CUNY Sentence Processing Conference, Tucson, AZ.

  • Burgess, C., &Lund, K. (1995b, November).Hyperspace analogue to language (HAL): A general model of semantic representation. Paper presented at the annual meeting of the Psychonomic Society, Los Angeles.

  • Burgess, C., &Lund, K. (in press). Modeling cerebral asymmetries of semantic memory using high-dimensional semantic space. In M. Beeman & C. Chiarello (Eds.),Getting it right: The cognitive neuroscience of right hemisphere language comprehension. Hillsdale, NJ: Erlbaum.

  • Chiarello, C., Burgess, C., Richards, L., &Pollock, A. (1990). Semantic and associative priming in the cerebral hemispheres: Some words do, some words don’t … sometimes, some places.Brain & Language,38, 75–104.

    Article  Google Scholar 

  • Ervin-Tripp, S. M. (1970). Substitution, context, and association. In L. Postman & G. Keppel (Eds.),Norms of word association (pp. 383–467). New York: Academic Press.

    Google Scholar 

  • Fischler, I. (1977). Semantic facilitation without association in a lexical decision task.Memory & Cognition,5, 335–339.

    Article  Google Scholar 

  • Landauer, T. K., &Dumais, S. (1994, November).Memory model reads encyclopedia, passes vocabulary test. Paper presented at the annual meeting of the Psychonomic Society, St. Louis.

  • Lund, K., &Burgess, C. (in press). A general model of semantic representation (abstract).Brain & Cognition.

  • Lund, K., Burgess, C., &Atchley, R. A. (1995). Semantic and associative priming in high-dimensional semantic space. InProceedings of the Seventeenth Annual Conference of the Cognitive Science Society (pp. 660–665). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • McRae, K.,de Sa, V., &Seidenberg, M. S. (1993).The role of correlated properties in accessing conceptual memory. Unpublished manuscript.

  • Neely, J. H. (1977). Semantic priming and retrieval from lexical memory: Roles of inhibitionless spreading activation and limited-capacity attention.Journal of Experimental Psychology: General,106, 226–254.

    Article  Google Scholar 

  • Osgood, C. E., Suci, G. J., &Tannenbaum, P. H. (1957).The measurement of meaning. Urbana: University of Illinois Press.

    Google Scholar 

  • Schütze, H. (1992).Dimensions of meaning. Unpublished manuscript.

  • Schvaneveldt, R. W. (1990).Pathfinder associative networks: Studies in knowledge organization. Norwood, NJ: Ablex.

    Google Scholar 

  • Shepard, R. N. (1980). Multidimensional scaling, tree-fitting, and clustering.Science,210, 390–398.

    Article  PubMed  Google Scholar 

  • Shepard, R. N., Romney, A. K., &Nerlove, S. B. (Eds.) (1972).Multidimension scaling: Theory and applications in the behavioral sciences. New York and London: Seminar Press.

    Google Scholar 

  • Spence, D. P.&Owens, K. C. (1990). Lexical co-occurrence and association strength.Journal of Psycholinguistic Research,19, 317–330.

    Article  Google Scholar 

  • Zernik, U. (Ed.) (1991).Lexical acquisition: Exploiting on-line resources to build a lexicon. Hillsdale, NJ: Erlbaum.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kevin Lund or Curt Burgess.

Additional information

This research was supported by an NSF Presidential Faculty Fellow award (SBR-9453406) to C.B.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lund, K., Burgess, C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers 28, 203–208 (1996). https://doi.org/10.3758/BF03204766

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.3758/BF03204766

Keywords

  • Target Word
  • Word Pair
  • Semantic Space
  • Semantic Distance
  • Vector Similarity