Advertisement

Behavior Research Methods

, Volume 42, Issue 2, pp 393–413 | Cite as

Exploring lexical co-occurrence space using HiDEx

  • Cyrus ShaoulEmail author
  • Chris Westbury
Articles From the SCiP Conference

Abstract

Hyperspace analog to language (HAL) is a high-dimensional model of semantic space that uses the global co-occurrence frequency of words in a large corpus of text as the basis for a representation of semantic memory. In the original HAL model, many parameters were set without any a priori rationale. We have created and publicly released a computer application, the High Dimensional Explorer (HiDEx), that makes it possible to systematically alter the values of these parameters to examine their effect on the co-occurrence matrix that instantiates the model. We took an empirical approach to understanding the influence of the parameters on the measures produced by the models, looking at how well matrices derived with different parameters could predict human reaction times in lexical decision and semantic decision tasks. New parameter sets give us measures of semantic density that improve the model’s ability to predict behavioral measures. Implications for such models are discussed.

Keywords

Window Size Target Word Lexical Decision Weighting Scheme Word Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.CrossRefGoogle Scholar
  2. Baayen, R. H. (2001). Word frequency distributions. Boston: Kluwer.Google Scholar
  3. Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX Lexical Database (Release 2) [CD-ROM]. Philadelphia: University of Pennsylvania, Linguistic Data Consortium.Google Scholar
  4. Baddeley, A. (2003). Working memory and language: An overview. Journal of Communication Disorders, 36, 189–208.CrossRefPubMedGoogle Scholar
  5. Balota, D. A., Black, S. R., & Cheney, M. (1992). Automatic and attentional priming in young and older adults: Reevaluation of the two-process model. Journal of Experimental Psychology: Human Perception & Performance, 18, 485–502.CrossRefGoogle Scholar
  6. ]Balota, D. A., Cortese, M. J., Hutchison, K. A., Neely, J. H., Nelson, D., Simpson, G. B., & Treiman, R. (2002). The English Lexicon Project: A Web-based repository of descriptive and behavioral measures for 40,481 English words and nonwords. Retrieved October 5, 2005, from http://elexicon.wustl.edu/.Google Scholar
  7. Binder, J. R., Westbury, C. F., McKiernan, K. A., Possing, E. T., & Medler, D. A. (2005). Distinct brain systems for processing concrete and abstract concepts. Journal of Cognitive Neuroscience, 17, 905–917.CrossRefPubMedGoogle Scholar
  8. Brants, T., & Franz, A. (2006). Web 1T 5-Gram Corpus (Version 1). Philadelphia: University of Pennsylvania, Linguistic Data Consortium.Google Scholar
  9. Buchanan, L., Burgess, C., & Lund, K. (1996). Overcrowding in semantic neighborhoods: Modeling deep dyslexia. Brain & Cognition, 32, 111–114.Google Scholar
  10. Buchanan, L., Westbury, C., & Burgess, C. (2001). Characterizing semantic space: Neighborhood effects in word recognition. Psychonomic Bulletin & Review, 8, 531–544.CrossRefGoogle Scholar
  11. Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39, 510–526.CrossRefPubMedGoogle Scholar
  12. Burgess, C. (1998). From simple associations to the building blocks of language: Modeling meaning in memory with the HAL model. Behavior Research Methods, Instruments, & Computers, 30, 188–198.CrossRefGoogle Scholar
  13. Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis. Behavior Research Methods, Instruments, & Computers, 30, 272–277.CrossRefGoogle Scholar
  14. Burgess, C., Livesay, K., & Lund, K. (1998). Explorations in context space: Words, sentences, discourse. Discourse Processes, 25, 211–257.CrossRefGoogle Scholar
  15. Burgess, C., & Lund, K. (1997). Modelling parsing constraints with high-dimensional context space. Language & Cognitive Processes, 12, 177–210.CrossRefGoogle Scholar
  16. Burgess, C., & Lund, K. (2000). The dynamics of meaning in memory. In E. Dietrich & A. B. Markman (Eds.), Cognitive dynamics: Conceptual and representational change in humans and machines (pp. 117–156). Mahwah, NJ: Erlbaum.Google Scholar
  17. Chapman, B., Jost, G., van der Pas, R., & Kuck, D. (2007). Using OpenMP: Portable shared memory parallel programming. Cambridge, MA: MIT Press.Google Scholar
  18. Cree, G. S., McNorgan, C., & McRae, K. (2006). Distinctive features hold a privileged status in the computation of word meaning: Implications for theories of semantic memory. Journal of Experimental Psychology: Learning, Memory, & Cognition, 32, 643–658.CrossRefGoogle Scholar
  19. Durda, K., & Buchanan, L. (2008). WINDSORS: Windsor improved norms of distance and similarity of representations of semantics. Behavior Research Methods, 40, 705–712.CrossRefPubMedGoogle Scholar
  20. Durda, K., Buchanan, L., & Caron, R. (2009). Grounding co-occurrence: Identifying features in a lexical co-occurrence model of semantic memory. Behavior Research Methods, 41, 1210–1223.CrossRefPubMedGoogle Scholar
  21. Fristrup, J. A. (1994). USENET: Netnews for everyone. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  22. Hollis, G., Westbury, C. F., & Peterson, J. B. (2006). NUANCE 3.0: Using genetic programming to model variable relationships. Behavior Research Methods, 38, 218–228.CrossRefPubMedGoogle Scholar
  23. Jones, M. N., Kintsch, W., & Mewhort, D. J. K. (2006). Highdimensional semantic space accounts of priming. Journal of Memory & Language, 55, 534–552.CrossRefGoogle Scholar
  24. Jones, M. N., & Mewhort, D. J. K. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review, 114, 1–37.CrossRefPubMedGoogle Scholar
  25. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.CrossRefGoogle Scholar
  26. Lifchitz, A., Jhean-Larose, S., & Denhière, G. (2009). Effect of tuned parameters on an LSA multiple choice questions answering model. Behavior Research Methods, 41, 1201–1209.CrossRefPubMedGoogle Scholar
  27. Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instrumentation, & Computers, 28, 203–208.CrossRefGoogle Scholar
  28. Mirman, D., & Magnuson, J. S. (2008). Attractor dynamics and semantic neighborhood density: Processing is slowed by near neighbors and speeded by distant neighbors. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 65–79.CrossRefGoogle Scholar
  29. Moss, H. E., & Tyler, L. K. (1995). Investigating semantic memory impairments: The contribution of semantic priming. Memory, 3, 359–395.CrossRefPubMedGoogle Scholar
  30. Murdock, B. B. (1982). A theory for the storage and retrieval of item and associative information. Psychological Review, 89, 609–626.CrossRefGoogle Scholar
  31. ]Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms. Available from www.usf.edu/FreeAssociation/.Google Scholar
  32. Pexman, P. M., Hino, Y., & Lupker, S. J. (2004). Semantic ambiguity and the process of generating meaning from print. Journal of Experimental Psychology: Learning, Memory, & Cognition, 30, 1252–1270.CrossRefGoogle Scholar
  33. Recchia, G., & Jones, M. N. (2009). More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis. Behavior Research Methods, 41, 647–656.CrossRefPubMedGoogle Scholar
  34. ]Rohde, D. L. T., Gonnerman, L. M., & Plaut, D. C. (2005). An improved method model of semantic similarity based on lexical co-occurrence. Unpublished manuscript. Retrieved April 20, 2007, from http://tedlab.mit.edu/~dr/.Google Scholar
  35. Russell, B. (1910). The study of mathematics. In Philosophical essays. London: Longmans, Green.Google Scholar
  36. ]Shaoul, C., & Westbury, C. (2006a). USENET orthographic frequencies for the 40,481 words in the English lexicon project [Data file]. Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab/downloads.html.Google Scholar
  37. Shaoul, C., & Westbury, C. (2006b). Word frequency effects in highdimensional co-occurrence models: A new approach. Behavior Research Methods, 38, 190–195.CrossRefPubMedGoogle Scholar
  38. ]Shaoul, C., & Westbury, C. (2008). HiDEx: High Dimensional Explorer [Software]. Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab/downloads.usenetcorpus.html.Google Scholar
  39. ]Shaoul, C., & Westbury, C. (2009). A USENET corpus (2005–2009). Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab/downloads.usenetcorpus.html.Google Scholar
  40. Siakaluk, P. D., Buchanan, L., & Westbury, C. (2003). The effect of semantic distance in yes/no and go/no-go semantic categorization tasks. Memory & Cognition, 31, 100–113.CrossRefGoogle Scholar
  41. ]Song, D., & Bruza, P. (2001, September 10). Discovering information flow using a high dimensional conceptual space. Paper presented at the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans.Google Scholar
  42. Song, D., Bruza, P., & Cole, R. (2004, July 30). Concept learning and information inferencing on a high-dimensional semantic space. Paper presented at the ACM SIGIR 2004 Workshop on Mathematical/ Formal Methods in Information Retrieval, Sheffield, U.K.Google Scholar
  43. Song, D., Bruza, P., Huang, Z., & Lau, R. K. (2003). Classifying document titles based on information inference. In J. G. Carbonell & J. Siekmann (Eds.), Foundations of intelligent systems (pp. 297–306). Berlin: Springer.CrossRefGoogle Scholar
  44. ]Stallman, R. (2009). GNU General Public License. Available from www.fsf.org/licensing/.Google Scholar
  45. ]Westbury, C. (2007). ACTUATE: Assessing Cases, The University of Alberta Testing Environment. Available from the University of Alberta Web site: www.psych.ualberta.ca/~westburylab.Google Scholar
  46. Yates, M., Locker, L., Jr., & Simpson, G. B. (2003). Semantic and phonological influences on the processing of words and pseudohomophones. Memory & Cognition, 31, 856–866.CrossRefGoogle Scholar
  47. Zipf, G. K. (1935). The psycho-biology of language: An introduction to dynamic philology. Boston: Houghton Mifflin.Google Scholar
  48. Zipf, G. K. (1949). Human behavior and the principle of least effort. New York: Addison-Wesley.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2010

Authors and Affiliations

  1. 1.Department of PsychologyUniversity of AlbertaEdmontonCanada

Personalised recommendations