Skip to main content

The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis

Abstract

Word frequency is one of the strongest determiners of reaction time (RT) in word recognition tasks; it is an important theoretical and methodological variable. The Kučera and Francis (1967) word frequency count (derived from the 1-million-word Brown corpus) is used by most investigators concerned with the issue of word frequency. Word frequency estimates from the Brown corpus were compared with those from a 131-million-word corpus (the HAL corpus; conversational text gathered from Usenet) in a standard word naming task with 32 subjects. RT was predicted equally well by both corpora for high-frequency words, but the larger corpus provided better predictors for low- and medium-frequency words. Furthermore, the larger corpus provides estimates for 97,261 lexical items; the smaller corpus, for 50,406 items.

References

  1. Breland, H. M. (1996). Word frequency and word difficulty: A comparison of counts on four corpora.Psychological Science,7, 96–99.

    Article  Google Scholar 

  2. Buchanan, L., Burgess, C., &Lund, K. (1996). Overcrowding in semantic neighborhoods: Modeling deep dyslexia.Brain & Cognition,32, 111–114.

    Google Scholar 

  3. Burgess, C. (1998).From simple associations to the building blocks of language: Modeling meaning in memory with the HAL model. Manuscript submitted for publication.

  4. Burgess, C., &Hollbach, S. C. (1988). A computational model of syntactic ambiguity as a lexical process. InProceedings of the Tenth Annual Cognitive Science Society Meeting (pp. 263–269). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  5. Burgess, C., &Lund, K. (1994). Multiple constraints in syntactic ambiguity resolution: A connectionist account of psycholinguistic data. InProceedings of the Cognitive Science Society (pp. 90–95). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  6. Burgess, C., &Lund, K. (1997a). Modeling cerebral asymmetries of semantic memory using high-dimensional semantic space. In M. Beeman & C. Chiarello (Eds.),Getting it right: The cognitive neuroscience of right hemisphere language comprehension (pp. 215–244). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  7. Burgess, C., &Lund, K. (1997b). Modeling parsing constraints with high-dimensional context space.Language & Cognitive Processes,12, 177–210.

    Article  Google Scholar 

  8. Burgess, C., &Lund, K. (1997c). Representing abstract words and emotional connotation in high-dimensional memory space. InProceedings of the Cognitive Science Society (pp. 61–66). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  9. Burgess, C., Lund, K., &Kromsky, A. (1997). Examining issues in developmental psycholinguistics with a high-dimensional memory model.Abstracts of the Psychonomic Society,2, 66.

    Google Scholar 

  10. Burgess, C., Tanenhaus, M. K., &Hoffman, M. (1994). Parafoveal and semantic effects on syntactic ambiguity resolution. InProceedings of the Cognitive Science Society (pp. 96–99). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  11. Cattell, J. M. (1886). The time it takes to see and name objects.Mind,11, 63–65.

    Article  Google Scholar 

  12. Chiarello, C. (1988). Lateralization of lexical processes in the brain: A review of visual half-field research. In H. A. Whitaker (Ed.),Contemporary reviews in neuropsychology (pp. 36–76). New York: Springer-Verlag.

    Google Scholar 

  13. Clark, S. E., &Burchett, R. E. R. (1994). Word frequency and list composition effects in associative recognition and recall.Memory & Cognition,22, 55–62.

    Google Scholar 

  14. Dobbs, A. R., Friedman, A., &Lloyd, J. (1985). Frequency effects in lexical decisions: A test of the verification model.Journal of Experimental Psychology: Human Perception & Performance,11, 81–92.

    Article  Google Scholar 

  15. Dupuy, H. J. (1974). The rationale, development, and standardization of a basic word vocabulary test.Vital & Health Statistics,2, 71.

    Google Scholar 

  16. Forster, K. I. (1976). Accessing the mental lexicon. In R. J. Wales & E. C. T. Walker (Eds.),New approaches to language mechanisms (pp. 257–287). Amsterdam: North-Holland.

    Google Scholar 

  17. Francis, W. N., &Kučera, H. (1982).Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin.

    Google Scholar 

  18. Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy.Journal of Experimental Psychology: General,113, 256–281.

    Article  Google Scholar 

  19. Graf, P., &Williams, D. (1987). Completion norms for 40 three-letter word stems.Behavior Research Methods, Instruments, & Computers,19, 422–445.

    Google Scholar 

  20. Grainger, J., O’Regan, J. K., Jacobs, A. M., &Segui, J. (1989). On the role of competing word units in visual word recognition: The neighborhood frequency effect.Perception & Psychophysics,45, 189–195.

    Google Scholar 

  21. Hyönä, J., &Olson, R. K. (1995). Eye fixation patterns among dyslexic and normal readers: Effects of word length and word frequency.Journal of Experimental Psychology: Learning, Memory, & Cognition,21, 1430–1440.

    Article  Google Scholar 

  22. Jurado, M. A., Junque, C., Pujol, J., Oliver, B., &Vendrell, P. (1997). Impaired estimation of word occurrence frequency in frontal lobe patients.Neuropsychologia,35, 635–641.

    Article  Google Scholar 

  23. Kučera, H., &Francis, W. N. (1967).Computational analysis of presentday American English. Providence, RI: Brown University Press.

    Google Scholar 

  24. Livesay, K., &Burgess, C. (1997). Mediated priming: A representational and empirical account using the HAL model. InProceedings of the Cognitive Science Society (pp. 436–441). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  25. Lovelace, E. A. (1988). On using norms for low-frequency words.Bulletin of the Psychonomic Society,26, 410–412.

    Google Scholar 

  26. Lund, K., &Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence.Behavior Research Methods, Instruments, & Computers,28, 203–208.

    Google Scholar 

  27. Lund, K., &Burgess, C. (1997, December).Recurrent neural networks and global co-occurrence models: Developing contextual representations of word meaning. Paper presented at the NIPS*97 (Neural Information Processing Systems) Neural Models of Concept Learning postconference workshop, Breckenridge, CO.

  28. Lund, K., Burgess, C., &Atchley, R. A. (1995). Semantic and associative priming in high-dimensional semantic space. InProceedings of the Cognitive Science Society (pp. 660–665). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  29. Lund, K., Burgess, C., &Audet, C. (1996). Dissociating semantic and associative word relationships using high-dimensional semantic space. InProceedings of the Cognitive Science Society (pp. 603–608). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  30. MacDonald, M. C. (1994). Probabilistic constraints and syntactic ambiguity resolution.Language & Cognitive Processes,9, 157–201.

    Article  Google Scholar 

  31. MacDonald, M. C., Pearlmutter, N. J., &Seidenberg, M. S. (1994). The lexical nature of syntactic ambiguity resolution.Psychological Review,101, 676–703.

    Article  PubMed  Google Scholar 

  32. McClelland, J. L., &Rumelhart, D. E. (1985). Distributed memory and the representation of general and specific information.Journal of Experimental Psychology: General,114, 159–188.

    Article  Google Scholar 

  33. Monsell, S. (1991). The nature and locus of word frequency effects in reading. In D. Besner & G. W. Humphreys (Eds.),Basic processes in reading: Visual word recognition (pp. 148–197). Hillsdale, NJ: Erlbaum.

    Google Scholar 

  34. Morton, J. (1969). Interaction of information in word recognition.Psychological Review,76, 165–178.

    Article  Google Scholar 

  35. Plaut, D. C. (1996). Relearning after damage in connectionist networks: Toward a theory of rehabilitation.Brain & Language,52, 25–82.

    Article  Google Scholar 

  36. Rudell, A. P. (1993). Frequency of word usage and perceived word difficulty: Ratings of Kucera and Francis words.Behavior Research Methods, Instruments, & Computers,25, 455–463.

    Google Scholar 

  37. Schwanenflugel, P., &Shoben, E. (1985). The influence of sentence constraints on the scope of facilitation for upcoming words.Journal of Memory & Language,24, 232–252.

    Article  Google Scholar 

  38. Sears, C. R., Hino, Y., &Lupker, S. J. (1995). Neighborhood size and neighborhood frequency effects in word recognition.Journal of Experimental Psychology: Human Perception & Performance,21, 876–900.

    Article  Google Scholar 

  39. Smith, E. E., Shoben, E. J., &Rips, L. J. (1974). Structure and process in semantic memory: A featural model for semantic decisions.Psychological Review,81, 214–241.

    Article  Google Scholar 

  40. Tanenhaus, M. K., &Carlson, G. N. (1989). Lexical structure and language comprehension. In W. Marslen-Wilson (Ed.),Lexical representation and process (pp. 529–561). Cambridge, MA: MIT Press.

    Google Scholar 

  41. Thorndike, E. L., &Lorge, I. (1944).The teacher’s word book of 30,000 words. New York: Columbia University, Teachers College Press.

    Google Scholar 

  42. Troia, G. A., Roth, F. P., &Yeni-Komshian, G. H. (1996). Word frequency and age effects in normally developing children’s phonological processing.Journal of Speech & Hearing Research,39, 1099–1108.

    Google Scholar 

  43. Trueswell, J. C., Tanenhaus, M. K., &Garnsey, S. M. (1994). Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution.Journal of Memory & Language,33, 285–318.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Curt Burgess.

Additional information

This research was supported by NSF Presidential Faculty Fellow award SBR- 9453406 to C.B. We thank Kevin Lund, Catherine Decker, Sarah Ransdell, and an anonymous reviewer for their helpful comments, and Maureen Keeney for her tabulation of the information in Table 1.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Burgess, C., Livesay, K. The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis. Behavior Research Methods, Instruments, & Computers 30, 272–277 (1998). https://doi.org/10.3758/BF03200655

Download citation

Keywords

  • Word Frequency
  • Cognitive Science Society
  • Corpus Size
  • Word Fragment Completion
  • Brown Corpus