The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis

Abstract

Word frequency is one of the strongest determiners of reaction time (RT) in word recognition tasks; it is an important theoretical and methodological variable. The Kučera and Francis (1967) word frequency count (derived from the 1-million-word Brown corpus) is used by most investigators concerned with the issue of word frequency. Word frequency estimates from the Brown corpus were compared with those from a 131-million-word corpus (the HAL corpus; conversational text gathered from Usenet) in a standard word naming task with 32 subjects. RT was predicted equally well by both corpora for high-frequency words, but the larger corpus provided better predictors for low- and medium-frequency words. Furthermore, the larger corpus provides estimates for 97,261 lexical items; the smaller corpus, for 50,406 items.

This research was supported by NSF Presidential Faculty Fellow award SBR- 9453406 to C.B. We thank Kevin Lund, Catherine Decker, Sarah Ransdell, and an anonymous reviewer for their helpful comments, and Maureen Keeney for her tabulation of the information in Table 1.