Behavior Research Methods

, Volume 41, Issue 4, pp 977–990 | Cite as

Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English

Article

Abstract

Word frequency is the most important variable in research on word processing and memory. Yet, the main criterion for selecting word frequency norms has been the availability of the measure, rather than its quality. As a result, much research is still based on the old Kučera and Francis frequency norms. By using the lexical decision times of recently published megastudies, we show how bad this measure is and what must be done to improve it. In particular, we investigated the size of the corpus, the language register on which the corpus is based, and the definition of the frequency measure. We observed that corpus size is of practical importance for small sizes (depending on the frequency of the word), but not for sizes above 16–30 million words. As for the language register, we found that frequencies based on television and film subtitles are better than frequencies based on written sources, certainly for the monosyllabic and bisyllabic words used in psycholinguistic research. Finally, we found that lemma frequencies are not superior to word form frequencies in English and that a measure of contextual diversity is better than a measure based on raw frequency of occurrence. Part of the superiority of the latter is due to the words that are frequently used as names. Assembling a new frequency norm on the basis of these considerations turned out to predict word processing times much better than did the existing norms (including Kučera & Francis and Celex). The new SUBTL frequency norms from the SUBTLEXUS corpus are freely available for research purposes from http://brm.psychonomic-journals.org/content/supplemental, as well as from the University of Ghent and Lexique Web sites.

Supplementary material

Brysbaert-BRM-2009.zip (7.9 mb)
Supplementary material, approximately 340 KB.

References

  1. Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17, 814–823.PubMedCrossRefGoogle Scholar
  2. Baayen, R. H., Dijkstra, T., & Schreuder, R. (1997). Singulars and plurals in Dutch: Evidence for a parallel dual-route model. Journal of Memory & Language, 37, 94–117.CrossRefGoogle Scholar
  3. Baayen, R. H., Feldman, L. B., & Schreuder, R. (2006). Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory & Language, 55, 290–313.CrossRefGoogle Scholar
  4. Baayen, R. H., Piepenbrock, R., & van Rijn, H. (1993). The CELEX Lexical Database [CD-ROM]. Philadelphia: Linguistic Data Consortium.Google Scholar
  5. Baayen, R. H., Wurm, L. H., & Aycock, J. (2007). Lexical dynamics for low-frequency complex words: A regression study across tasks and modalities. Mental Lexicon, 2, 419–436.Google Scholar
  6. Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human Perception & Performance, 10, 340–357.CrossRefGoogle Scholar
  7. Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133, 283–316.CrossRefGoogle Scholar
  8. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., et al. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459.PubMedCrossRefGoogle Scholar
  9. Blair, I. V., Urland, G. R., & Ma, J. E. (2002). Using Internet search engines to estimate word frequency. Behavior Research Methods, Instruments, & Computers, 34, 286–290.CrossRefGoogle Scholar
  10. Brysbaert, M., Drieghe, D., & Vitu, F. (2005). Word skipping: Implications for theories of eye movement control in reading. In G. Underwood (Ed.), Cognitive processes in eye guidance (pp. 53–77). Oxford: Oxford University Press.Google Scholar
  11. Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis. Behavior Research Methods, Instruments, & Computers, 30, 272–277.CrossRefGoogle Scholar
  12. Caramazza, A., Laudanna, A., & Romani, C. (1988). Lexical access and inflectional morphology. Cognition, 28, 297–332.PubMedCrossRefGoogle Scholar
  13. Clahsen, H. (1999). Lexical entries and rules of language: A multidisciplinary study of German inflection. Behavioral & Brain Sciences, 22, 991–1060.Google Scholar
  14. Cortese, M. J., & Khanna, M. M. (2007). Age of acquisition predicts naming and lexical-decision performance above and beyond 22 other predictor variables: An analysis of 2,342 words. Quarterly Journal of Experimental Psychology, 60, 1072–1082.CrossRefGoogle Scholar
  15. Drieghe, D., Pollatsek, A., Staub, A., & Rayner, K. (2008). The word grouping hypothesis and eye movements during reading. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 1552–1560.CrossRefGoogle Scholar
  16. Duyck, W., Desmet, T., Verbeke, L. P. C., & Brysbaert, M. (2004). WordGen: A tool for word selection and nonword generation in Dutch, English, German, and French. Behavior Research Methods, Instruments, & Computers, 36, 488–499.CrossRefGoogle Scholar
  17. Francis, W., & Kučera, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Boston: Houghton Mifflin.Google Scholar
  18. Gaskell, M. G., & Dumay, N. (2003). Lexical competition and the acquisition of novel words. Cognition, 89, 105–132.PubMedCrossRefGoogle Scholar
  19. Ghyselinck, M., Lewis, M. B., & Brysbaert, M. (2004). Age of acquisition and the cumulative-frequency hypothesis: A review of the literature and a new multi-task investigation. Acta Psychologica, 115, 43–67.PubMedCrossRefGoogle Scholar
  20. Glanzer, M., & Adams, J. K. (1985). The mirror effect in recognition memory. Memory & Cognition, 13, 8–20.CrossRefGoogle Scholar
  21. Glanzer, M., & Bowles, N. (1976). Analysis of the word-frequency effect in recognition memory. Journal of Experimental Psychology: Human Learning & Memory, 2, 21–31.CrossRefGoogle Scholar
  22. Hauk, O., & Pulvermüller, F. (2004). Effects of word length and frequency on the human event-related potential. Clinical Neurophysiology, 115, 1090–1103.PubMedCrossRefGoogle Scholar
  23. Hockley, W. E. (2008). The effects of environmental context on recognition memory and claims of remembering. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 1412–1429.CrossRefGoogle Scholar
  24. Howes, D. H., & Solomon, R. L. (1951). Visual duration threshold as a function of word-probability. Journal of Experimental Psychology, 41, 401–410.PubMedCrossRefGoogle Scholar
  25. Huber, D. E., Clark, T. F., Curran, T., & Winkielman, P. (2008). Effects of repetition priming on recognition memory: Testing a perceptual fluency-disfluency model. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 1305–1324.CrossRefGoogle Scholar
  26. Jescheniak, J. D., & Levelt, W. J. M. (1994). Word frequency effects in speech production: Retrieval of syntactic information and of phonological form. Journal of Experimental Psychology: Learning, Memory, & Cognition, 20, 824–843.CrossRefGoogle Scholar
  27. Johnston, R. A., & Barry, C. (2006). Age of acquisition and lexical processing. Visual Cognition, 13, 789–845.CrossRefGoogle Scholar
  28. Juhasz, B. J. (2005). Age-of-acquisition effects in word and picture identification. Psychological Bulletin, 131, 684–712.PubMedCrossRefGoogle Scholar
  29. Klepousniotou, E., Titone, D., & Romero, C. (2008). Making sense of word senses: The comprehension of polysemy depends on sense overlap. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 1534–1543.CrossRefGoogle Scholar
  30. Kučera, H., & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.Google Scholar
  31. Leech, G., Rayson, P., & Wilson, A. (2001). Word frequencies in written and spoken English: Based on the British National Corpus. London: Longman.Google Scholar
  32. Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28, 203–208.CrossRefGoogle Scholar
  33. McDonough, I. M., & Gallo, D. A. (2008). Autobiographical elaboration reduces memory distortion: Cognitive operations and the distinctiveness heuristic. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 1430–1445.CrossRefGoogle Scholar
  34. McKay, A., Davis, C., Savage, G., & Castles, A. (2008). Semantic involvement in reading aloud: Evidence from a nonword training study. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 1495–1517.CrossRefGoogle Scholar
  35. Monsell, S., Doyle, M. C., & Haggard, P. N. (1989). Effects of frequency on visual word recognition tasks: Where are they? Journal of Experimental Psychology: General, 118, 43–71.CrossRefGoogle Scholar
  36. New, B., Brysbaert, M., Segui, J., Ferrand, L., & Rastle, K. (2004). The processing of singular and plural nouns in French and English. Journal of Memory & Language, 51, 568–585.CrossRefGoogle Scholar
  37. New, B., Brysbaert, M., Veronis, J., & Pallier, C. (2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics, 28, 661–677.CrossRefGoogle Scholar
  38. New, B., Ferrand, L., Pallier, C., & Brysbaert, M. (2006). Reexamining the word length effect in visual word recognition: New evidence from the English Lexicon Project. Psychonomic Bulletin & Review, 13, 45–52.CrossRefGoogle Scholar
  39. New, B., Pallier, C., Brysbaert, M., & Ferrand, L. (2004). Lexique 2: A new French lexical database. Behavior Research Methods, Instruments, & Computers, 36, 516–524.CrossRefGoogle Scholar
  40. O’Malley, S., & Besner, D. (2008). Reading aloud: Qualitative differences in the relation between stimulus quality and word frequency as a function of context. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 1400–1411.CrossRefGoogle Scholar
  41. Pastizzo, M. J., & Carbone, R. F., Jr. (2007). Spoken word frequency counts based on 1.6 million words in American English. Behavior Research Methods, 39, 1025–1028.PubMedCrossRefGoogle Scholar
  42. Peirce, C. S. (1877). The fixation of belief. Popular Science Monthly, 12, 1–15.Google Scholar
  43. Rastle, K., Davis, M. H., & New, B. (2004). The broth in my brother’s brothel: Morpho-orthographic segmentation in visual word recognition. Psychonomic Bulletin & Review, 11, 1090–1098.CrossRefGoogle Scholar
  44. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372–422.PubMedCrossRefGoogle Scholar
  45. Rayner, K., & Duffy, S. A. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition, 14, 191–201.CrossRefGoogle Scholar
  46. Seidenberg, M. S., & Waters, G. S. (1989). Reading words aloud: A mega study [Abstract]. Bulletin of the Psychonomic Society, 27, 489.Google Scholar
  47. Shaoul, C., & Westbury, C. (2008). A USENET corpus (2005–2008). Edmonton: University of Alberta. Retrieved on 10/9/2008 from www .psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html.Google Scholar
  48. Szpunar, K. K., McDermott, K. B., & Roediger, H. L., III (2008). Testing during study insulates against the buildup of proactive interference. Journal of Experimental Psychology: Learning, Memory, & Cognition, 34, 1392–1399.CrossRefGoogle Scholar
  49. Taft, M. (2004). Morphological decomposition and the reverse base frequency effect. Quarterly Journal of Experimental Psychology, 57A, 745–765.Google Scholar
  50. Thorndike, E. L., & Lorge, I. (1944). The teacher’s word book of 30,000 words. New York: Columbia University, Teachers College.Google Scholar
  51. Underwood, B. J. (1961). Ten years of massed practice on distributed practice. Psychological Review, 68, 229–247.CrossRefGoogle Scholar
  52. van Hell, J. G., & de Groot, A. M. B. (1998). Disentangling context availability and concreteness in lexical decision and word translation. Quarterly Journal of Experimental Psychology, 51A, 41–63.Google Scholar
  53. Yarkoni, T., Speer, N. K., Balota, D. A., McAvoy, M. P., & Zacks, J. M. (2008). Pictures of a thousand words: Investigating the neural mechanisms of reading with extremely rapid event-related fMRI. NeuroImage, 42, 973–987.PubMedCrossRefGoogle Scholar
  54. Yonelinas, A. P. (2002). The nature of recollection and familiarity: A review of 30 years of research. Journal of Memory & Language, 46, 441–517.CrossRefGoogle Scholar
  55. Zeno, S. M., Ivens, S. H., Millard, R. T., & Duvvuri, R. (1995). The educator’s word frequency guide. Brewster, NY: Touchstone Applied Science.Google Scholar
  56. Zevin, J. D., & Seidenberg, M. S. (2002). Age of acquisition effects in word reading and other tasks. Journal of Memory & Language, 47, 1–29.CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2009

Authors and Affiliations

  1. 1.Department of Experimental PsychologyGhent UniversityGentBelgium
  2. 2.Royal HollowayUniversity of LondonLondonEngland
  3. 3.Université Paris DescartesParisFrance

Personalised recommendations