Behavior Research Methods

, Volume 50, Issue 1, pp 1–25 | Cite as

On the predictive validity of various corpus-based frequency norms in L2 English lexical processing

  • Xiaocong Chen
  • Yanping Dong
  • Xiufen Yu


The predictive validity of various corpus-based frequency norms in first-language lexical processing has been intensively investigated in previous research, but less attention has been paid to this issue in second-language (L2) processing. To bridge the gap, in the present study we took English as a case in point and compared the predictive power of a large set of corpus-based frequency norms for the performance of an L2 English visual lexical decision task (LDT). Our results showed that, in general, the frequency norms from SUBTLEX-US and WorldLex–Blog tended to predict L2 performance better in reaction times, whereas the frequency norms from corpora with a mixture of written and spoken genres (CELEX, WorldLex–Blog, BNC, ANC, and COCA) tended to predict L2 accuracy better. Although replicated in both low- and high-proficiency L2 English learners, these patterns were not exactly the same as those found in LDT data from native English speakers. In addition, we only observed some limited advantages of the lemma frequency and contextual diversity measures over the wordform frequency measure in predicting L2 lexical processing. The results of the present study, especially the detailed comparisons among the different corpora, provide methodological implications for future L2 lexical research.


Corpus-based frequency norms L2 lexical processing Lemma frequency Contextual diversity Predictive validity 



This research is supported by the National Social Science Foundation of China (15AYY002). The authors thank Emmanuel Keuleers and an anonymous reviewer for their comments on earlier drafts. The authors also thank Jiadan Lin, Fei Li, Fei Zhong and Hongming Zhao for helping recruit the participants and collect the data and James Campion for proofreading the final draft.


  1. Adams, M. J. (1979). Models of word recognition. Cognitive Psychology, 11, 133–176. doi: CrossRefGoogle Scholar
  2. Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17, 814–823. doi: CrossRefPubMedGoogle Scholar
  3. Adorni, R., Manfredi, M., & Mado Proverbio, A. (2013). Since when or how often? Dissociating the roles of age of acquisition (AoA) and lexical frequency in early visual word processing. Brain and Language, 124, 132–141. doi: CrossRefPubMedGoogle Scholar
  4. Akbari, N. (2015). Word frequency and morphological family size effects on the accuracy and speed of lexical access in school-aged bilingual students. International Journal of Applied Linguistics doi:
  5. Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press. doi: CrossRefGoogle Scholar
  6. Baayen, R. H. (2010). Demythologizing the word frequency effect: A discriminative learning perspective. Mental Lexicon, 5, 436–461. doi: CrossRefGoogle Scholar
  7. Baayen, R. H., Feldman, L. B., & Schreuder, R. (2006). Morphological influences on the recognition of monosyllabic monomorphemic words. Journal of Memory and Language, 55, 290–313. doi: CrossRefGoogle Scholar
  8. Baayen, R. H., Milin, P., & Ramscar, M. (2016). Frequency in lexical processing. Aphasiology, 30, 1174–1220. doi: CrossRefGoogle Scholar
  9. Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX Lexical Database (Release 2) [CD-ROM]. Philadelphia: Linguistics Data Consortium, University of Pennsylvania.Google Scholar
  10. Baayen, R. H., Wurm, L. H., & Aycock, J. (2007). Lexical dynamics for low-frequency complex words: A regression study across tasks and modalities. Mental Lexicon, 2, 419–463. doi: CrossRefGoogle Scholar
  11. Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human Perception and Performance, 10, 340–357. doi: PubMedGoogle Scholar
  12. Balota, D. A., & Chumbley, J. I. (1985). The locus of word-frequency effects in the pronunciation lexical access and/or production? Journal of Memory and Language, 24, 89–106. doi: CrossRefGoogle Scholar
  13. Balota, D. A., & Chumbley, J. I. (1990). Where are the effects of frequency in visual word recognition tasks? Right where we said they were! Comment on Monsell, Doyle, and Haggard (1989). Journal of Experimental Psychology: General, 119, 231–237. doi: CrossRefGoogle Scholar
  14. Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133, 283–316. doi: CrossRefGoogle Scholar
  15. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A, Kessler, B., Loftis, B., … Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459. doi:
  16. Bradlow, A. R., & Pisoni, D. B. (1999). Recognition of spoken words by native and non-native listeners: Talker-, listener-, and item-related factors. Journal of the Acoustical Society of America, 106, 2074–2085. doi: CrossRefPubMedPubMedCentralGoogle Scholar
  17. Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58, 412–424. doi: CrossRefPubMedGoogle Scholar
  18. Brysbaert, M., & Cortese, M. J. (2011). Do the effects of subjective frequency and age of acquisition survive better word frequency norms? Quarterly Journal of Experimental Psychology, 64, 545–559. doi: CrossRefGoogle Scholar
  19. Brysbaert, M., & Diependaele, K. (2013). Dealing with zero word frequencies: A review of the existing rules of thumb and a suggestion for an evidence-based choice. Behavior Research Methods, 45, 422–430. doi: CrossRefPubMedGoogle Scholar
  20. Brysbaert, M., Keuleers, E., & New, B. (2011). Assessing the usefulness of Google Books’ word frequencies for psycholinguistic research on word processing. Frontiers in Psychology, 27:1–8. doi: Google Scholar
  21. Brysbaert, M., Lagrou, E., & Stevens, M. (2017). Visual word recognition in a second language: A test of the lexical entrenchment hypothesis with lexical decision times. Bilingualism: Language and Cognition, 20, 530–548. doi: CrossRefGoogle Scholar
  22. Brysbaert, M., Mandera, P., & Keuleers, E. (2017). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science. Advance online publication. doi:
  23. Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977–990. doi: CrossRefPubMedGoogle Scholar
  24. Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods, 44, 991–997. doi: CrossRefPubMedGoogle Scholar
  25. Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42, 441–458. doi: PubMedGoogle Scholar
  26. Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kučera and Francis. Behavior Research Methods, Instruments, & Computers, 30, 272–277. doi: CrossRefGoogle Scholar
  27. Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS ONE, 5, e10729. doi: CrossRefPubMedPubMedCentralGoogle Scholar
  28. Clahsen, H., Felser, C., Neubauer, K., & Silva, R. (2010). Morphological structure in native and nonnative language processing. Language Learning, 60, 21–43. doi: CrossRefGoogle Scholar
  29. Colombo, L., Pasini, M., & Balota, D. a. (2006). Dissociating the influence of familiarity and meaningfulness from word frequency in naming and lexical decision performance. Memory & Cognition, 34, 1312–1324. doi: CrossRefGoogle Scholar
  30. Connine, C. M., Mullennix, J., Shernoff, E., & Yelen, J. (1990). Word familiarity and frequency in visual and auditory word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 1084–1096. doi: PubMedGoogle Scholar
  31. Cop, U., Drieghe, D., & Duyck, W. (2015). Eye movement patterns in natural reading: A comparison of monolingual and bilingual reading of a novel. PLoS ONE, e134008:1–38. doi: Google Scholar
  32. Cop, U., Keuleers, E., Drieghe, D., & Duyck, W. (2015). Frequency effects in monolingual and bilingual natural reading. Psychonomic Bulletin & Review, 22, 1216–1234. doi: CrossRefGoogle Scholar
  33. Cuetos, F., Glez-Nosti, M., Barbón, A., & Brysbaert, M. (2011). SUBTLEX-ESP: Spanish word frequencies based on film subtitles. Psicológica, 32, 133–143. doi: Google Scholar
  34. Dahan, D., Magnuson, J. S., & Tanenhaus, M. K. (2001). Time course of frequency effects in spoken-word recognition: evidence from eye movements. Cognitive Psychology, 42, 317–367. doi: CrossRefPubMedGoogle Scholar
  35. Davies, M. (2009). The 385+ million word Corpus of Contemporary American English (1990–2008+). Design, architecture, and linguistic insights. International Journal of Corpus Linguistics, 14, 159–190. doi: CrossRefGoogle Scholar
  36. Davies, M. (2011a). Google Books (American English) Corpus (155 billion words, 1810–2009). Retrieved from Scholar
  37. Davies, M. (2011b). Google Books (British English) Corpus (34 billion words, 1810–2009).Google Scholar
  38. De Groot, A. M. B., Borgwaldt, S., Bos, M., & van den Eijnden, E. (2002). Lexical decision and word naming in bilinguals: Language effects and task effects. Journal of Memory and Language, 47, 91–124. doi: CrossRefGoogle Scholar
  39. R Development Core Team. (2017). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Retrieved from Google Scholar
  40. Diependaele, K., Lemhöfer, K., & Brysbaert, M. (2013). The word frequency effect in first- and second-language word recognition: A lexical entrenchment account. Quarterly Journal of Experimental Psychology, 66, 843–863. doi: CrossRefGoogle Scholar
  41. Dimitropoulou, M., & Carreiras, M. (2010). Subtitle-based word frequencies as the best estimate of reading behavior: the case of Greek. Frontiers in Psychology, 218:1–12. doi: Google Scholar
  42. Dong, Y., & Yuan, Y. (2008). The necessity of collecting baseline reaction time in priming experiments. Xinli Kexue (Psychological Science), 31, 192–194.Google Scholar
  43. Duchon, A., Perea, M., Sebastián-Gallés, N., Martí, M. A., & Carreiras, M. (2013). EsPal: One-stop shopping for Spanish word properties. Behavior Research Methods, 45, 1246–1258. doi: CrossRefPubMedGoogle Scholar
  44. Dufour, S., Brunellière, A., & Frauenfelder, U. H. (2013). Tracking the time course of word-frequency effects in auditory word recognition with event-related potentials. Cognitive Science, 37, 489–507. doi: CrossRefPubMedGoogle Scholar
  45. Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24, 143–188. doi: Google Scholar
  46. Forster, K. I., & Chambers, S. M. (1973). Lexical access and naming time. Journal of Verbal Learning and Verbal Behavior, 12, 627–635. doi: CrossRefGoogle Scholar
  47. Garlock, V. M., Walley, A. C., & Metsala, J. L. (2001). Age-of-acquisition, word frequency, and neighborhood density effects on spoken word recognition by children and adults. Journal of Memory and Language, 45, 468–492. doi: CrossRefGoogle Scholar
  48. Geranpayeh, A. (2003). A quick review of the English Quick Placement Test. Research Notes, 12, 8–10.Google Scholar
  49. Gimenes, M., Brysbaert, M., & New, B. (2016). The processing of singular and plural nouns in English, French, and Dutch: New insights from megastudies. Canadian Journal of Experimental Psychology, 70, 316–324. doi: CrossRefPubMedGoogle Scholar
  50. Gimenes, M., & New, B. (2016). Worldlex: Twitter and blog word frequencies for 66 languages. Behavior Research Methods, 48, 963–972. doi: CrossRefPubMedGoogle Scholar
  51. Gollan, T. H., Montoya, R. I., Cera, C., & Sandoval, T. C. (2008). More use almost always means a smaller frequency effect: Aging, bilingualism, and the weaker links hypothesis. Journal of Memory and Language, 58, 787–814. doi: CrossRefPubMedPubMedCentralGoogle Scholar
  52. Gollan, T. H., Slattery, T. J., Goldenberg, D., van Assche, E., Duyck, W., & Rayner, K. (2011). Frequency drives lexical access in reading but not in speaking: The frequency-lag hypothesis. Journal of Experimental Psychology: General, 140, 186–209. doi: CrossRefGoogle Scholar
  53. Heister, J., & Kliegl, R. (2012). Comparing word frequencies from different German text corpora. In K.-M. Würzner & E. Pohl (Eds.), Lexical resources in psycholinguistic research (pp. 27–44). Potsdam: Universitätsverlag Potsdam.Google Scholar
  54. Herdağdelen, A., & Marelli, M. (2017). Social media and language processing: How Facebook and Twitter provide the best frequency estimates for studying word recognition. Cognitive Science, 41, 976–995. doi: CrossRefPubMedGoogle Scholar
  55. Howes, D. H., & Solomon, R. L. (1951). Visual duration threshold as a function of word-probability. Journal of Experimental Psychology, 41, 401–410. doi: CrossRefPubMedGoogle Scholar
  56. Imai, S., Walley, A. C., & Flege, J. E. (2005). Lexical frequency and neighborhood density effects on the recognition of native and Spanish-accented words by native English and Spanish listeners. The Journal of the Acoustical Society of America, 117, 896–907. doi: CrossRefPubMedGoogle Scholar
  57. Jescheniak, J. D., & Levelt, W. J. M. (1994). Word frequency effects in speech production: Retrieval of syntactic information and of phonological form. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 824–843. doi: Google Scholar
  58. Johns, B. T., Gruenenfelder, T. M., Pisoni, D. B., & Jones, M. N. (2012). Effects of word frequency, contextual diversity, and semantic distinctiveness on spoken word recognition. The Journal of the Acoustical Society of America, 132, EL74–80. doi: CrossRefPubMedPubMedCentralGoogle Scholar
  59. Keuleers, E., & Brysbaert, M. (2010). Wuggy: A multilingual pseudoword generator. Behavior Research Methods, 42, 627–633. doi: CrossRefPubMedGoogle Scholar
  60. Keuleers, E., Brysbaert, M., & New, B. (2010). SUBTLEX-NL: a new measure for Dutch word frequency based on film subtitles. Behavior Research Methods, 42, 643–650. doi: CrossRefPubMedGoogle Scholar
  61. Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44, 287–304. doi: CrossRefPubMedGoogle Scholar
  62. Kilgarriff, A. (2006). BNC data base and word frequency lists. Retrieved from
  63. Kučera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence: Brown University Press. doi: Google Scholar
  64. Kuperman, V., & van Dyke, J. A. (2013). Reassessing word frequency as a determinant of word recognition for skilled and unskilled readers. Journal of Experimental Psychology: Human Perception and Performance, 39, 802–823. doi: PubMedPubMedCentralGoogle Scholar
  65. Lemhöfer, K., Dijkstra, T., Schriefers, H., Baayen, R. H., Grainger, J., & Zwitserlood, P. (2008). Native language influences on word recognition in a second language: A megastudy. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 12–31. doi: PubMedGoogle Scholar
  66. Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear and Hearing, 19, 1–36. doi:
  67. Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28, 203–208. doi: CrossRefGoogle Scholar
  68. Mandera, P., Keuleers, E., Wodniecka, Z., & Brysbaert, M. (2015). Subtlex-pl: subtitle-based word frequency estimates for Polish. Behavior Research Methods, 47, 471–483. doi: CrossRefPubMedGoogle Scholar
  69. McDonald, S. A., & Shillcock, R. C. (2001). Rethinking the word frequency effect: the neglected role of distributional information in lexical processing. Language and Speech, 44, 295–323. doi: CrossRefPubMedGoogle Scholar
  70. Merkle, E. C., You, D., & Preacher, K. J. (2016). Testing nonnested structural equation models. Psychological Methods, 21, 151–163. doi: CrossRefPubMedGoogle Scholar
  71. Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Pickett, J. P., … Aiden, E. L. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331, 176–182. doi: CrossRefPubMedGoogle Scholar
  72. Monaghan, P., Chang, Y., Welbourne, S., & Brysbaert, M. (2017). Exploring the relations between word frequency , language exposure , and bilingualism in a computational model of reading. Journal of Memory and Language, 93, 1–21. doi: CrossRefGoogle Scholar
  73. Monsell, S., Doyle, M. C., & Haggard, M. P. (1989). Effects of frequency on visual word recognition tasks: Where are they? Journal of Experimental Psychology: General, 118, 43–71. doi: CrossRefGoogle Scholar
  74. Moulin, A., & Richard, C. (2015). Lexical influences on spoken spondaic word recognition in hearing-impaired patients. Frontiers in Neuroscience, 476:1–14. doi: Google Scholar
  75. New, B., Brysbaert, M., Veronis, J., & Pallier, C. (2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics, 28, 661–677. doi: CrossRefGoogle Scholar
  76. Nusbaum, H. C., Pisoni, D. B., & Davis, C. K. (1984). Sizing up the hoosier mental lexicon: Measuring the familiarity of 20,000 words. Research on Speech Perception Progress Report, 10, 357–376.Google Scholar
  77. Pham, H. (2014). Visual processing of vietnamese compound words: A multivariate analysis of using corpus linguistic and psycholinguistic paradigms (Unpublished PhD dissertation).University of Alberta, Edmonton.Google Scholar
  78. Rayner, K., & Duffy, S. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition, 14, 191–201. doi: CrossRefGoogle Scholar
  79. Reppen, R., & Ide, N. (2004). The American National Corpus: Overall goals and the first release. Journal of English Linguistics, 32, 105–113. doi: CrossRefGoogle Scholar
  80. Savin, H. B. (1963). Word frequency effect and errors in the perception of speech. Journal of the Acoustical Society of America, 35, 200–206. doi: CrossRefGoogle Scholar
  81. Schilling, H. E. H., Rayner, K., & Chumbley, J. I. (1998). Comparing naming, lexical decision, and eye fixation times: word frequency effects and individual differences. Memory & Cognition, 26, 1270–1281. doi: CrossRefGoogle Scholar
  82. Schmidtke, J. (2014). Second language experience modulates word retrieval effort in bilinguals: Evidence from pupillometry. Frontiers in Psychology, 137:1–16. doi: Google Scholar
  83. Schneider, W., Eschman, A., & Zuccolotto, A. (2001). E-prime. Pittsburgh: Psychology Software Tools, Inc.Google Scholar
  84. Shaoul, C., & Westbury, C. (2006). USENET Orthographic frequencies for 1,618,598 types. (2005–2006). Edmonton: University of Alberta. Retrieved from
  85. Shatzman, K. B., & Schiller, N. O. (2004). The word frequency effect in picture naming: Contrasting two hypotheses using homonym pictures. Brain and Language, 90, 160–169. doi: CrossRefPubMedGoogle Scholar
  86. Shi, L. (2014). Lexical effects on recognition of the NU-6 words by monolingual and bilingual listeners. International Journal of Audiology, 53, 318–325. doi: CrossRefPubMedGoogle Scholar
  87. Shi, L. (2015). English word frequency and recognition in bilinguals: Inter-corpus comparison and error analysis. International Journal of Audiology, 54, 674–681. doi: CrossRefPubMedGoogle Scholar
  88. Soares, A. P., Machado, J., Costa, A., Iriarte, Á., Simões, A., de Almeida, J. J., … Perea, M. (2015). On the advantages of word frequency and contextual diversity measures extracted from subtitles: The case of Portuguese. Quarterly Journal of Experimental Psychology, 68, 680–696. doi: CrossRefGoogle Scholar
  89. Univeristy of Cambridge Local Examination Syndicate. (2001). Quick Placement Test. Oxford: Oxford University Press.Google Scholar
  90. van Heuven, W. J. B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176–1190. doi: CrossRefGoogle Scholar
  91. van Wijnendaele, I., & Brysbaert, M. (2002). Visual word recognition in bilinguals: Phonological priming from the second to the first language. Journal of Experimental Psychology: Human Perception and Performance, 28, 616–627. doi: PubMedGoogle Scholar
  92. Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological Methods, 17, 228–243. doi: CrossRefPubMedPubMedCentralGoogle Scholar
  93. Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333. doi: CrossRefGoogle Scholar
  94. Whitford, V., & Titone, D. (2012). Second-language experience modulates first- and second-language word frequency effects: Evidence from eye movement measures of natural paragraph reading. Psychonomic Bulletin & Review, 19, 73–80. doi: CrossRefGoogle Scholar
  95. Yap, M. J., & Balota, D. A. (2009). Visual word recognition of multisyllabic words. Journal of Memory and Language, 60, 502–529. doi: CrossRefGoogle Scholar
  96. Zevin, J. D., & Seidenberg, M. S. (2002). Age of acquisition effects in word reading and other tasks. Journal of Memory and Language, 47, 1–29. doi:

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  1. 1.Bilingual Cognition and Development Lab, Center for Linguistics and Applied LinguisticsGuangdong University of Foreign StudiesGuangzhouChina

Personalised recommendations