Advertisement

Word prevalence norms for 62,000 English lemmas

  • Marc Brysbaert
  • Paweł Mandera
  • Samantha F. McCormick
  • Emmanuel Keuleers
Article

Abstract

We present word prevalence data for 61,858 English words. Word prevalence refers to the number of people who know the word. The measure was obtained on the basis of an online crowdsourcing study involving over 220,000 people. Word prevalence data are useful for gauging the difficulty of words and, as such, for matching stimulus materials in experimental conditions or selecting stimulus materials for vocabulary tests. Word prevalence also predicts word processing times, over and above the effects of word frequency, word length, similarity to other words, and age of acquisition, in line with previous findings in the Dutch language.

Keywords

Word prevalence Word frequency Word processing Megastudy 

Supplementary material

13428_2018_1077_MOESM1_ESM.xlsx (3.7 mb)
ESM 1 (XLSX 3750 kb)
13428_2018_1077_MOESM2_ESM.xlsx (8.1 mb)
ESM 2 (XLSX 8255 kb)

References

  1. Adelman, J. S., & Brown, G. D. A. (2007). Phonographic neighbors, not orthographic neighbors, determine word naming latencies. Psychonomic Bulletin & Review, 14, 455–459. doi: https://doi.org/10.3758/BF03194088 CrossRefGoogle Scholar
  2. Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17, 814–823. doi: https://doi.org/10.1111/j.1467-9280.2006.01787.x CrossRefPubMedGoogle Scholar
  3. Adelman, J. S., Marquis, S. J., Sabatos-DeVito, M. G., & Estes, Z. (2013). The unexplained nature of reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1037–1053. doi: https://doi.org/10.1037/a0031829 PubMedGoogle Scholar
  4. Anderson, R. C., & Freebody, P. (1981). Vocabulary knowledge. In J. Guthrie (Ed.), Reading comprehension and education (pp. 77–117). Newark, DE: International Reading Association.Google Scholar
  5. Baayen, R. H., Milin, P., & Ramscar, M. (2016). Frequency in lexical processing. Aphasiology, 30, 1174–1220.CrossRefGoogle Scholar
  6. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., . . . Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459. doi: https://doi.org/10.3758/BF03193014
  7. Benjamin, R. G. (2012). Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review, 24, 63–88.CrossRefGoogle Scholar
  8. Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, J., & Böhl, A. (2011). The word frequency effect: A review of recent developments and implications for the choice of frequency estimates in German. Experimental Psychology, 58, 412–424. doi: https://doi.org/10.1027/1618-3169/a000123 CrossRefPubMedGoogle Scholar
  9. Brysbaert, M., Mandera, P., & Keuleers, E. (2018). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27, 45–50. doi: https://doi.org/10.1177/0963721417727521 CrossRefGoogle Scholar
  10. Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977–990. doi: https://doi.org/10.3758/BRM.41.4.977 CrossRefPubMedGoogle Scholar
  11. Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods, 44, 991–997. doi: https://doi.org/10.3758/s13428-012-0190-4 CrossRefPubMedGoogle Scholar
  12. Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016a) How many words do we know? Practical estimates of vocabulary size dependent on word definition, the degree of language input and the participant’s age. Frontiers in Psychology 7, 1116. doi: https://doi.org/10.3389/fpsyg.2016.01116 CrossRefPubMedPubMedCentralGoogle Scholar
  13. Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016b). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42, 441–458. doi: https://doi.org/10.1037/xhp0000159 PubMedGoogle Scholar
  14. Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46, 904–911. doi: https://doi.org/10.3758/s13428-013-0403-5 CrossRefPubMedGoogle Scholar
  15. Clark, J. M., & Paivio, A. (2004). Extensions of the Paivio, Yuille, and Madigan (1968) norms. Behavior Research Methods, Instruments, & Computers, 36, 371–383. doi: https://doi.org/10.3758/BF03195584 CrossRefGoogle Scholar
  16. Connell, L., & Lynott, D. (2012). Strength of perceptual experience predicts word processing performance better than concreteness or imageability. Cognition, 125, 452–465. doi: https://doi.org/10.1016/j.cognition.2012.07.010 CrossRefPubMedGoogle Scholar
  17. Cortese, M. J., Hacker, S., Schock, J., & Santo, J. B. (2015). Is reading-aloud performance in megastudies systematically influenced by the list context? Quarterly Journal of Experimental Psychology, 68, 1711–1722.CrossRefGoogle Scholar
  18. Cortese, M. J., Yates, M., Schock, J., & Vilks, L. (2018). Examining word processing via a megastudy of conditional reading aloud. Quarterly Journal of Experimental Psychology. Advance online publication. doi: https://doi.org/10.1177/1747021817741269
  19. De Clercq, O., & Hoste, V. (2016). All mixed up? finding the optimal feature set for general readability prediction and its application to English and Dutch. Computational Linguistics, 42, 457–490.CrossRefGoogle Scholar
  20. De Deyne, S., Navarro, D. J., Perfors, A., & Storms, G. (2016). Structure at every scale: A semantic network account of the similarities between unrelated concepts. Journal of Experimental Psychology: General, 145, 1228.CrossRefGoogle Scholar
  21. Dufau, S., Grainger, J., Midgley, K. J., & Holcomb, P. J. (2015). A thousand words are worth a picture: Snapshots of printed-word processing in an event-related potential megastudy. Psychological Science, 26, 1887–1897.CrossRefPubMedPubMedCentralGoogle Scholar
  22. Ernestus, M., & Cutler, A. (2015). BALDEY: A database of auditory lexical decisions. Quarterly Journal of Experimental Psychology, 68, 1469–1488. doi: https://doi.org/10.1080/17470218.2014.984730 CrossRefGoogle Scholar
  23. Ferrand, L., Brysbaert, M., Keuleers, E., New, B., Bonin, P., Méot, A., . . . Pallier, C. (2011). Comparing word processing times in naming, lexical decision, and progressive demasking: Evidence from Chronolex. Frontiers in Psychology, 2, 306. doi: https://doi.org/10.3389/fpsyg.2011.00306
  24. Ferrand, L., Méot, A., Spinelli, E., New, B., Pallier, C., Bonin, P., . . . Grainger, J. (2018). MEGALEX: A megastudy of visual and auditory word recognition. Behavior Research Methods, 50, 1285–1307. doi: https://doi.org/10.3758/s13428-017-0943-1
  25. Forster, K. I. (2000). The potential for experimenter bias effects in word recognition experiments. Memory & Cognition, 28, 1109–1115. doi: https://doi.org/10.3758/BF03211812 CrossRefGoogle Scholar
  26. Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactions between lexical familiarity and orthography, concreteness, and polysemy. Journal of Experimental Psychology: General, 113, 256–281. doi: https://doi.org/10.1037/0096-3445.113.2.256 CrossRefGoogle Scholar
  27. Goulden, R., Nation, P., & Read, J. (1990). How large can a receptive vocabulary be? Applied Linguistics, 11, 341–363.CrossRefGoogle Scholar
  28. Hancke, J., Vajjala, S., & Meurers, D. (2012). Readability classification for German using lexical, syntactic, and morphological features. In Proceedings of COLING 2012 (pp. 1063–1080).Google Scholar
  29. Hollis, G., Westbury, C., & Lefsrud, L. (2017). Extrapolating human judgments from skip-gram vector representations of word meaning. Quarterly Journal of Experimental Psychology, 70, 1603–1619. doi: https://doi.org/10.1080/17470218.2016.1195417 CrossRefGoogle Scholar
  30. Juhasz, B. J., & Yap, M. J. (2013). Sensory experience ratings for over 5,000 mono- and disyllabic words. Behavior Research Methods, 45, 160–168. doi: https://doi.org/10.3758/s13428-012-0242-9 CrossRefPubMedGoogle Scholar
  31. Keuleers, E., & Brysbaert, M. (2010). Wuggy: A multilingual pseudoword generator. Behavior Research Methods, 42, 627–633. doi: https://doi.org/10.3758/BRM.42.3.627 CrossRefPubMedGoogle Scholar
  32. Keuleers, E., Stevens, M., Mandera, P., & Brysbaert, M. (2015). Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment. Quarterly Journal of Experimental Psychology, 68, 1665–1692. doi: https://doi.org/10.1080/17470218.2015.1022560 CrossRefGoogle Scholar
  33. Kuperman, V. (2015). Virtual experiments in megastudies: A case study of language and emotion. Quarterly Journal of Experimental Psychology, 68, 1693–1710.CrossRefGoogle Scholar
  34. Kuperman, V., Estes, Z., Brysbaert, M., & Warriner, A. B. (2014). Emotion and language: valence and arousal affect word recognition. Journal of Experimental Psychology: General, 143, 1065–1081. doi: https://doi.org/10.1037/a0035669 CrossRefGoogle Scholar
  35. Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30 thousand English words. Behavior Research Methods, 44, 978–990.Google Scholar
  36. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240. doi: https://doi.org/10.1037/0033-295X.104.2.211 CrossRefGoogle Scholar
  37. Liu, Y., Shu, H., & Li, P. (2007). Word naming and psycholinguistic norms: Chinese. Behavior Research Methods, 39, 192–198. doi: https://doi.org/10.3758/BF03193147 CrossRefPubMedGoogle Scholar
  38. Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92, 57–78. doi: https://doi.org/10.1016/j.jml.2016.04.001 CrossRefGoogle Scholar
  39. Milton, J., & Treffers-Daller, J. (2013). Vocabulary size revisited: the link between vocabulary size and academic achievement. Applied Linguistics Review, 4, 151–172.CrossRefGoogle Scholar
  40. Paul, P. V., Stallman, A. C., & O’Rourke, J. P. (1990). Using three test formats to assess good and poor readers’ word knowledge (Technical Report No. 509). Urbana, IL: Center for the Study of Reading, University of Illinois.Google Scholar
  41. Raven, J. C. (1958). Guide to using the Mill Hill Vocabulary Scale with the Progressive Matrices Scales. Oxford, England: H. K. Lewis & Co.Google Scholar
  42. Revelle, W. (2018). Package “psych.” Available on May 29, 2018, at https://cran.r-project.org/web/packages/psych/psych.pdf
  43. Schröter, P., & Schroeder, S. (2017). The Developmental Lexicon Project: A behavioral database to investigate visual word recognition across the lifespan. Behavior Research Methods, 49, 2183–2203. doi: https://doi.org/10.3758/s13428-016-0851-9 CrossRefPubMedGoogle Scholar
  44. Shipley, W. C. (1940). A self-administering scale for measuring intellectual impairment and deterioration. Journal of Psychology: Interdisciplinary and Applied, 9, 371–377. doi: https://doi.org/10.1080/00223980.1940.9917704 CrossRefGoogle Scholar
  45. Sze, W. P., Yap, M. J., & Rickard Liow, S. J. (2015). The role of lexical variables in the visual recognition of Chinese characters: A megastudy analysis. Quarterly Journal of Experimental Psychology, 68, 1541–1570.CrossRefGoogle Scholar
  46. Tsang, Y.-K., Huang, J., Lui, M., Xue, M., Chan, Y.-W. F., Wang, S., & Chen, H.-C. (2018). MELD-SCH: A megastudy of lexical decision in simplified Chinese. Behavior Research Methods. Advance online publication. doi: https://doi.org/10.3758/s13428-017-0944-0
  47. Tse, C.-S., & Yap, M. J. (2018). The role of lexical variables in the visual recognition of two-character Chinese compound words: A megastudy analysis. Quarterly Journal of Experimental Psychology. Advance online publication. doi: https://doi.org/10.1177/1747021817738965
  48. Tse, C.-S., Yap, M. J., Chan, Y.-L., Sze, W. P., Shaoul, C., & Lin, D. (2017). The Chinese Lexicon Project: A megastudy of lexical decision performance for 25,000+ traditional Chinese two-character compound words. Behavior Research Methods, 49, 1503–1519. doi: https://doi.org/10.3758/s13428-016-0810-5 CrossRefPubMedGoogle Scholar
  49. van Heuven, W. J. B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176–1190. doi: https://doi.org/10.1080/17470218.2013.850521 CrossRefGoogle Scholar
  50. Yap, M. J., & Balota, D. A. (2009). Visual word recognition of multisyllabic words. Journal of Memory and Language, 60, 502–529. doi: https://doi.org/10.1016/j.jml.2009.02.001 CrossRefGoogle Scholar
  51. Yap, M. J., Tan, S. E., Pexman, P. M., & Hargreaves, I. S. (2011). Is more always better? Effects of semantic richness on lexical decision, speeded pronunciation, and semantic classification. Psychonomic Bulletin & Review, 18, 742–750. doi: https://doi.org/10.3758/s13423-011-0092-y CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  • Marc Brysbaert
    • 1
  • Paweł Mandera
    • 1
  • Samantha F. McCormick
    • 2
  • Emmanuel Keuleers
    • 3
  1. 1.Department of Experimental PsychologyGhent UniversityGentBelgium
  2. 2.Department of PsychologyUniversity of RoehamptonRoehamptonUK
  3. 3.Department of Cognitive Science and Artificial IntelligenceUniversity of Tilburg, Tilburg UniversityTilburgNetherlands

Personalised recommendations