The influence of place and time on lexical behavior: A distributional analysis

  • Brendan T. JohnsEmail author
  • Randall K. Jamieson


We measured and documented the influence of corpus effects on lexical behavior. Specifically, we used a corpus of over 26,000 fiction books to show that computational models of language trained on samples of language (i.e., subcorpora) representative of the language located in a particular place and time can track differences in people’s experimental language behavior. This conclusion was true across multiple tasks (lexical decision, category production, and word familiarity) and provided insight into the influence that language experience imposes on language processing and organization. We used the assembled corpus and methods to validate a new machine-learning approach for optimizing language models, entitled experiential optimization (Johns, Jones, & Mewhort in Psychonomic Bulletin & Review, 26, 103–126, 2019).


Lexical organization Lexical semantics Distributional semantics Big data Machine learning 



  1. Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17, 814–823. CrossRefPubMedGoogle Scholar
  2. Applebee, A. N. (1992). Stability and change in the high-school canon. English Journal, 81, 27–32.CrossRefGoogle Scholar
  3. Baker, P. (2010). Sociolinguistics and sub-corpus linguistics. Edinburgh, UK: Edinburgh University Press.Google Scholar
  4. Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., . . . Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 339, 445–459.
  5. Bartlett, F. C. (1928). An experiment upon repeated reproduction. Journal of General Psychology, 1, 54–63.CrossRefGoogle Scholar
  6. Bartlett, F. C. (1932). Remembering: An experimental and social study. Cambridge, UK: Cambridge University Press.Google Scholar
  7. Battig, W. F., & Montague, W. E. (1969). Category norms of verbal items in 56 categories: A replication and extension of the Connecticut category norms. Journal of Experimental Psychology, 80(3, Pt. 2), 1–46. CrossRefGoogle Scholar
  8. Biber, D. (1993). Representativeness in sub-corpus design. Literary and Linguistic Computing, 8, 243–257.CrossRefGoogle Scholar
  9. Brysbaert, M., Keuleers, E., & New, B. (2011). Assessing the usefulness of Google Books’ word frequencies for psycholinguistic research on word processing. Frontiers in Psychology, 2, 27.CrossRefPubMedPubMedCentralGoogle Scholar
  10. Brysbaert, M., Mandera, P., & Keuleers, E. (2018). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27, 45–50. CrossRefGoogle Scholar
  11. Brysbaert, M., Mandera, P., McCormick, S.F., & Keuleers, E. (2019). Word prevalence norms for 62,000 English lemmas. Behavior Research Methods, 51, 467–479. CrossRefPubMedGoogle Scholar
  12. Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977–990. CrossRefPubMedGoogle Scholar
  13. Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS ONE, 5, e10729:1–8. CrossRefPubMedPubMedCentralGoogle Scholar
  14. Chubala, C. M., Johns, B. T., Jamieson, R. K., & Mewhort, D. J. K. (2016). Applying an exemplar model to an implicit rule-learning task: Implicit learning of semantic structure. Quarterly Journal of Experimental Psychology, 69, 1049–1055.CrossRefGoogle Scholar
  15. Clark, J. M., & Paivio, A. (2004). Extensions of the Paivio, Yuille, and Madigan (1968) norms. Behavior Research Methods, Instruments, & Computers, 36, 371–383. CrossRefGoogle Scholar
  16. Gilhooly, K. J., & Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words. Behavior Research Methods & Instrumentation, 12, 395–427. CrossRefGoogle Scholar
  17. Green, C. D., Feinerer, I., & Burman, J. T. (2013). Beyond the schools of psychology 1: A digital analysis of Psychological Review, 1894–1903. Journal of the History of the Behavioral Sciences, 49, 167–189.CrossRefPubMedGoogle Scholar
  18. Green, C. D., Feinerer, I., & Burman, J. T. (2015). Searching for the structure of early American psychology: Networking Psychological Review, 1894–1908. History of Psychology, 18, 15–31.CrossRefPubMedGoogle Scholar
  19. Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114, 211–244. CrossRefPubMedGoogle Scholar
  20. Hampton, J. A., & Gardiner, M. M. (1983). Measures of internal category structure: A correlational analysis of normative data. British Journal of Psychology, 74, 491–516.CrossRefGoogle Scholar
  21. Herdağdelen, A., & Marelli, M. (2017). Social media and language processing: How Facebook and Twitter provide the best frequency estimates for studying word recognition. Cognitive Science, 41, 976–995. CrossRefPubMedGoogle Scholar
  22. Hills, T. T., Jones, M. N., & Todd, P. M. (2012). Optimal foraging in semantic memory. Psychological Review, 119, 431–440. CrossRefPubMedGoogle Scholar
  23. Johns, B. T. (2019). Mining a crowdsourced dictionary to understand consistency and preference in word meanings. Frontiers in Psychology, 10, 268. CrossRefPubMedPubMedCentralGoogle Scholar
  24. Johns, B. T., Gruenenfelder, T. M., Pisoni, D. B., & Jones, M. N. (2012). Effects of word frequency, contextual diversity, and semantic distinctiveness on spoken word recognition. Journal of Acoustical Society of America, 132, EL74–EL80.CrossRefGoogle Scholar
  25. Johns, B. T., & Jamieson, R. K. (2018). A large-scale analysis of variance in written language. Cognitive Science, 42, 1360–1374. CrossRefPubMedGoogle Scholar
  26. Johns, B. T., & Jones, M. N. (2015). Generating structure from experience: A retrieval-based model of language processing. Canadian Journal of Experimental Psychology, 69, 233–251.CrossRefPubMedGoogle Scholar
  27. Johns, B. T., Jones, M. N., & Mewhort, D. J. K. (2019). Using experiential optimization to build lexical representations. Psychonomic Bulletin & Review, 26, 103–126. CrossRefGoogle Scholar
  28. Johns, B. T., Mewhort, D. J. K., & Jones, M. N. (2019). The role of negative information in distributional semantic learning. Cognitive Science, 43, e12730. CrossRefPubMedGoogle Scholar
  29. Johns, B. T., Sheppard, C. L., Jones, M. N., & Taler, V. (2016). The role of semantic diversity in word recognition across aging and bilingualism. Frontiers in Psychology, 7, 703:1–11. CrossRefPubMedPubMedCentralGoogle Scholar
  30. Jones, M. N. (2017). Developing cognitive theory by mining large-scale naturalistic data. In M. N. Jones (Ed.), Big data in cognitive science. New York, NY: Taylor & Francis.Google Scholar
  31. Jones, M. N., Dye, M., & Johns, B. T. (2017). Context as an organizational principle of the lexicon. In B. H. Ross (Ed.), The psychology of learning and motivation (Vol. 67, 239–283). San Diego, CA: Elsevier Academic Press.Google Scholar
  32. Jones, M. N., Johns, B. T., & Recchia, G. (2012). The role of semantic diversity in lexical organization. Canadian Journal of Experimental Psychology, 66, 115–124. CrossRefPubMedGoogle Scholar
  33. Jones, M. N., & Mewhort, D. J. K. (2007). Representing word meaning and order information in a composite holographic lexicon. Psychological Review, 114, 1–37.CrossRefPubMedGoogle Scholar
  34. Keuleers, E., Brysbaert, M., & New, B. (2010). SUBTLEX-NL: A new measure for Dutch word frequency based on film subtitles. Behavior Research Methods, 42, 643–650. CrossRefPubMedGoogle Scholar
  35. Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44, 287–304. CrossRefPubMedGoogle Scholar
  36. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240. CrossRefGoogle Scholar
  37. Mandera, P., Keuleers, E., Wodniecka, Z., & Brysbaert, M. (2015). SUBTLEX-PL: Subtitle-based word frequency estimates for Polish. Behavior Research Methods, 47, 471–483. CrossRefPubMedGoogle Scholar
  38. Paivio A. (1974). [Imagery and familiarity ratings for 2,448 words] (Unpublished norms). London, ON: University of Western Ontario, Department of Psychology.Google Scholar
  39. Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology, 76(1, Pt. 2), 1–25. CrossRefGoogle Scholar
  40. Plate, T. A. (1995). Holographic reduced representations. IEEE Transactions on Neural Networks, 6, 623–641.CrossRefPubMedGoogle Scholar
  41. Recchia, G., Sahlgren, M., Kanerva, P., & Jones, M. N. (2015). Encoding sequential information in semantic space models: Comparing holographic reduced representation and random permutation. Computational Intelligence and Neuroscience, 2015, 986574. CrossRefPubMedPubMedCentralGoogle Scholar
  42. Rosch, E., & Mervis, C. B. (1975). Family resemblances: Studies in the internal structure of categories. Cognitive Psychology, 7, 573–605. CrossRefGoogle Scholar
  43. Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63, 129–138.CrossRefPubMedGoogle Scholar
  44. Simon, H. A. (1969). The sciences of the artificial. Cambridge, MA: MIT Press.Google Scholar
  45. Stadthagen-Gonzalez, H., & Davis, C. J. (2006). The Bristol norms for age of acquisition, imageability, and familiarity. Behavior Research Methods, 38, 598–605. CrossRefPubMedGoogle Scholar
  46. Stratton, R. P., Jacobus, K. A., & Brinley, B. (1975). Age-of-acquisition, imagery, familiarity and meaningfulness norms for 543 words. Behavior Research Methods & Instrumentation, 7, 1–6. CrossRefGoogle Scholar
  47. Taler, V., Johns, B. T., & Jones, M. N. (2019). A large scale semantic analysis of verbal fluency across the aging spectrum: Data from the Canadian Longitudinal Study on Aging. Journals of Gerontology B: Psychological Sciences. Advance online publication.
  48. Todd, P. M., & Gigerenzer, G. (2001). Shepard’s mirrors or Simon’s scissors? Commentary on R. Shepard, “Perceptual–cognitive universals as reflections of the world.” Behavioral and Brain Sciences, 24, 704–705. CrossRefGoogle Scholar
  49. Todd, P. M., & Gigerenzer, G. (2007). Environments that make us smart: Ecological rationality. Current Directions in Psychological Science, 16, 167–171.CrossRefGoogle Scholar
  50. Tremblay, M., & Vézina, H. (2000). New estimates of intergenerational time intervals for the calculation of age and origins of mutations. American Journal of Human Genetics, 66, 651–658.CrossRefPubMedPubMedCentralGoogle Scholar
  51. van Heuven, W. J. B., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67, 1176–1190. CrossRefGoogle Scholar
  52. Van Overschelde, J. P., Rawson, K. A., & Dunlosky, J. (2004). Category norms: An updated and expanded version of the Battig and Montague (1969) norms. Journal of Memory and Language, 50, 289–335. CrossRefGoogle Scholar

Copyright information

© The Psychonomic Society, Inc. 2019

Authors and Affiliations

  1. 1.University at BuffaloBuffaloUSA
  2. 2.University of ManitobaWinnipegCanada

Personalised recommendations