Advertisement

The “Small World of Words” English word association norms for over 12,000 cue words

  • Simon De Deyne
  • Danielle J. Navarro
  • Amy Perfors
  • Marc Brysbaert
  • Gert Storms
Article

Abstract

Word associations have been used widely in psychology, but the validity of their application strongly depends on the number of cues included in the study and the extent to which they probe all associations known by an individual. In this work, we address both issues by introducing a new English word association dataset. We describe the collection of word associations for over 12,000 cue words, currently the largest such English-language resource in the world. Our procedure allowed subjects to provide multiple responses for each cue, which permits us to measure weak associations. We evaluate the utility of the dataset in several different contexts, including lexical decision and semantic categorization. We also show that measures based on a mechanism of spreading activation derived from this new resource are highly predictive of direct judgments of similarity. Finally, a comparison with existing English word association sets further highlights systematic improvements provided through these new norms.

Keywords

Word associations Mental lexicon Networks Similarity Spreading activation 

Notes

References

  1. Abott, J. T., Austerweil, J. L., & Griffiths, T. L. (2015). Random walks on semantic networks can resemble optimal foraging. Psychological Review, 122, 558–559.CrossRefGoogle Scholar
  2. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, (pp. 19–27).Google Scholar
  3. Atkinson, K. (2018). Variant conversion info (VarCon), accessed February 6, 2018. http://wordlist.aspell.net/varcon-readme/.
  4. Austerweil, J. L., Abbott, J. T., & Griffiths, T. L. (2012). Human memory search as a random walk in a semantic network. In F. Pereira, C. Burges, L. Bottou, & K. Weinberger (Eds.) Advances in Neural Information Processing Systems (pp. 3041–3049). Curran Associates, Inc.Google Scholar
  5. Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445–459.CrossRefGoogle Scholar
  6. Battig, W. F., & Montague, W. E. (1969). Category norms for verbal items in 56 categories: A replication and extension of the Connecticut category norms. Journal of Experimental Psychology Monographs, 80, 1–45.CrossRefGoogle Scholar
  7. Borge-Holthoefer, J., & Arenas, A. (2010). Categorizing words through semantic memory navigation. The European Physical Journal B-Condensed Matter and Complex Systems, 74, 265–270.CrossRefGoogle Scholar
  8. Bruni, E., Boleda, G., Baroni, M., & Tran, N. K. (2012). Distributional semantics in Technicolor. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume, 1, 136–145.Google Scholar
  9. Bruni, E., Tran, N. K., & Baroni, M. (2014). Multimodal distributional semantics. Journal of Artificial Intelligence Research, 49, 1–47.CrossRefGoogle Scholar
  10. Brysbaert, M., & New, B. (2009). Moving beyond Kucerǎ and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41, 977–990.CrossRefGoogle Scholar
  11. Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39, 510–526.CrossRefGoogle Scholar
  12. Collins, A. M., & Loftus, E. F. (1975). A spreading-activation theory of semantic processing. Psychological Review, 82, 407–428.CrossRefGoogle Scholar
  13. De Deyne, S., & Storms, G. (2008a). Word associations: Network and semantic properties. Behavior Research Methods, 40, 213–231.CrossRefGoogle Scholar
  14. De Deyne, S., & Storms, G. (2008b). Word associations: Norms for 1,424 Dutch words in a continuous task. Behavior Research Methods, 40, 198–205.CrossRefGoogle Scholar
  15. De Deyne, S., Verheyen, S., Ameel, E., Vanpaemel, W., Dry, M., Voorspoels, W., & Storms, G. (2008). Exemplar by feature applicability matrices and other Dutch normative data for semantic concepts. Behavior Research Methods, 40, 1030–1048.CrossRefGoogle Scholar
  16. De Deyne, S., Navarro, D. J., Perfors, A., & Storms, G. (2012). Strong structure in weak semantic similarity: A graph-based account. In Proceedings of the 34th Annual Conference of the Cognitive Science Society (pp. 1464–1469). Austin: Cognitive Science Society.Google Scholar
  17. De Deyne, S., Navarro, D. J., & Storms, G. (2013). Associative strength and semantic activation in the mental lexicon: evidence from continued word associations. In Knauff, M., Pauen, M., Sebanz, N., & Wachsmuth, I. (Eds.) Proceedings of the 33rd Annual Conference of the Cognitive Science Society (pp. 2142–2147). Austin: Cognitive Science Society.Google Scholar
  18. De Deyne, S., Navarro, D. J., & Storms, G. (2013). Better explanations of lexical and semantic cognition using networks derived from continued rather than single word associations. Behavior Research Methods, 45, 480–498.CrossRefGoogle Scholar
  19. De Deyne, S., Voorspoels, W., Verheyen, S., Navarro, D. J., & Storms, G. (2014). Accounting for graded structure in adjective categories with valence-based opposition relationships. Language and Cognitive Processes, 29, 568–583.Google Scholar
  20. De Deyne, S., Verheyen, S., & Storms, G. (2015). The role of corpus-size and syntax in deriving lexico-semantic representations for a wide range of concepts. Quarterly Journal of Experimental Psychology, 26, 1–22. 10.1080/17470218.2014.994098CrossRefGoogle Scholar
  21. De Deyne, S., Navarro, D. J., Perfors, A., & Storms, G. (2016). Structure at every scale: a semantic network account of the similarities between unrelated concepts. Journal of Experimental Psychology: General, 145, 1228–1254.CrossRefGoogle Scholar
  22. De Deyne, S., Perfors, A., & Navarro, D. (2016). Predicting human similarity judgments with distributional models: The value of word associations. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, (pp. 1861–1870).Google Scholar
  23. De Deyne, S., Navarro, D. J., Collell, G., & Perfors, A. (2018). Visual and emotional grounding in language and mind. Unpublished manuscript.Google Scholar
  24. Deese, J. (1959). On the prediction of occurrence of particular verbal intrusions in immediate recall. Journal of Experimental Psychology, 58(1), 17–22.CrossRefGoogle Scholar
  25. Deese, J. (1965) The structure of associations in language and thought. Baltimore: Johns Hopkins Press.Google Scholar
  26. Devereux, B. J., Tyler, L. K., Geertzen, J., & Randall, B. (2014). The Centre for Speech, Language and the Brain (CSLB) concept property norms. Behavior Research Methods, 46(4), 1119–1127.CrossRefGoogle Scholar
  27. Dubossarsky, H., De Deyne, S., & Hills, T. T. (2017). Quantifying the structure of free association networks across the lifespan. Developmental Psychology, 53(8), 1560–1570.CrossRefGoogle Scholar
  28. Evert, S., & Baroni, M. (2007). Zipfr: Word frequency distributions in R. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Posters and Demonstrations Sessions (pp. 29–32). Prague, Czech Republic.Google Scholar
  29. Fouss, F., Saerens, M., & Shimbo, M. (2016) Algorithms and models for network data and link analysis. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  30. Griffiths, T. L., Steyvers, M., & Firl, A. (2007). Google and the mind. Psychological Science, 18, 1069–1076.CrossRefGoogle Scholar
  31. Gunel, E., & Dickey, J. (1974). Bayes factors for independence in contingency tables. Biometrika, 61, 545–557.CrossRefGoogle Scholar
  32. Halawi, G., Dror, G., Gabrilovich, E., & Koren, Y. (2012). Large-scale learning of word relatedness with constraints. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (pp. 1406–1414).Google Scholar
  33. Herdan, G. (1964). Quantitative linguistics. Butterworth.Google Scholar
  34. Hill, F., Reichart, R., & Korhonen, A. (2016). Simlex-999: Evaluating semantic models with (genuine) similarity estimation. Computational Linguistics, 41, 665–695.CrossRefGoogle Scholar
  35. Hutchison, K. A. (2003). Is semantic priming due to association strength or feature overlap?. Psychonomic Bulletin and Review, 10, 785–813.CrossRefGoogle Scholar
  36. Hutchison, K. A., Balota, D. A., Cortese, M. J., & Watson, J. M. (2008). Predicting semantic priming at the item level. The Quarterly Journal of Experimental Psychology, 61, 1036–1066.CrossRefGoogle Scholar
  37. Hutchison, K. A., Balota, D. A., Neely, J. H., Cortese, M. J., Cohen-Shikora, E. R., Tse, C. S., & Buchanan, E. (2013). The semantic priming project. Behavior Research Methods, 45(4), 1099–1114.CrossRefGoogle Scholar
  38. Jamil, T., Ly, A., Morey, R., Love, J., Marsman, M., & Wagenmakers, E. J. (2017). Default ”Gunel and Dickey” Bayes factors for contingency tables. Behavior Research Methods, 49, 638– 652.CrossRefGoogle Scholar
  39. Jones, M. N., Hills, T. T., & Todd, P. M. (2015). Hidden processes in structural representations: A reply to Abbott, Austerweil, and Griffiths (2015). Psychological Review, 122, 570–574.CrossRefGoogle Scholar
  40. Jones, M. N., Willits, J., Dennis, S., & Jones, M. (2015). Models of semantic memory. In J. Busemeyer, & J. Townsend (Eds.) Oxford Handbook of Mathematical and Computational Psychology (pp. 232–254). Oxford, England: Oxford University Press.Google Scholar
  41. Joyce, T. (2005). Constructing a large-scale database of Japanese word associations. Corpus Studies on Japanese Kanji Glottometrics, 10, 82–98.Google Scholar
  42. Jung, J., Na, L., & Akama, H. (2010). Network analysis of Korean word associations. In: Proceedings of the NAACL HLT 2010 First Workshop on Computational Neurolinguistics, (pp. 27–35).Google Scholar
  43. Kenett, Y. N., Levi, E., Anaki, D., & Faust, M. (2017). The semantic distance task: Quantifying semantic distance with semantic network path length. Journal of Experimental Psychology. Learning, Memory, and Cognition, 43, 1470–1489.CrossRefGoogle Scholar
  44. Kiss, G., Armstrong, C., Milroy, R., & Piper, J. (1973). The computer and literacy studies. In A. J. Aitken, R. W. Bailey, & N. Hamilton-Smith (Eds.) (pp. 153–165): Edinburgh University Press.Google Scholar
  45. Louwerse, M. M. (2011). Symbol interdependency in symbolic and embodied cognition. Topics in Cognitive Science, 3, 273–302.CrossRefGoogle Scholar
  46. Lynott, D., & Connell, L. (2013). Modality exclusivity norms for 400 nouns: The relationship between perceptual experience and surface word form. Behavior Research Methods, 45, 516–526.CrossRefGoogle Scholar
  47. Maki, W. S. (2008). A database of associative strengths from the strength-sampling model: a theory-based supplement to the Nelson, McEvoy, and Schreiber word association norms. Behavior Research Methods, 40, 232–235.CrossRefGoogle Scholar
  48. McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods, 37, 547–559.CrossRefGoogle Scholar
  49. Mitchell, T. M., Shinkareva, S. V., Carlson, A., Chang, K. M., Malave, V. L., Mason, R. A., & Just, M. A. (2008). Predicting human brain activity associated with the meanings of nouns. Science, 320, 1191–1195.CrossRefGoogle Scholar
  50. Mollin, S. (2009). Combining corpus linguistics and psychological data on word co-occurrence: Corpus collocates versus word associations. Corpus Linguistics and Linguistic Theory, 5, 175– 200.CrossRefGoogle Scholar
  51. Morais, A. S., Olsson, H., & Schooler, L. J. (2013). Mapping the structure of semantic memory. Cognitive Science, 37, 125–145.CrossRefGoogle Scholar
  52. Moss, H., & Older, L. (1996). Birkbeck word association norms. Psychology Press.Google Scholar
  53. Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, and Computers, 36, 402–407.CrossRefGoogle Scholar
  54. Nematzadeh, A., Meylan, S. C., & Griffiths, T. L. (2017). Evaluating vector-space models of word representation, or, the unreasonable effectiveness of counting words near other words. In Proceedings of the 39th annual meeting of the Cognitive Science Society, (pp. 859–864).Google Scholar
  55. Newman, M. E. J. (2010) Networks: An introduction. Oxford: Oxford University Press, Inc.CrossRefGoogle Scholar
  56. Pexman, P. M., Heard, A., Lloyd, E., & Yap, M. J. (2017). The Calgary semantic decision project: concrete/abstract decision data for 10,000 English words. Behavior Research Methods, 49, 407– 417.CrossRefGoogle Scholar
  57. Playfoot, D., Balint, T., Pandya, V., Parkes, A., Peters, M., & Richards, S. (2016). Are word association responses really the first words that come to mind? Applied Linguistics amw015.  https://doi.org/10.1093/applin/amw015.
  58. Prior, A., & Bentin, S. (2008). Word associations are formed incidentally during sentential semantic integration. Acta Psychologica, 127, 57–71.CrossRefGoogle Scholar
  59. Radinsky, K., Agichtein, E., Gabrilovich, E., & Markovitch, S. (2011). A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th International Conference on World Wide Web, (pp. 337–346).Google Scholar
  60. Recchia, G., & Jones, M. N. (2009). More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis. Behavior Research Methods, 41, 647–656.CrossRefGoogle Scholar
  61. Roelke, A., Franke, N., Biemann, C., Radach, R., Jacobs, A. M., & Hofmann, M. J. (2018). A novel co-occurrence-based approach to predict pure associative and semantic priming. Psychonomic Bulletin & Review, 25, 1488–1493.CrossRefGoogle Scholar
  62. Rubenstein, H., & Goodenough, J. B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8, 627–633.  https://doi.org/10.1145/365628.365657.CrossRefGoogle Scholar
  63. Schloss, B., & Li, P. (2016). Disentangling narrow and coarse semantic networks in the brain: The role of computational models of word meaning. Behavior Research Methods, 49, 1582–1596.CrossRefGoogle Scholar
  64. Silberer, C., & Lapata, M. (2014). Learning grounded meaning representations with autoencoders, ACL.Google Scholar
  65. Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive Science, 29, 41–78.CrossRefGoogle Scholar
  66. Szalay, L. B., & Deese, J. (1978) Subjective meaning and culture: An assessment through word associations. NJ: Lawrence Erlbaum Hillsdale.Google Scholar
  67. Vankrunkelsven, H., Verheyen, S., Storms, G., & De Deyne, S. (2018). Predicting lexical norms using a word association corpus. Manuscript submitted for publication.Google Scholar
  68. Van Rensbergen, B., De Deyne, S., & Storms, G. (2016). Estimating affective word covariates using word association data. Behavior Research Methods, 48, 1644–1652.CrossRefGoogle Scholar
  69. Vinson, D. P., & Vigliocco, G. (2008). Semantic feature production norms for a large set of objects and events. Behavior Research Methods, 40, 183–190.CrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2018

Authors and Affiliations

  • Simon De Deyne
    • 1
  • Danielle J. Navarro
    • 2
  • Amy Perfors
    • 1
  • Marc Brysbaert
    • 3
  • Gert Storms
    • 4
  1. 1.School of Psychological SciencesUniversity of MelbourneVICAustralia
  2. 2.School of PsychologyUniversity of New South WalesNSWAustralia
  3. 3.Department of PsychologyGhent UniversityGhentBelgium
  4. 4.Department of PsychologyKU LeuvenLeuvenBelgium

Personalised recommendations