Skip to main content

SCOPE: The South Carolina psycholinguistic metabase

Abstract

The number of databases that provide various measurements of lexical properties for psycholinguistic research has increased rapidly in recent years. The proliferation of lexical variables, and the multitude of associated databases, makes the choice, comparison, and standardization of these variables in psycholinguistic research increasingly difficult. Here, we introduce The South Carolina Psycholinguistic Metabase (SCOPE), which is a metabase (or a meta-database) containing an extensive, curated collection of psycholinguistic variable values from major databases. The metabase currently contains 245 lexical variables, organized into seven major categories: General (e.g., frequency), Orthographic (e.g., bigram frequency), Phonological (e.g., phonological uniqueness point), Orth-Phon (e.g., consistency), Semantic (e.g., concreteness), Morphological (e.g., number of morphemes), and Response variables (e.g., lexical decision latency). We hope that SCOPE will become a valuable resource for researchers in psycholinguistics and affiliated disciplines such as cognitive neuroscience of language, computational linguistics, and communication disorders. The availability and ease of use of the metabase with comprehensive set of variables can facilitate the understanding of the unique contribution of each of the variables to word processing, and that of interactions between variables, as well as new insights and development of improved models and theories of word processing. It can also help standardize practice in psycholinguistics. We demonstrate use of the metabase by measuring relationships between variables in multiple ways and testing their individual contribution towards a number of dependent measures, in the most comprehensive analysis of this kind to date. The metabase is freely available at go.sc.edu/scope.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Data availability

The data for the present study can be accessed at go.sc.edu/scope.

Code availability

The code for the present study can be accessed at https://osf.io/9qbjz/.

References

  • Adelman, J. S., & Brown, G. D. (2007). Phonographic neighbors, not orthographic neighbors, determine word naming latencies. Psychonomic Bulletin & Review, 14(3), 455–459.

    Article  Google Scholar 

  • Adelman, J. S., Brown, G. D., & Quesada, J. F. (2006). Contextual diversity, not word frequency, determines word-naming and lexical decision times. Psychological Science, 17(9), 814–823.

    Article  PubMed  Google Scholar 

  • Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1996). The CELEX lexical database (CD-ROM). Linguistic Data Consortium.

    Google Scholar 

  • Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445–459.

    Article  PubMed  Google Scholar 

  • Binder, J. R., Conant, L. L., Humphries, C. J., Fernandino, L., Simons, S. B., Aguilar, M., & Desai, R. H. (2016). Toward a brain-based componential semantic representation. Cognitive Neuropsychology, 33(3–4), 130–174.

    Article  PubMed  Google Scholar 

  • Bird, H., Franklin, S., & Howard, D. (2001). Age of acquisition and imageability ratings for a large set of words, including verbs and function words. Behavior Research Methods, Instruments, & Computers, 33(1), 73–79.

    Article  Google Scholar 

  • Brysbaert, M. (2017). Age of acquisition ratings score better on criterion validity than frequency trajectory or ratings “corrected” for frequency. Quarterly Journal of Experimental Psychology, 70(7), 1129–1139.

    Article  Google Scholar 

  • Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.

    Article  PubMed  Google Scholar 

  • Brysbaert, M., New, B., & Keuleers, E. (2012). Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods, 44(4), 991–997.

    Article  PubMed  Google Scholar 

  • Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3), 904–911.

    Article  PubMed  Google Scholar 

  • Brysbaert, M., Mandera, P., & Keuleers, E. (2018). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27(1), 45–50.

    Article  Google Scholar 

  • Brysbaert, M., Mandera, P., McCormick, S. F., & Keuleers, E. (2019). Word prevalence norms for 62,000 English lemmas. Behavior Research Methods, 51(2), 467–479.

    Article  PubMed  Google Scholar 

  • Buchanan, E. M., Valentine, K. D., & Maxwell, N. P. (2019). English semantic feature production norms: An extended database of 4436 concepts. Behavior Research Methods, 51(4), 1849–1863.

    Article  PubMed  Google Scholar 

  • Caramazza, A., Laudanna, A., & Romani, C. (1988). Lexical access and inflectional morphology. Cognition, 28(3), 297–332.

    Article  PubMed  Google Scholar 

  • Chee, Q. W., Chow, K. J., Goh, W. D., & Yap, M. J. (2021). LexiCAL: A calculator for lexical variables. Plos One, 16(4), e0250891.

    Article  PubMed  PubMed Central  Google Scholar 

  • Chee, Q. W., Chow, K. J., Yap, M. J., & Goh, W. D. (2020). Consistency norms for 37,677 English words. Behavior Research Methods, 52(6), 2535–2555.

    Article  PubMed  Google Scholar 

  • Clark, J. M., & Paivio, A. (2004). Extensions of the Paivio, Yuille, and Madigan (1968) norms. Behavior Research Methods, Instruments, & Computers, 36(3), 371–383.

    Article  Google Scholar 

  • Coltheart, M., Davelaar, E., Jonasson, J. T., & Besner, D. (1977). Access to the internal lexicon. In Attention and performance VI (pp. 535-555). Routledge.

  • Cortese, M. J., & Fugett, A. (2004). Imageability ratings for 3,000 monosyllabic words. Behavior Research Methods, Instruments, & Computers, 36(3), 384–387.

    Article  Google Scholar 

  • Crawford, A. V., Green, S. B., Levy, R., Lo, W.-J., Scott, L., Svetina, D., & Thompson, M. S. (2010). Evaluation of parallel analysis methods for determining the number of factors. Educational and Psychological Measurement, 70(6), 885–901.

    Article  Google Scholar 

  • Dale, E., & O’Rourke, J. (1981). The living word vocabulary, the words we know: A national vocabulary inventory. World book .

    Google Scholar 

  • De Deyne, S., Navarro, D. J., Perfors, A., Brysbaert, M., & Storms, G. (2019). The “Small World of Words” English word association norms for over 12,000 cue words. Behavior Research Methods, 51(3), 987–1006.

    Article  PubMed  Google Scholar 

  • Diveica, V., Pexman, P. M., & Binney, R. J. (2022). Quantifying social semantics: An inclusive definition of socialness and ratings for 8388 English words. Behavior Research Methods. https://doi.org/10.3758/s13428-022-01810-x

  • Engelthaler, T., & Hills, T. T. (2018). Humor norms for 4,997 English words. Behavior Research Methods, 50(3), 1116–1124.

    Article  PubMed  Google Scholar 

  • Epskamp, S., & Fried, E. I. (2018). A tutorial on regularized partial correlation networks. Psychological Methods, 23(4), 617.

    Article  PubMed  Google Scholar 

  • Epskamp, S., Borsboom, D., & Fried, E. I. (2018). Estimating psychological networks and their accuracy: A tutorial paper. Behavior Research Methods, 50(1), 195–212.

    Article  PubMed  Google Scholar 

  • Fernandino, L., Tong, J. Q., Conant, L. L., Humphries, C. J., & Binder, J. R. (2022). Decoding the information structure underlying the neural representation of concepts. Proceedings of the National Academy of Sciences of the United States of America, 119(6). https://doi.org/10.1073/pnas.2108091119

  • Gilhooly, K. J., & Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity, and ambiguity measures for 1,944 words. Behavior Research Methods & Instrumentation, 12(4), 395–427.

    Article  Google Scholar 

  • Gimenes, M., & New, B. (2016). Worldlex: Twitter and blog word frequencies for 66 languages. Behavior Research Methods, 48(3), 963–972.

    Article  PubMed  Google Scholar 

  • Goh, W. D., Yap, M. J., & Chee, Q. W. (2020). The Auditory English Lexicon Project: A multi-talker, multi-region psycholinguistic database of 10,170 spoken words and nonwords. Behavior Research Methods, 52(5), 2202–2231.

    Article  PubMed  Google Scholar 

  • Goldstein, R., & Vitevitch, M. S. (2014). The influence of clustering coefficient on word-learning: how groups of similar sounding words facilitate acquisition. Frontiers in Psychology, 5, 1307.

    Article  PubMed  PubMed Central  Google Scholar 

  • Graves, W. W., Desai, R., Humphries, C., Seidenberg, M. S., & Binder, J. R. (2010). Neural systems for reading aloud: A multiparametric approach. Cerebral cortex, 20(8), 1799-1815.

    Article  PubMed  Google Scholar 

  • Hauk, O., Davis, M. H., Ford, M., Pulvermüller, F., & Marslen-Wilson, W. D. (2006). The time course of visual word recognition as revealed by linear regression analysis of ERP data. Neuroimage, 30(4), 1383–1400.

    Article  PubMed  Google Scholar 

  • Hoffman, P., Ralph, M. A. L., & Rogers, T. T. (2013). Semantic diversity: A measure of semantic ambiguity based on variability in the contextual usage of words. Behavior Research Methods, 45(3), 718–730.

    Article  PubMed  Google Scholar 

  • Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185.

    Article  PubMed  Google Scholar 

  • Juhasz, B. J., & Yap, M. J. (2013). Sensory experience ratings for over 5,000 mono- and disyllabic words. Behavior Research Methods, 45(1), 160–168.

    Article  PubMed  Google Scholar 

  • Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287–304.

    Article  PubMed  Google Scholar 

  • Khanna, M. M., & Cortese, M. J. (2021). How well imageability, concreteness, perceptual strength, and action strength predict recognition memory, lexical decision, and reading aloud performance. Memory, 29(5), 622–636.

    Article  PubMed  Google Scholar 

  • Kučera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Brown University Press.

    Google Scholar 

  • Kuperman, V., Bertram, R., & Baayen, R. H. (2008). Morphological dynamics in compound processing. Language & Cognitive Processes, 23(7–8), 1089–1132.

    Article  Google Scholar 

  • Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978–990.

    Article  PubMed  Google Scholar 

  • Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211.

    Article  Google Scholar 

  • Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady.

    Google Scholar 

  • Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28(2), 203–208.

    Article  Google Scholar 

  • Lynott, D., Connell, L., Brysbaert, M., Brand, J., & Carney, J. (2020). The Lancaster Sensorimotor Norms: Multidimensional measures of Perceptual and Action Strength for 40,000 English words. Behavior Research Methods, 52(3), 1271–1291.

    Article  PubMed  Google Scholar 

  • Mandera, P., Keuleers, E., & Brysbaert, M. (2020). Recognition times for 62 thousand English words: Data from the English Crowdsourcing Project. Behavior Research Methods, 52(2), 741–760.

    Article  PubMed  Google Scholar 

  • Medler, D.A., & Binder, J.R. (2005). MCWord: An on-line orthographic database of the English language. http://www.neuro.mcw.edu/mcword/

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

  • Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Mohammad, S., & Turney, P. (2010). Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text.

  • Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence, 29(3), 436–465.

    Article  Google Scholar 

  • Monsell, S., Doyle, M. C., & Haggard, P. N. (1989). Effects of frequency on visual word recognition tasks: Where are they? Journal of Experimental Psychology: General, 118(1), 43.

    Article  Google Scholar 

  • Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? Journal of Classification, 31(3), 274–295.

    Article  Google Scholar 

  • Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers, 36(3), 402–407.

    Article  Google Scholar 

  • Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology, 76(1p2), 1.

    Article  Google Scholar 

  • Peereman, R., & Content, A. (1997). Orthographic and phonological neighborhoods in naming: Not all neighbors are equally influential in orthographic space. Journal of Memory and Language, 37(3), 382–410.

    Article  Google Scholar 

  • Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)

  • Pereira, F., Gershman, S., Ritter, S., & Botvinick, M. (2016). A comparative evaluation of off-the-shelf distributed semantic representations for modelling behavioural data. Cognitive Neuropsychology, 33(3–4), 175–190.

    Article  PubMed  Google Scholar 

  • Pexman, P. M., Heard, A., Lloyd, E., & Yap, M. J. (2017). The Calgary semantic decision project: concrete/abstract decision data for 10,000 English words. Behavior Research Methods, 49(2), 407–417.

    Article  PubMed  Google Scholar 

  • Pexman, P. M., Muraki, E., Sidhu, D. M., Siakaluk, P. D., & Yap, M. J. (2019). Quantifying sensorimotor experience: Body–object interaction ratings for more than 9,000 English words. Behavior Research Methods, 51(2), 453–466.

    Article  PubMed  Google Scholar 

  • Reilly, M., & Desai, R. H. (2017). Effects of semantic neighborhood density in abstract and concrete words. Cognition, 169, 46–53.

    Article  PubMed  PubMed Central  Google Scholar 

  • Rice, C. A., Beekhuizen, B., Dubrovsky, V., Stevenson, S., & Armstrong, B. C. (2019). A comparison of homonym meaning frequency estimates derived from movie and television subtitles, free association, and explicit ratings. Behavior Research Methods, 51(3), 1399–1425.

    Article  PubMed  Google Scholar 

  • Roller, S., & Erk, K. (2016). Relations such as hypernymy: Identifying and exploiting Hearst patterns in distributional vectors for lexical entailment. arXiv preprint arXiv:1605.05433.

  • Sánchez-Gutiérrez, C. H., Mailhot, H., Deacon, S. H., & Wilson, M. A. (2018). MorphoLex: A derivational morphological database for 70,000 English words. Behavior Research Methods, 50(4), 1568–1580.

    Article  PubMed  Google Scholar 

  • Scott, G. G., Keitel, A., Becirspahic, M., Yao, B., & Sereno, S. C. (2019). The Glasgow Norms: Ratings of 5,500 words on nine scales. Behavior Research Methods, 51(3), 1258–1270.

    Article  PubMed  Google Scholar 

  • Seidenberg, M. S. (2012). Computational models of reading: connectionist and dual-route approaches. In M. Spivey, K. McRae, & M. Joanisse (Eds.), Cambridge Handbook of Psycholinguistics (pp. 186–203). Cambridge University Press.

    Chapter  Google Scholar 

  • Seidenberg, M. S., Waters, G. S., Barnes, M. A., & Tanenhaus, M. K. (1984). When does irregular spelling or pronunciation influence word recognition? Journal of Verbal Learning and Verbal Behavior, 23(3), 383–404.

    Article  Google Scholar 

  • Shaoul, C., & Westbury, C. (2006). Word frequency effects in high-dimensional co-occurrence models: A new approach. Behavior Research Methods, 38(2), 190–195.

    Article  PubMed  Google Scholar 

  • Shaoul, C., & Westbury, C. (2010). Exploring lexical co-occurrence space using HiDEx. Behavior Research Methods, 42(2), 393–413.

    Article  PubMed  Google Scholar 

  • Taylor, J. E., Beith, A., & Sereno, S. C. (2020). LexOPS: An R package and user interface for the controlled generation of word stimuli. Behavior Research Methods, 52(6), 2372–2382.

    Article  PubMed  PubMed Central  Google Scholar 

  • Toglia, M. P., & Battig, W. F. (1978). Handbook of semantic word norms. Lawrence Erlbaum.

    Google Scholar 

  • Tucker, B. V., Brenner, D., Danielson, D. K., Kelley, M. C., Nenadić, F., & Sims, M. (2019). The massive auditory lexical decision (MALD) database. Behavior Research Methods, 51(3), 1187–1204.

    Article  PubMed  Google Scholar 

  • Vaden, K. I., Halpin, H. R., & Hickok, G. S. (2009). Irvine phonotactic online dictionary. Version 2.0. [Data file].

  • Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

    Google Scholar 

  • Van Heuven, W. J., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. Quarterly Journal of Experimental Psychology, 67(6), 1176–1190.

    Article  Google Scholar 

  • Vinson, D. P., & Vigliocco, G. (2008). Semantic feature production norms for a large set of objects and events. Behavior Research Methods, 40(1), 183–190.

    Article  PubMed  Google Scholar 

  • Vitevitch, M. S., & Luce, P. A. (1999). Probabilistic phonotactics and neighborhood activation in spoken word recognition. Journal of Memory and Language, 40(3), 374–408.

    Article  Google Scholar 

  • Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45(4), 1191–1207.

    Article  PubMed  Google Scholar 

  • Weide, R. (2005). The Carnegie Mellon pronouncing dictionary [cmudict. 0.6]. Carnegie Mellon University.

    Google Scholar 

  • Yarkoni, T., Balota, D., & Yap, M. (2008). Moving beyond Coltheart’s N: A new measure of orthographic similarity. Psychonomic Bulletin & Review, 15(5), 971–979.

    Article  Google Scholar 

Download references

Acknowledgements

In addition to the authors of publicly available datasets, we thank Marc Brysbaert, Chee Qian Wen, and Michael Vitevitch for sharing data.

Funding

This work was supported by NIH/NIDCD grants R01DC017162, R01DC017162-02S1, and R56DC010783 (RHD), and a Radboud Excellence fellowship from Radboud University in Nijmegen, the Netherlands (CG).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Svetlana V. Shinkareva or Rutvik H. Desai.

Ethics declarations

Conflicts of interest

None.

Ethics approval

This study does not involve any data collection; therefore, no ethics approval is needed.

Consent to participate

This study does not involve any data collection; therefore, no consent to participate is needed.

Consent for publication

All authors approve for this publication.

Additional information

Open practices statement

The data for the present study can be accessed at go.sc.edu/scope. The code for the present study can be accessed at https://osf.io/9qbjz/.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gao, C., Shinkareva, S.V. & Desai, R.H. SCOPE: The South Carolina psycholinguistic metabase. Behav Res (2022). https://doi.org/10.3758/s13428-022-01934-0

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-022-01934-0

Keywords

  • Psycholinguistic
  • Database
  • Lexical characteristics
  • Word recognition