The CompLex database presents a large-scale collection of eye-movement studies on English compound-word processing. A combined total of 440 participants completed eye-tracking experiments in which they silently read unspaced English compound words (e.g., goalpost) embedded in sentence contexts (e.g., Dylan hit the goalpost when he was aiming for the net.). Three studies were conducted using participants representing the non-college-bound population (300 participants), and four studies included participants recruited from the student population (140 participants). The database comprises trial-level eye-movement data (47,763 trials), participant data (including a measure of reading experience estimated via the Author Recognition Test), and lexical characteristics for the set of 931 English compound words used as critical stimuli in the studies. One contribution of the present paper is a set of regression analyses conducted on the full database and individual experiments. We report that the most reliable and consistent main effects were those elicited by compound word length, left constituent frequency, right constituent frequency, compound frequency and semantic transparency. Separately, we also found that the effect of left frequency and compound word length is weaker among more frequent compounds. Another contribution is a power analysis, in which we determined the sample sizes required to reliably detect effect sizes that are comparable to those observed in our regression models. These sample size estimates serve as a recommendation for researchers wishing to either collect eye-movement data for compound word reading, or use the current database as a resource for the study of English compound word processing.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
The query was set to TOPIC = compound word* and further manually restricted to fields in language, linguistics and psychology, to filter out the literature on chemical compounds.
Constituent frequencies are estimated for those constituents as stand-alone words. Family size of a compound’s constituent is defined as a number of other compounds sharing that constituent in the same position (Schreuder & Baayen, 1997), e.g., the family of post includes postman, post office, postbag etc.
There is evidence to suggest that the ART is a valid test for younger readers, i.e., readers in our sample. In an item response theory analysis, Moore and Gordon (2015) show that item difficulty is predicted by the frequency of occurrence of authors on Internet webpages. Author names that were more frequently attested in the Google Terabyte N-Gram Corpus of English Web sites (one trillion tokens) tended to be correctly identified more often than less frequently attested author names. This finding may suggest that an individual’s success in the ART may be partly explained by the amount of that individual’s cultural knowledge that is acquired from experience with internet texts.
Following Schramm and Rouder (2019) recent investigation of the appropriacy of analyzing log-transformed response time data, we conducted all analyses without log-transforming the dependent variables. We found that all of the effects observed with logged durational measures were also found when predicting raw non-transformed measures. This was the case for effects in the full database and across individual experiments. Our motivation for examining log-transformed variables was that all durational measures were positively skewed. These distributions violate the normality assumption of linear regression models. We wanted to keep in check the overly influential values in our empirical distributions that may bias model fit. Though Schramm and Rouder present arguments against the transformation of response time measures when fitting statistical models, the reason for our choice of using logged response times is that this procedure is in keeping with the current convention of the field. If best practices in the field change, then future studies using the current database will be able to model compound word reading without log transforming durational eye-movement measures (or performing any other corrective transformation).
Acheson, D.J., Wells, J.B., & MacDonald, M.C. (2008). New and updated tests of print exposure and reading abilities in college students. Behavior Research Methods, 40(1), 278–289.
Baayen, R.H. (2010). A real experiment is a factorial experiment. The Mental Lexicon, 5(1), 149–157.
Baayen, R.H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (release 2). Distributed by the Linguistic Data Consortium, University of Pennsylvania.
Baayen, R.H., Kuperman, V., & Bertram, R. (2010) Frequency effects in compound processing, (pp. 257–270). Compounding, Amsterdam/Philadelphia: Benjamins.
Balota, D., Cortese, M., Hutchison, K., Neely, J., Nelson, D., Simpson, G., & Treiman, R. (2002). The English Lexicon Project: A web-based repository of descriptive and behavioral measures for 40,481 English words and nonwords. Washington University. Online: http://elexicon.wustl.edu..
Balota, D.A., Yap, M.J., Hutchison, K.A., Cortese, M.J., Kessler, B., Loftis, B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300.
Bertram, R. (2011). Eye movements and morphological processing in reading. The Mental Lexicon, 6(1), 83–109.
Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition, 2, 1.
Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.
Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1, 1.
Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42(3), 441.
Button, K.S., Ioannidis, J.P., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S., & Munafò, M.R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365.
Choi, W., Lowder, M.W., Ferreira, F., & Henderson, J.M. (2015). Individual differences in the perceptual span during reading: Evidence from the moving window technique. Attention, Perception, & Psychophysics, 77(7), 2463–2475.
Cohen, J. (2013). Statistical power analysis for the behavioral sciences. Routledge.
Cutter, M.G., Drieghe, D., & Liversedge, S.P. (2014). Preview benefit in English spaced compounds. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(6), 1778.
De Jong, N.H., Schreuder, R., & Harald Baayen, R. (2000). The morphological family size effect and morphology. Language and Cognitive Processes, 15(4-5), 329–365.
Ernestus, M., & Cutler, A. (2015). BALDEY: A database of auditory lexical decisions. The Quarterly Journal of Experimental Psychology, 68(8), 1469–1488.
Falkauskas, K., & Kuperman, V. (2015). When experience meets language statistics: Individual variability in processing English compound words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(6), 1607–1627.
Fox, J, Weisberg, S, Adler, D, Bates, D, Baud-Bovy, G, Ellison, S, ..., et al (2012). Package ‘car’. Vienna: R Foundation for Statistical Computing.
Fox, J., Weisberg, S., Friendly, M., Hong, J., Andersen, R., Firth, D., & Fox, M.J (2019). Package ‘effects’.
Frisson, S., Niswander-Klement, E., & Pollatsek, A. (2008). The role of semantic transparency in the processing of English compound words. British Journal of Psychology, 99(1), 87–107.
Gagné, C.L., Spalding, T.L., & Schmidtke, D. (2019). LADEC: The large database of English compounds. Behavior Research Methods, 1–28.
Green, P., & MacLeod, C.J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498.
Hyönä, J., & Olson, R.K. (1995). Eye fixation patterns among dyslexic and normal readers: Effects of word length and word frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(6), 1430.
Ioannidis, J.P. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.
Juhasz, B.J. (2018). Experience with compound words influences their processing: An eye movement investigation with English compound words. Quarterly Journal of Experimental Psychology, 71(1), 103–112.
Juhasz, B.J., & Berkowitz, R.N. (2011). Effects of morphological families on English compound word recognition: A multitask investigation. Language and Cognitive Processes, 26(4–6), 653–682.
Juhasz, B.J., Inhoff, A.W., & Rayner, K. (2005). The role of interword spaces in the processing of English compound words. Language and Cognitive Processes, 20(1–2), 291–316.
Juhasz, B.J., Pollatsek, A., Hyönä, J., Drieghe, D., & Rayner, K. (2009). Parafoveal processing within and between words. The Quarterly Journal of Experimental Psychology, 62(7), 1356–1376.
Keuleers, E., & Balota, D.A. (2015). Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments (Vol. 68) (8). Taylor & Francis.
Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287–304.
Kim, S.Y., Yap, M.J., & Goh, W.D. (2018). The role of semantic transparency in visual word recognition of compound words: A megastudy approach. Behavior Research Methods: 1–11.
Kuperman, V. (2013). Accentuate the positive: Semantic access in English compounds. Frontiers in Psychology, 4, 203.
Kuperman, V. (2015). Virtual experiments in megastudies: a case study of language and emotion. The Quarterly Journal of Experimental Psychology, 68(8), 1693–1710.
Kuperman, V., & Bertram, R. (2013). Moving spaces: Spelling alternation in English noun-noun compounds. Language and Cognitive Processes, 28(7), 939–966.
Kuperman, V., Bertram, R., & Baayen, R.H. (2008). Morphological dynamics in compound processing. Language and Cognitive Processes, 23(7–8), 1089–1132.
Kuperman, V., Schreuder, R., Bertram, R., & Baayen, R.H. (2009). Reading polymorphemic Dutch compounds: Toward a multiple route model of lexical processing. Journal of Experimental Psychology: Human Perception and Performance, 35(3), 876.
Kuperman, V., & Van Dyke, J.A. (2011). Effects of individual differences in verbal skills on eye-movement patterns during sentence reading. Journal of Memory and Language, 65(1), 42–73.
Kuperman, V., Estes, Z., Brysbaert, M., & Warriner, A.B. (2014). Emotion and language: Valence and arousal affect word recognition. Journal of Experimental Psychology: General, 143(3), 1065.
Kuperman, V., Matsuki, K., & Van Dyke, J.A. (2018). Contributions of reader-and text-level characteristics to eye-movement patterns during passage reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(11), 1687.
Kutner, M., Nachtsheim, C., & Neter, J. (2004) Simultaneous inferences and other topics in regression analysis. Applied linear regression models, (4th edn.), (pp. 168–170). New York: McGraw-Hill Irwin.
Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.
Libben, G. (2006). Why study compound processing? An overview of the issues. 1–23.
Libben, G. (2014). The nature of compounds: A psychocentric perspective. Cognitive Neuropsychology, 31 (1–2), 8–25.
Liversedge, S.P., Blythe, H.I., & Drieghe, D. (2012). Beyond isolated word recognition. Behavioral and Brain Sciences, 35(5), 293–294.
Lowder, M.W., & Gordon, P.C. (2017). Print exposure modulates the effects of repetition priming during sentence reading. Psychonomic Bulletin & Review, 24(6), 1935–1942.
Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92, 57–78.
Moore, M., & Gordon, P.C. (2015). Reading ability and print exposure: Item response theory analysis of the author recognition test. Behavior Research Methods, 47(4), 1095–1109.
Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental psychology: Human Perception and performance, 10(5), 667.
Perfetti, C.A., & Hart, L. (2002). The lexical quality hypothesis. Precursors of Functional Literacy, 11, 67–86.
Perugini, M., Gallucci, M., & Costantini, G. (2018). A practical primer to power analysis for simple experimental designs. International Review of Social Psychology, 31, 1.
Rau, A.K., Moeller, K., & Landerl, K. (2014). The transition from sublexical to lexical processing in a consistent orthography: An eye-tracking study. Scientific Studies of Reading, 18(3), 224–233.
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372.
Rayner, K. (2009). Eye movements and attention in reading, scene perception, and visual search. The Quarterly Journal of Experimental Psychology, 62(8), 1457–1506.
Schmid, H.J. (2010). Does frequency in text instantiate entrenchment in the cognitive system. Quantitative Methods in Cognitive Semantics: Corpus-Driven Approaches, 101–133.
Schmidtke, D., & Kuperman, V. (2019). A paradox of apparent brainless behavior: The time-course of compound word recognition. Cortex, 116, 250–267.
Schmidtke, D., Kuperman, V., Gagné, C. L., & Spalding, T.L. (2016). Competition between conceptual relations affects compound recognition: The role of entropy. Psychonomic Bulletin & Review, 23(2), 556–570.
Schmidtke, D., Van Dyke, J.A., & Kuperman, V. (2018a). Individual variability in the semantic processing of English compound words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(3), 421.
Schmidtke, D., Gagné, C.L., Kuperman, V., Spalding, T.L., & Tucker, B.V. (2018b). Conceptual relations compete during auditory and visual compound word recognition. Language, Cognition and Neuroscience, 33 (7), 923–942.
Schmidtke, D., Gagné, C.L., Kuperman, V., & Spalding, T.L. (2018c). Language experience shapes relational knowledge of compound words. Psychonomic Bulletin & Review, 25(4), 1468–1487.
Schotter, E.R., Lee, M., Reiderman, M., & Rayner, K. (2015). The effect of contextual constraint on parafoveal processing in reading. Journal of Memory and Language, 83, 118–139.
Schramm, P., & Rouder, J. (2019). Are reaction time transformations really beneficial? PsyArXiv March 5.
Schreuder, R., & Baayen, R.H. (1997). How complex simplex words can be? Journal of Memory and Language, 37(1), 118–139.
Stanovich, K.E., & West, R.F. (1989). Exposure to print and orthographic processing. Reading Research Quarterly, 402–433.
Staub, A. (2015). The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Language and Linguistics Compass, 9(8), 311–327.
Staub, A., Rayner, K., Pollatsek, A., Hyönä, J., & Majewski, H. (2007). The time course of plausibility effects on eye movements in reading: Evidence from noun-noun compounds. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(6), 1162.
Tiffin-Richards, S.P., & Schroeder, S. (2015). Children’s and adults’ parafoveal processes in German: Phonological and orthographic effects. Journal of Cognitive Psychology, 27(5), 531–548.
Tomaschek, F., Hendrix, P., & Baayen, R.H. (2018). Strategies for addressing collinearity in multivariate linguistic data. Journal of Phonetics, 71, 249–267.
Tucker, B.V., Brenner, D., Danielson, D.K., Kelley, M.C., Nenadić, F., & Sims, M. (2018). The massive auditory lexical decision (MALD) database. Behavior Research Methods, 1–18.
Underwood, G., Petley, K., & Clews, S. (1990). Searching for information during sentence comprehension. In R. Gruner, G. d’Ydewalle, & R. Parham (Eds.) From eye to mind: Information acquisition in perception (pp. 191–203). Amsterdam.
van Heuven, W.J., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. The Quarterly Journal of Experimental Psychology, 67(6), 1176–1190.
von der Malsburg, T., & Angele, B. (2017). False positives and other statistical errors in standard analyses of eye movements in reading. Journal of Memory and Language, 94, 119–133.
Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45(4), 1191–1207.
Wickham, H. (2009) ggplot2: Elegant graphics for data analysis. New York: Springer. http://ggplot2.org.
Daniel Schmidtke, McMaster English Language Development Diploma (MELD) program, Department of Linguistics and Language, McMaster University; Julie A. Van Dyke, Haskins Laboratories, New Haven Connecticut; Victor Kuperman, Department of Linguistics and Language, McMaster University. Daniel Schmidtke’s contribution was partially completed during his PhD studies, which was supported by the Ontario Trillium Award and a Graduate fellowship awarded by the Lewis & Ruth Sherman Centre for Digital Scholarship (McMaster University). The remainder of Daniel Schmidtke’s contribution was supported by a Post-doctoral fellowship appointment at McMaster English Language Development Diploma (MELD) programme, Faculty of Humanities, McMaster University, Canada. Victor Kuperman’s contribution was partially supported by the Canadian NSERC Discovery grant RGPIN/402395-2012 415 (Kuperman, PI), the Ontario Early Researcher award (Kuperman, PI), the Canada Research Chair (Tier 2; Kuperman, PI), the SSHRC Partnership Training Grant 895-2016-1008 (Libben, PI), and the CFI Leaders Opportunity Fund (Kuperman, PI). Julie A. Van Dyke’s contribution was supported by the following NIH grants to Haskins Laboratories: R01 HD-073288 (Julie A. Van Dyke, PI), and P01 HD-01994 (Jay G. Rueckl, PI).
We are thankful to Noor Al-Zanoon, Morgan Bontrager, Emma Bridgwater, Kaitlin Falkauskas, Irena Grusecki, Brooke Osborne, Sadaf Rahmanian, Katrina Reyes, Aaron So, Heidi Sarles-Whittlesey and Chloe Sukkau for data collection.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Schmidtke, D., Van Dyke, J. & Kuperman, V. CompLex: an eye-movement database of compound word reading in English. Behav Res (2020). https://doi.org/10.3758/s13428-020-01397-1
- Eye movements
- Compound words
- Semantic transparency