CompLex: an eye-movement database of compound word reading in English

Abstract

The CompLex database presents a large-scale collection of eye-movement studies on English compound-word processing. A combined total of 440 participants completed eye-tracking experiments in which they silently read unspaced English compound words (e.g., goalpost) embedded in sentence contexts (e.g., Dylan hit the goalpost when he was aiming for the net.). Three studies were conducted using participants representing the non-college-bound population (300 participants), and four studies included participants recruited from the student population (140 participants). The database comprises trial-level eye-movement data (47,763 trials), participant data (including a measure of reading experience estimated via the Author Recognition Test), and lexical characteristics for the set of 931 English compound words used as critical stimuli in the studies. One contribution of the present paper is a set of regression analyses conducted on the full database and individual experiments. We report that the most reliable and consistent main effects were those elicited by compound word length, left constituent frequency, right constituent frequency, compound frequency and semantic transparency. Separately, we also found that the effect of left frequency and compound word length is weaker among more frequent compounds. Another contribution is a power analysis, in which we determined the sample sizes required to reliably detect effect sizes that are comparable to those observed in our regression models. These sample size estimates serve as a recommendation for researchers wishing to either collect eye-movement data for compound word reading, or use the current database as a resource for the study of English compound word processing.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

Notes

  1. 1.

    The query was set to TOPIC = compound word* and further manually restricted to fields in language, linguistics and psychology, to filter out the literature on chemical compounds.

  2. 2.

    Eye-tracking studies that examine spaced English compound word processing include (Cutter et al., 2014; Frisson et al., 2008; Juhasz et al., 2009)

  3. 3.

    Constituent frequencies are estimated for those constituents as stand-alone words. Family size of a compound’s constituent is defined as a number of other compounds sharing that constituent in the same position (Schreuder & Baayen, 1997), e.g., the family of post includes postman, post office, postbag etc.

  4. 4.

    There is evidence to suggest that the ART is a valid test for younger readers, i.e., readers in our sample. In an item response theory analysis, Moore and Gordon (2015) show that item difficulty is predicted by the frequency of occurrence of authors on Internet webpages. Author names that were more frequently attested in the Google Terabyte N-Gram Corpus of English Web sites (one trillion tokens) tended to be correctly identified more often than less frequently attested author names. This finding may suggest that an individual’s success in the ART may be partly explained by the amount of that individual’s cultural knowledge that is acquired from experience with internet texts.

  5. 5.

    Following Schramm and Rouder (2019) recent investigation of the appropriacy of analyzing log-transformed response time data, we conducted all analyses without log-transforming the dependent variables. We found that all of the effects observed with logged durational measures were also found when predicting raw non-transformed measures. This was the case for effects in the full database and across individual experiments. Our motivation for examining log-transformed variables was that all durational measures were positively skewed. These distributions violate the normality assumption of linear regression models. We wanted to keep in check the overly influential values in our empirical distributions that may bias model fit. Though Schramm and Rouder present arguments against the transformation of response time measures when fitting statistical models, the reason for our choice of using logged response times is that this procedure is in keeping with the current convention of the field. If best practices in the field change, then future studies using the current database will be able to model compound word reading without log transforming durational eye-movement measures (or performing any other corrective transformation).

References

  1. Acheson, D.J., Wells, J.B., & MacDonald, M.C. (2008). New and updated tests of print exposure and reading abilities in college students. Behavior Research Methods, 40(1), 278–289.

    PubMed Central  Google Scholar 

  2. Baayen, R.H. (2010). A real experiment is a factorial experiment. The Mental Lexicon, 5(1), 149–157.

    Google Scholar 

  3. Baayen, R.H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (release 2). Distributed by the Linguistic Data Consortium, University of Pennsylvania.

  4. Baayen, R.H., Kuperman, V., & Bertram, R. (2010) Frequency effects in compound processing, (pp. 257–270). Compounding, Amsterdam/Philadelphia: Benjamins.

    Google Scholar 

  5. Balota, D., Cortese, M., Hutchison, K., Neely, J., Nelson, D., Simpson, G., & Treiman, R. (2002). The English Lexicon Project: A web-based repository of descriptive and behavioral measures for 40,481 English words and nonwords. Washington University. Online: http://elexicon.wustl.edu..

  6. Balota, D.A., Yap, M.J., Hutchison, K.A., Cortese, M.J., Kessler, B., Loftis, B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39(3), 445–459.

    PubMed  Google Scholar 

  7. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 289–300.

    Google Scholar 

  8. Bertram, R. (2011). Eye movements and morphological processing in reading. The Mental Lexicon, 6(1), 83–109.

    Google Scholar 

  9. Brysbaert, M. (2019). How many participants do we have to include in properly powered experiments? A tutorial of power analysis with reference tables. Journal of Cognition, 2, 1.

    Google Scholar 

  10. Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990.

    PubMed  Google Scholar 

  11. Brysbaert, M., & Stevens, M. (2018). Power analysis and effect size in mixed effects models: A tutorial. Journal of Cognition, 1, 1.

    Google Scholar 

  12. Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human Perception and Performance, 42(3), 441.

    PubMed  Google Scholar 

  13. Button, K.S., Ioannidis, J.P., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S., & Munafò, M.R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365.

    PubMed  PubMed Central  Google Scholar 

  14. Choi, W., Lowder, M.W., Ferreira, F., & Henderson, J.M. (2015). Individual differences in the perceptual span during reading: Evidence from the moving window technique. Attention, Perception, & Psychophysics, 77(7), 2463–2475.

    Google Scholar 

  15. Cohen, J. (2013). Statistical power analysis for the behavioral sciences. Routledge.

  16. Cutter, M.G., Drieghe, D., & Liversedge, S.P. (2014). Preview benefit in English spaced compounds. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(6), 1778.

    PubMed  Google Scholar 

  17. De Jong, N.H., Schreuder, R., & Harald Baayen, R. (2000). The morphological family size effect and morphology. Language and Cognitive Processes, 15(4-5), 329–365.

    Google Scholar 

  18. Ernestus, M., & Cutler, A. (2015). BALDEY: A database of auditory lexical decisions. The Quarterly Journal of Experimental Psychology, 68(8), 1469–1488.

    PubMed  Google Scholar 

  19. Falkauskas, K., & Kuperman, V. (2015). When experience meets language statistics: Individual variability in processing English compound words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(6), 1607–1627.

    PubMed  Google Scholar 

  20. Fox, J, Weisberg, S, Adler, D, Bates, D, Baud-Bovy, G, Ellison, S, ..., et al (2012). Package ‘car’. Vienna: R Foundation for Statistical Computing.

  21. Fox, J., Weisberg, S., Friendly, M., Hong, J., Andersen, R., Firth, D., & Fox, M.J (2019). Package ‘effects’.

  22. Frisson, S., Niswander-Klement, E., & Pollatsek, A. (2008). The role of semantic transparency in the processing of English compound words. British Journal of Psychology, 99(1), 87–107.

    PubMed  Google Scholar 

  23. Gagné, C.L., Spalding, T.L., & Schmidtke, D. (2019). LADEC: The large database of English compounds. Behavior Research Methods, 1–28.

  24. Green, P., & MacLeod, C.J. (2016). SIMR: An R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution, 7(4), 493–498.

    Google Scholar 

  25. Hyönä, J., & Olson, R.K. (1995). Eye fixation patterns among dyslexic and normal readers: Effects of word length and word frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(6), 1430.

    PubMed  Google Scholar 

  26. Ioannidis, J.P. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.

    PubMed  PubMed Central  Google Scholar 

  27. Juhasz, B.J. (2018). Experience with compound words influences their processing: An eye movement investigation with English compound words. Quarterly Journal of Experimental Psychology, 71(1), 103–112.

    Google Scholar 

  28. Juhasz, B.J., & Berkowitz, R.N. (2011). Effects of morphological families on English compound word recognition: A multitask investigation. Language and Cognitive Processes, 26(4–6), 653–682.

    Google Scholar 

  29. Juhasz, B.J., Inhoff, A.W., & Rayner, K. (2005). The role of interword spaces in the processing of English compound words. Language and Cognitive Processes, 20(1–2), 291–316.

    Google Scholar 

  30. Juhasz, B.J., Pollatsek, A., Hyönä, J., Drieghe, D., & Rayner, K. (2009). Parafoveal processing within and between words. The Quarterly Journal of Experimental Psychology, 62(7), 1356–1376.

    PubMed  Google Scholar 

  31. Keuleers, E., & Balota, D.A. (2015). Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments (Vol. 68) (8). Taylor & Francis.

  32. Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British Lexicon Project: Lexical decision data for 28,730 monosyllabic and disyllabic English words. Behavior Research Methods, 44(1), 287–304.

    PubMed  Google Scholar 

  33. Kim, S.Y., Yap, M.J., & Goh, W.D. (2018). The role of semantic transparency in visual word recognition of compound words: A megastudy approach. Behavior Research Methods: 1–11.

  34. Kuperman, V. (2013). Accentuate the positive: Semantic access in English compounds. Frontiers in Psychology, 4, 203.

    PubMed  PubMed Central  Google Scholar 

  35. Kuperman, V. (2015). Virtual experiments in megastudies: a case study of language and emotion. The Quarterly Journal of Experimental Psychology, 68(8), 1693–1710.

    PubMed  Google Scholar 

  36. Kuperman, V., & Bertram, R. (2013). Moving spaces: Spelling alternation in English noun-noun compounds. Language and Cognitive Processes, 28(7), 939–966.

    Google Scholar 

  37. Kuperman, V., Bertram, R., & Baayen, R.H. (2008). Morphological dynamics in compound processing. Language and Cognitive Processes, 23(7–8), 1089–1132.

    Google Scholar 

  38. Kuperman, V., Schreuder, R., Bertram, R., & Baayen, R.H. (2009). Reading polymorphemic Dutch compounds: Toward a multiple route model of lexical processing. Journal of Experimental Psychology: Human Perception and Performance, 35(3), 876.

    PubMed  Google Scholar 

  39. Kuperman, V., & Van Dyke, J.A. (2011). Effects of individual differences in verbal skills on eye-movement patterns during sentence reading. Journal of Memory and Language, 65(1), 42–73.

    PubMed  PubMed Central  Google Scholar 

  40. Kuperman, V., Estes, Z., Brysbaert, M., & Warriner, A.B. (2014). Emotion and language: Valence and arousal affect word recognition. Journal of Experimental Psychology: General, 143(3), 1065.

    Google Scholar 

  41. Kuperman, V., Matsuki, K., & Van Dyke, J.A. (2018). Contributions of reader-and text-level characteristics to eye-movement patterns during passage reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(11), 1687.

    PubMed  Google Scholar 

  42. Kutner, M., Nachtsheim, C., & Neter, J. (2004) Simultaneous inferences and other topics in regression analysis. Applied linear regression models, (4th edn.), (pp. 168–170). New York: McGraw-Hill Irwin.

    Google Scholar 

  43. Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.

    Google Scholar 

  44. Libben, G. (2006). Why study compound processing? An overview of the issues. 1–23.

  45. Libben, G. (2014). The nature of compounds: A psychocentric perspective. Cognitive Neuropsychology, 31 (1–2), 8–25.

    PubMed  Google Scholar 

  46. Liversedge, S.P., Blythe, H.I., & Drieghe, D. (2012). Beyond isolated word recognition. Behavioral and Brain Sciences, 35(5), 293–294.

    PubMed  Google Scholar 

  47. Lowder, M.W., & Gordon, P.C. (2017). Print exposure modulates the effects of repetition priming during sentence reading. Psychonomic Bulletin & Review, 24(6), 1935–1942.

    Google Scholar 

  48. Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92, 57–78.

    Google Scholar 

  49. Moore, M., & Gordon, P.C. (2015). Reading ability and print exposure: Item response theory analysis of the author recognition test. Behavior Research Methods, 47(4), 1095–1109.

    PubMed  PubMed Central  Google Scholar 

  50. Morrison, R.E. (1984). Manipulation of stimulus onset delay in reading: Evidence for parallel programming of saccades. Journal of Experimental psychology: Human Perception and performance, 10(5), 667.

    PubMed  Google Scholar 

  51. Perfetti, C.A., & Hart, L. (2002). The lexical quality hypothesis. Precursors of Functional Literacy, 11, 67–86.

    Google Scholar 

  52. Perugini, M., Gallucci, M., & Costantini, G. (2018). A practical primer to power analysis for simple experimental designs. International Review of Social Psychology, 31, 1.

    Google Scholar 

  53. Rau, A.K., Moeller, K., & Landerl, K. (2014). The transition from sublexical to lexical processing in a consistent orthography: An eye-tracking study. Scientific Studies of Reading, 18(3), 224–233.

    Google Scholar 

  54. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372.

    PubMed  Google Scholar 

  55. Rayner, K. (2009). Eye movements and attention in reading, scene perception, and visual search. The Quarterly Journal of Experimental Psychology, 62(8), 1457–1506.

    PubMed  Google Scholar 

  56. Schmid, H.J. (2010). Does frequency in text instantiate entrenchment in the cognitive system. Quantitative Methods in Cognitive Semantics: Corpus-Driven Approaches, 101–133.

  57. Schmidtke, D., & Kuperman, V. (2019). A paradox of apparent brainless behavior: The time-course of compound word recognition. Cortex, 116, 250–267.

    PubMed  Google Scholar 

  58. Schmidtke, D., Kuperman, V., Gagné, C. L., & Spalding, T.L. (2016). Competition between conceptual relations affects compound recognition: The role of entropy. Psychonomic Bulletin & Review, 23(2), 556–570.

    Google Scholar 

  59. Schmidtke, D., Van Dyke, J.A., & Kuperman, V. (2018a). Individual variability in the semantic processing of English compound words. Journal of Experimental Psychology: Learning, Memory, and Cognition, 44(3), 421.

  60. Schmidtke, D., Gagné, C.L., Kuperman, V., Spalding, T.L., & Tucker, B.V. (2018b). Conceptual relations compete during auditory and visual compound word recognition. Language, Cognition and Neuroscience, 33 (7), 923–942.

  61. Schmidtke, D., Gagné, C.L., Kuperman, V., & Spalding, T.L. (2018c). Language experience shapes relational knowledge of compound words. Psychonomic Bulletin & Review, 25(4), 1468–1487.

  62. Schotter, E.R., Lee, M., Reiderman, M., & Rayner, K. (2015). The effect of contextual constraint on parafoveal processing in reading. Journal of Memory and Language, 83, 118–139.

    PubMed  PubMed Central  Google Scholar 

  63. Schramm, P., & Rouder, J. (2019). Are reaction time transformations really beneficial? PsyArXiv March 5.

  64. Schreuder, R., & Baayen, R.H. (1997). How complex simplex words can be? Journal of Memory and Language, 37(1), 118–139.

    Google Scholar 

  65. Stanovich, K.E., & West, R.F. (1989). Exposure to print and orthographic processing. Reading Research Quarterly, 402–433.

  66. Staub, A. (2015). The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Language and Linguistics Compass, 9(8), 311–327.

    Google Scholar 

  67. Staub, A., Rayner, K., Pollatsek, A., Hyönä, J., & Majewski, H. (2007). The time course of plausibility effects on eye movements in reading: Evidence from noun-noun compounds. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(6), 1162.

    PubMed  Google Scholar 

  68. Tiffin-Richards, S.P., & Schroeder, S. (2015). Children’s and adults’ parafoveal processes in German: Phonological and orthographic effects. Journal of Cognitive Psychology, 27(5), 531–548.

    Google Scholar 

  69. Tomaschek, F., Hendrix, P., & Baayen, R.H. (2018). Strategies for addressing collinearity in multivariate linguistic data. Journal of Phonetics, 71, 249–267.

    Google Scholar 

  70. Tucker, B.V., Brenner, D., Danielson, D.K., Kelley, M.C., Nenadić, F., & Sims, M. (2018). The massive auditory lexical decision (MALD) database. Behavior Research Methods, 1–18.

  71. Underwood, G., Petley, K., & Clews, S. (1990). Searching for information during sentence comprehension. In R. Gruner, G. d’Ydewalle, & R. Parham (Eds.) From eye to mind: Information acquisition in perception (pp. 191–203). Amsterdam.

  72. van Heuven, W.J., Mandera, P., Keuleers, E., & Brysbaert, M. (2014). SUBTLEX-UK: A new and improved word frequency database for British English. The Quarterly Journal of Experimental Psychology, 67(6), 1176–1190.

    PubMed  Google Scholar 

  73. von der Malsburg, T., & Angele, B. (2017). False positives and other statistical errors in standard analyses of eye movements in reading. Journal of Memory and Language, 94, 119–133.

    PubMed  Google Scholar 

  74. Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013). Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods, 45(4), 1191–1207.

    PubMed  Google Scholar 

  75. Wickham, H. (2009) ggplot2: Elegant graphics for data analysis. New York: Springer. http://ggplot2.org.

    Google Scholar 

Download references

Acknowledgements

Daniel Schmidtke, McMaster English Language Development Diploma (MELD) program, Department of Linguistics and Language, McMaster University; Julie A. Van Dyke, Haskins Laboratories, New Haven Connecticut; Victor Kuperman, Department of Linguistics and Language, McMaster University. Daniel Schmidtke’s contribution was partially completed during his PhD studies, which was supported by the Ontario Trillium Award and a Graduate fellowship awarded by the Lewis & Ruth Sherman Centre for Digital Scholarship (McMaster University). The remainder of Daniel Schmidtke’s contribution was supported by a Post-doctoral fellowship appointment at McMaster English Language Development Diploma (MELD) programme, Faculty of Humanities, McMaster University, Canada. Victor Kuperman’s contribution was partially supported by the Canadian NSERC Discovery grant RGPIN/402395-2012 415 (Kuperman, PI), the Ontario Early Researcher award (Kuperman, PI), the Canada Research Chair (Tier 2; Kuperman, PI), the SSHRC Partnership Training Grant 895-2016-1008 (Libben, PI), and the CFI Leaders Opportunity Fund (Kuperman, PI). Julie A. Van Dyke’s contribution was supported by the following NIH grants to Haskins Laboratories: R01 HD-073288 (Julie A. Van Dyke, PI), and P01 HD-01994 (Jay G. Rueckl, PI).

We are thankful to Noor Al-Zanoon, Morgan Bontrager, Emma Bridgwater, Kaitlin Falkauskas, Irena Grusecki, Brooke Osborne, Sadaf Rahmanian, Katrina Reyes, Aaron So, Heidi Sarles-Whittlesey and Chloe Sukkau for data collection.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Daniel Schmidtke.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 50.0 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schmidtke, D., Van Dyke, J. & Kuperman, V. CompLex: an eye-movement database of compound word reading in English. Behav Res (2020). https://doi.org/10.3758/s13428-020-01397-1

Download citation

Keywords

  • Megastudy
  • Eye movements
  • Compound words
  • Semantic transparency
  • Psycholinguistics
  • Morphology