Skip to main content

Peekbank: An open, large-scale repository for developmental eye-tracking data of children’s word recognition

Abstract

The ability to rapidly recognize words and link them to referents is central to children’s early language development. This ability, often called word recognition in the developmental literature, is typically studied in the looking-while-listening paradigm, which measures infants’ fixation on a target object (vs. a distractor) after hearing a target label. We present a large-scale, open database of infant and toddler eye-tracking data from looking-while-listening tasks. The goal of this effort is to address theoretical and methodological challenges in measuring vocabulary development. We first present how we created the database, its features and structure, and associated tools for processing and accessing infant eye-tracking datasets. Using these tools, we then work through two illustrative examples to show how researchers can use Peekbank to interrogate theoretical and methodological questions about children’s developing word recognition ability.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. We note that the term trial is ambiguous and could be used to refer to both a particular combination of stimuli seen by many participants and a participant seeing that particular combination at a particular point in the experiment. We track the former in the trial_types table and the latter in the trials table.

  2. While information preceding the onset of the target label in some datasets such as co-articulation cues ((Mahr, McMillan, Saffran, Ellis Weismer, & Edwards, 2015) or adjectives (Fernald, Marchman, & Weisleder, 2013) can in principle disambiguate the target referent, we use a standardized point of disambiguation based on the onset of the label for the target referent. Onset times for other potentially disambiguating information (such as adjectives) can typically be recovered from the raw data provided on OSF.

  3. We, furthermore, used the R-packages dplyr [Version 1.0.7; Wickham, François, Henry, and Müller (2021)], forcats [Version 0.5.1; Wickham (2021a)], ggplot2 [Version 3.3.5; Wickham (2016)], ggthemes [Version 4.2.4; Arnold (2021)], here [Version 1.0.1; Müller (2020)], papaja [Version 0.1.0.9997; Aust and Barth (2020)], peekbankr [Version 0.1.1.9002; Braginsky, MacDonald, and Frank 2021], purrr [Version 0.3.4; Henry and Wickham (2020)], readr [Version 2.0.1; Wickham and Hester (2021)], stringr [Version 1.4.0; Wickham (2019)], tibble [Version 3.1.4; Müller and Wickham (2021)], tidyr [Version 1.1.3; Wickham (2021b)], tidyverse [Version 1.3.1; Wickham et al., (2019)], tinylabels (Barth, 2021), viridis [Version 0.6.1; Garnier et al., (2021a)], viridisLite [Version 0.4.0; Garnier et al., (2021a)], and xtable [Version 1.8.4; Dahl, Scott, Roosen, Magnusson, and Swinton 2019].

  4. The original paper investigated both close (e.g., opple, /apl/) and distant (e.g., opal, /opl/) mispronunciations. For simplicity, here we combine both mispronunciation conditions since the close vs. distant mispronunciation manipulation showed no effect in the original paper.

References

  • Adams, K.A., Marchman, V.A., Loi, E.C., Ashland, M.D., Fernald, A., & Feldman, H.M. (2018). Caregiver talk and medical risk as predictors of language outcomes in full term and preterm toddlers. Child Development, 89(5), 1674–1690.

    Article  Google Scholar 

  • Arnold, J.B. (2021). Ggthemes: Extra themes, scales and geoms for ’ggplot2’. Retrieved from https://CRAN.R-project.org/package=ggthemes.

  • Aslin, R.N. (2007). What’s in a look? Developmental Science, 10(1), 48–53.

    Article  Google Scholar 

  • Aust, F., & Barth, M. (2020). papaja: Create APA manuscripts with R Markdown. Retrieved from https://github.com/crsh/papaja.

  • Baillargeon, R., Spelke, E.S., & Wasserman, S. (1985). Object permanence in five-month-old infants. Cognition, 20(3), 191–208. https://doi.org/10.1016/0010-0277(85)90008-3

    Article  PubMed  Google Scholar 

  • Balota, D.A., Yap, M.J., Cortese, M.J., Hutchison, K.A., Kessler, B., Loftis, B., & Treiman, R. (2007). The English Lexicon project. Behavior Research Methods, 39(3), 445–459. https://doi.org/10.3758/BF03193014

    Article  PubMed  Google Scholar 

  • Barth, M. (2021). tinylabels: Lightweight variable labels. Retrieved from https://github.com/mariusbarth/tinylabels.

  • Bergelson, E. (2020). The comprehension boost in early word learning: Older infants are better learners. Child Development Perspectives, 14(3), 142–149.

    Article  Google Scholar 

  • Bergelson, E., & Swingley, D. (2012). At 6-9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences, 109(9), 3253–3258.

    Article  Google Scholar 

  • Bergelson, E., & Swingley, D. (2013). The acquisition of abstract words by young infants. Cognition, 127(3), 391–397.

    Article  Google Scholar 

  • Bergmann, C., Tsuji, S., Piccinini, P.E., Lewis, M.L., Braginsky, M., Frank, M.C., & Cristia, A. (2018). Promoting replicability in developmental research through meta-analyses: Insights from language acquisition research. Child Development, 89(6), 1996–2009.

    Article  Google Scholar 

  • Bleses, D., Makransky, G., Dale, P.S., Højen, A., & Ari, B.A. (2016). Early productive vocabulary predicts academic achievement 10 years later. Applied Psycholinguistics, 37(6), 1461–1476.

    Article  Google Scholar 

  • Braginsky, M., MacDonald, K., & Frank, M. (2021). Peekbankr: Accessing the peekbank database. Retrieved from http://github.com/langcog/peekbankr.

  • Byers-Heinlein, K., Bergmann, C., & Savalei, V. (2021). Six solutions for more reliable infant research. Infant and Child Development, e2296. https://doi.org/10.1002/icd.2296.

  • Byers-Heinlein, K., Morin-Lessard, E., & Lew-Williams, C. (2017). Bilingual infants control their languages as they listen. Proceedings of the National Academy of Sciences, 114(34), 9032–9037. https://doi.org/10.1073/pnas.1703220114

  • Casillas, M., Brown, P., & Levinson, S. C. (2017). Casillas HomeBank Corpus. https://doi.org/10.21415/T51X12

  • Dahl, D.B., Scott, D., Roosen, C., Magnusson, A., & Swinton, J. (2019). Xtable: Export tables to LaTeX or HTML. Retrieved from https://CRAN.R-project.org/package=xtable.

  • DeBolt, M.C., Rhemtulla, M., & Oakes, L.M. (2020). Robust data and power in infant research: A case study of the effect of number of infants and number of trials in visual preference procedures. Infancy, 25(4), 393–419. https://doi.org/10.1111/infa.12337

    Article  PubMed  Google Scholar 

  • Fantz, R.L. (1963). Pattern vision in newborn infants. Science, 140(3564), 296–297.

    Article  Google Scholar 

  • Fernald, A., Marchman, V.A., & Weisleder, A. (2013). SES differences in language processing skill and vocabulary are evident at 18 months. Developmental Science, 16(2), 234–248. https://doi.org/10.1111/desc.12019

    Article  PubMed  Google Scholar 

  • Fernald, A., Pinto, J. P., Swingley, D., Weinberg, A., & McRoberts, G.W. (1998). Rapid gains in speed of verbal processing by infants in the 2nd year. Psychological Science, 9(3), 228–231.

    Article  Google Scholar 

  • Fernald, A., Zangl, R., Portillo, A.L., & Marchman, V.A. (2008). Looking while listening: Using eye movements to monitor spoken language comprehension by infants and young children. In I.A. Sekerina, E.M. Fernandez, & H. Clahsen (Eds.) Developmental psycholinguistics: On-line methods in children’s language processing (pp. 97–135). Amsterdam: John Benjamins.

  • Frank, M.C., Bergelson, E., Bergmann, C., Cristia, A., Floccia, C., Gervain, J., & Yurovsky, D. (2017a). A collaborative approach to infant research: Promoting reproducibility, best practices, and theory-building. Infancy, 22(4), 421–435. https://doi.org/10.1111/infa.12182

    Article  PubMed  PubMed Central  Google Scholar 

  • Frank, M.C., Braginsky, M., Yurovsky, D., & Marchman, V.A. (2017b). Wordbank: An open repository for developmental vocabulary data. Journal of Child Language, 44(3), 677–694.

    Article  Google Scholar 

  • Frank, M.C., Braginsky, M., Yurovsky, D., & Marchman, V.A. (2021) Variability and Consistency in Early Language Learning: The Wordbank Project. MIT Press: Cambridge, MA.

    Book  Google Scholar 

  • Frank, M. C., Sugarman, E., Horowitz, A. C., Lewis, M. L., & Yurovsky, D. (2016). Using tablets to collect data from young children. Journal of Cognition and Development, 17(1), 1–17. https://doi.org/10.1080/15248372.2015.1061528

  • Garnier, S., Ross, N., Rudis, R., & Cédric (2021a). viridis - colorblind-friendly color maps for r. https://doi.org/10.5281/zenodo.4679424.

  • Garrison, H., Baudet, G., Breitfeld, E., Aberman, A., & Bergelson, E. (2020). Familiarity plays a small role in noun comprehension at 12–18 months. Infancy, 25(4), 458–477.

    Article  Google Scholar 

  • Gautheron, L., Rochat, N., & Cristia, A. (2021). Managing, storing, and sharing long-form recordings and their annotations. PsyArXivhttps://doi.org/10.31234/osf.io/w8trm.

  • Golinkoff, R.M., Ma, W., Song, L., & Hirsh-Pasek, K. (2013). Twenty-five years using the intermodal preferential looking paradigm to study language acquisition: What have we learned? Perspectives on Psychological Science, 8(3), 316–339.

    Article  Google Scholar 

  • Gorgolewski, K.J., Auer, T., Calhoun, V.D., Craddock, R.C., Das, S., Duff, E.P., & Poldrack, R.A. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3(1), 160044. https://doi.org/10.1038/sdata.2016.44

    Article  PubMed  PubMed Central  Google Scholar 

  • Hardwicke, T. E., Mathur, M. B., MacDonald, K., Nilsonne, G., Banks, G. C., Kidwell, M. C., & Frank, M. C. (2018). Data availability, reusability, and analytic reproducibility: Evaluating the impact of a mandatory open data policy at the journal Cognition. Royal Society Open Science, 5(8). https://doi.org/10.1098/rsos.180448.

  • Henrich, J., Heine, S.J., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and Brain Sciences, 33(2-3), 61–83. https://doi.org/10.1017/S0140525X0999152X

    Article  PubMed  Google Scholar 

  • Henry, L., & Wickham, H. (2020). Purrr: Functional programming tools. Retrieved from https://CRAN.R-project.org/package=purrr.

  • Hirsh-Pasek, K., Cauley, K.M., Golinkoff, R.M., & Gordon, L. (1987). The eyes have it: Lexical and syntactic comprehension in a new paradigm. Journal of Child Language, 14(1), 23–45.

    Article  Google Scholar 

  • Hurtado, N., Marchman, V.A., & Fernald, A. (2007). Spoken word recognition by Latino children learning Spanish as their first language. Journal of Child Language, 34(2), 227–249. https://doi.org/10.1017/S0305000906007896

    Article  PubMed  PubMed Central  Google Scholar 

  • Hurtado, N., Marchman, V.A., & Fernald, A. (2008). Does input influence uptake? Links between maternal talk, processing speed and vocabulary size in Spanish-learning children. Developmental Science, 11(6), 31–39. https://doi.org/10.1111/j.1467-7687.2008.00768.x

    Article  Google Scholar 

  • Lewis, M., Braginsky, M., Tsuji, S., Bergmann, C., Piccinini, P. E., Cristia, A., & Frank, M. C. (2016). A quantitative synthesis of early language acquisition using meta-analysis. PsyArXiv. https://doi.org/10.31234/osf.io/htsjm.

  • Lew-Williams, C., & Fernald, A. (2007). Young children learning Spanish make rapid use of grammatical gender in spoken word recognition. Psychological Science, 18(3), 193–198.

    Article  Google Scholar 

  • Liu, S., Ullman, T.D., Tenenbaum, J.B., & Spelke, E.S. (2017). Ten-month-old infants infer the value of goals from the costs of actions. Science, 358(6366), 1038–1041. https://doi.org/10.1126/science.aag2132

    Article  PubMed  Google Scholar 

  • MacWhinney, B. (2000). The CHILDES Project: Tools for analyzing talk. Mahwah, NJ: Lawrence Erlbaum Associates.

  • Mahr, T., McMillan, B.T.M., Saffran, J.R., Ellis Weismer, S., & Edwards, J. (2015). Anticipatory coarticulation facilitates word recognition in toddlers. Cognition, 142, 345–350. https://doi.org/10.1016/j.cognition.2015.05.009

    Article  PubMed  PubMed Central  Google Scholar 

  • Marchman, V.A., Loi, E.C., Adams, K.A., Ashland, M., Fernald, A., & Feldman, H.M. (2018). Speed of language comprehension at 18 months old predicts school-relevant outcomes at 54 months old in children born preterm. Journal of Developmental & Behavioral Pediatrics, 39(3), 246–253.

    Article  Google Scholar 

  • Muthukrishna, M., Bell, A.V., Henrich, J., Curtin, C.M., Gedranovich, A., McInerney, J., & Thue, B. (2020). Beyond western, educated, industrial, rich, and democratic (WEIRD) psychology: Measuring and mapping scales of cultural and psychological distance. Psychological Science, 31(6), 678–701.

    Article  Google Scholar 

  • Müller, K. (2020). Here: A simpler way to find your files. Retrieved from https://CRAN.R-project.org/package=here.

  • Müller, K., & Wickham, H. (2021). Tibble: Simple data frames. Retrieved from https://CRAN.R-project.org/package=tibble.

  • Nosek, B.A., Hardwicke, T.E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73, 719–748. https://doi.org/10.31234/osf.io/ksfvq

    Article  PubMed  Google Scholar 

  • Perry, L. K., & Saffran, J. R. (2017). Is a pink cow still a cow? Individual differences in toddlers' vocabulary knowledge and lexical representations. Cognitive Science, 41(4), 1090–1105. https://doi.org/10.1111/cogs.12370

  • Peter, M.S., Durrant, S., Jessop, A., Bidgood, A., Pine, J.M., & Rowland, C.F. (2019). Does speed of processing or vocabulary size predict later language growth in toddlers? Cognitive Psychology, 115, 101238.

    Article  Google Scholar 

  • Pomper, R., & Saffran, J. R. (2016). Roses are red, socks are blue: Switching dimensions disrupts young children's language comprehension. PLoS ONE, 11(6), e0158459. https://doi.org/10.1371/journal.pone.0158459

  • Pomper, R., & Saffran, J. R. (2019). Familiar object salience affects novel word learning. Child Development, 90(2), e246–e262. https://doi.org/10.1111/cdev.13053

  • Potter, C. E., Fourakis, E., Morin-Lessard, E., Byers-Heinlein, K., & Lew-Williams, C. (2019). Bilingual toddlers' comprehension of mixed sentences is asymmetrical across their two languages. Developmental Science, 22(4), e12794. https://doi.org/10.1111/desc.12794

  • Potter, C., & Lew-Williams, C. (2022). Frequent vs. infrequent words shape toddlers’ real-time sentence processing. PsyArXiv. https://doi.org/10.31234/osf.io/mertp

  • Quinn, P.C., Eimas, P.D., & Rosenkrantz, S.L. (1993). Evidence for representations of perceptually similar natural categories by 3-month-old and 4-month-old infants. Perception, 22(4), 463–475. https://doi.org/10.1068/p220463

    Article  PubMed  Google Scholar 

  • R Core Team (2021). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/.

  • Ronfard, S., Wei, R., & Rowe, M. L. (2021). Exploring the linguistic, cognitive, and social skills underlying lexical processing efficiency as measured by the looking-while-listening paradigm. Journal of Child Language, 1–24. https://doi.org/10.1017/S0305000921000106.

  • Sanchez, A., Meylan, S.C., Braginsky, M., MacDonald, K.E., Yurovsky, D., & Frank, M.C. (2019). childes-db: A flexible and reproducible interface to the child language data exchange system. Behavior Research Methods, 51(4), 1928–1941. https://doi.org/10.3758/s13428-018-1176-7

    Article  PubMed  Google Scholar 

  • Swingley, D., & Aslin, R.N. (2002). Lexical neighborhoods and the word-form representations of 14-month-olds. Psychological Science, 13(5), 480–484. https://doi.org/10.1111/1467-9280.00485

    Article  PubMed  Google Scholar 

  • The ManyBabies Consortium (2020). Quantifying sources of variability in infancy research using the infant-directed speech preference. Advances in Methods and Practices in Psychological Science, 3(1), 24–52.

    Article  Google Scholar 

  • Tincoff, R., & Jusczyk, P.W. (1999). Some beginnings of word comprehension in 6-month-olds. Psychological Science, 10(2), 172–175. https://doi.org/10.1111/1467-9280.00127

    Article  Google Scholar 

  • Wass, S.V., Smith, T.J., & Johnson, M.H. (2013). Parsing eye-tracking data of variable quality to provide accurate fixation duration estimates in infants and adults. Behavior Research Methods, 45(1), 229–250. https://doi.org/10.3758/s13428-012-0245-6

    Article  PubMed  Google Scholar 

  • Weisleder, A., & Fernald, A. (2013). Talking to children matters: Early language experience strengthens processing and builds vocabulary. Psychological Science, 24(11), 2143–2152. https://doi.org/10.1177/0956797613488145.

    Article  Google Scholar 

  • Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from https://ggplot2.tidyverse.org.

  • Wickham, H. (2019). Stringr: Simple, consistent wrappers for common string operations. Retrieved from https://CRAN.R-project.org/package=stringr.

  • Wickham, H. (2021a). Forcats: Tools for working with categorical variables (factors). Retrieved from https://CRAN.R-project.org/package=forcats.

  • Wickham, H. (2021b). Tidyr: Tidy messy data. Retrieved from https://CRAN.R-project.org/package=tidyr.

  • Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L.D., François, R., & Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

    Article  Google Scholar 

  • Wickham, H., François, R., Henry, L., & Müller, K. (2021). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr.

  • Wickham, H., & Hester, J. (2021). Readr: Read rectangular text data. Retrieved from https://CRAN.R-project.org/package=readr.

  • Yurovsky, D., & Frank, M. C. (2017). Beyond naïve cue combination: salience and social cues in early word learning. Developmental Science, 20(2), e12349. https://doi.org/10.1111/desc.12349

  • Yurovsky, D., Wade, A., & Frank, M. C. (2013). Online processing of speech and social information in early word learning. In Proceedings of the 35th Annual Meeting of the Cognitive Science Society.

  • Yurovsky, D., Wade, A., Kraus, A.M., Gengoux, G.W., Hardan, A.Y., & Frank, M.C. (unpublished). Developmental changes in the speed of social attention in early word learning.

  • Zettersten, M., Bergey, C., Bhatt, N., Boyce, V., Braginsky, M., Carstensen, A., & Frank, M.C. (2021). Peekbank: Exploring children’s word recognition through an open, large-scale repository for developmental eye-tracking data. In Proceedings of the 43rd Annual Conference of the Cognitive Science Society.

Download references

Acknowledgements

We would like to thank the labs and researchers that have made their data publicly available in the database. For further information about contributions, see https://langcog.github.io/peekbank-website/docs/contributors/. Work on this project (VAM) was supported in part by grants from the National Institutes of Health (Fernald: R01 HD092343, Feldman: 2R01 HD069150).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Zettersten.

Additional information

CRediT author statement

Outside of the position of the first and the last author, authorship position was determined by sorting authors’ last names in reverse alphabetical order. An overview of authorship contributions following the CRediT taxonomy can be viewed here: https://docs.google.com/spreadsheets/d/e/2PACX-1vRD-LJD_dTAQaAynyBlwXvGpfAVzP-3Pi6JTDoG15m3PYZe0c44Y12U2a_hwdmhIstpjyigG2o3na4y/pubhtml.

Open Practices Statement

All code for reproducing the paper is available at https://github.com/langcog/peekbank-paper. Raw and standardized datasets are available on the Peekbank OSF repository (https://osf.io/pr6wu/) and can be accessed using the peekbankr R package (https://github.com/langcog/peekbankr).

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zettersten, M., Yurovsky, D., Xu, T.L. et al. Peekbank: An open, large-scale repository for developmental eye-tracking data of children’s word recognition. Behav Res (2022). https://doi.org/10.3758/s13428-022-01906-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-022-01906-4

Keywords

  • Word recognition
  • Eye-tracking
  • Vocabulary development
  • Looking-while-listening
  • Visual world paradigm
  • Lexical processing