Peekbank: An open, large-scale repository for developmental eye-tracking data of children’s word recognition


The ability to rapidly recognize words and link them to referents is central to children’s early language development. This ability, often called word recognition in the developmental literature, is typically studied in the looking-while-listening paradigm, which measures infants’ fixation on a target object (vs. a distractor) after hearing a target label. We present a large-scale, open database of infant and toddler eye-tracking data from looking-while-listening tasks. The goal of this effort is to address theoretical and methodological challenges in measuring vocabulary development. We first present how we created the database, its features and structure, and associated tools for processing and accessing infant eye-tracking datasets. Using these tools, we then work through two illustrative examples to show how researchers can use Peekbank to interrogate theoretical and methodological questions about children’s developing word recognition ability.

  1. We note that the term trial is ambiguous and could be used to refer to both a particular combination of stimuli seen by many participants and a participant seeing that particular combination at a particular point in the experiment. We track the former in the trial_types table and the latter in the trials table.

  2. While information preceding the onset of the target label in some datasets such as co-articulation cues ((Mahr, McMillan, Saffran, Ellis Weismer, & Edwards, 2015) or adjectives (Fernald, Marchman, & Weisleder, 2013) can in principle disambiguate the target referent, we use a standardized point of disambiguation based on the onset of the label for the target referent. Onset times for other potentially disambiguating information (such as adjectives) can typically be recovered from the raw data provided on OSF.

  3. We, furthermore, used the R-packages dplyr [Version 1.0.7; Wickham, François, Henry, and Müller (2021)], forcats [Version 0.5.1; Wickham (2021a)], ggplot2 [Version 3.3.5; Wickham (2016)], ggthemes [Version 4.2.4; Arnold (2021)], here [Version 1.0.1; Müller (2020)], papaja [Version; Aust and Barth (2020)], peekbankr [Version; Braginsky, MacDonald, and Frank 2021], purrr [Version 0.3.4; Henry and Wickham (2020)], readr [Version 2.0.1; Wickham and Hester (2021)], stringr [Version 1.4.0; Wickham (2019)], tibble [Version 3.1.4; Müller and Wickham (2021)], tidyr [Version 1.1.3; Wickham (2021b)], tidyverse [Version 1.3.1; Wickham et al., (2019)], tinylabels (Barth, 2021), viridis [Version 0.6.1; Garnier et al., (2021a)], viridisLite [Version 0.4.0; Garnier et al., (2021a)], and xtable [Version 1.8.4; Dahl, Scott, Roosen, Magnusson, and Swinton 2019].

  4. The original paper investigated both close (e.g., opple, /apl/) and distant (e.g., opal, /opl/) mispronunciations. For simplicity, here we combine both mispronunciation conditions since the close vs. distant mispronunciation manipulation showed no effect in the original paper.


We would like to thank the labs and researchers that have made their data publicly available in the database. For further information about contributions, see Work on this project (VAM) was supported in part by grants from the National Institutes of Health (Fernald: R01 HD092343, Feldman: 2R01 HD069150).

Correspondence to Martin Zettersten.

CRediT author statement

Outside of the position of the first and the last author, authorship position was determined by sorting authors’ last names in reverse alphabetical order. An overview of authorship contributions following the CRediT taxonomy can be viewed here:

Open Practices Statement

All code for reproducing the paper is available at Raw and standardized datasets are available on the Peekbank OSF repository ( and can be accessed using the peekbankr R package (

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Zettersten, M., Yurovsky, D., Xu, T.L. et al. Peekbank: An open, large-scale repository for developmental eye-tracking data of children’s word recognition. Behav Res (2022).

  • Word recognition
  • Eye-tracking
  • Vocabulary development
  • Looking-while-listening
  • Visual world paradigm
  • Lexical processing