Behavior Research Methods

, Volume 50, Issue 2, pp 826–833 | Cite as

The Provo Corpus: A large eye-tracking corpus with predictability norms



This article presents the Provo Corpus, a corpus of eye-tracking data with accompanying predictability norms. The predictability norms for the Provo Corpus differ from those of other corpora. In addition to traditional cloze scores that estimate the predictability of the full orthographic form of each word, the Provo Corpus also includes measures of the predictability of the morpho-syntactic and semantic information for each word. This makes the Provo Corpus ideal for studying predictive processes in reading. Some analyses using these data have previously been reported elsewhere (Luke & Christianson, 2016). The Provo Corpus is available for download on the Open Science Framework, at


Corpus study Eyetracking Reading Predictability 


  1. Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition, 73, 247–264. doi: 10.1016/S0010-0277(99)00059-1 CrossRefPubMedGoogle Scholar
  2. Altmann, G. T. M., & Kamide, Y. (2007). The real-time mediation of visual attention by language and world knowledge: Linking anticipatory (and other) eye movements to linguistic processing. Journal of Memory and Language, 57, 502–518.CrossRefGoogle Scholar
  3. Ashby, J., Rayner, K., & Clifton, C. (2005). Eye movements of highly skilled and average readers: Differential effects of frequency and predictability. Quarterly Journal of Experimental Psychology, 58A, 1065–1086. doi: 10.1080/02724980443000476 CrossRefGoogle Scholar
  4. Balota, D. A., Pollatsek, A., & Rayner, K. (1985). The interaction of contextual constraints and parafoveal visual information in reading. Cognitive Psychology, 17, 364–390. doi: 10.1016/0010-0285(85)90013-1 CrossRefPubMedGoogle Scholar
  5. Bloom, P. A., & Fischler, I. (1980). Completion norms for 329 sentence contexts. Memory & Cognition, 8, 631–642.CrossRefGoogle Scholar
  6. Christiansen, M. H., & Chater, N. (2016). The now-or-never bottleneck: A fundamental constraint on language. Behavioral and Brain Sciences. doi: 10.1017/S0140525X1500031X Google Scholar
  7. Cop, U., Dirix, N., Drieghe, D., & Duyck, W. (2017). Presenting GECO: An eyetracking corpus of monolingual and bilingual sentence reading. Behavior Research Methods, 49, 602–615. doi: 10.3758/s13428-016-0734-0 CrossRefPubMedGoogle Scholar
  8. Dell, G. S., & Chang, F. (2014). The P-chain: Relating sentence production and its disorders to comprehension and acquisition. Philosophical Transactions of the Royal Society B, 369, 20120394. doi: 10.1098/rstb.2012.0394 CrossRefGoogle Scholar
  9. DeLong, K. A., Troyer, M., & Kutas, M. (2014). Pre‐processing in sentence comprehension: Sensitivity to likely upcoming meaning and structure. Language and Linguistics Compass, 8, 631–645.CrossRefPubMedPubMedCentralGoogle Scholar
  10. Ehrlich, S. F., & Rayner, K. (1981). Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20, 641–655. doi: 10.1016/S0022-5371(81)90220-6 CrossRefGoogle Scholar
  11. Engbert, R., Nuthmann, A., Richter, E. M., & Kliegl, R. (2005). SWIFT: A dynamical model of saccade generation during reading. Psychological Review, 112, 777–813. doi: 10.1037/0033-295X.112.4.777 CrossRefPubMedGoogle Scholar
  12. Garside, R., & Smith, N. (1997). A hybrid grammatical tagger: CLAWS4. In R. Garside, G. N. Leech, & T. McEnery (Eds.), Corpus annotation: Linguistic information from computer text corpora (pp. 102–121). London, UK: Longman.Google Scholar
  13. Hamberger, M. J., Friedman, D., & Rosen, J. (1996). Completion norms collected from younger and older adults for 198 sentence contexts. Behavior Research Methods, Instruments, & Computers, 28, 102–108.CrossRefGoogle Scholar
  14. Huettig, F. (2015). Four central questions about prediction in language processing. Brain Research, 1626, 118–135.CrossRefPubMedGoogle Scholar
  15. Huettig, F., & Mani, N. (2016). Is prediction necessary to understand language? Probably not. Language, Cognition and Neuroscience, 31, 19–31.CrossRefGoogle Scholar
  16. Kamide, Y., Altmann, G. T. M., & Haywood, S. L. (2003). The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements. Journal of Memory and Language, 49, 133–156. doi: 10.1016/S0749-596X(03)00023-8 CrossRefGoogle Scholar
  17. Kennedy, A., Hill, R., & Pynte, J. (2003). The Dundee Corpus. Paper presented at the 12th European Conference on Eye Movement, Dundee, Scotland.Google Scholar
  18. Kennedy, A., Pynte, J., Murray, W. S., & Paul, S.-A. (2013). Frequency and predictability effects in the Dundee Corpus: An eye movement analysis. Quarterly Journal of Experimental Psychology, 66, 601–618.CrossRefGoogle Scholar
  19. Kliegl, R., & Engbert, R. (2005). Fixation durations before word skipping in reading. Psychonomic Bulletin & Review, 12, 132–138.CrossRefGoogle Scholar
  20. Kliegl, R., Grabner, E., Rolfs, M., & Engbert, R. (2004). Length, frequency, and predictability effects of words on eye movements in reading. European Journal of Cognitive Psychology, 16, 262–284. doi: 10.1080/09541440340000213 CrossRefGoogle Scholar
  21. Kliegl, R., Nuthmann, A., & Engbert, R. (2006). Tracking the mind during reading: The influence of past, present, and future words on fixation durations. Journal of Experimental Psychology: General, 135, 12–35. doi: 10.1037/0096-3445.135.1.12 CrossRefGoogle Scholar
  22. Kuperberg, G. R., & Jaeger, T. F. (2016). What do we mean by prediction in language comprehension? Language, Cognition and Neuroscience, 31, 32–59.CrossRefPubMedGoogle Scholar
  23. Kutas, M., DeLong, K. A., & Smith, N. J. (2011). A look around at what lies ahead: Prediction and predictability in language processing. In M. Bar (Ed.), Predictions in the brain: Using our past to generate a future (pp. 190–207). New York: Oxford University Press.CrossRefGoogle Scholar
  24. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240. doi: 10.1037/0033-295X.104.2.211 CrossRefGoogle Scholar
  25. Luke, S. G., & Christianson, K. (2016). Limits on lexical prediction during reading. Cognitive Psychology, 88, 22–60.CrossRefPubMedGoogle Scholar
  26. McDonald, S. A., & Tamariz, M. (2002). Completion norms for 112 Spanish sentences. Behavior Research Methods, Instruments, & Computers, 34, 128–137.CrossRefGoogle Scholar
  27. Nuthmann, A., Engbert, R., & Kliegl, R. (2007). The IOVP effect in mindless reading: Experiment and modeling. Vision Research, 47, 990–1002. doi: 10.1016/j.visres.2006.11.005 CrossRefPubMedGoogle Scholar
  28. Payne, B. R., Lee, C. L., & Federmeier, K. D. (2015). Revisiting the incremental effects of context on word processing: Evidence from single‐word event‐related brain potentials. Psychophysiology, 52, 1456–1469.CrossRefPubMedPubMedCentralGoogle Scholar
  29. Pickering, M. J., & Garrod, S. (2007). Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences, 11, 105–110. doi: 10.1016/j.tics.2006.12.002 CrossRefPubMedGoogle Scholar
  30. Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production and comprehension. Behavioral and Brain Sciences, 36, 329–347. doi: 10.1017/S0140525X12001495 CrossRefPubMedGoogle Scholar
  31. Pynte, J., New, B., & Kennedy, A. (2009). On-line contextual influences during reading normal text: The role of nouns, verbs and adjectives. Vision Research, 49, 544–552.CrossRefPubMedGoogle Scholar
  32. Rayner, K., Slattery, T. J., Drieghe, D., & Liversedge, S. P. (2011). Eye movements and word skipping during reading: Effects of word length and predictability. Journal of Experimental Psychology: Human Perception and Performance, 37, 514–528. doi: 10.1037/a0020990 PubMedPubMedCentralGoogle Scholar
  33. Rayner, K., & Well, A. D. (1996). Effects of contextual constraint on eye movements in reading: A further examination. Psychonomic Bulletin & Review, 3, 504–509. doi: 10.3758/BF03214555 CrossRefGoogle Scholar
  34. Schwanenflugel, P. J. (1986). Completion norms for final words of sentences using a multiple production measure. Behavior Research Methods, Instruments, & Computers, 18, 363–371. doi: 10.3758/BF03204419 CrossRefGoogle Scholar
  35. Smith, N. J., & Levy, R. (2013). The effect of word predictability on reading time is logarithmic. Cognition, 128, 302–319. doi: 10.1016/j.cognition.2013.02.013 CrossRefPubMedPubMedCentralGoogle Scholar
  36. Staub, A. (2015). The effect of lexical predictability on eye movements in reading: Critical review and theoretical interpretation. Language and Linguistics Compass, 9, 311–327.CrossRefGoogle Scholar
  37. Staub, A., Abbott, M., & Bogartz, R. S. (2012). Linguistically guided anticipatory eye movements in scene viewing. Visual Cognition, 20, 922–946.CrossRefGoogle Scholar
  38. Staub, A., Grant, M., Astheimer, L., & Cohen, A. (2015). The influence of cloze probability and item constraint on cloze task response time. Journal of Memory and Language, 82, 1–17.CrossRefGoogle Scholar
  39. Taylor, W. L. (1953). Cloze procedure”: A new tool for measuring readability. Journalism Quarterly, 30, 415–433.Google Scholar
  40. Van Petten, C., & Luka, B. J. (2012). Prediction during language comprehension: Benefits, costs, and ERP components. International Journal of Psychophysiology, 83, 176–190.CrossRefPubMedGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2017

Authors and Affiliations

  1. 1.Department of Psychology and Neuroscience CenterBrigham Young UniversityProvoUSA
  2. 2.University of Illinois at Urbana-ChampaignUrbanaUSA
  3. 3.The Beckman Institute for Advanced Science and TechnologyNew HavenUSA

Personalised recommendations