The Provo Corpus: A large eye-tracking corpus with predictability norms
This article presents the Provo Corpus, a corpus of eye-tracking data with accompanying predictability norms. The predictability norms for the Provo Corpus differ from those of other corpora. In addition to traditional cloze scores that estimate the predictability of the full orthographic form of each word, the Provo Corpus also includes measures of the predictability of the morpho-syntactic and semantic information for each word. This makes the Provo Corpus ideal for studying predictive processes in reading. Some analyses using these data have previously been reported elsewhere (Luke & Christianson, 2016). The Provo Corpus is available for download on the Open Science Framework, at https://osf.io/sjefs.
KeywordsCorpus study Eyetracking Reading Predictability
- Garside, R., & Smith, N. (1997). A hybrid grammatical tagger: CLAWS4. In R. Garside, G. N. Leech, & T. McEnery (Eds.), Corpus annotation: Linguistic information from computer text corpora (pp. 102–121). London, UK: Longman.Google Scholar
- Kennedy, A., Hill, R., & Pynte, J. (2003). The Dundee Corpus. Paper presented at the 12th European Conference on Eye Movement, Dundee, Scotland.Google Scholar
- Taylor, W. L. (1953). Cloze procedure”: A new tool for measuring readability. Journalism Quarterly, 30, 415–433.Google Scholar