Skip to main content

The Beijing Sentence Corpus: A Chinese sentence corpus with eye movement data and predictability norms

Abstract

This report introduces the Beijing Sentence Corpus (BSC). This is a Chinese sentence corpus of eye-tracking data with relatively clear word boundaries. In addition, we report predictability norms for each word in the corpus. Eye movement corpora are available in alphabetic scripts such as English, German, and French. However, there is no publicly available corpus for Chinese. Thus, to study predictive processes during reading in Chinese, it is necessary to establish such a corpus. Also, given the clear word boundaries in the sentences, BSC is especially useful to provide evidence relevant to the theoretical debate of saccade target selection in Chinese. With the large-scale predictability norms, we conducted new analyses based on 60 BSC readers, testing the influences of launch word and target word properties while controlling for visual and oculomotor constraints, as well as sentence and subject-level individual differences. We discuss implications for guidance of eye movements in Chinese reading.

This is a preview of subscription content, access via your institution.

Fig. 1

Data availability

The BSC is available publicly from the Open Science Framework. Two files can be found and downloaded from this link: https://osf.io/vr3k8/. The file BSC.Word.Info.xlsx is a text file that provides relevant linguistic information as described in Table 3, most critically including the predictability norms. The other file, BSC.EMD.zip, contains the aforementioned eye-tracking data from 60 native readers of Chinese.

Notes

  1. This information was acquired via personal communication with the corresponding author.

References

  • Bates, D., Kliegl, R., Vasishth, S., & Baayen, R. H. (2015a). Parsimonious mixed models. arXiv: 1506.04967 [stat.ME].

  • Bates, D., Maechler, M., Bolker, B., & Walker, S., (2015b). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01

    Article  Google Scholar 

  • Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLoS One, 5, e10729. https://doi.org/10.1371/journal.pone.0010729.

    Article  PubMed  PubMed Central  Google Scholar 

  • Chen, L.K., & Carr, H.A. (1926). The ability of Chinese students to read in vertical and horizontal directions. Journal of Experimental Psychology, 9, 110–117.

    Google Scholar 

  • Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation analysis for the behavioral sciences. Erlbaum.

    Google Scholar 

  • Cop, U., Dirix, N., Drieghe, D., & Duyck, W. (2017). Presenting GECO: An eye tracking corpus of monolingual and bilingual sentence reading. Behavior Research Methods, 49, 602–615. https://doi.org/10.3758/s13428-016-0734-0

    Article  PubMed  Google Scholar 

  • Dann, K. M., Veldre, A., & Andrews, S. (2021). Morphological preview effects in English are restricted to suffixed words. Journal of Experimental Psychology: Learning, Memory, and Cognition. https://doi.org/10.1037/xlm0001029

  • Ehrlich, S.F., & Rayner, K. (1981). Contextual effects on word perception and eye movements during reading. Journal of Verbal Learning and Verbal Behavior, 20, 641–655. https://doi.org/10.1016/S0022-5371(81)90220-6

    Article  Google Scholar 

  • Engbert, R. & Kliegl, R. (2001). Mathematical models of eye movements in reading: A possible role for autonomous saccades. Biological Cybernetics, 85, 77-87.

    PubMed  Google Scholar 

  • Engbert, R., & Kliegl, R. (2003). Microsaccades uncover the orientation of covert attention. Vision Research, 43, 1035-1045.

    PubMed  Google Scholar 

  • Engbert, R., Longtin, A., & Kliegl, R. (2002). A dynamical model of saccade generation in reading based on spatially distributed lexical processing. Vision Research, 42, 621–636.

    PubMed  Google Scholar 

  • Engbert, R., Nuthmann, A., Richter, E., & Kliegl, R. (2005). SWIFT: A dynamical model of saccade generation during reading. Psychological Review, 112, 777-813.

    PubMed  Google Scholar 

  • Eriksen, C. W., & St. James, J. D. (1986). Visual attention within and around the field of focal attention: A zoom lens model. Perception & Psychophysics, 40, 225–240. https://doi.org/10.3758/BF03211502

    Article  Google Scholar 

  • Findlay, J.M., & Gilchrist, I.D. (2003). Active vision: the psychology of looking and seeing. Oxford University Press.

    Google Scholar 

  • Henderson, J. M., & Ferreira, F. (1990). Effects of foveal processing difficulty on the perceptual span in reading: Implications for attention and eye movement control. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 417–429

    PubMed  Google Scholar 

  • Hohenstein, S., Matuschek, H., & Kliegl, R. (2017). Linked linear mixed models: A joint analysis of fixation locations and fixation durations in natural reading. Psychonomic Bulletin & Review, 24, 637–651.

    Google Scholar 

  • Hoosain, R. (1991). Psycholinguistic implications for linguistic relativity: A case study of Chinese. LEA.

    Google Scholar 

  • Hoosain, R. (1992). The psychological reality of the word in Chinese. Advances in Psychology, 90, 111–130. https://doi.org/10.1016/S0166-4115(08)61889-0.

    Article  Google Scholar 

  • Husain, S., Vasishth, S., & Srinivasan, N. (2014). Integration and prediction difficulty in Hindi sentence comprehension: Evidence from an eye-tracking corpus. Journal of Eye Movement Research, 8. https://doi.org/10.16910/jemr.8.2.3

  • Hyönä, J., Heikkilä, T. T., Vainio, S., & Kliegl, R. (2021). Parafoveal access to word stem during reading: An eye movement study. Cognition, 208, 104547. https://doi.org/10.1016/j.cognition.2020.104547

    Article  PubMed  Google Scholar 

  • Inhoff, A.W., & Liu, W. (1998). The perceptual span and oculomotor activity during the reading of Chinese sentences. Journal of Experimental Psychology: Human Perception and Performance, 24, 20-34. https://doi.org/10.1037/0096-1523.24.1.20

    Article  PubMed  Google Scholar 

  • Inhoff, A. W., & Rayner, K. (1986). Parafoveal word processing during eye fixations in reading: Effects of word frequency. Perception & Psychophysics, 40, 431–439.

    Google Scholar 

  • Inhoff, A.W., & Wu, C. (2005). Eye movements and the identification of spatially ambiguous words during Chinese sentence reading. Memory & Cognition, 33, 1345-1356.

    Google Scholar 

  • Inhoff, A. W., Radach, R., Starr, M., & Greenberg, S. (2000). Allocation of visuo-spatial attention and saccade programming during reading. In A. Kennedy, R. Radach, D. Heller, & J. Pynte (Eds.), Reading as a perceptual process (pp. 221-246). Elsevier.

    Google Scholar 

  • Institute of Linguistic Studies. (1986). Modern Chinese word frequency dictionary. Beijing Language Institute Publisher. (in Chinese).

    Google Scholar 

  • Just, M.A., & Carpenter, P.A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87, 329-354.

    PubMed  Google Scholar 

  • Kennedy, A., & Pynte, J. (2005). Parafoveal-on-foveal effects in normal reading. Vision Research, 45, 153–168. https://doi.org/10.1016/j.visres.2004.07.037

    Article  PubMed  Google Scholar 

  • Kennedy, A., Hill, R., & Pynte, J. (2003). The Dundee Corpus. Paper presented at the 12th European Conference on Eye Movement, Dundee, Scotland.

  • Kennedy, A., Pynte, J., Murray, W.S., & Paul, S.A. (2013). Frequency and predictability effects in the Dundee Corpus: An eye movement analysis. Quarterly Journal of Experimental Psychology, 66, 601–618

    Google Scholar 

  • Kliegl, R. (2007). Towards a perceptual-span theory of distributed processing in reading: A reply to Rayner, Pollatsek, Drieghe, Slattery, & Reichle (2007). Journal of Experimental Psychology: General, 138, 530-537.

    Google Scholar 

  • Kliegl, R. (2014, May). Towards a joint analysis of fixation location and duration during reading of Chinese sentences. Paper presented at the 6th China International Conference on Eye Movements, Beijing, China.

  • Kliegl, R., Grabner, E., Rolfs, M., & Engbert, R. (2004). Length, frequency, and predictability effects of words on eye movements in reading. European Journal of Cognitive Psychology, 16, 262-284.

    Google Scholar 

  • Kliegl, R., Nuthmann, A., & Engbert, R. (2006). Tracking the mind during reading: The influence of past, present, and future words on fixation durations. Journal of Experimental Psychology: General, 135,13-35.

    Google Scholar 

  • Kumle, L., Võ, M.LH., & Draschkow, D. (2021). Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R. Behavior Research Methods. https://doi.org/10.3758/s13428-021-01546-0

  • Laurinavichyute, A.K., Sekerina, I.A., Alexeeva, S., Bagdasaryan, K., & Kliegl, R. (2019). Russian Sentence Corpus: Benchmark measures of eye movements in reading in Russian. Behavior Research Methods, 51, 1161–1178. https://doi.org/10.3758/s13428-018-1051-6

    Article  PubMed  Google Scholar 

  • Lavigne, F., Vitu, F., & d’Ydevalle, G., (2000). The influence of semantic context on initial eye landing sites in words. Acta Psychologica, 104, 191-214.

    PubMed  Google Scholar 

  • Li, X., Liu, P., & Rayner, K. (2011). Eye movement guidance in Chinese reading: Is there a preferred viewing location? Vision Research, 51, 1146–1156.

    PubMed  PubMed Central  Google Scholar 

  • Li, X., Bicknell, K., Liu, P., Wei, W., & Rayner, K. (2014). Reading is fundamentally similar across disparate writing systems: a systematic characterization of how words and characters influence eye movements in Chinese reading. Journal of Experimental Psychology: General, 143, 895-913. https://doi.org/10.1037/a0033580

    Article  Google Scholar 

  • Liu, Y., Reichle, E.D., & Li, X. (2015). Parafoveal processing affects outgoing saccade length during the reading of Chinese. Journal of Experimental Psychology Learning Memory and Cognition, 41, 1229–1236.

    PubMed  Google Scholar 

  • Luke, S.G., & Christianson, K. (2016). Limits on lexical prediction during reading. Cognitive Psychology, 88, 22–60.

    PubMed  Google Scholar 

  • Luke, S.G., & Christianson, K. (2018). The Provo Corpus: A large eye-tracking corpus with predictability norms. Behavior Research Methods, 50, 826–833. https://doi.org/10.3758/s13428-017-0908-4

    Article  PubMed  Google Scholar 

  • Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. https://doi.org/10.1016/j.jml.2017.01.001.

    Article  Google Scholar 

  • McConkie, G.W., & Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception & Psychophysics, 17, 578-586. https://doi.org/10.3758/BF03203972

    Article  Google Scholar 

  • McConkie, G. W., Kerr, P. W., Reddix, M. D., & Zola, D. (1988). Eye movement control during reading: I. The location of initial eye fixations on words. Vision Research, 28, 245–253.

    Google Scholar 

  • McConkie, G.W., Kerr, P.W., Reddix, M.D., Zola, D., & Jacobs, A.M. (1989). Eye movement control during reading: II. Frequency of refixating a word. Perception & Psychophysics, 46, 245–253.

    Google Scholar 

  • McDonald, S.A., Carpenter, R.H.S., & Shillcock, R.C. (2005). An anatomically-constrained, stochastic model of eye movement control in reading. Psychological Review, 112, 814-840.

    PubMed  Google Scholar 

  • O’Regan, J.K., & Lévy-Schoen, A. (1987). Eye-movement strategy and tactics in word recognition and reading. In: M. Coltheart (Ed.), Attention and performance. The psychology of reading (Vol. 12, pp. 363–383). Erlbaum.

    Google Scholar 

  • O'Regan, K. (1979). Saccade size control during reading: Evidence for the linguistic control hypothesis. Perception & Psychophysics, 25, 501-509.

    Google Scholar 

  • O'Regan, J.K. (1980). The control of saccade size and fixation duration during reading: The limits of linguistic control. Perception & Psychophysics, 28, 112-117.

    Google Scholar 

  • Özkan, A., Fikri, F., Kırkıcı, B., Kliegl, R., & Acartürk, C. (2021). Eye movement control in Turkish sentence reading. Quarterly Journal of Experimental Psychology, 74, 377-397. https://doi.org/10.1177/1747021820963310

    Article  Google Scholar 

  • Pan, J., Yan, M., & Laubrock, J. (2017). Perceptual span in oral reading: The case of Chinese. Scientific Studies of Reading, 21, 254-263. https://doi.org/10.1080/10888438.2017.1283694

    Article  Google Scholar 

  • Peng, D.L., Orchard, L.N., & Stern, J.A. (1983) Evaluation of eye movement variables of Chinese and American readers. The Pavlovian Journal of Biological Science, 18, 94–102. https://doi.org/10.1007/BF03001861

    Article  PubMed  Google Scholar 

  • Rayner, K. (1979). Eye guidance in reading: Fixation locations within words. Perception & Psychophysics, 8, 21–30.

    Google Scholar 

  • Rayner, K. (2009). Eye movements and attention in reading, scene perception, and visual search. Quarterly Journal of Experimental Psychology, 62, 1457-1506. https://doi.org/10.1080/17470210902816461

    Article  Google Scholar 

  • Rayner, K., & Well, A.D. (1996). Effects of contextual constraint on eye movements in reading: A further examination. Psychonomic Bulletin & Review, 3, 504–509. https://doi.org/10.3758/BF03214555

    Article  Google Scholar 

  • Rayner, K., Binder, K.S., Ashby, J., & Pollatsek, A. (2001). Eye movement control in reading: Word predictability has little influence on initial landing positions in words. Vision Research, 41, 943–954.

    PubMed  Google Scholar 

  • Rayner, K., Reichle, E.D., Stroud, M.J., Williams, C.C., & Pollatsek, A. (2006). The effect of word frequency, word predictability, and font difficulty on the eye movements of young and older readers. Psychology and Aging, 21, 448–465. https://doi.org/10.1037/0882-7974.21.3.448.

    Article  PubMed  Google Scholar 

  • Rayner, K., Li, X., & Pollatsek, A. (2007a). Extending the E-Z Reader model of eye movement control to Chinese readers. Cognitive Science, 31, 1021–1033. https://doi.org/10.1080/03640210701703824

    Article  PubMed  Google Scholar 

  • Rayner, K., Pollatsek, A., Drieghe, D., Slattery, T.J., & Reichle, E.D. (2007b). Tracking the mind during reading via eye movements: Comments on Kliegl, Nuthmann, and Engbert (2006). Journal of Experimental Psychology: General, 136, 520–529.

    Google Scholar 

  • Reichle, E.D., Pollatsek, A., Fisher, D.L., & Rayner, K. (1998). Toward a model of eye movement control in reading. Psychological Review, 105, 125–157.

    PubMed  Google Scholar 

  • Reichle, E.D., Rayner, K., & Pollatsek, A. (1999). Eye movement control in reading: Accounting for initial fixation locations and refixations within the E-Z Reader model. Vision Research, 39, 4403–4411. https://doi.org/10.1016/S0042-6989(99)00152-2

    Article  PubMed  Google Scholar 

  • Richter, E., Yan, M., Engbert, R., & Kliegl, R. (2010, May). Modeling Chinese reading with SWIFT: How does word segmentation affect targeting? Paper presented at the 4th China International Conference on Eye Movements, Tianjin, China. Summary available at https://doi.org/10.17605/OSF.IO/VXFYM

  • Risse, S., Hohenstein, S., Kliegl, R., & Engbert, R. (2014). A theoretical analysis of the perceptual span based on SWIFT simulations of the n+2 boundary paradigm. Visual Cognition, 22, 283-308.

    PubMed  PubMed Central  Google Scholar 

  • Schad, D.J. & Engbert, R. (2012). The zoom lens of attention: Simulating shuffled versus normal text reading using the SWIFT model. Visual Cognition, 20, 391-421. https://doi.org/10.1080/13506285.2012.670143

    Article  PubMed  PubMed Central  Google Scholar 

  • Schilling, H.E.H., Rayner, K., & Chumbley, J.I. (1998). Comparing naming, lexical decision, and eye fixation times: Word frequency effects and individual differences. Memory and Cognition, 26, 1270-1281.

    PubMed  Google Scholar 

  • Shen, E. (1927). An analysis of eye movements in the reading of Chinese. Journal of Experimental Psychology, 10, 158–183. https://doi.org/10.1037/h0075609

    Article  Google Scholar 

  • Sun, F., Morita, M., & Stark, L.W. (1985). Comparative patterns of reading eye movement in Chinese and English. Perception & Psychophysics 37, 502–506. https://doi.org/10.3758/BF03204913

    Article  Google Scholar 

  • Taylor, W.L. (1953). Cloze procedure: A new tool for measuring readability. Journalism Quarterly, 30, 415–433.

    Google Scholar 

  • Tsai, J.L., & McConkie, G.W. (2003). Where do Chinese readers send their eyes? In J. Hyönä, R. Radach & H. Deubel (Eds.), The mind’s eye: Cognitive and applied aspects of eye movement research (pp. 159-176). Elsevier.

    Google Scholar 

  • Tsai, J., Lee, C., Tzeng, O.J.L., Hung, D.L., & Yen, N. (2004). Use of phonological codes for Chinese characters: Evidence from processing of parafoveal preview when reading sentences. Brain and Language, 91, 235–244.

    PubMed  Google Scholar 

  • Tsang, Y.-K., & Chen, H.-C. (2012). Eye movement control in reading: Logographic Chinese versus alphabetic scripts. PsyCh Journal, 1, 128-142.

    PubMed  Google Scholar 

  • Vainio, S., Hyönä, J., & Pajunen, A. (2009). Lexical predictability exerts robust effects on fixation duration, but not on initial landing position during reading. Experimental Psychology, 56, 66–74. https://doi.org/10.1027/1618-3169.56.1.66

    Article  PubMed  Google Scholar 

  • Yan, M., & Kliegl, R. (2016). CarPrice versus CarpRice: Word boundary ambiguity influences saccade target selection during the reading of Chinese sentences. Journal of Experimental Psychology: Learning, Memory and Cognition, 42, 1832-1838. https://doi.org/10.1037/xlm0000276

    Article  Google Scholar 

  • Yan, M., Kliegl, R., Richter, E.M., Nuthmann, A., & Shu, H. (2010). Flexible saccade-target selection in Chinese reading. Quarterly Journal of Experimental Psychology, 63, 705–725. https://doi.org/10.1080/17470210903114858.

    Article  Google Scholar 

  • Yan, M., Zhou, W., Shu, H., Yusupu, R., Miao, D., Krugel, A., & Kliegl, R. (2014). Eye movements guided by morphological structure: Evidence from the Uighur language. Cognition, 132, 181-215. https://doi.org/10.1016/j.cognition.2014.03.008

    Article  PubMed  Google Scholar 

  • Yan, M., Zhou, W., Shu, H., & Kliegl, R. (2015). Perceptual span depends on font size during the reading of Chinese sentences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41, 209–219. https://doi.org/10.1037/a0038097

    Article  PubMed  Google Scholar 

  • Yan, M., Pan, J., Chang, W., & Kliegl, R. (2019a). Read sideways or not: Vertical saccade advantage in sentence reading. Reading and Writing, 32, 1911-1926. https://doi.org/10.1007/s11145-018-9930-x

    Article  Google Scholar 

  • Yan, M., Pan, J., & Kliegl, R. (2019b). Eye movements control in Chinese reading: A cross-sectional study. Developmental Psychology, 55, 2275–2285. https://doi.org/10.1037/dev0000819yan.

    Article  PubMed  Google Scholar 

  • Yang, H.-M., & McConkie, G.W. (1999). Reading Chinese: Some basic eye-movement characteristics. In J. Wang, A.W. Inhoff, and H-C. Chen (Eds), Reading Chinese script: A cognitive analysis (pp. 207-222). Lawrence Erlbaum.

    Google Scholar 

  • Yang, S.-N., & McConkie, G.W. (2004). Saccade generation during reading: Are words necessary? European Journal of Cognitive Psychology, 16, 226-261.

    Google Scholar 

  • Yang, J., Wang, S., Xu, Y., & Rayner, K. (2009). Do Chinese readers obtain preview benefit from word n + 2? Evidence from eye movements. Journal of Experimental Psychology: Human Perception and Performance, 15, 1192–1204.

    Google Scholar 

  • Yen, M.-H., Radach, R., Tzeng, O., Hung, D., & Tsai, J.-L. (2009). Early parafoveal processing in reading Chinese sentences. Acta Psychologica, 131, 24–33. https://doi.org/10.1016/j.actpsy.2009.02.005

    Article  PubMed  Google Scholar 

  • Yen, M.-H., Radach, R., Tzeng, O., & Tsai, J.-L. (2012). Usage of statistical cues for word boundary in reading Chinese sentences. Reading & Writing, 25, 1007-1025.

    Google Scholar 

  • Zhou, W., Wang, A., Shu, H., Kliegl, R., & Yan, M. (2018). Word segmentation by alternating colors facilitates eye guidance in Chinese reading. Memory & Cognition, 46, 729–740. https://doi.org/10.3758/s13421-018-0797-5.

    Article  Google Scholar 

Download references

Author Note

E21 Humanities and Social Sciences Building, Avenida da Universidade, Taipa, Macau. The authors thank Yingyi Luo for her assistance in checking materials. This research was supported by a Multi-Year Research Grant from the University of Macau (MYRG2020-00120-FSS). Early empirical results were initially presented at the 6th China International Conference on Eye Movements (Kliegl, 2014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Yan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pan, J., Yan, M., Richter, E.M. et al. The Beijing Sentence Corpus: A Chinese sentence corpus with eye movement data and predictability norms. Behav Res (2021). https://doi.org/10.3758/s13428-021-01730-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-021-01730-2

Keywords

  • Corpus analysis
  • Eye tracking
  • Chinese reading
  • Predictability