Introduction

The study of reading requires the investigation of perceptual and cognitive processes at multiple levels, including oculomotor control, word identification, sentential level processes, and discourse comprehension. Over the past two decades, eye movement control models and analysis methods have been developed to investigate the relationship between word recognition dynamics and eye movements during text reading, aiming to explain directly unobservable reading processes such as word recognition in terms of observable phenomena, mainly eye movements. Numerous features of words are used as variables, the most popular being word frequency, word length, and the predictability of words in sentential contexts. Depending on the stimulus design, these features (e.g., word frequency and length) are calculated using general purpose corpora and massive data collection sessions (e.g., sentential predictability). Dependent variables include numerous oculomotor parameters, such as single fixation duration on target words, pre-target words, post-target words, fixation counts, and regressions within or across word boundaries, e.g., Inhoff et al. (2011); Kliegl et al. (2006); Kliegl (2007); Laubrock & Kliegl (2015); Pollatsek et al. (2000); Hyönä & Pollatsek (1998); Järvilehto et al. (2009); Yan et al. (2014); Deutsch & Rayner (1999); Paterson et al. (2015); Özkan et al. (2021)); see Rayner (1998, 2009a, 2009b); Rayner et al. (2012) for reviews.

As another crucial aspect in reading research, a diverse set of experimental paradigms have been used to study the role of sound coding in skilled reading, including masked phonological priming, articulatory suppression, auditory input distraction, electromyography recordings of articulatory muscles, and the gaze-contingent boundary paradigm, which often accompany tasks such as naming, lexical decision, semantic categorization, and sentence or text reading. The findings obtained provide supporting evidence for the presence of sound coding in the reading process. There is an ongoing debate on the possible impact of sound encoding on eye movements in reading. Such an impact may be realized as early involvement of phonological processes in lexical access, or the retention of words can be realized as phonological representations during post-lexical integration (Slowiaczek & Clifton, 1980; Van Orden, 1987; Guerrero, 2005; Inhoff et al., 2004; Eiter & Inhoff, 2010; Ashby et al., 2006; Frost, 1998, 2005; Coltheart et al., 2001; Rastle and Brysbaert, 2006; Acartürk et al., 2017; Leinenger, 2014). Previous research, which is based mainly on empirical investigations, shows that direct auditory input registration (Inhoff et al., 2004) and its combination with articulatory suppression as a secondary task (Slowiaczek & Clifton, 1980; Eiter & Inhoff, 2010) influence the duration of subsequent fixation and comprehension of the text. However, studies focusing on Eye Voice Span (EVS) in reading aloud have indicated a dynamic modulation of EVS, keeping a uniform distance between the eyes and the voice, thus implying the retention of a manageable number of items in working memory; see Inhoff et al. (2011); Laubrock & Kliegl (2015); Leinenger (2014) for reviews of the role of sound coding in post-lexical processing in reading.

Eye movement datasets usually present the characteristics of words and a set of eye movement variables concerning the words in a text and, thereby, provide eye movement data for analyses. Recently, eye movement corpora for numerous languages have emerged (Schilling et al., 1998; Kennedy et al., 2003, 2013; Kliegl et al., 2004; Laurinavichyute et al., 2019; Luke & Christianson, 2018; Pan et al., 2021), some of which have been established on multiple dimensions, such as monolingual and bilingual reading (Cop et al., 2017), cross-linguistic multilingual reading (Siegelman et al., 2022), and reading development in children in both silent and oral modalities (Vorstius et al., 2014). The existing eye movement datasets vastly differ in their material selection methodologies. Some employ an experimental approach, in which a set of selected target words is manipulated (Kliegl et al., 2004; Laurinavichyute et al., 2019; Vorstius et al., 2014), while others employ a corpus-analytical approach, in which participants read a set of sentences without any manipulation of target words (Kennedy et al., 2003, 2013). Another aspect in which the existing eye movement datasets differ is the selection of the variables investigated. For example, some available datasets include predictability norms besides word frequency and length (Kennedy et al., 2003, 2013; Kliegl et al., 2004; Laurinavichyute et al., 2019; Luke & Christianson, 2018; Pan et al., 2021), while others only provide the eye movement data (Siegelman et al., 2022). Another aspect is that while most eye movement corpora cover silent reading, some oral reading data is also available with a specific focus on Eye Voice Span (Inhoff et al., 2011; Laubrock & Kliegl, 2015).

The present study presents TURead, an eye movement dataset of reading in Turkish, a largely understudied language in reading research. A small dataset of eye movements in Turkish was recently made available for silent reading, established using a corpus analytical approach (Özkan et al., 2021). TURead differs from this available dataset in several dimensions. TURead assumes an experimental approach in which target words were manipulated based on word length, frequency, and the number of suffixes. The main body of TURead includes lexical and prelexical characteristics of the target words, their predictability scores, and specific measures for oral reading. TURead aims to provide empirical data to facilitate further investigations of the reading characteristics of Turkish, an agglutinating language with rich morphology and shallow orthography. These characteristics of Turkish make it particularly suitable for studying early phonological processing, word frequency and length effects, and morphological complexity, which may be conceived as the primary components of the cognitive processes involved in reading.

Regarding the investigation of word identification processes in Turkish reading, in a study of sentential pseudoword reading in Turkish, several measures of letter frequency showed significant effects on measures of eye movement (Acartürk et al., 2017). Statistically significant effects on fixation duration were obtained for word center consonant collocation frequency and word boundary frequency, in addition to a significant interaction of vowel harmony collocation frequency (reflecting the vowel harmony rules that restrict vowel sequences in Turkish words) and word boundary collocation frequency. The observed effects were interpreted as instances of the impact of phonotactics on Turkish reading, since the stimuli consisted of pseudowords. Therefore, TURead was designed to include the prelexical characteristics of the target words to address the possible impact of phonotactics in Turkish.

In TURead, we included EVS measures, in addition to silent reading measures, to improve their potential to contribute to the study of the influence of phonological representations of words in reading. We also included the results from two memory tests (a Corsi Block test and a digit span test). A possible use of the results of the memory test is to investigate the relationship between the retention of items in the working memory and the reading process (for example, an analysis of the influence of the working memory scores on EVS as the number of words, Özkan 2020).

TURead includes an additional set of variables, mainly novel for reading research. One is suffix-level predictability values, and the other is familiarity ratings for target words. The former has the potential to be used in the analysis of morphological complexity, whereas the latter can be used to investigate early phonological processing within specific theoretical frameworks, such as the dual-route hypothesis, which assumes a direct lexical access route for words and an indirect access route through prelexical grapheme-to-phoneme rules for novel words (Coltheart et al., 2001). Finally, TURead can also be used as a new benchmark for future research on Turkish reading. In the following section, we present the methodology behind TURead.

Methodology

Participants

A total of 215 participants (M = 22.72, SD = 2.61 years old; 102 females) participated in the experiment for monetary compensation of approximately 5 US Dollars. Each participant signed an informed consent form and completed a demographic data form prior to the eye movement recording session. We excluded data from 15 participants, since (i) the native language of one participant was not Turkish, (ii) eight participants identified themselves as bilingual, (iii) two participants reported having dyslexia, (iv) two participants used contact lenses during the experiment (no participant had corrected vision with glasses) and (v) two participants read 50% of the stimuli text twice due to technical problems (total data loss 6.9%; M = 23.13 years old, SD = 2.36; seven females). An inspection of the eye movement data recorded revealed that the data of the other four participants were not eligible due to technical problems, such as electricity supply problems during the recording session (data loss 1.9%; M = 21.00 years old, SD = 2.08; two females). As a consequence, the data collected from 196 participants were included in TURead (91.2% of 215 participants; M = 22.72 years old, SD = 2.64; 93 females).

Materials

TURead consists of 192 short texts, each composed of 1-3 sentences (s). Each text includes a target word designed for the purpose of the study. The target words were selected from the BOUN web corpus according to their stem frequencies and lengths. The BOUN web corpus includes 1,337,898 distinct words (types) and 383,224,629 word tokens (Sak et al., 2008). The surface frequency of a word was calculated in terms of word tokens such that the surface count was the sum of the occurrences of the exact form of the word in the corpus.

Two groups of target words were selected from the BOUN corpus based on their stem surface frequencies: low frequency words and high frequency words. The cut-off point for stem surface frequencies was 0.75 (frequency per million), which was the mean of the BOUN corpus (SD = 35.50).Footnote 1 As for word length, TURead included short target words and long target words. The stems of the short target words consisted of four letters (for example, masa, ‘table’), while the stems of the long target words consisted of ten letters (e.g., bilgisayar, ‘computer’). Consequently, the target word set had four conditions based on the combination of stem length and surface stem frequency (henceforth, conditions): Short-Infrequent (SI) words, Long-Infrequent (LI) words, Short- Frequent (SF) words, and Long-Frequent (LF) target words.

There were 16 words per condition. The stimuli (that is, the texts) also included suffixed forms of the target words, which bore the allomorphs of the Turkish locative marker -DA (-de / -da / -te / -ta), and of -DAki,Footnote 2 the combination of the locative marker and the suffix -ki (-deki / -daki / -teki / -taki). The selected suffixes are among the most frequently used in Turkish, as revealed by an analysis of suffix frequencies in the corpus. In total, the target word set consisted of 192 words.

The four word conditions were constructed such that the characteristics of the target words (i.e., stem length and frequency per million) were homogeneous within each condition, as desired for the validity of the design. In other words, the mean frequency per million values were not different between short and long words within each frequency condition for stems, one-suffix words, and two-suffix words (e.g., the mean stem frequency per million values of short words were not significantly different from that of long words among frequent words). The mean surface frequency values (per million) of the stem and suffixed versions of the target words, together with the ANOVA results, are presented in Table 1.

Table 1 Mean surface frequency values of target words by condition and ANOVA results of the difference between mean frequency values between length conditions

The texts consisted of 1-3 sentences. The sentences within the texts were selected from a set of sources, including the BOUN Corpus (Sak et al., 2008), the METU Turkish Corpus (Say et al., 2002), and the Turkish National Corpus (Aksan et al., 2012). Due to the agglutinating structure of Turkish, it was difficult to find suffixed forms of infrequent words within the aforementioned sources. In such cases, the sentences were retrieved from publicly available sources (e.g., search engine results) or a synonym was used in place of a target word in a sentence. This methodology allowed us to use publicly available texts instead of generating sentences on purpose, thus improving the ecological validity of the experiment. In addition to the stimuli texts, four paragraphs were used as filler material. The paragraphs were excerpted from a novel (Bıçakçı, 2006). Stimuli texts are publicly available in the online repository.Footnote 3

In the resulting stimuli, neither the number of words (M = 15.33, SD = 2.88 words) in each text nor the number of characters (M = 125.13, SD = 20.78 characters) were significantly different between the experimental conditions (F(3,188) = 1.00, p > 0.05, F (3, 188) = 2.20, p >.05, respectively). As for the number of characters in each line that included a target word, there were no significant differences between the four conditions (M =60.92, SD = 4.50, F(3,188) = 0.65, p > 0.05), see Table 2.

Another design principle applied during the development of TURead was that the target words were located approximately in the middle of a line. In other words, the number of characters to the left of a target word (M = 26.66, SD = 8.52) was close to the number of characters to the right (M = 25.77, SD = 7.93). The target words were also located approximately in the middle of the text. The character count from the onset of the text to the onset of the target word was M = 56.70 (SD = 27.85), and the character count from the end of the target word to the end of the text was M = 59.43 (SD = 22.93).

As briefly stated above, some of the stimuli texts consisted of more than one sentence (155 texts include a single sentence, 34 texts include two sentences, and three texts include three sentences). Orthographically, each text was presented on at least two lines (107 of 192 texts) and at most three lines (85 of 192 texts). There were at least two words between a target word and the onset or end of a sentence. Another principle of stimuli design was that sentences were selected from available corpora or public resources such that there was no punctuation mark around the target words. There were at least two words between the target word and the conjunction in case of the presence of a conjunction in a sentence. Finally, each text included only one target word. Each target word appeared in one single text and only once in a text. Hence, each target word and its suffixed forms appeared only once in the stimuli text set.

Table 2 Word and character counts of the stimuli texts

Apparatus

The eye movements of the participants were recorded using a monocular camera (right eye) embedded in an SR Research EyeLink 1000 eye tracker system with a tower mount, which has a recording frequency of 1000 Hz. The stimuli were presented on a 17-inch CRT monitor with 1024 x 768 resolution, with a VGA connection to a computer running at 3.0 GHz under the Windows XP operating system. The audio files were recorded for each text stimulus and the filler paragraphs using a compatible sound card (Creative Labs Sound Blaster Audigy 2 ZS). The participants were seated approximately 65 cm away from the display screen with their heads positioned on a forehead rest. The stimuli were presented using 18 pt monospace font (Courier New), each letter corresponding to 14.03 pixels, and approximately 0.46 degrees of visual angle. Since the experiment included oral reading blocks, only the forehead, but not the chin, was fixed with a chinrest to minimize head movements.

Each text was followed by a Yes/No comprehension question. The participants answered the comprehension questions using a Microsoft USB Sidewinder gamepad, and proceeded with the experiment after breaks, calibrations, and reading instructions. They were instructed to answer each question as False by using the back-left button or as True by using the back- right button. These instructions appeared below each question in parentheses.

Design and procedure

The experimental stimuli were designed using the eye tracker manufacturer software, Experiment Builder version 1.10.1630. The recording session consisted of two parts each include two blocks, one silent reading block, and one oral reading block (that is, the reading modality), each lasting approximately 45 min. The experiment was conducted using a within-subject design. The reading modality of the texts and the order of the experiment blocks were counterbalanced by distributing 48 combinations (of the texts, conditions, the reading modality, and the order of the blocks) among the participants. The order of the texts within each block was also randomized. Consequently, each text was read both silently and aloud by different participants, and each participant read half of the stimuli texts silently and the remaining half of the stimuli aloud. Most of the participants completed both parts of the recording sessions on the same day (N = 213 of 215).

Each block consisted of a practice session (including four sample texts, two practice questions, and a filler paragraph) and a main reading session. The main reading sessions consisted of 48 stimuli texts in each block (cf. four target words x three suffix versions x four conditions). The entire recording session consisted of 192 stimuli texts and 48 true-or-false comprehension questions in total. The comprehension questions were prepared such that the correct answer to 94 of them was False and the remaining 98 required True as an answer. The participants correctly answered most of the questions (M = 88.84%, SD = 5.89%). There was a break after every 16 texts and between the blocks, summing up to ten breaks throughout the whole experimental session.

Instructions were presented to participants at the beginning and also between blocks for specific reading modalities. A nine-point standard calibration and validation was performed for the eye movement recordings. The calibration and validation procedures were renewed after each break. Participants were instructed to read the texts at their normal reading pace for comprehension, either silently or aloud, depending on the reading modality of the block. On the left of the screen, a gaze-contingent fixation marker (a circle with a diameter of 32 pixels) was displayed, on a blank screen, before the presentation of a stimulus on the screen. The coordinates of the fixation marker were px. 42 - px. 250 for the stimuli texts and px. 28 - px. 150 for the filler paragraphs (coordinate px. 0 - px. 0 defines the upper left corner of the screen). The non-visible IA (Interest Area) had a diameter of 150 pixels around the fixation marker. Following a fixation duration of the fixation marker of 1000 ms within the IA, the stimulus appeared on the screen with the first letter on the same coordinates as the coordinates of the fixation marker. If no fixation fell within the IA of the fixation marker for a duration of 1000 ms or longer for 10 s, an auto-calibration process was triggered for recalibration. Together with the texts, there was another fixation marker and an IA near the bottom right corner of the screen, which was the same size as the previous fixation marker. The coordinates were px. 982 - px. 700 for text stimuli and px. 981 - px. 715 for the filler paragraphs. The second fixation marker was also gaze-contingent. However, it was used to trigger the display of the next screen. Automatic recalibration was not triggered for the second fixation marker to avoid limiting the duration of the reading of the participants. If no fixation was detected for 1000 ms or longer in the IA of the fixation marker, the experimenter manually displayed the next screen using the keyboard of the host PC (that is, the computer that controls the eye tracker). This action started the automatic recalibration process. Figure 1 illustrates the procedure in one block. The procedure for the silent block and that of the oral block were identical.

Fig. 1
figure 1

The procedure employed in each block. Each text was followed by a comprehension question, a fixation marker for the next screen, or a break

For the analyses, each word was marked as an IA using the Use Runtime Word Segment InterestArea function of the Experiment Builder software. IAs for the fixation markers were constructed manually. The IA sets were then reconstructed to include the space before a word within the IA of that word. They were then used in Data Viewer version 3.2.1, the analysis software provided by the eye tracker manufacturer. Since the IAs for the fixation markers shown on the blank screen before the texts were used only during experiments for their gaze-contingent functions, the IAs for those fixation markers were removed from the IA sets for the analyses. However, the IAs for the fixation markers shown together with the texts were preserved in a new IA set to detect and eliminate rereading fixations.

Memory and familiarity tests

Two memory tests were performed after the recording sessions (a Corsi Block test and a digit span test). Participants also completed an additional seven-point Likert scale familiarity test for target words. The raw scores of the memory tests were included in TURead with the variables names CORSI BLOCK SCORE and DIGIT SPAN SCORE. The mean values of the Corsi scores and the digit span scores are presented in Table 3.

Table 3 Mean scores and standard deviations of the digit span and Corsi tests (standard deviations presented in parentheses)

The familiarity test was administered after the recording session to prevent participants from seeing the target words prior to the experiment. Participants were instructed to score their levels of familiarity with the target words on a 7-point Likert scale (1 for I have never heard the word and 7 for I know the meaning of the word). Our analyses revealed that the stem frequencies (per million) and the familiarity rating scores of the words were significantly correlated, rs =.856, p <.001. The raw scores from the familiarity rating task were included in TURead under the variable name FAMILIARITY RATING. The mean familiarity rating scores are presented in Table 4.

Table 4 Mean familiarity ratings for target words (standard deviations presented in parentheses)

Predictability scores

The predictability scores for the target words were collected from 122 participants (mean age M = 24.01, SD = 4.26, ten participants did not report their year of birth; 82 females, one participant did not report gender), who did not participate in the main experimental recording sessions. A Cloze procedure was used to score sentential predictability (Taylor, 1953). Predictability scores were also collected for neighboring words of each target word (i.e., words n-1 and n+1) from a separate group of 70 participants, who made 35 predictions for each word (word n-1: M = 25, SD = 5.94 years old, two participants did not report their birth year; 21 females; word n+1: M = 25.6, SD = 5.07 years old, five participants did not report their year of birth; 25 females). Participants were asked to predict the following word (the target word n, the word n-1, or the word n+1) given the context in the part of the text prior to the target word. In addition to word predictability, two sets of suffix predictability data were collected from another group of participants, who also participated in neither the data recording sessions nor the word predictability scoring. A total of 110 participants were asked to predict the suffix of one-suffixed target words (M = 21.78, SD = 2.53 years old, seven participants did not report year of birth; 51 females, one participant did not report any gender), while 69 participants were asked to predict two suffixes of two-suffixed target words (M = 22.29, SD = 3.32 years old, three participants did not report birth year; 36 females) given the context in the part of the text prior to the suffix(es) of the target word, including the target word.

The TURead dataset

In this section, we introduce the variables that provide general information in TURead, about the participants, the experimental blocks, the reading modality, and the stimuli texts (see Table 16 in the Appendix for general variables). We have also defined several variables to allow detailed analyses of the data. These are presented in Tables 17 and 18 in the Appendix. The full set of variables can be accessed in the online repository.

Eye movement data inspection and cleansing

Eye movement data were inspected, cleansed, and corrected manually where necessary (e.g., in cases of regular offset errors), as described in this section. Eye movement measures were retrieved by Data Viewer analysis software.Footnote 4

Manual inspection of gaze data revealed two types of calibration problems: (i) all fixations were above or below the lines (the offset problem), (ii) the fixations were upward or downward sloping. They were resolved by (1) selecting all fixations and moving them downward or upward, (2) selecting the fixations belonging to the same line and aligning them using the Drift Correct function of the Data Viewer, or (3) using a combination of (1) and (2). Fixations were moved only vertically when needed, and no fixations were moved horizontally (i.e., the coordinates of fixations along the X-axis were not updated) according to best practice in the literature (Holmqvist et al., 2011). When no solutions were applicable to a trial with calibration problems, that specific trial was removed from the analyses. Consequently, a total of 60 trials (of 37,632) were eliminated (0.16%). In 156 trials, the stimuli were read twice by the participant. These were also removed from the analyses (0.41%). The sum of the partial data loss was 216 trials (0.57%). Data loss statistics by elimination criteria are presented in Table 5.

Table 5 Eliminated data based on eye movement measures and articulation- related criteria

No further data were eliminated, but the data were labeled to indicate the possibility of further elimination for potential analyses in the future (see Table 19 in the Appendix).

Eye movement measures

This section presents the description and data for common eye movement measures in the literature, such as word skipping rates, fixation duration, count and location variables, saccadic amplitude, and reading rate. Eye movement measures were either retrieved from the Data Viewer software or calculated using several variables provided by the software. The full set of eye movement variables in TURead is presented in Table  20 in the Appendix. The following sections present a snapshot of the values for selected variables.

Word skipping

This section presents the descriptive statistics by condition and by reading modality (oral reading vs. silent reading.) for skipping (Table 6). The stimuli of the present study consisted of 192 texts including one target word each, organized according to their frequencies and lengths in four conditions: Short-Frequent (SF), Short-Infrequent (SI), Long-Frequent (LF), and Long-Infrequent (LI) target words.

In general, the findings show that word skipping is more frequently observed in short words, both in silent reading and oral reading, compared to long words, which is consistent with the findings reported in the literature.

Fixation, saccades, and reading rates

In this section, the eye movement measures are presented in terms of six major variables: Fixation Duration (FD) in terms of First Fixation Duration (FFD), Gaze Duration (GD, also known as First Pass Dwell Time), and Total Fixation Duration (TFD); Saccadic Amplitude (Amp) in terms of the Last Saccade (Last) and the Next Saccade (Next); First Pass Fixation Count (FPFC), First Fixation Location (FFL), Launch Site (LS) and Reading Rate (RR) in Table 7. Fixations after the first fixation on the right bottom fixation marker (that is, rereadings) were removed, except for reading rate calculation.

The findings show that the first pass fixation counts (1) increase as the length of words increases, (2) increase as the frequency of words decreases, and (3) are more frequent in oral reading than in silent reading. Another finding is that the mean first fixation and gaze durations are longer for oral reading than for silent reading. Moreover, the mean first landing positions are slightly to the left of the word center, and the saccade amplitude is approximately seven characters for oral reading, whereas it is about eight characters for silent reading. These findings are largely compatible with the literature on reading research in most languages.

Audio recording analysis and the eye voice span measures

The texts and paragraphs read aloud by participants were recorded as waveform (.wav) audio files, separately for each trial. The start times of the articulation and the end times of the articulation of the target words were manually annotated using the ELAN software (Brugman & Russel, 2008). The beginnings and ends of the articulations were identified listening to the audio files and marking the wave beginnings in the ELAN interface. The tier sets included one tier for each target word imported into the ELAN file (.eaf) for each participant. ELAN annotations were labeled on those tiers. If a target word was not articulated correctly in a trial (e.g., in case of the utterance of a different word than the written one, reading the target word more than once, or stuttering while reading the target word), the audio file of that trial was not annotated and was removed from the analyses. In total, 92.39% of the audio recording annotations were controlled and refined by a second annotator. The annotations provided time stamps of the start and end of an articulation, which allowed synchronization of articulation times and eye movements. For synchronization, the start times of the audio recording and the first fixation start times were calculated according to the eye tracker time using (1) and (2).

$$\begin{aligned} A_{tracker} = A_{pc} - t_{pc} + t_{tracker} \end{aligned}$$
(1)
$$\begin{aligned} FF_{tracker} = FF_{trial} + t_{trial} \end{aligned}$$
(2)

The start times for the audio recording were calculated using (1), where Atracker stands for the start time for the audio recording in the eye tracker time. Apc is the start time of the audio recording on the display PC time, which was recorded in a variable defined for this purpose. tpc is the current time of the display PC, recorded in a separate variable. Finally, ttracker is the current eye tracker time when tpc value is updated. The fixation start and end times were provided relative to the trial start time by the Data Viewer software. Therefore, the first fixation start time was calculated relative to the start time of the tracker by (2), where FFtracker is the first fixation start time relative to the start time of the eye tracker, FFtrial is the first fixation start time relative to the start time of the trial, and ttrial is the start time of the trial relative to the start time of the eye tracker. The variables used to calculate FFtracker were provided by the Data Viewer software.

Figure 2 shows an example of a fixation immediately following the target word (n+1). In the example, at the start time of the articulation of the target word jeodinamik ‘geodynamics’, there is a fixation on the second ‘i’ of karakterinin ‘of character’ at n+1. Accordingly, the EVS value in this example consists of 22 characters and the value of the EVS-word is 1.

Table 6 Number and percentage of skipped and fixated words in oral reading and silent reading, for Short-Frequent (SF), Short-Infrequent (SI), Long-Frequent (LF), and Long-Infrequent (LI) target words
Table 7 Fixations, saccades, and reading rates in oral and silent reading (standard deviations in parentheses)
Fig. 2
figure 2

Eye Voice Span (EVS) in character count

Four Eye Voice Span (EVS) measures were included in TURead: (i) The duration between the beginning of the articulation of a word and the first fixation time on the target word, the Fixation Speech Interval, FSI, following the relevant studies on EVS (Inhoff et al., 2011), (ii) the distance between the first letter of the target word and the character fixated at the beginning of the articulation of the target word, in terms of character count (EVS-char), (iii) the distance between the target word and the word fixated at the beginning of the articulation of the target word, in terms of word count (EVS-word), and (iv) the duration of the articulation.

A sample FSI (Fixation Speech Interval) is illustrated in Fig. 3. The first fixation on the target word jeodinamik ‘geodynamics’ starts 2900 ms after the onset of the trial. The articulation of the same word, in this example, starts 3839.54 ms after the onset of the trial. The resulting FSI is 939.54 ms.

Fig. 3
figure 3

A sample Eye Voice Span (EVS) in time interval (FSI)

The variables related to oral reading and their descriptions are presented in Table 8.

Table 8 Variables related to oral reading

Table 9 shows the values of the EVS measures in TURead, for Short-Infrequent (SI) words, Long-Infrequent (LI) words, Short-Frequent (SF) words, and Long-Frequent (LF) target words.

Table 9 Eye Voice Span (EVS) measures in oral reading

The findings show that the mean FSI values are higher in oral Turkish reading compared to English and German under all conditions (486 ms for English Inhoff et al., 2011), 561 ms for German Laubrock & Kliegl 2015). However, the values are close to the FSI values reported for Finnish (625 ms, Järvilehto et al., 2009). Given that Turkish and Finnish share agglutinating characteristics and shallow orthograpies, the findings are not unexpected. However, the mean values of EVS-char (i.e., the spatial measure of EVS in terms of character count) are shorter in all four conditions compared to the findings reported in the literature (e.g., 15-17 characters in Buswell (1920); 16 characters calculated from the first fixation onset in Laubrock & Kliegl (2015)). In Turkish, approximately one more word was viewed during the FSI of short words, while the eyes tended to be on the same word at the beginning of its articulation for long words. We suggest that the discrepancy observed between the EVS measures obtained for Turkish sentence reading and those reported in the literature (except those for Finnish) is a result of the shallow orthography of Turkish. On the other hand, the inflated FSI could be an indicator of increased prelexical phonological processing for languages with shallow orthographies, as suggested in Frost (1998, 2005). However, these claims require further investigation with cross-linguistic studies.

Prelexical characteristics

A set of prelexical characteristics were included in TURead, which identified the characteristics of a target word (n), the word prior to the target word (n-1), and the word next to the target word (n+1), including Vowel Harmony, and a set of variables for bigrams and trigrams. These were selected due to the potential impact of the phonological characteristics of Turkish words, particularly vowels.

The Turkish alphabet includes eight vowels, grouped according to the height of the tongue, the roundedness of the lips, and the frontness of the tongue during articulation (Table 10).

Table 10 Vowels in Turkish

Vowel distribution in Turkish words is mostly restricted according to vowel harmony rules. The vowels in the suffixes usually agree with the vowel in the last syllable of the stem to preserve vowel harmony, although there are exceptions (e.g., -ki, one of the frequently used suffixes, also used in the present study). Most of the exceptions to vowel harmony in Turkish are loan words (Göksel & Kerslake, 2005). The vowel sequences, allowed according to vowel harmony, are presented in Table 11.

Table 11 Vowel sequences allowed according to vowel harmony

The variable VH (Vowel Harmony) was included in TURead as a categorical variable with two levels, showing whether the rule was broken or not. A respected VH rule was labeled 0, and a broken VH rule was labeled 1. In addition, the number of broken instances was calculated, as presented in the next section.

Further characteristics considered in designing TURead were Trigram Frequency (TF) and Bigram Frequency (BF) of n (the target word), n-1 (the word preceding the target word), and n+1 (the word following the target word). They are assumed to capture the phoneme environment since different pronunciations of phonemes (i.e., allophones) are context dependent. For instance, /h/ is pronounced as a voiceless palatal fricative when it precedes a front vowel. It is also pronounced as a voiceless velar fricative when a back vowel precedes it or as a voiceless glottal fricative when it precedes a back vowel. Sometimes, when it occurs between two identical vowels, it is silent (Göksel & Kerslake, 2005). Due to the restrictions of letter clusters at word-initial and word-final positions, trigrams and bigrams were divided into three subgroups: word-initial, word-final, and between these two. For each group, the frequency values were obtained separately from the BOUN corpus (Sak et al., 2008), depending on the place of the trigram/bigram in the word (Bilgin, 2016). Each prelexical frequency values were calculated as occurrences per million, using Laplace smoothing applying (3), following the previous work on the topic (Brysbaert & Diependaele, 2013) since the data included zero frequency values.

$$\begin{aligned} Fpm = ((Count + 1) / (Token + Type)) * 1,000,000 \end{aligned}$$
(3)

In (3), Fpm stands for frequency per million, Count stands for the number of occurrences of a word in the corpus, Token stands for the number of word tokens in the corpus, and Type stands for the number of word types in the corpus. The average adjacent trigram and adjacent bigram frequencies per million were included in the dataset as trigram frequency (TF) and bigram frequency (BF), respectively. Another important restriction regarding word boundaries was captured by inclusion of the average of word-initial and word-final trigram/bigram frequencies within calculations. The variables for prelexical characteristics and their descriptions are provided in Table 21 in the Appendix, and the number of types and tokens used in (3) are provided in Table 12.

Table 12 The number of types and tokens used for trigram and bigram calculation

The descriptive statistics for the trigram and bigram frequencies and the number of broken vowel harmony instances are presented in the following section, together with the predictability scores and lexical characteristics of the words.

Table 13 Characteristics of the target word (n)

Table 14 presents the characteristics of the words that precede the target word (n-1). Table 15 presents the characteristics of the words that follow the target word (n+1).

Table 14 Characteristics of the words that precede the target word (n-1)
Table 15 The characteristics of the words that follow the target word (n+1)

Predictability scores and lexical characteristics

The predictability scores were collected from 122 participants for the target words, 70 participants for the neighboring words (35 for n-1 and 35 for n+1), 110 participants for the suffix of one-suffixed target words and 69 participants for the suffixes of two-suffixed target words. The least data were collected for neighboring words (35 participants). To have balanced data from the participants for analyses that require it, a randomly selected sample set of 35 participant scores were included for target words and suffixes. The predictability scores of 192 target words from 122 participants and that of the selected 35 participants (M = 23.66, SD = 3.89 years old, three participants did not report birth year; 35 females) were not significantly different (F(1,382) = 0.00004, p \(=.995\)), which justified the selection of a smaller set as a representative set for the predictability scores. The prediction of the suffix of one-suffixed target words from 110 participants and that of selected 35 participants (M = 22.06, SD = 3.32 years old, four participants did not report birth year; 15 females, one participant did not report any gender) were not significantly different (F(1,126) = 0.366, p \(=.546\)), and neither were the prediction of the suffixes of two-suffixed target words from 69 participants and that of selected 35 participants (M = 21.66, SD = 1.54 years old, three participants did not report birth year; 15 females) (F(1,126) = 0.05, p \(=.824\)). In addition to the randomly selected sample sets of 35 participant scores for target words and suffixes, predictability scores of all available data were also included in the TURead Dataset. The information of the number of participants that contributed to the predictability scores for each predictability variable in the dataset was indicated in the variable name. For example, there are two variables for word (n) predictability scores such that the variable named p0_122_participants is calculated on the scores of 122 participants and the variable named p0_35_participants is calculated over the scores of 35 participants. For all predictability data, the correct predictions were scored as 1, and the incorrect predictions were scored as 0. The probability (p) of a correct prediction was calculated using (4), where num stands for the number of predictions for each word.

$$\begin{aligned} p = number\ of\ correct\ predictions\ /\ num \end{aligned}$$
(4)

The variables for the predictability calculations and their descriptions are presented in Table 22 in the Appendix.

In addition to word-level and suffix-level predictability, TURead includes further variables that identify the characteristics of a target word (n), the word prior to the target word (n-1), and the word next to the target word (n+1), such as familiarity ratings, word lengths, inflectional suffix counts, stem lengths, word frequencies, stem frequencies, trigram and bigram frequencies, and vowel harmony states.

Surface frequency values were obtained from the BOUN corpus (Sak et al., 2008). Lexical frequencies per million were calculated using Laplace smoothing (Brysbaert & Diependaele, 2013) applying (3). The number of word tokens was 383,224,629 and the number of word types was 1,337,898 used for the calculation. The Fpm values for lexical frequencies can be back-transformed using the same formula. The variables for the lexical characteristics of the words and their descriptions are presented in Table 23 in the Appendix. Table 13 presents the characteristics of the target words (n).

Discussion

The present study presents TURead, an eye movement dataset of silent and oral reading in Turkish, with an experimental approach in which the target words were manipulated on the basis of word length, frequency, and number of suffixes. TURead aims to provide empirical data for a diverse set of analyses. To provide benchmark data for analyses, word characteristics variables such as length, frequency, and predictability of target words and neighboring words are provided in the dataset (e.g., Kliegl et al., 2006). For further analysis of the influences of morphological complexity and phonological processing on eye movements during reading, a set of variables related to morphological complexity (e.g., suffix counts, suffix predictabilities, and stem lengths and frequencies) and prelexical characteristics of target words were also included in TURead to address the potential impact of phonotactics in Turkish (Acartürk et al., 2017). Furthermore, the familiarity scores of the target words included in TURead provide data to further investigations of strong vs. weak approaches to early phonological processing (Frost 1998, 2005, Coltheart et al., 2001; e.g., Özkan 2020). Turkish, as an agglutinating language with rich morphology and shallow orthography, provides a suitable environment for such investigations.

In addition to mostly studied eye movement measures (e.g., first fixation duration, gaze duration, last saccade amplitude, next saccade amplitude, first fixation location and launch site) for both oral and silent reading, four specific measures of oral reading were included in TURead (i.e., fixation speech interval, eye-voice span in terms of character count, eye-voice span in terms of word count and duration of the articulation). As previous research indicates, the eye-voice span measures reflect the manageable count of items held in the memory buffer during reading (e.g., Laubrock & Kliegl 2015; Inhoff et al., 2011). Together with working memory test scores (i.e., Corsi Block test and digit span test scores), oral reading-specific variables provided in TURead could be used in analyses that address memory processes and post-lexical processing involved in reading (e.g., Özkan 2020).

Conclusion

This study presented a new eye movement dataset of sentence reading in Turkish. The dataset consists of 192 sentences read by 215 participants in silent and oral modalities. The variables in the dataset were described together with the data collection, data cleaning, and data analysis procedures. The descriptive statistics of the selected variables in both reading modalities were prelexical and lexical characteristics of previous (n-1), target (n), and next words (n+1), and familiarity ratings of words. In addition to the descriptive statistics of the selected oculomotor measures such as fixation durations and saccade amplitudes, we also reported our findings on FSI (fixation speech interval) and EVS (eye voice span) in Turkish reading. We observed that FSI in Turkish is greater than in English and German but close to Finnish, which also has a shallow orthography. The increase in FSI in languages such as Turkish and Finnish may point to the influence of shallow orthography on prelexical phonological processing. We also observed shorter EVS values in Turkish sentence reading compared to previous research. Again, this difference may be explained by the effect of shallow orthography on the working memory buffer (Laubrock & Kliegl, 2015). More studies are needed in Turkish and other languages with shallow orthography to provide more evidence to support our findings. We believe that TURead will be a valuable and helpful resource for researchers to investigate the interplay between language characteristics and eye movements during reading.