TURead: An eye movement dataset of Turkish reading

In this study, we present TURead, an eye movement dataset of silent and oral sentence reading in Turkish, an agglutinative language with a shallow orthography understudied in reading research. TURead provides empirical data to investigate the relationship between morphology and oculomotor control. We employ a target-word approach in which target words are manipulated by word length and by the addition of two commonly used suffixes in Turkish. The dataset contains well-established eye movement variables; prelexical characteristics such as vowel harmony and bigram-trigram frequencies and word features, such as word length, predictability, frequency, eye voice span measures, Cloze test scores of the root word and suffix predictabilities, as well as the scores obtained from two working memory tests. Our findings on fixation parameters and word characteristics are in line with the patterns reported in the relevant literature.


Introduction
The study of reading requires the investigation of perceptual and cognitive processes at multiple levels, including oculomotor control, word identification, sentential level processes, and discourse comprehension.Over the past two decades, eye movement control models have been developed to investigate the relationship between word recognition dynamics and eye movements during text reading.Such oculomotor control models aim to explain directly unobservable reading processes such as word recognition in terms of observable phenomena, mainly eye movements.Numerous features of words are used as model parameters, the most popular being word frequency, word length, and the predictability of words in sentential contexts.Depending on the stimulus design, these features (e.g., word frequency and length) are calculated using general purpose corpora and massive data collection sessions (e.g., sentential predictability).Dependent variables include numerous oculomotor parameters, such as single fixation duration on target words, pre-target words, post-target words, fixation counts, and regressions within or across word boundaries, e.g., [1]- [11]; see [12]- [15] for reviews.
As another crucial aspect in reading research, a diverse set of experimental paradigms have been used to study the role of sound coding in skilled reading, including masked phonological priming, articulatory suppression, auditory input distraction, electromyography recordings of articulatory muscles, and the gaze-contingent boundary paradigm, which often accompany tasks such as naming, lexical decision, semantic categorization, and sentence or text reading.The findings obtained provide supporting evidence for the presence of sound coding in the reading process.There is an ongoing debate on the possible impact of sound encoding on eye movements in reading.Such an impact may be realized as early involvement of phonological processes in lexical access, or the retention of words can be realized as phonological representations during post-lexical integration [25]- [36].Previous research, which is based mainly on empirical investigations, shows that direct auditory input registration [28] and its combination with articulatory suppression as a secondary task [25], [29] influence the duration of subsequent fixation and comprehension of the text.However, studies focusing on Eye Voice Span (EVS) in reading aloud have indicated a dynamic modulation of EVS, keeping a uniform distance between the eyes and the voice, thus implying the retention of a manageable number of items in working memory; see [1], [4], [36] for reviews of the role of sound coding in post-lexical processing in reading.
Eye movement datasets usually present the characteristics of words and a set of eye movement variables concerning the words in a text and, thereby, provide eye movement data for analyses and oculomotor control models.Recently, eye movement corpora for numerous languages have emerged [16]- [22], some of which have been established on multiple dimensions, such as monolingual and bilingual reading [23], cross-linguistic multilingual reading [24], and reading development in children in both silent and oral modalities [48].The existing eye movement datasets vastly differ in their material selection methodolo-gies.Some employ an experimental approach, in which a set of selected target words is manipulated [19], [20], [48], while others employ a corpus-analytical approach, in which participants read a set of sentences without any manipulation of target words [17], [18].Another aspect in which the existing eye movement datasets differ is the selection of the variables investigated.For example, some available datasets include predictability norms besides word frequency and length [17]- [22], while others only provide the eye movement data [24].Another aspect is that while most eye movement corpora cover silent reading, some oral reading data is also available with a specific focus on Eye Voice Span [1], [4].
The present study presents TURead, an eye movement dataset of reading in Turkish, a largely understudied language in reading research.A small dataset of eye movements in Turkish was recently made available for silent reading, established using a corpus analytical approach [11].TURead differs from this available dataset in several dimensions.TURead assumes an experimental approach in which target words were manipulated based on word length, frequency and number of suffixes.The main body of TURead includes lexical and prelexical characteristics of the target words, their predictability scores and specific measures for oral reading.TURead aims to provide empirical data to facilitate further investigations of the reading characteristics of Turkish, an agglutinating language with rich morphology and shallow orthography.These characteristics of Turkish make it particularly suitable for studying early phonological processing, word frequency and length effects, and morphological complexity, which may be conceived as the primary components of the cognitive processes involved in reading.
Regarding the investigation of word identification processes in Turkish reading, in a study of sentential pseudoword reading in Turkish, several measures of letter frequency showed significant effects on measures of eye movement [35].Statistically significant effects on fixation duration were obtained for word center consonant collocation frequency and word boundary frequency, in addition to a significant interaction of vowel harmony collocation frequency (reflecting the vowel harmony rules that restrict vowel sequences in Turkish words) and word boundary collocation frequency.The observed effects were interpreted as instances of the impact of phonotactics on Turkish reading, since the stimuli consisted of pseudowords.Therefore, TURead was designed to include the prelexical characteristics of the target words to address the possible impact of phonotactics in Turkish.
In TURead, we included EVS measures, in addition to silent reading measures, to improve their potential to contribute to the study of the influence of phonological representations of words in reading.We also included the results from two memory tests (a Corsi Block test and a digit span test).A possible use of the results of the memory test is to investigate the relationship between the retention of items in the working memory and the reading process (for example, an analysis of the influence of the working memory scores on EVS as the number of words, [49]).
TURead includes an additional set of variables, mainly novel for reading research.One is suffix-level predictability values, and the other is familiarity ratings for target words.The former has the potential to be used in the analysis of morphological complexity, whereas the latter can be used to investigate early phonological processing within specific theoretical frameworks, such as the dual-route hypothesis, which assumes a direct lexical access route for words and an indirect access route through prelexical grapheme-to-phoneme rules for novel words [33].Finally, TURead can also be used as a new benchmark for future computational models of reading and to validate the compatibility of existing ones.In the following section, we present the methodology behind TURead.

Participants
A total of 215 participants (M = 22.72, SD = 2.61 years old; 102 females) participated in the experiment for monetary compensation of approximately 5 US Dollars.Each participant signed an informed consent form and completed a demographic data form prior to the eye movement recording session.We excluded data from 15 participants, since (i) the native language of one participant was not Turkish, (ii) eight participants identified themselves as bilingual, (iii) two participants reported having dyslexia, (iv) two participants used contact lenses during the experiment (no participant had corrected vision with glasses) and (v) two participants read 50% of the stimuli text twice due to technical problems (total data loss 6.9%; M = 23.13 years old, SD = 2.36; seven females).An inspection of the eye movement data recorded revealed that the data of the other four participants were not eligible due to technical problems, such as electricity supply problems during the recording session (data loss 1.9%; M = 21.00 years old, SD = 2.08; two females).As a consequence, the data collected from 196 participants were included in TURead (91.2% of 215 participants; M = 22.72 years old, SD = 2.64; 93 females).

Materials
TURead consists of 192 short texts, each composed of 1-3 sentences (s).Each text includes a target word designed for the purpose of the study.The target words were selected from the BOUN web corpus according to their stem frequencies and lengths.The BOUN web corpus includes 1,337,898 distinct words (types) and 383,224,629 word tokens [37].The surface frequency of a word was calculated in terms of word tokens such that the surface count was the sum of the occurrences of the exact form of the word in the corpus.
Two groups of target words were selected from the BOUN corpus based on their stem surface frequencies: low frequency words and high frequency words.
The cut-off point for stem surface frequencies was 0.75 (frequency per million), which was the mean of the BOUN corpus (SD = 35.50). 1 As for word length, TURead included short target words and long target words.The stems of the short target words consisted of four letters (for example, masa, 'table'), while the stems of the long target words consisted of ten letters (e.g., bilgisayar, 'computer').Consequently, the target word set had four conditions based on the combination of stem length and surface stem frequency (henceforth, conditions): Short-Infrequent (SI) words, Long-Infrequent (LI) words, Short-Frequent (SF) words, and Long-Frequent (LF) target words.
There were 16 words per condition.The stimuli (that is, the texts) also included suffixed forms of the target words, which bore the allomorphs of the Turkish locative marker -DA (-de / -da / -te / -ta), and of -DAki2 , the combination of the locative marker and the suffix -ki (-deki / -daki / -teki / -taki ).The selected suffixes are among the most frequently used in Turkish, as revealed by an analysis of suffix frequencies in the corpus.In total, the target word set consisted of 192 words.
The four word conditions were constructed such that the characteristics of the target words (i.e., stem length and frequency per million) were homogeneous within each condition, as desired for the validity of the design.In other words, the mean frequency per million values were not different between short and long words within each frequency condition for stems, one-suffix words, and two-suffix words (e.g., the mean stem frequency per million values of short words were not significantly different from that of long words among frequent words).The mean surface frequency values (per million) of the stem and suffixed versions of the target words, together with the ANOVA results, are presented in Table 1.
The texts consisted of 1-3 sentences.The sentences within the texts were selected from a set of sources, including the BOUN Corpus [37], the METU Turkish Corpus [38], and the Turkish National Corpus [39].Due to the agglutinating structure of Turkish, it was difficult to find suffixed forms of infrequent words within the aforementioned sources.In such cases, the sentences were retrieved from publicly available sources (e.g., search engine results) or a synonym was used in place of a target word in a sentence.This methodology allowed us to use publicly available texts instead of generating sentences on purpose, thus improving the ecological validity of the experiment.In addition to the stimuli texts, four paragraphs were used as filler material.The paragraphs were excerpted from a novel [40].Stimuli texts are publicly available in the online repository. 3n the resulting stimuli, neither the number of words (M = 15.33,SD = 2.88 words) in each text nor the number of characters (M = 125.13,SD = 20.78characters) were significantly different between the experimental conditions (F (3,188) = 1.00, p > 0.05, F (3, 188) = 2.20, p > .05,respectively).As for the number of characters in each line that included a target word, there were no significant differences between the four conditions (M =60.92,SD = 4.50, F (3,188) = 0.65, p > 0.05), see Table 2.
Another design principle applied during the development of TURead was that the target words were located approximately in the middle of a line.In other words, the number of characters to the left of a target word (M = 26.66,SD = 8.52) was close to the number of characters to the right (M = 25.77,SD = 7.93).The target words were also located approximately in the middle of the text.The character count from the onset of the text to the onset of the target word was M = 56.70(SD = 27.85), and the character count from the end of the target word to the end of the text was M = 59.43 (SD = 22.93).
As briefly stated above, some of the stimuli texts consisted of more than one sentence (155 texts include a single sentence, 34 texts include two sentences, and three texts include three sentences).Orthographically, each text was presented on at least two lines (107 of 192 texts) and at most three lines (85 of 192 texts).There were at least two words between a target word and the onset or end of a sentence.Another principle of stimuli design was that sentences were selected from available corpora or public resources such that there was no punctuation mark around the target words.There were at least two words between the target word and the conjunction in case of the presence of a conjunction in a sentence.Finally, each text included only one target word.Each target word appeared in one single text and only once in a text.Hence, each target word and its suffixed forms appeared only once in the stimuli text set.

Apparatus
The eye movements of the participants were recorded using a monocular camera (right eye) embedded in an SR Research EyeLink 1000 eye tracker system with a tower mount, which has a recording frequency of 1000 Hz.The stimuli were presented on a 17-inch CRT monitor with 1024 x 768 resolution, with a VGA connection to a computer running at 3.0 GHz under the Windows XP operating system.The audio files were recorded for each text stimulus and the filler paragraphs using a compatible sound card (Creative Labs Sound Blaster Audigy 2 ZS).The participants were seated approximately 65 cm away from the display screen with their heads positioned on a forehead rest.The stimuli were presented using 18 pt monospace font (Courier New), each letter corresponding to 14.03 pixels, and approximately 0.46 degrees of visual angle.Since the experiment included oral reading blocks, only the forehead, but not the chin, was fixed with a chinrest to minimize head movements.
Each text was followed by a Yes/No comprehension question.The participants answered the comprehension questions using a Microsoft USB Sidewinder gamepad, and proceeded with the experiment after breaks, calibrations, and reading instructions.They were instructed to answer each question as False by using the back-left button or as True by using the back-right button.These instructions appeared below each question in parentheses.

Design and Procedure
The experimental stimuli were designed using the eye tracker manufacturer software, Experiment Builder version 1.10.1630.The recording session consisted of two blocks, one silent reading block and one oral reading block (that is, the reading modality), each lasting approximately 45 minutes.The experiment was conducted using a within-subject design.The reading modality of the texts and the order of the experiment blocks were counterbalanced by distributing 48 combinations (of the texts, conditions, the reading modality and the order of the blocks) among the participants.The order of the texts within each block was also randomized.Consequently, each text was read both silently and aloud by different participants, and each participant read half of the stimuli texts silently and the remaining half of the stimuli aloud.Most of the participants completed both blocks of the recording sessions on the same day (N = 190 of 192).
Each block consisted of a practice session (including four sample texts, two practice questions, and a filler paragraph) and a main reading session.The main reading sessions consisted of 48 stimuli texts in each block (cf.four target words x three suffix versions x four conditions).The entire recording session consisted of 192 stimuli texts and 48 true-or-false comprehension questions in total.The comprehension questions were prepared such that the correct answer to 94 of them was False and the remaining 98 required True as an answer.The participants correctly answered most of the questions (M = 88.84%,SD = 5.89%).There was a break after every 16 texts and between the blocks, summing up to ten breaks throughout the whole experimental session.
Instructions were presented to participants at the beginning and also between blocks for specific reading modalities.A nine-point standard calibration and validation was performed for the eye movement recordings.The calibration and validation procedures were renewed after each break.Participants were instructed to read the texts at their normal reading pace for comprehension, either silently or aloud, depending on the reading modality of the block.On the left of the screen, a gaze-contingent fixation marker (a circle with a diameter of 32 pixels) was displayed, on a blank screen, before the presentation of a stimulus on the screen.The coordinates of the fixation marker were px.42 -px.250 for the stimuli texts and px.28 -px.150 for the filler paragraphs (coordinate px.0 -px.0 defines the upper left corner of the screen).The non-visible IA (Interest Area) had a diameter of 150 pixels around the fixation marker.Following a fixation duration of the fixation marker of 1000 ms within the IA, the stimulus appeared on the screen with the first letter on the same coordinates as the coordinates of the fixation marker.If no fixation fell within the IA of the fixation marker, for a duration of 1000 ms or longer for 10 seconds, an auto-calibration process was triggered for recalibration.Together with the texts, there was another fixation marker and an IA near the bottom right corner of the screen, which was the same size as the previous fixation marker.The coordinates were px.982 -px.700 for text stimuli and px.981 -px.715 for the filler paragraphs.The second fixation marker was also gaze-contingent.However, it was used to trigger the display of the next screen.Automatic recalibration was not triggered for the second fixation marker to avoid limiting the duration of the reading of the participants.If no fixation was detected for 1000 ms or longer in the IA of the fixation marker, the experimenter manually displayed the next screen using the keyboard of the host PC (that is, the computer that controls the eye tracker).This action started the automatic recalibration process.Figure 1 illustrates the procedure in one block.The procedure for the silent block and that of the oral block were identical.
For the analyses, each word was marked as an IA using the Use Runtime Word Segment InterestArea function of the Experiment Builder software.IAs for the fixation markers were constructed manually.The IA sets were then reconstructed to include the space before a word within the IA of that word.They were then used in Data Viewer version 3.2.1, the analysis software provided by the eye tracker manufacturer.Since the IAs for the fixation markers shown on the blank screen before the texts were used only during experiments for their gaze-contingent functions, the IAs for those fixation markers were removed from the IA sets for the analyses.However, the IAs for the fixation markers shown together with the texts were preserved in a new IA set to detect and eliminate rereading fixations.

Memory and Familiarity Tests
Two memory tests were performed after the recording sessions (a Corsi Block test and a digit span test).Participants also completed an additional sevenpoint Likert scale familiarity test for target words.The raw scores of the memory tests were included in TURead with the variables names CORSI BLOCK SCORE and DIGIT SPAN SCORE.The mean values of the Corsi scores and the digit span scores are presented in Table 3.
The familiarity test was administered after the recording session to prevent participants from seeing the target words prior to the experiment.Participants were instructed to score their levels of familiarity with the target words on a 7-point Likert scale (1 for I have never heard the word and 7 for I know the meaning of the word ).Our analyses revealed that the stem frequencies (per million) and the familiarity rating scores of the words were significantly correlated, rs = .856,p < .001.The raw scores from the familiarity rating task were included in TURead under the variable name FAMILIARITY RATING.The mean familiarity rating scores are presented in Table 4.

Calibration
Fig. 1 The procedure employed in each block.Each text was followed by a comprehension question, a fixation marker for the next screen, or a break.

Predictability Scores
The predictability scores for the target words were collected from 122 participants (mean age M = 24.01,SD = 4.26, ten participants did not report their year of birth; 82 females, one participant did not report gender), who did not participate in the main experimental recording sessions.A Cloze procedure was used to score sentential predictability [41].Predictability scores were also collected for neighboring words of each target word (i.e., words n-1 and n+1) from a separate group of 70 participants, who made 35 predictions for each word (word n-1: M = 25, SD = 5.94 years old, two participants did not report their birth year; 21 females; word n+1: M = 25.6,SD = 5.07 years old, five participants did not report their year of birth; 25 females).Participants were asked to predict the following word (the target word n, the word n-1, or the word n+1) given the context in the part of the text prior to the target word.
In addition to word predictability, two sets of suffix predictability data were collected from another group of participants, who also participated in neither the data recording sessions nor the word predictability scoring.A total of 110 participants were asked to predict the suffix of one-suffixed target words (M = 21.78,SD = 2.53 years old, seven participants did not report year of birth; 51 females, one participant did not report any gender), while 69 participants were asked to predict two suffixes of two-suffixed target words (M = 22.29, SD = 3.32 years old, three participants did not report birth year; 36 females) given the context in the part of the text prior to the suffix(es) of the target word, including the target word.

The TURead Dataset
In this section, we introduce the variables that provide general information in TURead, about the participants, the experimental blocks, the reading modality and the stimuli texts (see Table 15 in the Appendix for general variables).
We have also defined several variables to allow detailed analyses of the data.These are presented in Table 16 and Table 17 in the Appendix.The full set of variables can be accessed in the online repository.

Eye Movement Data Inspection and Cleansing
Eye movement data were inspected, cleansed, and corrected manually where necessary (e.g., in cases of regular offset errors), as described in this section.Eye movement measures were retrieved by Data Viewer analysis software. 4anual inspection of gaze data revealed two types of calibration problems: (i) all fixations were above or below the lines (the offset problem), (ii) the fixations were upward or downward sloping.They were resolved by (1) selecting all fixations and moving them downward or upward, (2) selecting the fixations belonging to the same line and aligning them using the Drift Correct function of the Data Viewer, or (3) using a combination of ( 1) and ( 2).Fixations were moved only vertically when needed, and no fixations were moved horizontally (i.e., the coordinates of fixations along the X-axis were not updated) according to best practice in the literature [42].When no solutions were applicable to a trial with calibration problems, that specific trial was removed from the analyses.Consequently, a total of 60 trials (of 37,632) were eliminated (0.16%).
In 156 trials, the stimuli were read twice by the participant.These were also removed from the analyses (0.41%).The sum of the partial data loss was 216 trials (0.57%).Data loss statistics by elimination criteria are presented in Table 5.
No further data were eliminated, but the data were labeled to indicate the possibility of further elimination for potential analyses in the future (see Table 18 in the Appendix).

Eye Movement Measures
This section presents the description and data for common eye movement measures in the literature, such as word skipping rates, fixation duration, count and location variables, saccadic amplitude, and reading rate.Eye movement measures were either retrieved from the Data Viewer software or calculated using several variables provided by the software.The full set of eye movement variables in TURead is presented in Table 19 in the Appendix.The following sections present a snapshot of the values for selected variables.

Word Skipping
This section presents the descriptive statistics by condition and by reading modality (oral reading vs. silent reading.) for skipping (Table 6).The stimuli of the present study consisted of 192 texts including one target word each, organized according to their frequencies and lengths in four conditions: Short-Frequent (SF), Short-Infrequent (SI), Long-Frequent (LF), and Long-Infrequent (LI) target words.
In general, the findings show that word skipping is more frequently observed in short words, both in silent reading and oral reading, compared to long words, which is consistent with the findings reported in the literature.

Fixation, Saccades, and Reading Rates
In this section, the eye movement measures are presented in terms of six major variables: Fixation Duration (FD) in terms of First Fixation Duration (FFD), Gaze Duration (GD, also known as First Pass Dwell Time), and Total Fixation Duration (TFD); Saccadic Amplitude (Amp) in terms of the Last Saccade (Last) and the Next Saccade (Next); First Pass Fixation Count (FPFC), First Fixation Location (FFL), Launch Site (LS) and Reading Rate (RR) in Table 7. Fixations after the first fixation on the right bottom fixation marker (that is, rereadings) were removed, except for reading rate calculation.
The findings show that the first pass fixation counts (1) increase as the length of words increases, (2) increase as the frequency of words decreases, and (3) are more frequent in oral reading than in silent reading.Another finding is that the mean first fixation and gaze durations are longer for oral reading than for silent reading.Moreover, the mean first landing positions are slightly to the left of the word center, and the saccade amplitude is approximately seven characters for oral reading, whereas it is about eight characters for silent reading.These findings are largely compatible with the literature on reading research in most languages.

Audio Recording Analysis and the Eye Voice Span Measures
The texts and paragraphs read aloud by participants were recorded as waveform (.wav) audio files, separately for each trial.The start times of the articulation and the end times of the articulation of the target words were manually annotated using the ELAN software [43].The beginnings and ends of the articulations were identified listening to the audio files and marking the wave beginnings in the ELAN interface.The tier sets included one tier for each target word imported into the ELAN file (.eaf) for each participant.ELAN annotations were labeled on those tiers.If a target word was not articulated correctly in a trial (e.g., in case of the utterance of a different word than the written one, reading the target word more than once, or stuttering while reading the target word), the audio file of that trial was not annotated and was removed from the analyses.In total, 92.39% of the audio recording annotations were controlled and refined by a second annotator.The annotations provided time stamps of the start and end of an articulation, which allowed synchronization of articulation times and eye movements.For synchronization, the start times of the audio recording and the first fixation start times were calculated according to the eye tracker time using ( 1) and (2).
The start times for the audio recording were calculated using (1), where A tracker stands for the start time for the audio recording in the eye tracker time.A pc is the start time of the audio recording on the display PC time, which was recorded in a variable defined for this purpose.t pc is the current time of the display PC, recorded in a separate variable.Finally, t tracker is the current eye tracker time when t pc value is updated.The fixation start and end times were provided relative to the trial start time by the Data Viewer software.Therefore, the first fixation start time was calculated relative to the start time of the tracker by (2), where FF tracker is the first fixation start time relative to the start time of the eye tracker, FF trial is the first fixation start time relative to the start time of the trial, and t trial is the start time of the trial relative to the start time of the eye tracker.The variables used to calculate FF tracker were provided by the Data Viewer software.Figure 2 shows an example of a fixation immediately following the target word (n+1).In the example, at the start time of the articulation of the target word jeodinamik 'geodynamics', there is a fixation on the second 'i ' of karakterinin 'of character' at n+1.Accordingly, the EVS value in this example consists of 22 characters and the value of the EVS-word is 1.
Four Eye Voice Span (EVS) measures were included in TURead: (i) The duration between the beginning of the articulation of a word and the first fixation time on the target word, the Fixation Speech Interval, FSI, following the relevant studies on EVS [1], (ii) the distance between the first letter of the target word and the character fixated at the beginning of the articulation of the  target word, in terms of character count (EVS-char), (iii) the distance between the target word and the word fixated at the beginning of the articulation of the target word, in terms of word count (EVS-word), and (iv) the duration of the articulation.
A sample FSI (Fixation Speech Interval) is illustrated in Figure 3.The first fixation on the target word jeodinamik 'geodynamics' starts 2900 ms after the onset of the trial.The articulation of the same word, in this example, starts 3839.54 ms after the onset of the trial.The resulting FSI is 939.54 ms.
The variables related to oral reading and their descriptions are presented in Table 8.
Table 9 shows the values of the EVS measures in TURead, for Short-Infrequent (SI) words, Long-Infrequent (LI) words, Short-Frequent (SF) words, and Long-Frequent (LF) target words.
The findings show that the mean FSI values are higher in oral Turkish reading compared to English and German under all conditions (486 ms for

FSI
The duration between the beginning of the articulation of the target word and the first fixation time on the target word (i.e., fixation speech interval).

EVS-char
The distance between the first letter of the target word and the character that is fixated at the beginning of the articulation of the target word, in terms of character count.

EVS-word
The eye voice span in terms of word count.

Articulation Duration
The duration of the articulation of the target word.English [1], 561 ms for German [4]).However, the values are close to the FSI values reported for Finnish (625 ms, [7]).Given that Turkish and Finnish share agglutinating characteristics, the findings are not unexpected.However, the mean values of EVS-char (i.e., the spatial measure of EVS in terms of character count) are shorter in all four conditions compared to the findings reported in the literature (e.g., 15-17 characters in [47]; 16 characters calculated from the first fixation onset in [4]).In Turkish, approximately one more word was viewed during the FSI of short words, while the eyes tended to be on the same word at the beginning of its articulation for long words.We suggest that the discrepancy observed between the EVS measures obtained for Turkish sentence reading and those reported in the literature (except those for Finnish) is a result of the shallow orthography of Turkish.On the other hand, the inflated FSI could be an indicator of increased prelexical phonological processing for languages with shallow orthographies, as suggested in [31], [32].However, these claims require further investigation with cross-linguistic studies.

Prelexical Characteristics
A set of prelexical characteristics were included in TURead, which identified the characteristics of a target word (n), the word prior to the target word (n-1), and the word next to the target word (n+1), including Vowel Harmony, and a set of variables for bigrams and trigrams.These were selected due to the potential impact of the phonological characteristics of Turkish words, particularly vowels.
The Turkish alphabet includes eight vowels, grouped according to the height of the tongue, the roundedness of the lips, and the frontness of the tongue during articulation (Table 10).
Vowel distribution in Turkish words is mostly restricted according to vowel harmony rules.The vowels in the suffixes usually agree with the vowel in the last syllable of the stem to preserve vowel harmony, although there are exceptions (e.g., -ki, one of the frequently used suffixes, also used in the present study).Most of the exceptions to vowel harmony in Turkish are loan words [45].The vowel sequences, allowed according to vowel harmony, are presented in Table 11.
The variable VH (Vowel Harmony) was included in TURead as a categorical variable with two levels, showing whether the rule was broken or not.A respected VH rule was labeled 0, and a broken VH rule was labeled 1.In addition, the number of broken instances was calculated, as presented in the next section.
Further characteristics considered in designing TURead were Trigram Frequency (TF) and Bigram Frequency (BF) of n (the target word), n-1 (the word preceding the target word), and n+1 (the word following the target word).They are assumed to capture the phoneme environment since different pronunciations of phonemes (i.e., allophones) are context dependent.For instance, /h/ is pronounced as a voiceless palatal fricative when it precedes a front vowel.It is also pronounced as a voiceless velar fricative when a back vowel precedes it or as a voiceless glottal fricative when it precedes a back vowel.Sometimes, when it occurs between two identical vowels, it is silent [45].Due to the restrictions of letter clusters at word-initial and word-final positions, trigrams were divided into three subgroups: word-initial, word-final, and between these two.For each group, the frequency values were obtained separately from the BOUN corpus [37], depending on the place of the trigram in the word [46].The average adjacent trigram frequencies were included.Another important restriction regarding word boundaries was captured by the average of word initial and word final unigram frequencies, obtained from the BOUN corpus [46].Both prelexical frequency values were calculated as occurrences per million.Since there were no zero frequency values for prelexical characteristics, Laplace smoothing was not applied.The variables for prelexical characteristics and their descriptions are provided in Table 20 in the Appendix.
The descriptive statistics for the trigram and bigram frequencies and the number of broken vowel harmony instances are presented in the following section, together with the predictability scores and lexical characteristics of the words.

Predictability Scores and Lexical Characteristics
The predictability scores were collected from 122 participants for the target words, 70 participants for the neighboring words (35 for n-1 and 35 for n+1), 110 participants for the suffix of one-suffixed target words and 69 participants for the suffixes of two-suffixed target words.The least data were collected for neighboring words (35 participants).To have balanced data from the participants for analyses that require it, a randomly selected sample set of 35 participant scores were included for target words and suffixes.The predictability scores of 192 target words from 122 participants and that of the selected 35 participants (M = 23.66,SD = 3.89 years old, three participants did not report birth year; 35 females) were not significantly different (F (1,382) = 0.00004, p = .995),which justified the selection of a smaller set as a representative set for the predictability scores.The prediction of the suffix of one-suffixed target words from 110 participants and that of selected 35 participants (M = 22.06, SD = 3.32 years old, four participants did not report birth year; 15 females, one participant did not report any gender) were not significantly different (F (1,126) = 0.366, p = .546),and neither were the prediction of the suffixes of two-suffixed target words from 69 participants and that of selected 35 participants (M = 21.66,SD = 1.54 years old, three participants did not report birth year; 15 females) (F (1,126) = 0.05, p = .824).In addition to the randomly selected sample sets of 35 participant scores for target words and suffixes, predictability scores of all available data were also included in the TURead Dataset.The information of the number of participants that contributed to the predictability scores for each predictability variable in the dataset was indicated in the variable name.For example, there are two variables for word (n) predictability scores such that the variable named p0 122 participants is calculated on the scores of 122 participants and the variable named p0 35 participants is calculated over the scores of 35 participants.For all predictability data, the correct predictions were scored as 1, and the incorrect predictions were scored as 0. The probability (p) of a correct prediction was calculated using (3), where num stands for the number of predictions for each word.

p = number of correct predictions / num
(3) The variables for the predictability calculations and their descriptions are presented in Table 21 in the Appendix.
In addition to word-level and suffix-level predictability, TURead includes further variables that identify the characteristics of a target word (n), the word prior to the target word (n-1), and the word next to the target word (n+1), such as familiarity ratings, word lengths, inflectional suffix counts, stem lengths, word frequencies, stem frequencies, trigram and bigram frequencies, and vowel harmony states.
Surface frequency values were obtained from the BOUN corpus [37].Lexical frequencies per million were calculated using Laplace smoothing applying (4), following previous work on the topic [44] since the data included zero frequency values.
Fpm = ((Count + 1) / (Token + Type)) * 1,000,000 In ( 4), Fpm stands for frequency per million, Count stands for the number of occurrences of a word in the corpus, Token stands for the number of word tokens in the corpus (383,224,629), and Type stands for the number of word types in the corpus (1,337,898).The Fpm values can be back-transformed using the same formula.The variables for the lexical characteristics of the words and their descriptions are presented in Table 22 in the Appendix.Table 12 presents the characteristics of the target words (n).Table 13 presents the characteristics of the words that precede the target word (n-1).Table 14 presents the characteristics of the words that follow the target word (n+1).

Discussion
The present study presents TURead, an eye movement dataset of silent and oral reading in Turkish, with an experimental approach in which the target words were manipulated on the basis of word length, frequency, and number of suffixes.TURead aims to provide empirical data for a diverse set of analyses and oculomotor control models.To provide benchmark data for analyses that address discussions on oculomotor control models, word characteristics variables such as length, frequency, and predictability of target words and neighboring words are provided in the dataset (e.g., [2]).For further analysis of the influences of morphological complexity and phonological processing on eye movements during reading, a set of variables related to morphological complexity (e.g., suffix counts, suffix predictabilities, and stem lengths and frequencies) and prelexical characteristics of target words were also included in TURead to address the potential impact of phonotactics in Turkish ( [35]).Furthermore, the familiarity scores of the target words included in TURead provide data to further investigations of strong vs. weak approaches to early phonological processing ([31]- [33]; e.g., [49]).Turkish, as an agglutinating language with rich morphology and shallow orthography, provides a suitable environment for such investigations.
In addition to mostly studied eye movement measures (e.g., first fixation duration, gaze duration, last saccade amplitude, next saccade amplitude, first fixation location and launch site) for both oral and silent reading, four specific measures of oral reading were included in TURead (i.e., fixation speech interval, eye-voice span in terms of character count, eye-voice span in terms of word count and duration of the articulation).As previous research indicates, the eye-voice span measures reflect the manageable count of items held in memory buffer during reading (e.g., [4], [1]).Together with working memory test scores (i.e., Corsi Block test and digit span test scores), oral reading specific variables provided in TURead could be used in analyses that address memory processes and post-lexical processing involved in reading (e.g., [49]).

Conclusion
This study presented a new eye movement dataset of sentence reading in Turkish.The dataset consists of 192 sentences read by 215 participants in silent and oral modalities.The variables in the dataset were described together with the data collection, data cleaning, and data analysis procedures.The descriptive statistics of the selected variables in both reading modalities were prelexical and lexical characteristics of previous (n-1), target (n), and next words (n+1), and familiarity ratings of words.In addition to the descriptive statistics of the selected oculomotor measures such as fixation durations and saccade amplitudes, we also reported our findings on FSI (fixation speech interval) and EVS (eye voice span) in Turkish reading.We observed that FSI in Turkish is greater than in English and German but close to Finnish, which also has a shallow orthography.The increase in FSI in languages such as Turkish and Finnish may point to the influence of shallow orthography on prelexical phonological processing.We also observed shorter EVS values in Turkish sentence reading compared to previous research.Again, this difference may be explained by the effect of shallow orthography on the working memory buffer [4].More studies are needed in Turkish and other languages with shallow orthography to provide more evidence to support our findings.We believe that TURead will be a valuable and helpful resource for researchers to investigate the interplay between language characteristics and eye movements during reading.

Availability
The files that include datasets, stimulus texts, and variable explanations can be downloaded from TURead: An Eye Movement Dataset of Turkish Reading in Open Science Framework OSF Repository, under the folder TURead files. 5URead dataset is provided as an Excel file, TURead target words.xlsx.The variables for target words as explained above were combined in one Excel file, TURead variables.xlsx.An additional data set that includes eye movement measures of both oral and silent reading for all words was provided for further analyses in another Excel file, TURead all words.xlsx.None of the data or materials for the experiments reported here is available, and none of the experiments was preregistered.
Table 15 General variables common to the participants and the stimuli.

PARTICIPANT
The ID number of each participant.

RECORDING SESSION LABEL
The ID number of each part of an experiment session organized as participant code + part information (p1: part one, p2: part two).

TRIAL INDEX
The order of one trial within each part of an experiment session.

TARGET WORD
The original target word of stimuli texts.

TARGET WORD WITHOUT TURKISH CHAR-ACTERS
Target words in which Turkish characters were replaced such that ç replaced by c, ı replaced by i, g replaced by g, ö replaced by o, ş replaced by s, ü replaced by u.

READING MODALITY
The modality of reading (oral vs. silent).

CONDITION
The condition set for the target word.SI: Short-Infrequent words, LI: Long-Infrequent words, SF: Short-Frequent words, and LF: Long-Frequent words.

IA ID
The order of the target word within the text, by the software as the ordinal ID of the current interest area.

W1 IA ID
The order of the word on the left of the target word (word n-1) within the text, by the software as the ordinal ID of the current interest area.

W2 IA ID
The order of the word on the left of the target word (word n+1) within the text, by the software as the ordinal ID of the current interest area.

ARTICULATION OF ONSET FIXATION IA ID
The order of the word within the text that was fixated at the onset of the articulation of the target word.

ARTICULATION ONSET FIXATION DURATION
The duration of the fixation at the onset of the articulation of the target word.

ARTICULATION START ET
The beginning of the articulation of the target word according to the eye tracker time.

ARTICULATION END ET
The end of the articulation of the target word according to the eye tracker time.

FFIX ET TIME
The beginning of the first fixation on the target word relative to the eye tracker start time.

IA FIRST FIXATION TIME
The beginning of the first fixation on the target word relative to the beginning of the trial (i.e., TRIAL START TIME).
TRIAL START TIME The start time of the trial since the tracker was activated.
IA SECOND FIXA-TION X The horizontal position of the second fixation in pixels.

IA THIRD FIXA-TION X
The horizontal position of the third fixation in pixels.

IA FIRST FIXA-TION X
The horizontal position of the first fixation in pixels.

IA FIRST FIXATION INDEX
The order of the first fixation on the target word.

IA FIRST RUN LAST FIX INDEX
The order of the last fixation on the target word in the first pass.

IA FIRST RUN LAST FIX X
The horizontal position of the last fixation on the target word in the first pass.

IA FIRST RUN NEXT FIX OF LAST FIX IA ID
The order of the word within the text that was fixated immediately after the last fixation on the target word in the first pass.

IA FIRST RUN NEXT FIX OF LAST FIX X
The horizontal position of the fixation that was made immediately after the last fixation on the target word in the first pass.

IA FIRST RUN PREVIOUS FIX OF FIRST FIX IA ID
The order of the word within the text that was fixated prior to the first fixation on the target word.

IA FIRST RUN PREVIOUS FIX OF FIRST FIX X
The horizontal position of the fixation that was made prior to the first fixation on the target word in the first pass.

IS FIRST RUN NEXT FIX OF LAST FIX IA BOTTOM
The vertical position of the bottom edge of the interest area of the word that was fixated immediately after the last fixation on the target word in the first pass.

IA FIRST RUN PREVIOUS FIX OF FIRST FIX IA BOT-TOM
The vertical position of the bottom edge of the interest area of the word that was fixated prior to the first fixation on the target word in the first pass.
Table 17 The variables for further analyses (cont'd).

Variable Description
TARGET IA BOT-TOM The vertical position of the bottom edge of the interest area (IA) of the target word.
TARGET IA LEFT The horizontal position of the left edge of the IA of the target word.

TARGET IA RIGHT
The horizontal position of the right edge of the IA of the target word.

TARGET IA TOP
The vertical position of the top edge of the IA of the target word.

AUDIO RECORD-ING START TIME
The beginning of the audio recording according to the display PC clock.The variable was used to retrieve the time when the audio recording starts for synchronizing eye movements with audio recordings.

CURRENT DIS-PLAY PC TIME
The current time on the display PC clock when the CURRENT EYE TRACKER TIME value is updated.The variable was used for synchronizing eye movements with audio recordings.

CURRENT EYE TRACKER TIME
The current time on the eye tracker clock.The variable was used for synchronizing eye movements with audio recordings.

DISPLAY ONSET TIME
The onset time of the stimulus according to the display PC clock.

IP END TIME
The end time of the interest period.Interest period is the period between the stimulus appearance and disappearance.

DISPLAY ONSET ET
The onset time of the stimulus according to the eye tracker time.

DURATION TO CHANGE SCREEN
The duration that was required for the change of the screen during a trial which is 1000 ms.That was set by an eye contingent IA for a fixation marker at the right-bottom of the screen appeared together with the stimulus.

READING TIME MIN
The reading duration of the stimulus text in minutes.It was calculated by the subtraction of DURATION TO CHANGE SCREEN from the time interval between DISPLAY ONSET ET and IP END TIME.

WORD COUNT
The number of words in the stimuli texts.
Table 19 Eye movement variables.

Variable Description
IA SKIP Skipped words marked as 1, by the software.

IA FIRST FIXA-TION DURATION
The duration of the first fixation on the word in the first pass, without considering whether a higher-ID interest area (IA) at the right was fixated before, by the software.

IA FIRST RUN DWELL TIME
The sum of the fixation durations on the word in the first pass, without considering whether a higher-ID IA was fixated before, by the software.

IA FIRST RUN FIXATION COUNT
The number of fixations on the word in the first pass, without considering whether a higher-ID IA was fixated before, by the software.

IA FIRST RUN LANDING POSI-TION
The first landing position on the word, by the software.

IA FIRST RUN LAUNCH SITE
The distance of the fixation preceding the first fixation to the word (launch site), by the software.

IA FIRST RUN LANDING POSI-TION IN CHAR-ACTER COUNT
The first landing position on the word in terms of character count (the space preceding the word as 1), calculated manually.

IA FIRST RUN LAUNCH SITE IN CHARACTER COUNT
The distance between the previous fixation to the word in terms of character count (the space preceding the word as 1), calculated manually.

OSA
The distance between the last and fixation on the word (outgoing saccade amplitude).

OSA IN CHARAC-TER COUNT
The distance between the last and next fixation on the word in character count, calculated manually.

ISA
The distance between the preceding and current fixation on the word (incoming saccade amplitude).

ISA IN CHARAC-TER COUNT
The distance between the preceding and current fixation on the word in character count, calculated manually.

IA SECOND FIXA-TION DURATION
The duration of the second fixation on the word, by the software.

IA SECOND FIXA-TION RUN
The order of the run in which the second fixation was made (1 indicates it was made before leaving the word), by the software.

IA THIRD FIXA-TION DURATION
The duration of the third fixation on the word, by the software.

IA THIRD FIXA-TION RUN
The order of the run in which the third fixation was made (1 indicates it was made before leaving the word), by the software.

TOTAL FIXA-TION DURATION
The sum of the fixation durations on the word, calculated manually by removing fixations after the first fixation on the marker.

IA REGRESSION OUT
Whether regression(s) were made from the current IA to earlier IAs prior to leaving it in a forward direction.1 if there is at least one regressive saccade from the IA, 0 otherwise, by the software.

IA REGRESSION OUT COUNT
The number of regressions from the current IA to earlier IAs prior to leaving that IA in a forward direction, by the software.

IA REGRESSION PATH DURATION
The sum of fixation durations when the current IA is first fixated until the eyes leave the IA in a forward direction, by the software.

REGRESSION IN COUNT
The number of times IA was entered from a higher-ID IA.Calculated manually by removing the fixations made after the first fixation on the marker.

REGRESSION IN
Whether the current IA received at least one regression from higher-ID IAs. 1 if there is at least one regressive saccade from the IA, 0 otherwise, calculated manually by removing the fixations made after the first fixation on the marker.

WPM
Reading rate in words per minute.
Table 22 The variables that identify the lexical characteristics of the words.(*The frequency per million values are the transformed values according to the Laplace smoothing method [44]).

WL0.raw
The length of the target word in terms of character count.

SC0
The count of the inflectional suffixes of the target word.

SL0.raw
The length of the target stem in terms of character count.

SF0 pm
The surface frequency per million* of the target stem.

WF0 pm
The surface frequency per million* of the target word.

WL1 raw
The length of the word n-1 in terms of character count.

SC1
The count of the inflectional suffixes of the word n-1.

SL1.raw
The length of the stem of word n-1 in terms of character count.

SF1 pm
The surface frequency per million* of the stem of word n-1.

WF1 pm
The surface frequency per million* of the word n-1.

WL2 raw
The length of the word n+1 in terms of character count.

SC2
The count of the inflectional suffixes of the word n+1.

SL2.raw
The length of the stem of word n+1 in terms of character count.

SF2 pm
The surface frequency per million* of the stem of word n+1.

WF2 pm
The surface frequency per million* of the word n+1.

Fig. 3 A
Fig. 3 A sample Eye Voice Span (EVS) in time interval (FSI).

Table 1
Mean surface frequency values of target words by condition and ANOVA results of the difference between mean frequency values between length conditions.Values in parentheses represent standard deviations.

Table 2
Word and character counts of the stimuli texts.Values in parentheses represent standard deviations.

Table 3
Mean scores and standard deviations of the digit span and Corsi tests (standard deviations presented in parentheses).The mean scores of the Corsi test in the table presents data obtained from 195 participants due to a recording error in one participant.

Table 4
Mean familiarity ratings for target words (standard deviations presented in parentheses).

Table 5
Eliminated data based on eye movement measures and articulation-related criteria.

Table 6
Number and percentage of skipped and fixated words in oral reading and silent reading, for Short-Frequent (SF), Short-Infrequent (SI), Long-Frequent (LF), and Long-Infrequent (LI) target words.

Table 7
Fixations, saccades, and reading rates in oral and silent reading (standard deviations in parentheses).FD: Fixation Duration, FFD: First Fixation Duration (FFD), GD: Gaze Duration (aka.First Pass Dwell Time), TFD: Total Fixation Duration, AMP: Saccadic Amplitude, FPFC: First Pass Fixation Count, FFL: First Fixation Location, LS: Launch Site RR: Reading Rate.Duration values are expressed in milliseconds, amplitude values are expressed in characters, and RR values in wmp (words per minute).

Table 8
Variables related to oral reading.

Table 9
Eye Voice Span (EVS) measures in oral reading.

Table 10
Vowels in Turkish.

Table 11
Vowel sequences allowed according to vowel harmony.

Table 12
Characteristics of the target word (n).The values in parentheses show the standard deviations.The frequency values in the table are log-transformed (base 10).p: participants

Table 13
Characteristics of the words that precede the target word (n-1).The values in parentheses show the standard deviations.The frequency values in the table are logtransformed (base 10).

Table 14
The characteristics of the words that follow the target word (n+1).The values in parentheses show the standard deviations.The frequency values in the table are logtransformed (base 10).