The results of earlier studies led to the assumption that individuals on the autism spectrum (AS) answered reading comprehension questions less accurately because they exhibited longer eye-gaze fixations during text reading than typically developing (TD) individuals. This finding led the same researchers to conclude that text processing was more demanding and difficult for those on the AS than it was for TD individuals (e.g. Minshew & Goldstein, 1998; Sansosti et al., 2013). This conclusion may have been influenced by the results from studies of readers with dyslexia in which longer mean fixation durations and reading difficulties were to be correlated (Rayner et al., 2006).

In contrast, later studies indicated that, despite longer mean fixation durations during reading, individuals on the AS were observed to have been as accurate in answering reading comprehension questions as TD individuals (Howard et al., 2016; Micai et al., 2017) when matched on word reading and language skills. For example, Howard et al. (2016) compared the lexical, syntactic, and semantic processing during the reading of both individuals on the AS and TD individuals and found individuals on the AS to have longer reading times overall yet similar accuracy to TD participants on measures of comprehension. Unlike earlier researchers (e.g. Minshew & Goldstein, 1998; Sansosti et al., 2013), who assumed that longer mean fixations were associated with more difficult and demanding processing, Howard et al. (2016) proposed that longer mean fixations to text reflected a “cautious” style of reading rather than difficulty comprehending the text itself which led to the conclusion that this behavior was characteristic of readers on the AS. Micai et al. (2017) confirmed the Howard et al. (2016) finding that both autistic and TD individuals were as accurate at answering comprehension questions even though the participants on the AS displayed longer fixations and more regressions to the target word during reading. More recently, non-invasive eye-tracking has been used in these later studies to examine eye-gaze movements by providing accurate, moment-to-moment information (Frank et al., 2012; Klin et al., 2002; Wilkinson & Mitchell, 2014) compared to earlier studies when eye-tracking involved attaching equipment to the user’s head, restricting their head movement.

Significantly, while eye-tracking has been used to assess eye-gaze during reading, it has not been used to measure eye-gaze behavior during the time when the individual is seeking the answer within text (hereafter referred to as question-answering): a potentially critical element missing from this area of research and unique to the present study. Although reading comprehension has often been defined as accurate text recall, without reference back to text, recent evidence suggests that reading comprehension is also determined by how the information is retrieved (i.e. eye-gaze during question-answering; Drysdale, 2021).

Accordingly, the present study was designed to investigate the eye-gaze behavior of students on the autism spectrum during question-answering to further an understanding of the possible links between eye-gaze and text comprehension. To this end, two research questions were formed. One, would participants on the AS exhibit longer mean fixation times during question-answering than TD individuals? Two, would participants on the AS display similar accuracy when answering comprehension questions to TD individuals?

Method

Participants

Following ethics approval and informed parental consent, participants were selected based on their scores on the Progressive Achievement Tests in Reading (PAT-R), an Australian-based test of reading comprehension and word knowledge (see https://www.australiaeducation.info/Tests/K12-Standardized-Tests/reading.html). Two students with a clinical diagnosis of autism were recruited having been diagnosed by either a psychologist, paediatrician, or both. Students who were considered to be typically developing and of similar age, grade level, gender, word reading, and language skills were included as a form of comparison. Word reading and language skills were measured by the Total Reading Composite Score on the Woodcock Reading Mastery Tests—Third Edition (WRMT-III; Woodcock, 2011) pre-assessment. The descriptive statistics for participants are presented in Table 1.

Table 1 Participant demographics and WRMT-III pre-testing scores

Procedures

Assessment

Five minutes of each assessment session was spent familiarising participants with eye-tracking equipment, followed by the 90-min WRMT-III assessment which was conducted to eliminate any participants with a severe reading deficit. Four participants met the inclusion criteria.

Eye-Tracking

During calibration and reading tasks, stimuli were presented by the laptop installed with Tobii Studio and viewed by participants via the connected monitor. Binocular eye movements were recorded remotely, meaning no contact was made between the participant and eye-tracker. Each participant sat in a chair with adjustable seat height to ensure the top of the monitor was aligned with the top of the participant’s head. The chair was positioned approximately 65 cm away from the front of the monitor. Before starting the experimental task, participants undertook a five-point calibration procedure. This involved participants following a dot with their gaze as it moved throughout the screen and stopped at five locations.

Reading tasks were completed in a single session of 50 min duration. These sessions were video-recorded, allowing recordings to be re-watched if researchers failed to code answers in vivo. Participants were instructed to read the entire text out loud from start to finish. The Reading Phase immediately followed calibration via a button press from the researcher and began the moment the text stimuli were presented on the monitor. This phase ended when the participant finished reading the last word of the passage, which prompted the researcher, via button press, to switch to a screen containing a black background and white fixation cross to prevent re-reading occurring during the Reading Phase.

Participants were instructed to answer a series of questions provided by the researcher, referring back to the text if needed. To begin the Question-Answering Phase, the researcher switched to the text screen. Participants were encouraged with a verbal prompt to guess the answer if they indicated they were unsure, or had not responded within 1 min. The researcher ended the task via a final button press once all questions were answered.

Measures

All testing sessions occurred in a small room containing two chairs and a table designed to minimise distractions.

Woodcock Reading Mastery Tests—Third Edition (WRMT-III)

Form A from the WRMT-III was used to test Basic Skills, Reading Comprehension, Listening Comprehension, and Oral Reading Fluency leading to a composite score of Total Reading ability.

Eye-Tracking Measures

Eye-movement data was collected using the Tobii X2-30, a remote-based eye-tracker that utilises pupil centre/corneal reflection technique (Tobii Technology AB, 2014). The X2-30 is suited to young children who may move their heads during testing as it maintains tracking activity within a 20″ × 14″ area (width × height). The X2-30 was interfaced with a Dell laptop installed with Tobii Studio v 3.4.6, a software program used to control the depiction of text stimuli to participants via a 1080 resolution 24″ monitor (19 × 20). The X2-30 was attached to the bottom of the viewing monitor via a magnetic bracket which held it in place. The front of the eye-tracker was positioned at a 45° angle so as to point directly at participants’ eyes. A Logitech HD Pro Webcam C920 was also connected to the top of the viewing monitor to record the participants’ faces during eye-tracking tasks.

Texts

Four texts were included, two from the Grades 3–4 level of the Wechsler Individual Achievement Test—Second Edition (WIAT-II; Wechsler, 2005), and two created by the research team. Created texts underwent proofreading by researchers, who evaluated each for its suitability for analysis. All text was presented in black monospace Century Gothic font (size 12) on white background with 1.5 line spacing, to ensure adequate space between lines so that ambiguous fixations (slightly above or below the text line) would not be wrongly attributed to a different line of text. Images were added to resemble WIAT-II texts.

Text difficulty was graded with the Flesch-Kincaid readability formula allowing created texts to be standardised and represent the reading ability of a 9 to 10-year-old child with 3 to 4 years of Australian education. The algorithm used to test grade level readability was (0.39 × average number of words per sentence) + (11.8 × average number of syllables per word) − 15.59. Word count ranged between 78 and 156 (M = 122, SD = 34.1). Text difficulty statistics for texts are presented in Table 2.

Table 2 Text difficulty measures

Questions

The existing questions for the WIAT-II texts were used (Crickets = 6, Good Neighbours = 5 questions) while questions were written for created texts (Poppy the Puppy = 9, Camping Weekend = 7 questions) to assess students’ comprehension. While each text varied in the number of comprehension questions used, the total possible marks per text ranged from 10 to 12 (Crickets = 12, Good Neighbours = 10, Poppy the Puppy = 12, Camping Weekend = 12 marks). The comprehension questions were based on the taxonomy developed by Day and Park (2005), which can be used to develop comprehension questions for texts to help young students better understand what they read, rather than a hierarchical model such as Bloom’s Taxonomy (Bloom, 1956), where lower-order thinking skills must be acquired before higher-order critical thinking skills. Question types included were as follows: Literal (10), refers to an understanding of the straightforward meaning of the text, such as dates, facts, vocabulary, and locations; Reorganise (8), based on a literal understanding of the text, students must use information from various parts of the text and combine them for additional meaning, and; Inference (8), answers are based on material that is in the text but not explicitly stated, involves combining literal understanding of the text with own knowledge and intuitions (Day & Park, 2005). There were a total of 26 questions across all texts. Describing foundational comprehension skills as “lower-level” has led educators to devalue this knowledge (Munzenmaier & Rubin, 2013), yet the purpose of the current study was to explore how students obtain this knowledge through the study of their eye-gaze patterns. Question formats included the following: forced-choice i.e. alternatives, true/false; open-answer; and closed-answer, such as fill-in-the-blank (cloze exercise). Before a question and satisfactory answers were assigned to a story, a consensus of suitability was required from the research team. Answers provided by participants were assigned a mark of 0, 1, or 2, with higher marks reflecting a fuller understanding of the text. For certain forced-choice format questions, a maximum mark of 1 was assigned for a correct answer. Examples of questions and answer criteria are provided in Table 3.

Table 3 Question and answer examples

Data Analyses

Tobii Studio was set to filter out all fixations shorter than 80 ms as this was calculated to be an insufficient time for text processing, and longer than 800 ms, as this was assumed to indicate “mind wandering,” that is, when the reader continued to maintain eye-gaze on a fixed location without processing visual information (Smallwood et al., 2011). Participant eye-gaze was recorded for both phases, with the Question-Answering Phase also divided into scenes, a feature of Tobii Studio, which allowed a recording to be broken up into subsets of the overall recording. The number of scenes equalled the number of comprehension questions, for example, if a participant answered five questions, five scenes were created for that passage. A scene began at the end of a question-utterance by the researcher and finished at the end of the reply-utterance by the participant.

Dependent variables were mean fixation duration and comprehension accuracy. Fixations were defined as the maintenance of eye-gaze for at least 80 ms at a single location; hence, mean fixation duration was calculated as the average length of fixations across texts for a single participant. Comprehension accuracy (Marks awarded/Total possible marks) scores were added and converted into an overall percentage for each participant.

Results

Both participants on the AS demonstrated longer mean fixation durations than the TD participants during question-answering. Despite the longer fixations during question-answering, the comprehension accuracy of the participants on the AS was equal to or superior to those of the TD participants.

Indeed, JG spent longer, on average, on each fixation (250 ms) to text during question-answering than SM (TD) (170 ms). JG made less fixations overall than SM (JG = 2815 fixations; SM = 3170 fixations) and spent a longer time overall in the Question-Answering Phase (JG = 812 ms; SM = 697 ms) than SM. Significantly, JG was equally accurate in answering comprehension questions (JG = 32/46 marks = 70%; SM = 32/46 marks = 70%) despite the difference in eye-gaze behavior.

The participant JC, also spent, on average, longer on each fixation (140 ms) to text during question-answering than participant MI (TD)(110 ms). JC also made fewer fixations overall than MI (JC = 3906 fixations; MI = 6679 fixations); however, he spent a shorter time overall in the Question-Answering Phase than did MI (JC = 822 ms; MI = 1087 ms). Interestingly, JC was more accurate in answering comprehension questions than MI (JC = 38/46 marks = 83%; MI = 34/46 marks = 74%).

Discussion

In this preliminary study, it was observed that while two school-aged children on the autism spectrum exhibited longer mean fixation durations during question-answering, the accuracy of their question-answering remained at least equal to that of the TD participants. Past research has reported associations between longer mean fixations when reading and poor comprehension accuracy (e.g. Minshew & Goldstein, 1998; Sansosti et al., 2013). In contrast, and acknowledging the small sample size of the current study, poor comprehension scores were not observed for participants on the AS who demonstrated longer average fixation durations in the Question-Answering Phase. This finding was in-line with that of Howard et al. (2016) and Micai et al. (2017) who concluded that the reading comprehension accuracy of students on the AS was facilitated by longer mean fixation durations and that such observed behavior may be characteristic of a more cautious search style.