Background

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by challenges in reciprocal social communication behaviours and the presence of restricted and repetitive behaviours [1]. One challenge for ASD children is their ability to accurately engage in joint attention behaviours. Joint attention involves sharing a mutual reference point (e.g., a physical or mental object) with another person, making it an essential part of human social cognition [2]. Joint attention has been extensively investigated in infants, due to its important role in the development of information processing [2] and language [3, 4] (refer to [5, 6] for a review). In the first years of life, typically developing (TD) infants respond to joint attention (RJA) by coordinating their attention with that of their primary caregivers. For example, an infant might respond by following the gaze point of the parent looking at a target object. Although findings are somewhat mixed, it is believed that individuals with ASD are less likely to share attention with another person [7, 8], resulting in various socio-communicative challenges during development [6]. As a result, there is growing interest in identifying robust, reliable and valid biomarkers for determining differences in RJA behaviours between TD children and those with a diagnosis of ASD. The present study employed an eye-tracking paradigm during RJA tasks to examine and quantify differences between preschool children with ASD and a TD control group.

Recent advances in technology have allowed the quantification of different biological and behavioural markers that are useful in ASD research (see [9, 10] for a review). In particular, eye-tracking technology has been used to effectively distinguish ASD children from TD children [11]. In addition, it has been used to investigate differences in visual attention between children with and without a diagnosis of ASD (refer to [12,13,14] for a review). Furthermore, it has been used to quantify RJA behaviours and its construct validity has been established. For example, Navab, et al. [15] showed that eye-tracking measures, such as (1) the standard difference score defined as the number of participants’ first look at the distracter object subtracted from the number of participants’ first look towards the target object and (2) the percentage of accurate gaze shifts, during an RJA task were related to the Early Social Communication Scale (ESCS) distal pointing task where an examiner points at one of the three colourful posters hung on the testing room. Similarly, the accuracy of gaze shifts was found to be correlated with behavioural RJA on the ESCS [16]. Numerous eye-tracking measures have also been found to correlate with various clinical characteristics [17]. For example, the accuracy of gaze shifts correlates with Vineland Adaptive Behaviour Scales – Second Edition (VABS-II) Socialisation scores. [18] The present study investigated the construct validity of eye tracking (as it indexes RJA) and explored the associations between RJA behaviours quantified using eye-tracking and various clinical scores.

Studies that have compared eye tracking performance on RJA tasks across TD and ASD individuals have revealed mixed results, with some studies indicating a range of atypical responses in children with ASD [18,19,20,21] and others showing typical performance [22,23,24,25,26]. Eye-tracking studies investigating RJA behaviours in ASD individuals are summarised in Table 1. Variations in participant age, communicative content of the scene used, the inclusion of a face talking [27], emotional intensity [16], initiating direct gaze [28], visibility of the target [29], and the different nature of the stimuli (images or videos) [30] may all have contributed to differences between studies. While previous eye-tracking studies have been mostly consistent in finding that accuracy of gaze shifts is intact in infants with ASD or toddlers later diagnosed to have ASD [31], much less is known about RJA behaviours in older preschool children with ASD and TD children. Earlier research that used retrospective analysis of videos has observed that differences emerge later in development during preschool years [32, 33]. In this regard, accuracy in gaze shift has been found to be reduced in six-year-old children with ASD [18]. This finding has also been observed in adults with ASD [34]. Most RJA studies that included older participants typically have a wide age range. The present study recruited a cohort of ASD and TD participants from 31 to 73 months, the age range where diagnosis and assessment are typically performed [35].

Table 1 Eye-tracking studies exploring response to joint attention (RJA) behaviours in ASD individuals, sorted by age of participants

Various visual stimuli and experimental paradigms have been used in eye-tracking research with children on the spectrum [36]. For example, some eye-tracking studies have used static or dynamic stimuli that had an actor turning his/her head to initiate joint attention, possibly eliminating any confounding effect of either eye gaze or head movement on the ability of participants to respond to a bid for joint attention. Using retrospective video analysis, Presmanes, et al. [37] studied the effects of different attentional cues on RJA and found that there was no difference in the accuracy of gaze shifts between younger siblings of children with ASD and infants in the control group, when combinations of verbal and non-verbal cues were used simultaneously. However, lower accuracy of gaze shifts was found in the younger siblings of ASD children when fewer cues were presented. A previous study showed that gaze following performance increased when pointing with language cues was added [38]. The effect of head movement on the ability of infants to follow gaze has also been recently studied using eye-tracking during a live interaction [22, 39]. One study showed that infants at familial risk for ASD were less likely to follow gaze with the Eyes-Only condition when compared to the Eyes/Head condition [22]. This is contrary to neurotypical infants, whose accuracy of gaze shifts did not vary between those two conditions. While more naturalistic stimuli have been shown to evoke different neural and behavioural responses than pre-recorded stimuli, the former offers poorer experimental control. The present study investigated whether the results seen in naturalistic studies also hold in preschool children during an experimental eye-tracking paradigm.

The present study aimed to identify differences in RJA behaviours between ASD and TD preschool children, building on previous research findings [24, 40]. Two different conditions (Head and Eyes vs. Eyes-Only condition) were therefore included using pre-recorded stimuli to compare RJA behaviours between ASD and TD children. Further correlations between eye-tracking measures and various clinical scores were examined. We hypothesised that this study would independently replicate and extend previous findings that identified differences in RJA behaviours between ASD and TD children. Specifically, we hypothesised that ASD participants would have reduced RJA behaviours when eye gaze alone is used to initiate joint attention, a finding that was observed previously during live interactions with ASD infants [22]. A further hypothesis was that eye-tracking measures would be meaningfully correlated with clinical information, suggesting that eye tracking could be used as a helpful biomarker in clinical assessments of ASD.

Methods

Participants

Participants were children between 31 months and 6 years of age with a confirmed ASD diagnosis (N = 60; 51 males) and TD children (N = 17; 8 males). Thirty-four children (29 males) in the ASD group were recruited from the KU Marcia Burgess Autism Specific Early Learning and Care Centre (ASELCC) and 26 children (22 males) were recruited from a Child Development Unit (CDU) at the Children’s Hospital in Westmead, New South Wales, Australia. TD children were recruited from the KU Children’s Services (CS) preschool in Liverpool, New South Wales, Australia. All ASD participants met the criteria for ASD based on the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) criteria [1] for ASD and the diagnosis was confirmed using the Autism Diagnostic Observation Schedule – Second Edition (ADOS-2) [45]. No specific exclusion criteria were applied for ASD participants. Participants with known neurodevelopmental disorders, significant developmental delays and reported visual/hearing difficulties were excluded from participation in the TD group. No child had any visual acuity problems. This study was approved by the University of New South Wales Human Research Ethics Committee (HC14267). Written informed consent was obtained from the participants’ parents/legally authorised representatives. All methods were carried out in accordance with relevant guidelines and regulations.

Clinical measures

Participants were administered a battery of clinical and behavioural assessments for determining autism symptomatology, developmental skills, and adaptive functioning. These assessments were conducted on-site, either at ASELCC, CS or CDU. The assessments were completed in approximately 2.5 h per participant. This was done as part of a clinical assessment or the intake assessment for entry to an early intervention program and was at times done spread over 2 or 3 sittings or with smaller breaks depending on the capacity of the child.

The Autism Diagnostic Observation Schedule, Second Edition (ADOS-2) is a semi-structured standardised assessment instrument for ASD diagnosis in individuals aged 12 months to adulthood [45]. It is used to quantify autism symptomatology across social interaction, communication, play, repetitive behaviours and imaginative use of materials. Depending on the participant’s age and language ability, the appropriate module of the ADOS-2 was administered by qualified research staff (Module 1: N = 41, Module 2: N = 8, Module 3: N = 11). Higher ADOS scores are indicative of a greater degree of ASD symptomatology.

The Social Communication Questionnaire (SCQ) is a parent-reported screening tool used to quantify autism-specific symptoms [46]. It consists of 40 items and yields a total score and three subscale scores in different domains: Communication, Social Interaction and Restricted Repetitive Behaviour.

The Mullen Scale of Early Learning (MSEL) is a standardised measure of cognitive and motor development [47]. It provides an estimate of verbal and non-verbal abilities of children less than 6 years of age. It yields standardised T Scores, age equivalent scores and raw scores on different subscales: Visual Reception, Fine Motor, Gross Motor, Receptive Language and Expressive Language. Since the Gross Motor subscale was not administered in this study, age equivalent (AE) scores and developmental quotient (DQ) scores on the remaining subscales were used for analysis. DQ scores were calculated by dividing the age equivalent scores by the chronological age and the result was multiplied by 100. Verbal DQ is the average value of the receptive language DQ score and the expressive language DQ score. On the other hand, non-verbal DQ is the average value of the visual reception and fine motor DQ scores. As a result, possible effects of chronological and developmental age are explored. In addition, both verbal and non-verbal DQs were used as a proxy for a measure of intelligence quotient (IQ).

The Vineland Adaptive Behaviour Scales – Second Edition (VABS-II) is a well-established and reliable parent-reported measure of the child’s daily adaptive functioning [48]. It yields an overall composite score and subscale standard scores in the following domains, including Communication, Daily Living Skills, Socialisation, and Motor Skills. All subscale standard scores were used for analysis.

Eye tracking

Apparatus and procedure

Eye-tracking data was collected using the Tobii X2-60 eye tracker and analysed using Tobii Studio software [49]. Each participant entered a quiet room and sat approximately 60 cm in front of a 22″ widescreen monitor with a resolution of 1680 × 1050 pixels. Novel stimuli for assessing response to joint attention were shown at the location of recruitment and took approximately five minutes to finish. To ensure accurate eye tracking, a built-in five-point calibration procedure in Tobii Studio was completed for each participant before administering the task. The calibration procedure required gaze following on an image of an animal paired with auditory cues, starting with the centre of the screen, and moving across the four corners of the screen.

Response to joint attention

Eight videos were presented of a female actor seated behind a table on which two toys were placed, one to each side of the actor. The task was similar to previous studies [15, 18, 24,25,26, 31, 41, 50,51,52] with additional conditions controlling for the actor’s initiation of joint attention (Eyes-Only condition or Head/Eyes condition). Each video consisted of multiple phases (refer to Fig. 1). In the first phase, an animated attention-getter (star) accompanied by a sound covers the actor’s face, attracting the child’s attention. This phase lasted 3.0 s. In the second phase, the animation disappears, and the actor looks directly at the camera and smiles for 3.0 s, engaging the child’s attention. Afterwards, the actor initiates joint attention for 4.0 s by either (a) shifting and holding her eye gaze towards one of the toys (Eyes-Only condition) or (b) shifting and holding her gaze and simultaneously turning her head towards one of the toys (Head/Eyes condition). Finally, the actor shifts her gaze back to the camera for 2.0 s. For each condition, there were four visually similar designs, counterbalanced for placement of the target toy (left or right) to ensure that there was a minimal influence of the participant’s looking preference. As a result, there was a total of 8 trials with four blocks. Each block had two trials where the target object's location was counterbalanced. The experiment started with one block of the same condition followed by another block of the other condition. The experiment finished when 8 videos were displayed. The presentation of conditions was counterbalanced, so half of the participants started the experiment with the Head/Eyes condition while the other half started with the Eyes-Only condition.

Fig. 1
figure 1

Stylised representation of the eye-tracking task, described based on the row number: 1.) an animated attention-getter (star) accompanied by a sound covers the actor’s face, attracting the child’s attention. 2.) the actor initiates joint attention using the Eyes-Only condition. 3.) the actor initiates joint attention using the Head/Eyes condition. 4.) the actor shifts her gaze back to the camera

Eye-tracking measures

Five different eye-tracking measures were adopted from previous studies [15, 50, 53, 54].

The standard difference score was computed by subtracting the frequency with which the participant first looks from the actor to the distracter object from the frequency with which the first look was towards the target object [15, 53, 54]. This is similar to the eye-tracking measure used during a live interaction with the same experimental conditions (Head/Eyes and Eyes-Only conditions) [39].

The percentage of accurate gaze shifts was computed by dividing the number of correct trials (trials where the participant looks towards the same toy that the actor is directing their attention towards) by the total number of valid trials [15, 53].

The restrained standard difference score (RSDS) was computed by dividing the standard difference score by the total number of trials in which the participant looked at either the target or distracter object [15, 50].

The restrained duration difference score (RDDS) was computed by dividing the difference of the total duration (in milliseconds) of all fixations upon the distracter object from the total duration of all fixations upon the target object by the total duration of all fixations upon either object [15, 50].

The response time (RT) was computed by measuring the number of milliseconds between the presentation of the joint attention cue and the onset of the participant’s fixation on the correct target location [53].

Data processing and statistical analysis

Fixations and saccades were identified using the identification velocity threshold (IV-T) filter [55] in Tobii Studio. For the purpose of exclusion criteria, all trials were divided temporally into the four phases described above. Three different areas of interest (AOIs) within each video were defined around the face, target toy and distracter toy. Python scripts were written to extract gaze information around these AOIs in the four phases. Trials were excluded if there was no fixation recorded on the face AOI during the attention-getter and/or the smiling phase, as this indicated that the child was not reliably participating in the task. To be included in the analysis, each participant required at least 1 valid trial (25%) per condition. This resulted in 77 infants being included in the Eyes/Head and Eyes-Only condition comparison, with an average of 6.68 valid trials per participant.

Repeated-measures analysis of variance (ANOVA) was used to investigate differences in eye-tracking across the two groups and the two experimental conditions. Specifically, each eye-tracking dependent variable (gaze, SDS, RSDS, RDDS, RT) was investigated using a 2 × 2 ANOVA with group (ASD vs TD) as a between-subjects variable, and condition (Eyes-Only vs Eyes/Head) as a repeated measures variable. Each ANOVA was subject to a Family-wise error rate of 0.05. In addition, all post hoc analyses were subject to Bonferroni corrections in order to reduce the risk of Type I errors. A separate analysis including age as a covariate was also performed. In addition, another analysis was conducted with a restricted age range. Effect sizes were estimated by partial eta squared (η2; values from 0.01 to 0.06 are considered a small effect, values from 0.06 and 0.14 are considered a medium effect and values above 0.14 are considered a large effect) [25, 56]. Correlations between each eye-tracking measure and each clinical measure were assessed using Spearman’s correlations.

All statistical analysis was performed in IBM SPSS Statistics Version 26.

Results

The demographic information of the participants is shown in Table 2. Children with ASD had a mean age of 4.57 (0.82) years while TD participants had a mean age of 4.61 (0.47) years. There was no significant difference in age between the two groups, F(1,75) = 8.199, p = 0.833. The gender distribution was significantly different between groups with a higher proportion of males in the ASD group, X2(1) = 10.646, p = 0.001. However, performance did not differ between boys and girls in either the Eyes/Head condition, F(1,69) = -0.855, p = 0.395, or the Eyes-Only condition, F(1,69) = -0.836, p = 0.406, across both groups. Bayesian statistics (Independent Samples Normal) analysis was also conducted to confirm this. The most likely difference between the mean gaze accuracy of boys and girls in the ASD group were -0.669 and -0.0567 in the Eyes/Head and Eyes-Only conditions respectively. However, the Bayesian factors (BF) were 3.031 and 3.116 respectively, suggesting the difference in mean performance between boys and girls in the ASD group is not statistically significant. Similarly, the difference in mean performance between boys and girls in the TD group was not significant in either condition (Eyes/Head condition: mean difference in gaze accuracy: -0.0382, BF = 2.882; Eyes-Only condition: mean difference of gaze accuracy:1.609, BF = 1.806).

Table 2 Participant demographic information

There were significant differences in cognitive level as determined using the MSEL and autism features as measured using the SCQ scores between the ASD and TD groups. Because parents did not complete every questionnaire, the sample contained missing data. Missing data were dealt with by case-wise exclusion. The total number of valid trials did not differ between the two groups, for either the Eyes and Head condition (ASD: 3.42(0.70), TD: 3.53(0.72), p = 0.56), the Eyes-Only condition (ASD: 3.42(0.72), TD: 3.56(0.51), p = 0.45), or when data from both conditions were combined (ASD: 6.55 (1.57), TD: 7.19 (0.98), p = 0.13). This suggests that there was no significant difference in the amount of valid data available in the two groups.

A calibration quality assessment was performed to rule out the possibility of eye-tracking data quality as a confounding factor. In this assessment, a toy accompanied by a sound was used to attract the participants’ gaze to the calibration point in the middle of the screen. The mean distance between the detected fixation locations and the calibration point was calculated as a measure of accuracy. A t-test between the groups showed no significant difference between the groups, suggesting that data quality did not differ between the two groups: t(57) = 0.334, p = 0.739, ASD: 46.49 pixels (23.87), TD: 48.76 pixels (19.00).

An additional data quality assessment was performed to determine the overall nature of the visual attention of the participants in both conditions. In particular, the average amount of time spent looking at the stimuli in each condition was computed. There was a significant main effect of condition, F(1,69) = 6.256, p = 0.015. On average, participants spent around 35 s longer in the Eyes/Head condition (6.26(2.47)) than in the Eyes-Only condition (5.68(2.91)). There was no significant main effect of group, F(1,69) = 3.718, p = 0.058, and no significant interaction effect, F(1,69) = 0.00, p = 0.991. These analyses of quality suggest that it is unlikely that differences in data quality and general attention were responsible for clinically meaningful group differences in RJA.

Standard difference score (SDS)

Standard difference score (SDS) refers to the number of participants’ first look at the distracter object subtracted from the number of participants’ first look towards the target object. Hence, a positive (or higher) SDS means that the participant responded to the joint attention cue more frequently (indicative of better joint attention). The ANOVA investigating standard difference scores revealed a main effect of group (as seen in Fig. 2a), with TD children achieving higher SDS, F(1,69) = 11.205, p = 0.001, η2 = 0.140. There was also a main effect of condition, as the performance was better in the Eye/Head condition than in the Eyes-Only condition, F(1,69) = 8.916, p = 0.004, η2 = 0.114. There was no interaction effect between group and condition, F(1,69) = 2.004, p = 0.161, η2 = 0.028. When age was included as a covariate, the groups still differed significantly in terms of SDS, as the TD group reported higher SDS, F(1,68) = 11.272, p = 0.001, η2 = 0.142. There was no significant effect of condition, F(1,68) = 1.399, p = 0.241, η2 = 0.020, no interaction between condition and age, F(1,68) = 0.401, p = 0.529, η2 = 0.006, and no interaction between condition and participant group, F(1,68) = 1.987, p = 0.163, η2 = 0.028.

Fig. 2
figure 2

Different eye-tracking measures of ASD and TD participants in two conditions. Error bars show 95% CI

Accuracy of gaze shifts

The accuracy of gaze shifts was computed by dividing the number of correct trials by the total number of valid trials. The groups significantly differed in terms of the accuracy of gaze shifts (see Fig. 2b), with the TD group demonstrating higher accuracy, F(1,69) = 9.870, p = 0.002, η2 = 0.125. There was a main effect of condition, as accuracy was increased for the Eyes/Head condition, F(1,69) = 14.990, p < 0.00002, η2 = 0.178. There was also a significant interaction effect between condition and group, F(1,69) = 4.391, p = 0.040, η2 = 0.060. Specifically, post hoc, Bonferroni-corrected contrasts showed that gaze accuracy was significantly reduced for the Eyes-Only condition (mean: 0.30, std: 0.29) compared with the Eyes/Head condition (mean:0.57, std: 0.21), for TD participants only, t(16) = 3.922, p = 0.001; critical alpha = 0.025. By contrast, there was no significant difference between the Eyes Only (mean:0.22, std: 0.26) and Eyes/Head conditions (mean:0.30, std: 0.28) for the ASD group, t(53) = 1.753, p = 0.085 (critical alpha = 0.025 given the Bonferroni correction). Importantly, even when adjusted for age, the groups still differed significantly in terms of the accuracy of gaze shifts, F(1,68) = 9.888, p = 0.002, η2 = 0.127. The significant interaction effect between condition and participant group also remained, as described above, F(1,68) = 4.328, p = 0.041, η2 = 0.060. However, there was no significant effect of condition, F(1,68) = 0.548, p = 0.462, η2 = 0.008, and no interaction between condition and age, F(1,68) = 0.000, p = 0.988, η2 = 0.000.

Restrained standard difference score (RSDS)

RSDS was computed by dividing the standard difference score by the total number of trials in which the participant looked at either the target or distracter object. RSDS showed a significant effect of group (see Fig. 2c), with RSDS significantly higher for the TD group, F(1,47) = 7.287, p = 0.010, η2 = 0.134. There was no main effect of condition, F(1,47) = 2.117, p = 1.52, η2 = 0.043, and no interaction effect between group and condition, F(1,47) = 0.125, p = 0.726, η2 = 0.003. When adjusted for age, the groups continued to differ in terms of RSDS, F(1,46) = 7.115, p = 0.011, η2 = 0.134. There was no significant effect of condition, F(1,46) = 0.921, p = 0.342, η2 = 0.020, no interaction between condition and age, F(1,46) = 0.524, p = 0.473, η2 = 0.011, and no interaction effect between condition and participant group, F(1,46) = 0.142, p = 0.708, η2 = 0.003.

Restrained duration difference score (RDDS)

RDDS was computed by dividing the difference in the total duration (in milliseconds) of all fixations on the distracter object from the total duration of all fixations on the target object by the total duration of all fixations upon either object. Hence, positive (or higher) RDDS means that the participant allocated more attention to the target object than the distracter object. There was no main effect of condition (see Fig. 2d), F(1,54) = 0.322, p = 0.573, η2 = 0.006, and no interaction effect between group and condition, F(1,54) = 0.168, p = 0.684, η2 = 0.003. However, the effect of group on RDDS trended towards significance, F(1,54) = 3.612, p = 0.063, η2 = 0.063, suggesting that TD participants allocated more attention to the target than the distracter when compared to ASD participants. When adjusted for age, there were no significant main or interaction effects (highest F = 3.974, p = 0.051, η2 = 0.070).

Response times (RT)

Response Times (RT) measures the speed with which participants were able to correctly look at the target object after responding to a joint attention bid. There was no main effect of condition (refer to Fig. 2e), F(1,45) = 3.093, p = 0.085, η2 = 0.064, and no main effect of group, F(1,45) = 0.139, p = 0.711, η2 = 0.003. However, there was a significant interaction effect between group and condition, F(1,45) = 5.564, p = 0.023, η2 = 0.110. Post-hoc, Bonferroni-corrected contrasts revealed that ASD participants’ response times were slower in the Eyes-Only condition (mean RT = 2.08, std = 1.35) compared to the Eyes/Head condition (mean RT = 1.11, std = 0.78) (t(33) = -3.769, p = 0.001; critical alpha = 0.025). On the other hand, RT did not differ across conditions for the TD participant group, t(12) = 0.399, p = 0.697: Eyes/Head: mean RT = 1.58, std = 0.48 and Eyes-Only: mean RT: 1.43, std = 1.00. These results suggest that ASD children were slower to allocate attention to the target object when only eye gaze information was available compared with when there was a movement of both eye and head directly towards the target object.

Age-restricted analysis

Given that the age range was considerably broader for the ASD group compared to the TD group, all of the above analyses were re-run with a reduced sample of ASD children (N = 33), which acted to match age across the two groups. Specifically, ASD children were excluded from the analysis if their age was not within the age range of the group of TD children (3.96 – 5.41 years). Importantly, the results reported above did not change in terms of statistical significance when the analysis was restricted to this smaller group of participants, thereby demonstrating that our primary results were not driven by the inclusion of a wider age range of ASD participants.

Correlations between eye-tracking measures and clinical information

Correlations between eye-tracking measures and clinical information are shown in Table 3 for the ASD group. The analysis relating RJA eye-tracking variables to different clinical scores showed numerous significant associations for ASD participants. SDS was significantly positively correlated with Visual Reception, Fine Motor, Receptive Language and Expressive Language on MSEL, as well as Communication, Daily Living Skills and Socialisation on VABS. SDS was negatively correlated with Calibrated Severity Score and Social Affect on ADOS as well as SCQ scores.

Table 3 Spearman’s correlations between eye-tracking variables and clinical characteristics in the ASD group. Correlation coefficients (with corresponding p-values) are listed

Accuracy of gaze shifts was found to be correlated to Visual Reception, Fine Motor, Receptive Language, Expressive Language on MSEL and Communication, Daily Living Skills and Socialisation on VABS. Accuracy of gaze shifts was found to be negatively correlated to Calibrated Severity Score and Social Affect on ADOS as well as SCQ scores. RSDS was found to be correlated to Visual Reception, Fine Motor, Receptive Language, Expressive Language on MSEL and Communication and Socialisation on VABS. RSDS and RDDS were found to be negatively correlated to Calibrated Severity Score and Social Affect on ADOS. RDDS was found to be correlated to Visual Reception, Fine Motor, Receptive Language, and Expressive Language on MSEL. RDDS was also found to be negatively correlated to Restricted Repetitive Behaviours on ADOS.

All of the above analyses were re-run with a reduced sample of children from the ASD group (N = 33) to ensure that the age ranges were matched across the two groups (specifically the acceptable age was restricted to 3.96 – 5.41 years). Table 5 shows Spearman’s correlations between eye-tracking measures and clinical information using the age-restricted sample. Most correlations in MSEL and VABS domains were retained even after controlling for the age of the ASD group. Interestingly, the Response time (RT) was found to be negatively correlated with the ADOS severity score after controlling for age. On the other hand, no correlations between eye-tracking measures and clinical characteristics were significant for the TD group (refer to Table 4).

Table 4 Spearman’s correlations between eye-tracking variables and clinical information in the TD group. Correlation coefficients (with corresponding p-values) are reported

To control for the multiple comparisons, Tables 3, 4 and 5 also demonstrate the correlations that remain statistically significant when a much more conservative critical alpha of 0.002 is applied (Bonferroni-corrected critical alpha, to account for at least 14 correlations being computed for each dependent variable). The results suggest that more advanced cognitive and language skills (as measured by the MSEL) were associated with better joint attention skills (as measured by all the eye-tracking measures) in children with ASD. Initial gaze location and accurate gaze location were both positively correlated with adaptive functioning, as measured via the VABS-II. There was no correlation between eye gaze profile and scores in the motor skills domain. Collectively, these findings indicate that more accurate eye-tracking gaze profiles were associated with better early learning and more adaptive functioning. Furthermore, given that Social Communication Questionnaire scores were negatively correlated with eye-tracking measures in children with ASD, suggesting that more severe ASD symptomatology is associated with worse gaze profiles. There were no correlations for the TD group, as shown in Table 4. The correlations for the restricted-age ASD group in Table 5 show similar trends as for the full ASD group in Table 3.

Table 5 Spearman’s correlations between eye-tracking variables and clinical characteristics in the age-restricted ASD group. Correlation coefficients (with corresponding p-values) are reported

Discussion

The primary aim of this study was to examine the utility of an eye-tracking paradigm as a physiological index of RJA behaviours in children with and without a current diagnosis of ASD. Although previous eye-tracking studies during live interaction have shown that eye movement, head movement or both may affect the RJA behaviours of high-risk children [22], most existing eye-tracking studies that used pre-recorded stimuli have not examined this effect. Furthermore, the current study, to the best of our knowledge, is the first to examine this specific effect in a cohort of preschool children aged 3 to 6 years. Previous studies have shown group differences in RJA behaviours in infants [22, 31, 39] and toddlers [25]. Contrary to previous studies with a similar age range [24, 41], our results showed significant differences in RJA behaviours between ASD and TD preschool-aged children in an eye-tracking paradigm using pre-recorded stimuli. Our results support another study that found reduced gaze following accuracy in ASD children [18]. This follows from the literature suggesting that difficulties in RJA behaviours emerge early in life and become progressively evident later in life.

In the present study, gaze accuracy in TD children was more accurate on trials where more information was available (specifically, gaze accuracy was increased on Eyes/Head trials versus Eyes-Only trials). There was only a trend for this same pattern in children with ASD (i.e., a trend for increased accuracy for Eyes/Head trials versus Eyes-Only trials), and the pattern was statistically stronger in TD children. These results partially support other published work where participants with a high risk of ASD or participants with ASD failed to use the information encoded in the eye movements of other people during RJA tasks in a live interaction [22] and pre-recorded stimuli [43]. This finding is also in line with studies suggesting that children on the spectrum pay less attention to eyes [28, 57,58,59,60,61,62,63,64,65] and have difficulties in interpreting eye information [66, 67]. In the current study, neurotypical children’s gaze accuracy was higher when more joint attention information was available to them (i.e., gaze accuracy was improved for the Eyes/Head condition in comparison to the Eyes-Only condition). This suggests that the children without a diagnosis of ASD were better able to utilise various sources of social and interpersonal communicative information.

A reduced ability to engage in joint attention is expected to influence a child’s later development in several domains. For example, during language development, children must be able to associate an object and the relevant word for the object [6]. Therefore, reduced gaze accuracy can negatively influence the ability to learn new words. Joint attention is also important in non-verbal communication and socio-cognitive development. In our study, correlational analyses revealed reliable associations between various eye-tracking measures and clinical information. In line with previous research [18], SDS was positively associated with parent-reported Communication and Socialisation scores in children with ASD. In addition, gaze following accuracy was positively correlated with VABS communication scores. In other words, more accurate gaze profiles were associated with higher social and communication scores. Our study also linked eye-tracking measures to standardised measures of cognition. In particular, MSEL scores were positively correlated with numerous eye-tracking measures in children with ASD. This suggests that children with ASD who have better early learning and more adaptive behaviours (as per parent reports) are more likely to follow or respond to bids of joint attention. In addition, SCQ scores were negatively correlated with different eye-tracking measures in children with ASD. ADOS calibrated severity scores and scores on the social affect scale were also negatively correlated with various eye-tracking measures. Clinically, these findings suggest that more severe symptomatology were associated with less accurate gaze responses to requests for joint attention. Collectively, the correlations provide support for the notion that eye-tracking variables may provide utility as biomarkers for ASD [68]. Of course, it is important to highlight that these cross-sectional, correlational analyses cannot speak to causation.

The finding that clinical measures were consistently correlated with eye-tracking variables, in the direction expected, suggests that the ASD children’s eye-tracking responses are reflective of their functioning and associated difficulties. In this regard, if a pre-schooler fails to follow the non-verbal cues of communication, that behaviour may adversely impact social learning. Joint attention could therefore serve as a target for early intervention programs.

In the current study, there were no significant correlations between eye-tracking data and clinical characteristics for TD children, presumably due to the restricted range of clinical scores for TD children, so that eye tracking is not a useful biological or diagnostic marker (as there is no deficit to detect). The ASD group of children is likely more heterogenous than the TD group. This heterogeneity is a further reason why it is helpful to have a reliable biomarker by which to track specific difficulties, such as in RJA. It is clinically important to keep in mind that all children with ASD will not process information in the same way, and therefore will respond to interventions differently.

The current study differs from a study reported previously [16]. Franchini, et al. [16] included a younger cohort of children with ASD (mean age 2.8 years) and found no significant correlations between gaze following accuracy and clinical measures. However, they reported differences in RJA based on task conditions. In that study, RJA was improved when the stimulus was intense, and when supported by gestural pointing [16]. The findings of Thorup, et al. [22] are similar to our findings. They also found a significant reduction in gaze accuracy of high-risk infants in an Eye-Only condition (during a live eye-tracking interaction). Our results suggest that pre-schoolers with ASD are more likely to respond to joint attention when more visual information is available (although this result must be tempered as it was only trending towards significance). Exaggerating or augmenting content cues might help preschool children with ASD in RJA tasks in the context of early intervention. Given that the mean age of the Franchini, et al. [16] cohort was 2.8 years and our cohort was 4.6 years, it appears that this preschool-age period might provide a good time when gaze following accuracy and other eye-tracking measures could be used as an adjunct in the ASD diagnosis process, with a particular focus on improving our understanding of the underlying mechanisms involved in how early intervention may improve RJA in ASD. Furthermore, the numerous significant associations between physiological eye-tracking measures and clinical severity could potentially help individualise treatment for children with ASD: it is not hard to imagine that future interventions could be individualised based on not only the clinical and behavioural characteristics but also the physiological indices of information processing such as the child’s unique eye-tracking profile.

Limitations

Despite the utility of the current study, there are several limitations to keep in mind. First, there was a gender skew towards males in the ASD group, as would be clinically expected. Nevertheless, further studies with more female participants are required to clarify our results, as differences in autism presentation and diagnosis between males and females have been documented [69]. For example, studies have shown that girls on the spectrum behave similarly to neurotypical boys and girls on certain socially orientated tasks: for example, girls demonstrate enhanced attention to faces during scenes that do not have social interactions [70, 71]. In addition, TD men with high autistic-like traits exhibit worse accuracy of gaze shifts, while TD women have similar eye-gaze following behaviour regardless of autistic-like traits [72]. A follow-up study exploring the contribution of biological sex to joint attention behaviours in ASD is therefore indicated.

Further, the participant groups also differed in sample size, with the ASD group being three times as large as the TD group. The ASD participants in this study were recruited from an ASD-specific centre and there was good uptake to the study. Despite significant efforts of the team to recruit control participants, there was less interest from the families of neurotypical children at the centre to participate in the study, which is probably not surprising given the study is less meaningful for children without a developmental diagnosis.

It is also useful to note that the participant groups were matched on chronological age but not on developmental abilities. This may have accentuated the main results of this study, particularly the observed significant group differences and correlations between different eye-tracking measures and different clinical information in the ASD group. Further studies with larger sample sizes with a developmentally age-matched group are suggested to confirm this finding.

As reported in the Methods, children with ASD were not excluded from the study if they had a comorbid diagnosis. Although this has implications for any strict interpretation of the findings reported here, the inclusion of co-morbid conditions in ASD research is ecologically valid. Indeed, it is rare in clinical practice to encounter a young person who has a ‘pure' autism spectrum diagnosis with no other psychiatric or developmental comorbidities.

Moreover, it is important to consider the limitations due to the pre-recorded nature of the stimuli. In this work, we aimed to determine whether such stimuli can help identify differences in RJA behaviours in ASD and TD preschool children and determine possible correlations between the derived eye-tracking measures and clinical information. The results in this study suggest that differences in certain eye-tracking measures exist in the context of the stimuli used in this study. However, we acknowledge that it is not as ecologically valid as a live interaction task where an actor may exaggerate/augment their cues and even have multiple attempts to initiate joint attention. In comparison, the actor made no exaggerated cues in both the Eyes-Only and Head/Eyes conditions, as illustrated in Fig. 1. Future research should compare the presence and absence of exaggerated and pre-recorded movements in these two conditions for a more ecologically valid scenario.

Finally, given the cross-sectional nature of the study, it is not possible to infer any causative mechanisms. For example, it is not clear whether adaptive functioning may lead to improved social engagement, as reflected by gaze accuracy, or whether the development of gaze accuracy may help improve adaptive behaviours. In addition, it is not clear whether the observed eye-tracking profile is the result of differences in abilities or due to the lack of interest and motivation in engaging in social interactions and following gaze. However, the association between these measures is clinically important. From a clinical perspective, the finding suggests that eye-tracking technology could be used as a biomarker of adaptive functioning in young children, and could potentially be implemented into a diagnostic test battery, or as a measure of treatment progress. This will have implications for targeting the intervention, in terms of skills building versus increasing interest and engagement in social-communicative tasks. Future studies are indicated for future exploration of this issue.

Conclusion

In this study, we found that there are differences in the RJA behaviours between ASD and TD preschool children. In addition, we found that several eye-tracking measures of RJA behaviours in preschool children with ASD are associated with different clinical measures commonly used to diagnose ASD.