Looking But Not Seeing: Atypical Visual Scanning and Recognition of Faces in 2 and 4-Year-Old Children with Autism Spectrum Disorder
- First Online:
- Cite this article as:
- Chawarska, K. & Shic, F. J Autism Dev Disord (2009) 39: 1663. doi:10.1007/s10803-009-0803-7
This study used eye-tracking to examine visual scanning and recognition of faces by 2- and 4-year-old children with autism spectrum disorder (ASD) (N = 44) and typically developing (TD) controls (N = 30). TD toddlers at both age levels scanned and recognized faces similarly. Toddlers with ASD looked increasingly away from faces with age, atypically attended to key features of faces, and were impaired in face recognition. Deficits in recognition were associated with imbalanced attention between key facial features. This study illustrates that face processing in ASD may be affected early and become further compromised with age. We propose that deficits in face processing likely impact the effectiveness of toddlers with ASD as social partners and thus should be targeted for intervention.
KeywordsAutismASDFace scanningFace recognitionToddlers
Autism Spectrum Disorders (ASD) are characterized by marked impairments in social interactions and communication, as well as the presence of repetitive behaviors and restricted interests (APA 1994). In a vast majority of cases, behavioral symptoms become apparent by the age of 2 years (Chawarska et al. 2007; De Giacomo and Fombonne 1998). Although in naturalistic situations, affected toddlers with ASD look at the faces of others, they appear to have difficulties deriving information crucial for social functioning from affective expressions and facial gestures (see Chawarska and Volkmar 2005 for a review). For example, unlike typically developing children they do not seek affective cues from faces in uncertain situations or monitor the gaze direction of others in order to gauge attentional focus. Consistent with findings on older children and adults with ASD (e.g., Blair et al. 2002; Boucher and Lewis 1992; Gepner et al. 1996; Klin et al. 1999; Wallace et al. 2008), 2-year-olds with ASD also appear to have difficulties in encoding facial identity (Chawarska and Volkmar 2007).
While some face processing skills might be preserved in ASD, difficulties in face recognition have been consistently reported (see Golarai et al. 2006 for review). Several not necessarily mutually exclusive hypotheses have been advanced to link impaired face recognition skills with specific aspects of face processing. One account suggests that limited experience-dependent specialization in the face processing system results in an atypical activation pattern in a neural network that typically supports face processing and recognition (see Schultz 2005, for a review). Another account suggests that less effective face recognition is linked with the employment of feature-based rather than holistic face processing strategies (e.g., Joseph and Tanaka 2003; López et al. 2004). Furthermore, considering that the distribution of overt attention within a visual scene reflects an active process of gathering task-relevant information (Hayhoe and Ballard 2005; Henderson 2003; Pierce and Henderson 2007), it has been proposed that the abnormalities in face recognition in ASD might be linked to altered face scanning strategies as indexed by an abnormal spatial distribution of fixations (Pelphrey et al. 2002).
Recent advances in eye tracking research have facilitated direct tests of the hypothesis that face-scanning strategies are abnormal in ASD. The results pertaining to the distribution of visual attention while processing static images of faces, however, are somewhat ambiguous. While several reports suggest that individuals with ASD spend less time examining the eye area of the face compared to typical controls (Boraston et al. 2008; Dalton et al. 2005; Pelphrey et al. 2002; Sterling et al. 2008), there is also considerable evidence to the contrary (Dapretto et al. 2006; van der et al. 2002; Neumann et al. 2006; Rutherford and Towns 2008; Speer et al. 2007). Furthermore, though some studies highlight ASD-specific abnormalities in scanning, such as decreased attention to the internal features of the face (i.e., eyes, nose, and mouth) (Pelphrey et al. 2002), other studies have shown a great deal of similarity between ASD and TD groups, such as the proportionally greater attention to the eye region as compared with the mouth or nose area (e.g., Boraston et al. 2008; Dalton et al. 2005; Rutherford and Towns 2008; Sterling et al. 2008). Taken together, the evidence regarding abnormal scanning patterns of static faces in older children and adults with ASD remains conflicted. These discrepancies might be associated with methodological differences between the studies (e.g., employment of implicit vs. explicit recognition tasks; employment of different types of experimental stimuli, i.e., static vs. dynamic, affective vs. neutral, familiar vs. novel), as well as result from the intrinsic heterogeneity of the ASD population (e.g., selection of only high functioning individuals with autistic disorder as compared to selection from a range of autism spectrum disorders such as autism, PDD-NOS, and Asperger syndrome). Furthermore, prior studies have primarily focused on adults, tapping the developmental endpoints of face processing without addressing potential developmental changes resulting from an interaction between pathogenic factors responsible for autism and the effects of atypical social experiences that accumulate from early infancy onwards. For this reason, it is particularly important to consider face recognition and scanning in infants and toddlers.
The goal of the present study was to examine the visual scanning and recognition of faces in 2- and 4-year-old children with ASD and typically developing (TD) age-matched controls. Though the evidence is still fairly limited, impairments in face recognition have been reported in toddlers with ASD as young as 2 years of age (Chawarska and Volkmar 2007). However, there are no reports on face scanning in the context of face recognition tasks either in typically or atypically developing toddlers. We hypothesized that children with ASD: (a) would distribute their attention between facial features differently than TD children and that (b) these differences would be associated with their performance during the recognition phase of the experiment. Considering reports (e.g., Langdell 1978) suggesting that some aspects of face processing might worsen as children with ASD grow older, we also expected that the spatial distribution of attention during familiarization and recognition performance would be more atypical in older children with ASD as compared to younger toddlers with ASD. We also hypothesized that when tested in the recognition phase, typically developing children will look longer at the novel stimulus and their novelty preference ratio would be greater than chance. In contrast, we anticipated that children with ASD would show no clear evidence for recognition of previously familiarized faces.
Chronological age (months)
Nonverbal DQ (NVDQ)
Nonverbal MA (months)
Verbal DQ (VDQ)
Social affect (SA)
Restricted repetitive behaviors (RRB)
An additional four toddlers were tested but excluded from analysis due to negative affect preventing them from completing the procedure or technical errors (TD: n = 1, ASD: n = 3). Four more toddlers who were tested at the time of their first diagnosis were excluded from the sample, as on their follow-up visit they no longer met full criteria for ASD. All children were born after 32 weeks gestation, suffered no major prenatal or perinatal insults, and had no known visual or auditory abnormalities. About 84% of the children were Caucasian, with the remainder of the sample representing African–American, Asian, or a mixed ethnic heritage.
The following measures were derived to capture the overall level of functioning in the ASD group: (a) Social Affective (SA) score (Gotham et al. 2007), indicating the degree of autistic symptoms in socialization and communication, based on ADOS-G Modules 1 and 2 (Lord et al. 2000); (b) Stereotyped and Repetitive Behaviors (SRB) scores (Gotham et al. 2007), also based on ADOS-G Module 1 and 2; (c) nonverbal developmental quotient (NVDQ) score, obtained by combining visual reception and fine motor scales from the Mullen Scales (Mullen 1995); and (d) verbal developmental quotient (VDQ) score, obtained by combining receptive and expressive language scales from the Mullen Scales (see Table 1). Typical control status was confirmed though direct observations during a structured play session, assessment of their nonverbal cognitive skills, as well as a parent interview focused on medical and developmental history. The ASD and TD samples were matched on chronological age (CA), but not on the nonverbal level of functioning (NVDQ). The two ASD age groups were comparable in terms of their level of verbal and nonverbal cognitive functioning (VDQ, NVDQ) as well as their severity of autistic symptoms (SA and RRB scores).
Materials and Apparatus
Apparatus. Gaze trajectories were recorded at a sampling rate of 60 Hz using a SensoMotoric Instruments iView X™ RED eye tracking system (SensoMotoric Instruments 2005). Eye-tracking data were processed using custom software written in Matlab (Mathworks Inc. 2004). This software accommodated standard techniques for processing eye-tracking data including blink detection, data calibration, recalibration, and Region of Interest (ROI) analysis (Duchowski 2003; Shic 2008). A parallel set of analyses was also conducted using an analysis package, BeGaze (SensoMotoric Instruments, 2005), provided by the eye-tracker manufacturer, and were verified to lead to results analogous to those presented in this paper. Statistical analyses and data reduction were accomplished through programs written in SAS (SAS Institute Inc. 2004).
Face scanning and recognition were tested using the fixed-level Visual Paired Comparison (VPC) paradigm (Fantz 1964; see Pascalis and De Haan 2003 for review). The VPC is designed to test visual discrimination and recognition memory in a context of incidental learning and has been used successfully in human and nonhuman primates. Each trial consists of a presentation of a stimulus for a predetermined amount of time (familiarization phase), followed by a simultaneous presentation of the familiarized stimulus with a novel stimulus from the same class (recognition phase). In the VPC paradigm, facial recognition can be inferred from a significantly longer looking time either at the novel (novelty preference) or familiar (familiarity preference) stimulus (Kaplan et al. 1990; see also Pascalis and de Haan 2003 for review). Typically, when the familiar stimulus is completely encoded, a child will prefer to look at a new rather than old stimulus. However, in some situations, a child might display a preference for a familiar stimulus, which has typically been interpreted as an index of slower or less effective information processing, leading to partial encoding or formation of the stimulus representation (Richards 1997; Rose et al. 1981).
To examine the spatial distribution of attention the stimuli were divided into five regions of interests (ROIs): eyes, mouth, nose, outer (hair, cheeks, and forehead), and non Face (neck, body, background, screen) (see Fig. 1). Dependent variables in the familiarization phase consisted of (a) the average amount of time required on each trial to reach the 10 s familiarization criterion, (b) the total number of valid trials, and (c) percentage of time spent scanning each ROI (i.e., the total time spent scanning an ROI divided by the total time spent scanning the entire scene) averaged over trials. In the recognition phase we recorded (d) the total duration spent examining the familiar and novel stimuli, and also computed (e) a novelty preference ratio (total fixation time on the novel stimulus divided by the total amount of time examining both familiar and novel stimuli). Analysis of ROIs was conducted in two stages. First, we examined the distribution of attention between Outer Face (hair, cheeks, and forehead) features, Inner Face features (eyes, mouth, and nose combined), and Non Face (neck, body, background, screen) features. Second, we explored attention to the eyes, nose, and mouth. In this second exploration we also considered an average eye-to-mouth (EM) ratio (eyes/eyes + mouth).
Statistical analysis. Distributions of all variables were examined prior to the analysis for outliers. Primary hypotheses regarding between-group differences were tested using an analysis of variance approach with the SAS Proc GLM procedure. Significant effects were followed up with post-hoc tests using a Tukey–Kramer adjustment for multiple comparisons. Associations between scanning patterns during familiarization and performance during the Recognition test in the ASD group were examined using Pearson’s r correlation coefficient analysis partialled for age as well as verbal and nonverbal DQ.
Valid trials and eye tracking data. Each child was presented with six trials. Trials were excluded from the analysis either due to technical errors (faulty calibration, file recording error, or software or hardware malfunction), or due to subject inattention or excessive subject movement. A 2 (diagnosis) × 2 (age) between-group ANOVA indicated no differences in the number of valid trials with regard to diagnosis or age (ASD: M = 5.1, SD = 1.2; TD: M = 5.2, SD = 1.1). On average, children with ASD contributed 8.18 s (SD = 1.5) out of a maximum 10 s of valid eye tracking data at Age 1 and 7.69 s (SD = 2.3) at Age 2. In the TD sample these values were 7.05 s (SD = 1.7) and 8.59 (SD = 2.5) in groups Age 1 and Age 2, respectively. A diagnosis × age ANOVA revealed no significant differences in the total amount of eye-tracking data between groups. The discrepancy between the amount of valid eye-tracking data and the maximum possible time in both groups (10 s) is due to a combination of subject blinks and instabilities in eye-tracking as a result of subject motion; these effects are within the expected range for our experimental methods.
Visual attention to faces. First we examined whether the groups differed in their ability to regulate attention in response to the facial stimuli during familiarization. A diagnosis (2) × age (2) ANOVA on the total time required to reach the 10 s familiarization criterion revealed no significant effects: younger children with ASD required on average 13.72 (SD = 2.1) seconds to reach the criterion as compared to 15.01 s (SD = 3.3) in TD toddlers. In the older group, the average was 14.71 s (SD = 3.4) for ASD and 14.78 s (SD = 4.7) for TD children. These results suggest that, in the most general terms, there were no differences between groups in their ability to engage attention in the exploration of the visual scene during the familiarization. Note that the visual scene in this case consisted of the entire display, including the face stimulus as well as the background and the screen area.
Mean (SD) total time (milliseconds) spent examining novel and familiar stimulus during recognition phase
Age group 1 M (SD)
Age group 2 M (SD)
Partialled (CA, VDQ, NVDQ) Pearson’s r correlation coefficients: association between face scanning during familiarization and recognition measures in ASD group
Novelty preference ratio
Inner face (%)
Outer face (%)
Non face (%)
EM ratio (%)
Eye ratio (%)
Mouth ratio (%)
Nose ratio (%)
The results of the study suggest that both 2 and 4-Year-Olds with ASD attended to visual scenes containing faces to a similar extent. As our study illustrates, however, simply looking does not necessarily imply that information regarding facial identity is accessed and processed effectively. The results suggest that: (a) compared to age-matched TD controls, young children with ASD dwelled more on the external facial features; (b) their attention to internal facial features declined between 2 and 4 years; (c) regardless of age, children with ASD distributed attention between key features of the face (eyes and mouth) differently than their CA-matched TD controls; (d) young children with ASD were less effective in encoding facial identity than CA-matched TD controls; and (e) their atypical scanning patterns were associated with limited efficacy of facial identity encoding. We will discuss each of these findings in turn.
Attention to internal and external facial features. The 2- and 4-year-old typically developing children did not differ on any of the performance measures suggesting that, within this age range, no major changes in face scanning strategies are to be expected in this population. Regardless of age, children with ASD spent more time examining external facial features (i.e., hair, cheeks, and forehead) than TD controls. Moreover, their amount of attention to inner facial features was affected by age. While 2-year-olds with ASD spent a similar proportion of their active scanning time attending to inner facial features as their TD peers, the 4-year-olds with ASD spent significant less time attending to the eyes, nose, and mouth and more time looking away from the face (i.e., at the body, background and screen) compared to both TD groups and their younger counterparts with ASD. Worsening of other aspects of face processing with age has been reported in other studies as well. For instance, Langdell (1978) reported that the inversion effect was less pronounced in 14 Year-Olds with autism as compared with 8 Year-Olds with autism, with the older group exhibiting fewer recognition errors in a condition where faces were inverted, while 8 Year-Olds were as impacted by inversion as their typically developing and delayed controls. Thus, it is possible that in young children with ASD we might observe a progressive divergence from typical developmental trajectories in performance on face processing tasks.
Scanning patterns during familiarization. Perusal of Fig. 4 suggests that children with ASD spent more time looking at the eyes than any other part of the static images of faces. These results are similar to results found in older children and adults with ASD (Boraston et al. 2008; Dalton et al. 2005; Rutherford and Towns 2008; Sterling et al. 2008), in typical adults (Heisz and Shore 2008; Henderson et al. 2005), and in very young typically developing infants (Gallay et al. 2006; Maurer and Salapatek 1976). Children with ASD also spent a similar proportion of time examining the eyes and nose areas as their typically developing peers, a finding similar to those reported in older children with ASD (Dapretto et al. 2006; Neumann et al. 2006; van der Geest et al. 2002; Rutherford and Towns 2008; Speer et al. 2007). However, unlike the typical controls, young children with ASD allocated less attention to another key feature of the face, namely, the mouth.
Considering that one of the key features of their clinical presentation is atypical face monitoring and infrequent eye contact, it is not clear what mechanism is responsible for the relatively good attention to the eyes in static images of faces in young children with ASD. While it is plausible that scanning the eye area in this group is driven by a similar mechanism as that observed in typically developing children, alternatively, in ASD, interest in the eyes might be associated with heightened sensitivity to low-level perceptual features. A recent study in our lab suggested that compared to typically developing children, toddlers with ASD attended more to high contrast areas of the visual scenes than typical controls (Shic et al. 2007). In this context, the eyes might be salient for the ASD group due to perceptual rather than the semantic factors as purported to be in operation in the unaffected group. In other words, while typical toddlers might be focusing on the eyes as a potential source of information regarding identity and affective expression, toddlers with ASD might be attracted to the contrast created by the sclera and iris. A direct test of this hypothesis, however, awaits completion.
Quite striking in this context is the finding of decreased attention to the mouth area in children with ASD as compared to their typically developing peers. In typical development interest in the eyes and the mouth changes over time as the initial perceptual biases observed already in newborns interact with increasing social cognition and language skills in the developing infant. Exclusive focus on the eyes with respect to the mouth has been reported in typical 2–3 month-olds (e.g., Hainline 1978; Maurer and Salapatek 1976), even in response to a mother’s face speaking to the child (Haith et al. 1977). The latter finding suggests that at this age the eyes win the competition for the baby’s attention even though the mouth area is rich in perceptual cues (e.g., motion and motion contingent sound), which has been interpreted as an index of the emerging intersubjectivity in the infant (Haith et al. 1977). A prospective study of typically developing 3–24-month-old infants suggests that attention to the eye and mouth regions of faces changes markedly within this developmental period and is heavily affected by the context in which faces are presented (i.e., whether faces are presented in static or dynamic displays, and whether the presentation involves speech) (Chawarska et al. 2009a, b, manuscript in preparation). Specifically, when presented with images of static faces identical to those used in the current study, typically developing 3- and 6-month old infants spend most of their time dwelling on the eyes and pay very little attention to the mouth. Between 6 and 12 months, however, their dwell time on the eyes drops significantly and attention to the mouth increases, reflecting the development of less restrictive face scanning strategies, which might be linked, at least to some extent, with the emerging understanding of the role that mouth monitoring can play in speech perception. This increase was even more pronounced in response to faces engaged in infant-direct speech providing support for this interpretation. As the present study illustrates, older typically developing children continue to distribute their attention between the key features of the face, spending approximately twice as much time dwelling on the mouth region as compared to their ASD peers. Within this context, limited attention to the mouth in the ASD sample might be associated with the presence of more restrictive scanning strategies of static faces possibly linked with limited attention and responsivity to speech (Chawarska et al. in press; Paul 2008; Paul et al. 2007), and consequently lesser appreciation of the mouth region as highly informative for speech processing.
The results of the present study contrast with reports regarding scanning patterns in toddlers with ASD viewing videos depicting adults engaged in child-directed speech (Jones et al. 2008). In response to such dynamic displays toddlers appear to disproportionally decrease their attention to the eyes and increase their attention to the mouth in a manner similar to that observed in high functioning older individuals with autism (Klin et al. 2002; Speer et al. 2007). The contrast between the present study, which employs static images, and performance profiles reported in response to dynamic scenes, highlights the importance of context in interpreting research findings in young children with ASD Chawarska et al. (2009a; Speer et al. 2007). While in early typical development attention to the mouth of a speaker might reflect employment of lip reading strategy in speech perception (Chawarska et al. 2009b), in ASD interest in the mouth may be driven by factors such as preference for perfect audiovisual contingencies instead (Klin et al. 2009). That is, children with autism may focus on the mouth of the speaker because of its physically-contingent properties rather its linguistic context. In response to static pictures and in the absence of audiovisual contingencies associated with speech, toddlers may neglect the mouth and instead gravitate toward more perceptually salient areas of the display including the eyes and the edges associated with the hairline and face boundaries. How this pattern of attention might be detrimental for effective processing of facial identity is discussed in the following section.
Recognition. A previous study indicated that following a 20 s exposure, 4- but not 2-year-old children with ASD showed evidence for face recognition when tested in a standard VPC procedure (Chawarska and Volkmar 2007). In the present study, with the exposure shortened to 10 s, neither 2- nor 4-Year-Olds with ASD showed clear evidence for face recognition despite the fact that the full-head images provided additional cues regarding facial identity (e.g., hair). Thus, similarly as in typically developing children (Morgan and Hayne 2006) the ability to retain and recognize faces in young children with ASD might improve when the familiarization time is lengthened. Nonetheless, the results suggest that they are less effective in extracting invariant facial features than their typically developing peers.
This leads us to the question: what could contribute to this less effective encoding? Analysis of looking patterns suggests that in ASD longer looking at inner face elements during familiarization was associated with longer looking at the familiar stimulus during Recognition, suggesting that those who were attending to the internal parts of the face and trying to encode their invariant features did so in a less effective manner leading to the formation of an incomplete representation, and consequently, longer looking at the familiar image. Additional clues are provided by the examination of the relationship between the amount of time spent looking at the specific elements of the inner features, i.e., the mouth and the eyes during familiarization and the recognition performance. When we partialled out the effects of age and cognitive levels within the ASD group, we found that more exclusive attention to the eye region as compared to the mouth (eye-to-mouth ratio) during familiarization was associated with a lower novelty preference ratio. Thus, we hypothesize that the restricted scanning patterns of the internal facial features observed in ASD produces an incomplete representation of the face and, associated with this phenomenon, a tendency to continue looking at the more familiar stimulus during recognition phase. This interpretation is consistent with studies of face scanning and recognition in typical adults which suggest that a restricted pattern of scanning leads to diminished effectiveness of encoding (Henderson et al. 2005) and that a distribution of attention between eyes and mouth facilitates face recognition (Schyns et al. 2002).
Summary and future directions. Compared to typical peers, 2- and 4-year-old children with ASD show difficulties in encoding facial identity. These difficulties are at least partly associated with a restricted scanning of the key face areas as characterized by a more exclusive focus on the eyes and relative neglect of the mouth area. Moreover, attention to key facial features decreases in older children with ASD, possibly signaling a cumulative detrimental effect of atypical social experiences on the development of the face processing system. Early difficulties in encoding of facial identity observed in 2-year-olds are likely to impact their effectiveness as interactive partners and, thus, should be targeted directly for intervention. Future research should focus on determining conditions for optimizing face processing in infants and toddlers with ASD, by, for instance, allocating more time for processing and promoting less restricted scanning strategies of the key elements of the face. Considering that accurate face perception is crucial for establishing and maintaining successful social interactions, such a line of research would be highly consequential for the design and implementation of treatment strategies at the early stages of ASD.
Limitations. Several limitations of the present study need to be noted. First, the study lacks a developmentally disabled non-autistic control group, which would allow for the determination of whether the observed abnormalities in face scanning and their association with impaired recognition are specific to ASD. However, though the mental age of our 4-year-olds with ASD was similar to those observed in 2 year-old typically developing children, their performance was significantly different in a number of key areas, suggesting that factors other than mental age were at play. Second, the developmental effects would be best addressed through a prospective study tracking changes over time in the same group of participants; however we took great care in matching the two ASD groups on symptom severity and DQ scores. We are currently in the process of a large prospective study focused on face scanning and recognition in the first 3 years of life which will address this criticism in a more direct way. Third, it is important to note that the interpretation of the null preference effect in the VPC paradigm can be ambiguous as far as ‘lack of recognition’ is concerned. Our focus however, was on direct comparison between groups of affected and unaffected children and conclusions drawn from this study highlight the limited effectiveness of the process that leads to a formation of facial representation rather than the lack of face recognition skills. Finally, our study suggests a considerable variability within the ASD spectrum in terms of performance on the VPC task. Future studies should address the issues of heterogeneity and possible subtypes of children with ASD utilizing larger groups of participants.
The study was supported by the NAAR foundation (to KC), NIMH grant T32 MH18268 (to FS), and NIMH STAART grant U54 MH66494 (to KC). We would like to thank Jessica Bradshaw for her work in preparation of this data for analysis; Fred Volkmar, Ami Klin, and Suzanne Macari for helpful suggestions on earlier versions of the manuscript; and Celine Saulnier for her contribution to sample characterization. We wish to express our deep appreciation to the children and families who participated in the study.