Longer fixation duration while viewing face images
- First Online:
- Cite this article as:
- Guo, K., Mahmoodi, S., Robertson, R.G. et al. Exp Brain Res (2006) 171: 91. doi:10.1007/s00221-005-0248-y
- 217 Views
The spatio-temporal properties of saccadic eye movements can be influenced by the cognitive demand and the characteristics of the observed scene. Probably due to its crucial role in social communication, it is argued that face perception may involve different cognitive processes compared with non-face object or scene perception. In this study, we investigated whether and how face and natural scene images can influence the patterns of visuomotor activity. We recorded monkeys’ saccadic eye movements as they freely viewed monkey face and natural scene images. The face and natural scene images attracted similar number of fixations, but viewing of faces was accompanied by longer fixations compared with natural scenes. These longer fixations were dependent on the context of facial features. The duration of fixations directed at facial contours decreased when the face images were scrambled, and increased at the later stage of normal face viewing. The results suggest that face and natural scene images can generate different patterns of visuomotor activity. The extra fixation duration on faces may be correlated with the detailed analysis of facial features.
KeywordsEye movementFace imageNatural sceneMonkey
Visual exploration of a complex scene involves a series of saccades and fixations, which can shift our attention between specific objects or informative features within the scene and make detailed analysis and identification of the scene (Biederman 1987; Henderson and Hollingworth 1999). There are two important aspects of eye movements while studying gaze control during the scene perception, where fixations tend to be directed (fixation position) and how long they typically remain there (fixation duration; Henderson 2003). Although human saccadic eye movements show a variety of stereotypic patterns while inspecting visual scenes (Yarbus 1967), the frequency and size of saccades can be modulated by the cognitive demand and characteristics of the observed scene (Salthouse et al. 1981; Jacobs 1986; Pollatsek et al. 1986; Epelboim et al. 1995; Hooge and Erkelens 1998; Andrews and Coppola 1999). For example, longer fixations are normally associated with difficult words in reading task (Pollatsek et al. 1986) and decreased discriminability of target in visual search task (Jacobs 1986; Hooge and Erkelens 1998); and natural scenes generate shorter fixations and larger saccades compared with simple pattern images in free viewing task (Andrews and Coppola 1999).
As faces can provide visual information about an individual’s gender, age and familiarity, and their expressions offer significant cues to intention and mental state (Bruce and Young 1998; Emery 2000), the ability to recognize these cues and to respond accordingly plays an important role in the social life of higher primates (Andrew 1963; Anderson 1998). It is argued that face perception is involved in a unique cognitive process compared with non-face object or scene perception. For example, psychophysical studies have observed detrimental recognition performance for inverted faces rather than non-face objects or scenes (face inversion effect; e.g. Yin, 1969; Valentine 1988; Rossion and Gauthier 2002), a visual preference for face-like stimuli in human neonates (Johnson and Morton 1991; see also Turati et al. 2002), and selective impairments of face and object recognition in neurological patients (prosopagnosia and visual agnosia) (e.g. Sergent and Signoret 1992; Farah 1996; Moscovitch et al. 1997). Recordings of human event-related potentials showed a different topography to face (including human and animal faces) and non-face object or scene stimuli in the N170 time window (e.g. Bentin et al. 1996; Itier and Taylor 2004; Rousselet et al. 2004). Elecrtophysiology and brain imaging studies further suggested a distinct neuroanatomical region in cerebral cortex associated with the cortical processing of faces (face-selective neurons in monkey inferotemporal cortex, fusiform face area in human cortex; e.g. Sergent et al. 1992; McCarthy et al. 1997; Tanaka 1997; Tsao et al. 2003). However, this view is recently challenged by some brain imaging studies suggesting that faces are processed by a domain-general system for fine-grained, exemplar-level object perception but probably at different level of recognition or different degree of perceptual expertise (Gauthier et al. 1999, 2000; Tarr and Cheng 2003).
It is not clear, however, whether inspection of face and non-face scenes, which have different image characteristics and may involve different cognitive processes (i.e. different cortical processes, different level of recognition or different degree of perceptual expertise), can influence the patterns of visuomotor activity. To examine this issue, we compared monkeys’ saccadic eye movements when they freely viewed face and natural scene images. Familiar scenes sampled from monkeys’ daily environment were also used to examine potential influence of the familiarity of natural scene images. This exploratory project is not only important to increase our understanding of the relation between the category of real world stimuli and the organization of goal-directed eye movements in non-human primates, but also for comparison with findings from humans, as the behavior and neurophysiology of monkeys comprises the most significant model for the advancement of research into human brain function. We observed that the face images tended to generate longer fixations compared with the natural scene images, and these longer fixations were associated with the context of facial features.
Three male adult rhesus monkeys (Macaca mulatta, 4.5–6.0 kg) were trained to fixate a small fixation point (FP) for several seconds in a dimming fixation detection task. To make eye movement recordings, a scleral eye coil and head restraint were implanted under aseptic conditions (Guo and Benson 1998). All procedures complied with the “Principles of laboratory animal care” (NIH publication no. 86-23, revised 1985) and UK Home Office regulations.
Stimuli and apparatus
Digitized gray scale images were presented through a VSG 2/3 graphics system (Cambridge Research Systems) and displayed on a high frequency non-interlaced gamma-corrected color monitor (6.0 cd/m2 background luminance, 110 Hz frame rate, Sony GDM-F500T9) with the resolution of 1,024×768 pixels. At a viewing distance of 57 cm the monitor subtended a visual angle of 40×30°.
During the experiments the monkey sat in a primate chair with head restrained, and viewed the display binocularly. To calibrate eye movement signals, a small red FP (0.2° diameter, 7.8 cd/m2 luminance) was displayed randomly at one of 25 positions (5×5 matrix) across the monitor. The distance between adjacent FP positions was 5°. The monkey was trained to follow the FP and maintain fixation for 1 s. After the calibration procedure, the trial was started with an FP displayed on the center of monitor. If the monkey maintained fixation for 500 ms, the FP disappeared and an image was presented for 20 s. During the presentation, the monkeys passively viewed the images. No reinforcement was given during this procedure, neither were the animals trained on any other task with these stimuli, which could have potentially affected the structure of their behavior. It was considered that with their lack of training, and in the absence of instrumental responding, their behavior should be as natural as possible.
Eye movement recordings and analysis
Horizontal and vertical eye positions were measured using an 18-inch cubic scleral search coil assembly with 6 min arc sensitivity (CNC Engineering). Eye movement signals were amplified and sampled at 500 Hz through CED1401 plus digital interface (Cambridge Electronic Design). The software developed in Matlab computed horizontal and vertical eye displacement signals as a function of time to determine eye velocity and position. Fixation locations and durations were then extracted from the raw eye tracking data using velocity (less than 0.2° eye displacement at a velocity of less than 20°/s) and duration (greater than 50 ms) criteria (Guo et al. 2003).
As the main experimental design comprised three levels of image category (faces vs natural scenes vs familiar scenes), one-way repeated analysis of variance (ANOVA) was carried out after pooling the data from three monkeys. Appropriate post-hoc testing of differences between levels of image category (Tukey’s least significant procedure) was also carried out following detection of significant overall variable ratios.
The gray scale face and natural scene images appeared equally salient to the monkeys. No difference was observed in the number of fixations across the image categories (ANOVA, F(2,162)=0.5, P=0.61; Fig. 1b). During the entire 20-s presentation, three monkeys made 24.73±1.51 (Mean ± SEM), 24.82±1.69 and 22.82±1.58 fixations across the face, familiar scene and natural scene images.
Inspection of the natural scene is accompanied by a series of fixations directed towards important and informative scene regions. Recent studies observed higher local luminance contrast and lower local two-point correlation for fixated scene patches than unfixated patches (Reinagel and Zador 1999; Krieger et al. 2000; Parkhurst and Niebur 2003), suggesting that local image statistics, such as luminance contrast, is a major contributor to the saliency map for overt attention (Parkhurst et al. 2002). To examine whether the differences in fixation durations for the three classes of images were due to the differences in the physical properties and statistics of those fixated image regions, we calculated local luminance contrasts around individual fixations in different images. The local contrast is a measure of variability of the intensity within an image patch, and is defined as the standard deviation of the luminance within a square image divided by the mean intensity of the whole image (Reinagel and Zador 1999; Einhäuser and König 2003). The size of the square region was chosen to be 2°×2° (±1° around the fixation) which roughly covers the spatial scale of the size of the fovea. While the average fixation duration in the face images was longer than that in the familiar scenes (Fig. 1c), the average local contrast around the fixations in the face images (0.2568±0.0034) was not significantly different from that in the familiar scenes (0.2539±0.0038; t test, P>0.05; Fig. 2). However, the average local contrast around the fixations in the natural scene images (0.3512±0.0061) was higher than that in the face and familiar scene images (ANOVA, F(2,3975)=157.11, P=2.63E−66). This is due to the physical properties of the natural scene images, as the average local contrast from random samples in the natural scenes (25 samples per image) was also proportionally higher than that in the face and familiar scene images (ANOVA, F(2,1372)=113.02, P=3.67E−46; Fig. 2).
For individual fixations sampled while viewing face, familiar scene and natural scene images, we further plotted its duration against its local contrast (Fig. 3). In agreement with previous study of human subjects (Einhäuser and König 2003), over all images and all subjects, we found no correlation between local contrast and fixation duration (r=0.00005, 0.0007 and 0.0002 for face, familiar scene and natural scene images). This also holds true for the local contrasts calculated using smaller (1°×1°) or larger (3°×3°) spatial scale around the fixations (r<0.001 for all images). This analysis shows that the local luminance contrast was unlikely related to the differences in the fixation durations while viewing face, familiar scene and natural scene images.
We further compared the durations of each of the first seven fixations on the eyes and facial contours within normal face images (this number was chosen as it represented the maximum number of fixations within the region for some images, Fig. 6b). While the fixation durations on the eyes were the same with changing fixation sequence (ANOVA, F(6,268)=0.85, P=0.53), the duration of fixations on the facial contours increased gradually at the later stage of fixation (ANOVA, F(6,214)=3.75, P=0.001). There was no significant change of the fixation durations on the same regions within scrambled faces with increasing fixation sequence (ANOVA, eyes: F(6,98)=1.25, P=0.29; facial contours: F(6,115)=0.67, P=0.68).
In the present study, we compared the patterns of saccadic eye movements while monkeys freely viewed face and natural scene images (including familiar and novel natural scenes). The face and natural scene images appeared equally salient to the monkeys. They attracted similar number of fixations during the image presentation. However, viewing of the faces was accompanied by longer fixations compared with the natural scenes. This difference in fixation durations across different classes of images is unlikely to be related to the differences in local physical properties and statistics of these images which was demonstrated by the analysis of local luminance contrast (standard deviation of intensity in a fixation patch, Figs. 2, 3) and local two-point correlation function (intensity of the fixated point and nearby points, Figs. 4, 5) across the different classes of images. Comparison between familiar and novel natural scenes showed that these two classes of natural images attracted similar amount of fixation durations (Fig. 1). Because our familiar scenes were ‘artificial’ man-made scenes sampled from monkeys’ daily environment, and novel natural scenes included both ‘artificial’ scenes (i.e. buildings) and ‘natural’ scenes (i.e. plants), it is difficult to exclude the potential influence of the ‘naturalness’ of scenes on fixation duration without further detailed examination with large sample size. However, as our analysis also revealed that the fixation durations sampled from novel ‘natural’ scenes (253±7 ms) were not significantly different from those sampled from novel ‘artificial’ scenes (248±11 ms) (t test, P=0.61), it is unlikely that the potential interaction between familiarity and ‘naturalness’ of the tested scenes could fully account for our observation of difference in fixation durations between face and natural scene images.
Detailed examination of facial configurations further revealed that the longer fixations on facial contours appeared to be dependent upon the arrangement of these contours into a coherent and recognizable object, namely a face. The duration of the fixations on the same facial contours in the scrambled face images were significantly shorter (Fig. 6). These results suggest that face and natural scene images may generate different patterns of visuomotor activity. The extra fixation duration on faces may be correlated with the detailed analysis of facial features.
It is believed that oculomotor strategies are closely linked with the cognitive demand (Epelboim et al. 1995), and the fixation duration has been correlated with the amount of information being processed during foveal analysis (Moffit 1980). Longer fixations are usually associated with extra cognitive demand, informative visual information at the fixated region, and/or display complexity (Salthouse et al. 1981; Jacobs 1986; Hooge and Erkelens 1998). For example, individual fixation durations are longer during scene memorization than search (Henderson et al. 1999), or for semantically informative than uninformative objects within the scene (Henderson and Hollingworth 1999), or when the image at fixation is reduced by contrast or partially obscured by a noise mask (van Diepen 1995).
One of the major differences between face and natural scene images is that faces have inherent social significance. They are behaviorally relevant visual stimuli for primates, which provide essential information about an individual’s gender, age, familiarity, intention and mental state (e.g. Bruce and Young 1998; Emery 2000). When viewing a complex scene containing faces, the highest portion of human fixations is directed to the faces (Yarbus 1967). The local facial features, such as eyes, are not just simple geometric patterns or objects. They also contain significant social communicative signals. Like human, monkeys are also heavily reliant on facial signals for social communication. Based on facial cues alone, they are readily able to respond appropriately to the expressions of other individuals (Mendelson et al. 1982), to recognize and discriminate the faces of familiar and unfamiliar individuals (Rosenfeld and van Hoesen 1979; Parr et al. 2000). Their visual system also appears to be tuned to the informative facial features (Guo et al. 2003). They showed a preferential interest, high density of fixations and longer fixation durations, to the major local facial features while viewing faces. As local image complexity around the fixations unlikely accounts for the differences in fixation durations between the face and natural scene images (Figs. 2, 3, 4, 5), the extra duration of fixations for the faces may be correlated with the extra cognitive demand (i.e. “configural process”) which involves detailed analysis of local facial features and perceiving relations among the facial features, and therefore maybe important for acquisition and processing of facial cues, such as identity, expression and gaze direction (Maurer et al. 2002). However, from the present data it is difficult to see how the social relevance of the faces could affect the fixation durations as we only tested neural face images in a free viewing task in this experiment. In the future study it will be interesting to systematically manipulate social relevance over controlled sets of face images and/or cognitive demand, and to investigate the relations among social perception, cognitive demand and patterns of saccadic eye movements.
Interestingly, the facial configuration did not appear to have significant influence on individual fixation durations. Indeed, the durations of fixations on major local facial features, such as eyes, nose and mouth, were not different between normal and scrambled faces (Fig. 6). This suggests that the longer fixations on the faces are mainly correlated with the analysis of the local facial features rather than the precise facial configuration. However, the disruption of facial configuration (i.e. inverted or scrambled faces) can significantly reduce the number of fixations compared with the normal upright faces (Guo et al. 2003). Taken these observations together, it seems that the number of fixations rather than the duration of fixations play a more crucial role in the process of face inspection.
When tested with the scrambled face images, the durations of fixations on the facial contours (including hairlines) were slightly decreased (Fig. 6). For a normal upright face, the facial contour provides essential facial metric information which is critical for face perception and recognition (Burton et al. 1993; Perrett et al. 1994; Fellous 1997). Indeed, the responses of face-selective neurons in anterior inferotemporal cortex of macaques are correlated with dimensions relating the hairline to other facial points, such as eyes, in face discrimination tasks (Young and Yamane 1992). In our study, the observed longer fixations on the facial contours within the intact faces may be correlated with the analysis of the properties of facial dimensions, and this process may require extra fixation time.
This work is supported by Wellcome Trust, HFSPO and EU FP5.