The face plays an essential role in human communication. It conveys information about other people’s identity, gender, emotional states, and even personality traits (Bruce & Young, 1986; Emery, 2000; Hietanen, 2018; Kleinke, 1986). Within the structure of a face as a visual stimulus, the eyes are the first and most frequently fixated region, and gaze plays a central role in interindividual interactions (Kleinke, 1986). Indeed, the detection and perception of gaze provides valuable indications about other people’s intentions, they allow us to identify whether we are the target of the gaze, and if not, to know who is being observed. While the functions of gaze have been the focus of social psychology research for a very long time (Kendon, 1967), studies of the cognitive processes and brain mechanisms associated with the detection and perception of gaze direction are more recent, although widespread in the field of cognitive neuroscience (Itier & Batty, 2009). These processes seem to take place automatically and effortlessly in individuals with typical development and are an indissociable aspect of social communication (Baron-Cohen, 1994). In contrast, in atypical conditions, such as social anxiety disorder, and in neurodevelopmental pathologies, such as autism and schizophrenia, these processes appear to be altered. This causes difficulties in social relationships that can be reflected in the recognition of facial emotions and in the interpretation of other people's intentions.

Some researchers have advanced the hypothesis that eye contact acts as a communication intention detector. The hypothesis being that through its ability to induce an intention to communicate, eye contact directly activates channels of social cognition (Conty et al., 2007; Kampe et al., 2003; Schilbach et al., 2006; Wicker et al., 2003). This type of intention detector seems to be underpinned by specific subcortical and cortical mechanisms (Senju & Johnson, 2009b), making it a network of theory of mind and even social cognition. In light of the various investigations into detection and perception of gaze as premises of a social brain, this narrative review examines what behaviors can be observed, from birth to adulthood, in response to the direction of people’s gaze. Then, we examine the related neural aspects. Once the typical functioning has been explicated, we focus on some atypical aspects. In an effort to synthesize the extant literature, new hypotheses are advanced relating to the cognitive and neural aspects of gaze direction detection and perception.

Behavioral aspects of the detection and perception of gaze

It is well-established that infants are attracted to faces early in life, and research has tried to unravel the impact on newborns’ attention and behavior of the detection and perception of gaze directed at them (direct gaze) or away from them (averted gaze). The earliest studies (Hood et al., 1998; Vecera & Johnson, 1995) already showed different behaviors in newborns in these two conditions, with a preferential orientation toward direct gaze. According to Itier and Batty (2009), preferential orientation for direct gaze can help to build a certain number of relational capacities and may constitute an early sign of the emergence of social interactions.

One subject of speculation and ongoing debate is why and how this preferential orientation toward direct gaze develops. From birth, newborns are able to catch another person’s eye and to fix a direct gaze as long as their face is no further than 30 cm away (Bronson, 1974). It is highly likely that a fixation reflex exists from the first weeks of life, albeit unstable and short-lived due to low visual acuity (Allen et al., 1996). From birth and up to 9 months of life, stable, precise, and sustained fixation is obtained through foveal maturation. At the same time, increased sensitivity to contrast allows the infant to distinguish between and recognize static and dynamic shapes (Vital-Durand et al., 1996). During the first weeks and months of life, this sensitivity undergoes significant qualitative and quantitative changes (Bui Quoc, 2007). The 5-week-old newborn cannot perceive contrasts of less than 20%. From birth, however, the contrasts between the sclera, the pupil, and the iris of the eyes of others are easily perceived, even when the iris is clear, as long as the contrast in the eyes is greater than 20% (Vital-Durand et al., 1996). This suggests that the basic requirements for detecting the eyes are met quite soon after birth. In adults (between 24 and 28 years old in the studies of Burra, Baker, & George (2017); George et al. (2001); Kampe et al. (2001); Taylor, Itier, Allison, & Edmonds (2001)), the perception of the direction of the gaze tends to be more nuanced with behavioral responses to a direct gaze, which are faster than responses to an averted gaze. These behavioral responses also seem to be modulated by the orientation of the head, since the perception of gaze direction is more evident with a frontal view than with a 30°-deviated view (Coelho et al., 2006; Conty et al., 2007). It is assumed that the preferred orientation toward direct gaze serves to collect information about intentions, potential future interactions, or the focus of other people’s attention. This information is important for social and nonverbal communication and tells one what to expect, what happens, and how to behave with peers (Itier & Batty, 2009).

Various interpretations have been proposed to account for newborns’ preference for direct gaze. Baron-Cohen(1994) suggested the existence of the “Eye Direction Detector” (EDD), an innate module in the brain specifically dedicated to gaze processing. This module is part of a broader system called “The Mindreading System,” the role of which is to determine the code and intentions of others and, therefore, to contribute to social interactions. According to Baron-Cohen(1994) and Brothers (1990), this system is supported by a specialized brain network called the social brain. The EDD has two functions: the first to detect the presence of eyes, and the second, to represent eye behavior. To explain the second function, Baron-Cohen’s argument was based on what he called domain specificity, which is the idea that the eyes constitute a well-defined area with specific regions of contrast: a darker region (iris/pupil) and a white region (the sclera). Contrast sensitivity being a general property of the visual system (Baron-Cohen, 1994, 1995), the author suggested that after detecting the eyes, the EDD makes it possible to construct a representation of behavior, including a relationship between oneself and the person who is looking at us. He suggested that this enables the observer to imagine certain basic properties of intentionality. He also assumed that the EDD is activated once a face has been detected. Thus, the EDD is used to locate the relevant stimuli (i.e., the eyes) within a more global structured background (i.e., the face) and this should already be possible soon after birth. The direct gaze preference also can be explained using the two-process model proposed by Johnson et al. (Johnson, 2005; Johnson et al., 1991; Johnson et al., 2015; Morton & Johnson, 1991). The first process is a face-specific mechanism, named Conspec, set to detect faces of conspecifics. This mechanism is selectively tuned to the geometry of a face. The effectiveness and ubiquity of the simple T-shaped schematic face (i.e., the spatial configuration of the two eyes, the nose and the mouth) suggest that face detection may be accomplished through a simple template-like process (Tsao & Livingstone, 2008). Computational models relying on algorithms detect and match basic features, which are much simpler than a whole face (rectangle features in the Viola Jones algorithm (Viola & Jones, 2004), qualitative contrast ratios between pairs of face regions (Sinha, 2002), and face parts (Ullman et al., 2002)). However, such algorithms carry out holistic detections, which means they necessarily detect the faces as spatially arranged sets of features. Indeed, such algorithms detect overlapping constellations of the basic features that constitute the entire face and, implicitly, enforce the overall layout of features. This process tuned to the geometry of a face (Conspec) is already present at birth (Morton & Johnson, 1991). The second process proposed by Johnson et al. (2015), is a domain-relevant mechanism, named Conlearn, that progressively specializes in face recognition from the age of two months by acquiring and retaining specific information about the visual characteristics of conspecifics. Direct gaze detection preference is probably supported by the detection of faces at birth due to the domain-specificConspec mechanism (Morton & Johnson, 1991). There are seemingly at least two overlapping ideas in the two models laid out above. The first concerns the existence of specific mechanisms that are already present and operational at birth, that continue to mature after birth and support the subsequent development of higher-order, learning-based mechanisms of great importance for social cognition and interactions. The difference is that the first model is more eye-specific, whilst the second is more face-specific. The second overlapping idea is not as direct as the first. It supposes a specific time-course of information processing in which core face-like stimuli are detected before eye-like stimuli. This aspect is important since it describes how certain stimuli are given processing priority and how they are ranked for further processing.

The existence of a mechanism specifically devoted to detecting faces in the environment (and hence, the eyes) has been questioned by an alternative view according to which faces are preferred because they are a collection of perceptual structures, the properties of which attract newborns’ attention. According to this view, attentional biases toward these structural properties present in faces lead to the newborns’ preferences, yet the attentional biases are not specifically built for detecting faces and therefore eyes. They probably derive from the functional properties of the immature newborns’ visual systems and are useful for both faces and non-face stimuli. Such structural properties appear relevant because they allow newborns to successfully detect and identify faces, i.e structured wholes, when embedded among other non-face stimuli (Simion et al., 2001). This view is consistent with the notion the newborns’ immature visual system is sensitive, not only to a certain range of spatial frequencies, as described by the contrast sensitivity function (see Acerra et al., 2002), but also to other higher-level structural properties, as demonstrated by newborns’ preference for horizontal versus vertical stripes (Farroni et al., 2000). In adults (mean age 20 years old in the study by Olk et al. (2008)), gaze direction judgments are possibly based on the outcome of competition between different gaze direction signals such as luminance cues and geometric cues. In addition, faces are three-dimensional objects that move and, importantly, manifest visual traits that are present simultaneously. These characteristics render faces probably the most interesting stimulus experienced by newborns in their immediate environment, all the more so because they are the most frequent stimulus that appears within a distance of 30 cm (Bui Quoc, 2007). Indeed, a study conducted on newborns between 13 and 168 hours old by Farroni et al. (2005) hypothesized that if up-down asymmetry is crucial to determining face preference, then the contrast polarity of the elements should not interfere (i.e., face-sensitive view, see Johnson et al., 2015, for a discussion). The authors found that the contrast polarity of the stimulus is a determinant of this preference for faces. From the age of 4 months, infants seem sensitive to the contrast polarity that results from the typical perceptual pattern of the eyes, i.e., the black pupil on a white sclera (Michel et al., 2017). Similarly, adult observers are highly inaccurate in judging gaze direction from pictures of human eyes with negative contrast polarity (regardless of whether the surrounding face is positive or negative), even when the negative pictures of eyes have the same geometric properties as positive images that have been judged accurately (Ricciardelli et al., 2000). In another study involving a visual research task, when the contrast polarity intrinsic to the eye (i.e. a conspicuous (white or colored lighter than the iris color) or inconspicuous (colored the same or darker than the iris color) sclera) was varied adults detected gaze (direct or averted) more accurately and rapidly with conspicuous sclera (Yorzinski & Miller, 2020). Furthermore, this perceptual and sensory hypothesis is backed by the fact that the eye itself, as a visual stimulus, has a number of simple and powerful characteristics such as relatively high contrast and shape, making it particularly simple to detect (Langton et al., 2000). Other features may also need to be coded, such as angle of the head or body, but the information within the eye region is clearly important (for a review see Clifford & Palmer, 2018). The eye may be a special stimulus only in the sense that a vast amount of sensory information can be recovered from it using simple processing mechanisms, such as interactions between specialized cells in the visual system (Langton et al., 2000). From an evolutionary point of view, the extraction of properties of the eye region can lead to a gradual construction of a representation, thus shaping their social development, and eventually resulting in adult adaptive specializations (Farroni et al., 2003).

Whilst attentional priority is given to the person who is staring at us, the emotional responses this elicits are varied (Kendon, 1967). Eye contact through mutual direct gaze gives the impression of being stared at and triggers physiological arousal (Hood et al., 2003; Kampe et al., 2001; Kawashima et al., 1999; Senju & Johnson, 2009b). Early research found a greater electrodermal response when participants looked at the eyes when examining pictures of faces compared with when they looked at the mouth area (McBride et al., 1965), and direct gaze elicited a stronger electrodermal response than averted gaze (Nichols & Champness, 1971). These results were confirmed by Hietanen et al. (2008), who also showed that arousal was greater when eye contact was established between the participant and another person as opposed to contact between the participant and a photograph. Similar results were found when electrocortical activity was used as a measure of arousal (Hietanen et al., 2008; Kampe et al., 2003). Direct gaze therefore not only attracts attention and gains processing priority, but also increases physiological activity testifying to an emotional response, probably related to approach/avoidance responses (Adams & Kleck, 2003, 2005). The notion of approach and avoidance supports the idea that the capacity to process gaze information plays a role in behavioral intentions, defined by Baron-Cohen(1995) as "primitive mental states insofar as these are the basic states that are necessary to be able to understand the universal movements of all animals: approach and avoidance" (pp. 33-34).

In conclusion, preferential orientation toward direct gaze seems to be present soon after birth and to prime autonomous physiological activity. Furthermore, it seems to depend (i) on the presence of a specific facial configuration that maintains the spatial relationships between the components of the face (Farroni et al., 2002; Farroni et al., 2006, 2007; Morton & Johnson, 1991) and is used to identify the direction of gaze (Baron-Cohen, 1994), and/or(ii) on a collection of visual perceptual properties (Batki et al., 2000; Langton et al., 2000; Simion & Giorgio, 2015). In our opinion, these three theories (Baron-Cohen, 1994; Morton & Johnson, 1991; Simion & Giorgio, 2015) are not necessarily mutually exclusive. Indeed, they may be integrated in a more general framework. For instance, eye detection and perception may include a pictorial encoding step extracting details of the lighting, grain and salient areas of high contrasts in the face. This captures the static pose and expression seen on a face (Bruce & Young, 1986). It may be hypothesized that a contrast processing step specific to the "eye" stimulus extracts specific perceptual information from the eye region (dark/white contrasts) and determine the salience of the eye region based on the percentage of contrast. In a very recent study, Riechelmann et al. (2021) asked the question of the origin of gaze effects following their results highlighting the averted-gaze advantage. They consider that the increase in the salience of the white of the eye (i.e., sclera) for averted gaze (compared with direct gaze) is at the origin of the observed effects. We make the hypothesis that a direct gaze is associated with more contrast than an averted gaze. Contrast is the intrinsic property of an image, which quantifies the difference in brightness between the light (e.g., sclera) and dark (e.g., pupil) parts of the image (here, the eye). This difference in brightness, and therefore the contrast, is different for a direct gaze (sclera/pupil/sclera) than for an averted gaze (sclera/pupil or pupil/sclera), because contrast is not only determined by the presence of dark and light areas, but above all by their spatial arrangement and configuration. For instance, the eye stimuli presented in Fig. 1 show a significant difference in contrast between direct gaze and averted gaze. Areas of greater contrast (generally associated with direct gaze) would receive higher processing priority and help provide a faster, preferential response, such as orientation towards that area and a subsequent increase in physiological arousal.

Fig. 1
figure 1

Direct gaze and averted gaze. Two stylized stimuli representing direct gaze (a) and averted gaze (b). The cartoon probably does not correspond to real-world eyes. However, there is such variability in actual characteristics in human eyes (depending on age, gender, ethnicity, weight, individual characteristics…) that it is extremely difficult to fabricate a single type of stimuli that represents them. Five, successive, luminance measurements were taken using a Minolta LS-110 photometer. The contrast for the averted gaze is equal to 59.9% (averted gaze luminance for the sclera (white) = 136.31 (±4.95) cd / m2; averted gaze luminance for the pupil (black) = 34.19 (±7.70) cd / m2) and the contrast for the direct gaze is equal to 65.67% (direct gaze luminance for the sclera (white) = 104.1 (±5.43) cd / m2; direct gaze luminance for pupil (black) = 21.57 (±8.75) cd / m2). There is indeed a difference in contrast with a stronger contrast (6% more) for direct gaze. Differences of this order may be sufficient to change the speed of processing of visual information

In our opinion, there are still two gray areas and studying these would provide answers to these questions. First, we believe that the order in which stimuli are processed (face first or eyes first) is the fundamental difference between the aforementioned theories and that to our knowledge, this has not yet been fully investigated. Indeed, Langton (2000) and Ricciardelli and Driver (2008) studied these effects, but concomitantly with the impact of the orientation of the head. In their studies, they have shown that gaze perception is not only determined by the eye region but also by some parts of the face. Moreover, this perception seems to depend on the time constraints for judgments of direction of gaze. Furthermore, it is surprising that there is no mention in the relevant literature of a third possibility, i.e., synchronous and parallel processing of the eyes and the face. Indeed, the processing of the eyes could occur in parallel and in complementarity to the processing of the more general features of the face. Studying this question could shed light on the time course of information processing during the detection of gaze direction and, consequently, on which theory is closest to reality. There are several ways to assess the which-comes-first question. For instance, one experiment could involve adult participants who would be presented with the sketch of a face and required to make an eye-direction judgment (i.e., the eyes look at me vs. they look away) as quickly as possible. A temporal asynchrony could be introduced between the presentation of the face (without the eyes) and the presentation of the eyes in such a way that the whole face would appear in a face-eyes order, or in an eyes-face order. During very short time intervals, asynchrony is not consciously perceived. By recording response times, this would allow an assessment of whether preferential processing of direct gaze (as compared to averted gaze) is most visible when the face comes first, or when the eyes come first. This would answer the order of processing question.

Second, the models described so far try to explain how we perceive a specific area of the face (i.e., the eyes) characterized by high contrast, defined by a darker region (iris/pupil) and a white region (the sclera). It is thus supposed that such salient perceptual features are given processing priority, and this is what defines preferential orienting. This hypothesis could be assessed by parametrically varying the contrast between the face and eyes, but also within the eye in a gaze direction judgment task. If the saliency hypothesis is correct, then faster responses for direct gaze (compared with averted gaze) are to be expected with increasing eyes/face contrast, with increasing iris/pupil contrast, and the fastest responses would be expected when increased iris/pupil contrast is combined with decreased eye/face contrast (this is the situation where gaze direction should stand out).

Neural aspects of the detection and perception of gaze

Low-level visual information processing

For 30 years, evidence obtained from behavioral, neuroimaging and electrophysiological studies has supported the hypothesis of specialized neural networks involved in eye and gaze processing (Haxby et al., 2002; Itier & Batty, 2009; Nummenmaa & Calder, 2009; Wicker et al., 1998), starting at retina level. As previously mentioned, foveal development associated with increased sensitivity to contrast plays an important role (Johnson, 2003) in processing high-contrast areas and in sustaining fixation. This seems to support the idea of the subsequent development of specialized cortical zones for the processing of the eyes and, more generally, the face. The course of visual information processing thus provides clues as to how retinal information translates into relevant information that biases attention (Bisley, 2011). Fibers from the retina travel along the optic nerve and axons either (i) reach the lateral geniculate nucleus (LGN) and then the primary visual cortex (or striate cortex), constituting the geniculostriate pathway (Ling et al., 2015; McAlonan et al., 2008); (ii) or first reach the superior colliculus, the pulvinar of the thalamus and the amygdala before reaching the extrastriate parietal and temporal areas, constituting the extrageniculate pathway (Knudsen, 2011; Robinson & McClurkin, 1989). It is thought that these two pathways operate simultaneously and in parallel, although it has been hypothesized that the extrageniculate pathway offers faster signal transmission (Mizzi & Michael, 2016). Both pathways are thought to play a role in eye and gaze processing, which needs to be more precisely defined in order to understand how information processing progresses from low to higher-levels and contributes to social interactions.

Within the geniculostriate pathway, inputs to the LGN include both the first synapses from the retinal ganglion cells, as well as massive feedback from the primary visual cortex and the thalamic reticular nucleus that greatly outnumber the retinal inputs (Sherman & Koch, 1986). This makes the LGN an opportune control structure to modulate the transmission of visual information. At the end of the geniculostriate pathway, single cells in the primary visual cortex respond vigorously to the whole eye as a stimulus (Langton et al., 2000). These cells respond in three spatially defined parts of the eye: the two visible parts of the sclera and the pupil. This enables the primary visual cortex to respond to gaze direction. When gaze direction changes the pupil shifts, causing a change in contrast. In addition, as the direction of gaze changes, the response of the cells sensitive to the scleral parts also changes. Neurons in the primary visual cortex represent luminance changes by dynamically adjusting their responses when the luminance distribution changes (Wang & Wang, 2016). This results in the detection of changes in gaze direction (Langton et al., 2000).

Within the extrageniculate pathway, the physiological properties of cells in the superior colliculus, the pulvinar and the amygdala may also contribute to emitting elementary signals of importance in terms of the direction of other people's gaze (Grosbras et al., 2005; Hoehl et al., 2009; Itier & Batty, 2009). The superior colliculus is involved in integrating spatial information and conveys topographic information on the face (Van Le et al., 2020). For instance, neurophysiological evidence in monkeys suggests that neurons in the superior colliculus respond more strongly to face-like stimuli (Van Le et al., 2019). Furthermore, they are sensitive to the saliency of stimuli (Shen et al., 2011; Shen & Pare, 2014), making the involvement of the superior colliculus in both the detection of faces and the selection of salient features in faces for saccadic eye movements a plausible theory (Horwitz & Newsome, 2001).

In turn, the pulvinar incorporates important aspects of visual salience (Robinson & Petersen, 1992), probably by detecting regions or patterns of interest (Michael et al., 2016; Michael & Desmedt, 2004) and prioritizing them for processing (Michael & Gálvez-García, 2011). Importantly, primate pulvinar cells respond clearly and consistently to stimuli moving in all directions, providing the visual system with information about the location of a movement, although there are signs of orientation selectivity (Petersen et al., 1985). Recent neuroimaging evidence has shown that the pulvinar conveys information on faces that is nonselective in terms of spatial frequency or emotional expressions (McFadyen et al., 2017). Finally, there is mounting evidence that the pulvinar plays an important role in attention (Petersen et al., 1987) and that damage to this nucleus can contribute to attention deficits (Danziger et al., 2002; Michael et al., 2001; Michael & Buron, 2005). So far, we can assume that the superior colliculus and the pulvinar contribute to the detection of gaze direction at least through the coarse processing of face-like stimuli, the analysis of salient regions of contrast (e.g., the regions of the eyes), and the efficient detection of movement (i.e., an approaching face or gaze that changes direction). On this last point, it is also interesting to note that neurophysiological investigations have uncovered a disynaptic pathway that transfers motion signals through the superior colliculus and the pulvinar from the retina toward the extrastriate cortical areas specialized in motion processing (Lyon et al., 2010). Interestingly, recent psychophysical evidence suggests that this pathway guides attention to moving stimuli without awareness (Mizzi & Michael, 2018), which might be the means by which the extrageniculate pathway prioritizes the processing of changes in gaze direction.

It has been known for several decades that the amygdala participates in emotional processes (Aggleton & Young, 1999) and responds to emotional facial expressions (Sato et al., 2004). Its role in this type of social information processing is well established. Whether and how it may be involved in gaze direction processing, however, is not easy to apprehend. The amygdala contains neurons that respond both selectively to stimuli from unique sensory modalities (visual, tactile, auditory, etc.) and to integrated multisensory stimuli (Morrow et al., 2019), suggesting that the amygdala may bind together elementary signals carried by facial expressions, voices etc. Recent investigations in humans have shown that neural responses in the amygdala increase in response to facial stimuli, independently of spatial frequency or emotional expressions (McFadyen et al., 2017). Furthermore, there is evidence that the right amygdala exhibits an increased response to direct gaze (Kawashima et al., 1999; Senju & Johnson, 2009b). Thus, the amygdala is thought to be involved in the bottom-up processing of faces and the eye region through connections with the pulvinar (Itier & Batty, 2009;Vuilleumier et al., 2003 ; Vuilleumier & Schwartz, 2001), forming a rapid pathway for the transfer of generalized information about faces (McFadyen et al., 2017). One interesting recent hypothesis is that more precise information on face structure giving rise to the perception of facial expressions is transferred to the amygdala via a slower route that involves the primary visual cortex and conveys high spatial frequencies (McFadyen et al., 2017). This tallies well with the fact that amygdala responses to facial expressions are predominantly driven by judgments about the eye region (Wang et al., 2014). These findings suggest that the amygdala constitute one of the convergence points of information on faces and gaze obtained from forward projections from the superior colliculus and the pulvinar, and from backward projections from the primary visual cortex after direct input from the LGN.

Overall, the properties of the structures that form both the geniculostriate and the extrageniculate pathways suggest that the low-level processing within these pathways provides sufficient information to detect static and approaching faces, localize the contrasts that designate the eye region, detect a direct gaze, spot changes in the direction of gaze, and confer attentional priority on the relevant aspects of the stimuli without conscious awareness. Combined with the fact that at birth these pathways are sufficiently developed to take on these processes, these observations concur with theories on the elementary and automatic mechanisms that convey information about, and contribute to, the detection of the direction of gaze (Baron-Cohen, 1994; Morton & Johnson, 1991; Simion & Giorgio, 2015), and which form a basis for social interaction through the processing of emotional information conveyed through facial expressions (Vuilleumier et al., 2003; Vuilleumier & Schwartz, 2001). Furthermore, the amygdala is involved in priming approach/avoidance responses (Davis & Whalen, 2001), which can induce autonomous physiological reactions when a direct gaze is detected (Hietanen et al., 2008).

Several authors have suggested that the geniculostriate and the extrageniculate pathways interact to detect gaze and faces. These pathways may provide a representation of contrast distribution, information which would then be transmitted to the striate cortex (Acerra et al., 2002; Farroni et al., 2004, 2013; Nakano & Nakatani, 2014). Senju and Johnson (2009a, 2009b) suggested the possibility of an accelerated modulator associated with the effect of eye contact. The effect of eye contact would appear to be mediated by the extrageniculate pathway (de Gelder et al., 2003; Johnson, 2005; Morton & Johnson, 1991) because of its assumed higher information transmission speed and the nature and complexity of the information it conveys. This could modulate activity of the geniculostriate pathway (McFadyen et al., 2017) and beyond (Johnson, 2005). The organization in a dual parallel extrageniculate and geniculostriate system led Senju and Johnson (2009a, 2009b) to suggest a direct gaze detector system might exist. The extrageniculate pathway would detect social interaction intent, through detection of direct gaze for instance. This information would then be transmitted and combined with that gathered by the geniculostriate pathway, eventually leading to the conscious perception of being stared at. Higher-level processing in other cortical areas would provide a subsequent more detailed analysis of the situation to promote social interaction and situational behavior.

Higher-level visual information processing

The story therefore does not end with the mere convergence of information from two separate pathways in the amygdala. Cortical areas involved in higher-level visual information processing seem to contribute to detecting the direction of gaze. Integrated information from the geniculostriate and extrageniculate pathways converges towards extrastriate cortical areas (Prasad & Galetta, 2011) and then reaches certain portions of the fusiform gyrus (FG) and the superior temporal sulcus (STS)(Seltzer & Pandya, 1978). This suggests that the input into these cortical areas contains all aspects of the information already handled during the low-level stages of processing and that, consequently, some involvement in gaze detection and perception is to be expected (Grosbras et al., 2005). This idea is supported by the observation that the detection of direct gaze increases functional connectivity between the amygdala and the FG (George et al., 2001). Furthermore, the FG exhibits an increased response to direct gaze, mostly in the right cerebral hemisphere (Senju & Johnson, 2009b). Interestingly, it is the medial part of the posterior FG (FGm) that receives input from the striate and extrastriate cortices (Zhang et al., 2016). Like the majority of neurons in the amygdala (Morrow et al., 2019), neurons in the FGm are multisensory, suggesting that the FGm may bind together signals carried by facial expressions and voices for the purposes of social communication (Zhang et al., 2016). The perception and identification of faces (Iidaka, 2014) as well as the analysis of gaze provide visual clues during social communication.

As for the STS, it would seem to be a structure of higher cognitive integration and the wealth of afferences to and efferences from the STS gives credit to this hypothesis (Desimone & Ungerleider, 1986; Seltzer & Pandya, 1989a, 1989b, 1994; Specht & Wigglesworth, 2018). Some authors have suggested that the STS can be divided into five areas along an antero-posterior pathway, each area supporting a function: theory of mind, biological motion, faces, voices, language (Beauchamp, 2015). The posterior segment of the STS (pSTS) also is thought to be a key area for dynamic facial processing and code changes in expression and gaze (Cheng et al., 2018; Iidaka, 2014). Furthermore, this tallies well with earlier work showing that different head views are associated with distinct responses in the pSTS (Fang et al., 2007; Natu et al., 2010). The anterior part of STS (aSTS) seems to feature a fine-grained representation of specific gaze directions (Calder et al., 2007; Carlin et al., 2011; Carlin et al., 2012) since it exhibits graduated activations in response to the direction of gaze. This specificity of the aSTS is supported by the fact the responses are independent of other factors such as head orientation or head view (Calder et al., 2007; Carlin et al., 2011). These studies therefore suggest the existence of view invariant processes in gaze direction subtended by the aSTS (Carlin & Calder, 2013). Of particular interest in the present paper, in order to better understand the complete path of neuronal information concerning gaze detection and perception, is the fact that the STS receives afferents from the striate (Montero, 1980) and the extrastriate cortices, as well as from the FG (Seltzer & Pandya, 1978). A further characteristic of the STS is that it shows anatomical asymmetry between the left and right cerebral hemispheres. It is longer on the left, but deeper on the right (Leroy et al., 2015). This asymmetry is genetically coded since it would appear to develop in utero(Kasprian et al., 2011). A neural population responding specifically to the direction of gaze is hosted in the right STS (von dem Hagen et al., 2014). This has been confirmed by studies using Transcranial Magnetic Stimulation (TMS) techniques and shows that transient inhibition of the right posterior STS disturbs orientation towards the eyes and, therefore, indirectly the perception of gaze (Saitovitch et al., 2016). In turn, the left STS exhibits increased activity during the passive viewing of averted gaze (Hoffman & Haxby, 2000). Therefore, both cerebral hemispheres appear to participate in the detection of gaze direction, but in a different way depending on where the gaze is directed. In conclusion, the STS seems to play a role in representing a person’s gaze direction (Carlin & Calder, 2013; Hoffman & Haxby, 2000; Perrett et al., 1985; Perrett et al., 1990; Senju & Johnson, 2009b) with a progressive anterio-posterior representation. Neurons in the pSTS may represent the detection of gaze direction changes depending on head movement whilst neurons in the aSTS might be specifically dedicated to the more view invariant aspects of the direction of gaze (Calder et al., 2007; Carlin & Calder, 2013). This information would then be transmitted to the anterior cortical regions to enable joint attention.

Several other areas appear to be involved in gaze direction detection and perception, such as the intraparietal sulcus (IPS) and the temporoparietal junction (TPJ)(Senju & Johnson, 2009b; von dem Hagen et al., 2014). It is less clear, however, whether these structures are directly involved in determining gaze direction or whether they are recruited subsequently to reorient attention toward the other person’s gaze (i.e., joint attention; Carlin & Calder, 2013; Hoffman & Haxby, 2000; Itier & Batty, 2009). The literature notably suggests that the IPS plays a role in the endogenous orientation of spatial attention and that the TPJ could act as a circuit breaker by interrupting ongoing activities to reallocate available attentional resources to salient stimuli (Corbetta et al., 2008; Corbetta & Shulman, 2002). As far as gaze direction detection and perception are concerned, direct extrageniculate input may activate the IPS in order to reorient attention to socially relevant stimuli (e.g., when someone is staring at us). This might be facilitated by the activation of the TPJ, which, in this case, would act as a circuit breaker (Grosbras et al., 2005). Finally, activity in the frontal cortical areas has also been reported to occur in response to direct gaze (von dem Hagen et al., 2014), especially in the right hemisphere (Senju & Johnson, 2009b). Again, these results are inconsistent and these structures might be indirectly involved in coding emotional information conveyed through gaze, or in promoting joint attention (Carlin & Calder, 2013; George et al., 2001; Itier & Batty, 2009).

Overall, these findings make it possible to outline a dynamic cortical network dedicated to higher-level information processing. The core role of this network being to detect gaze direction on a static or moving head, and decode the dynamic microstructural changes on the face that form emotional facial expressions. This network also encompasses structures devoted to controlling and reorienting attention toward relevant social cues and promoting joint attention (Mosconi et al., 2005). Finally, the right cerebral hemisphere seems to be more specifically involved in the detection of direct gaze (Grosbras et al., 2005).

Awareness of direct gaze

All the aforementioned structures seem to be engaged differently or in qualitatively different processing depending on whether or not there is conscious awareness of direct gaze (Madipakkam et al., 2015). Indeed, in situations where exposure to face is being masked, and thus participants are unaware of perceiving faces, the FG, the STS and the IPS still respond to the direction of gaze, but their responses are weaker than for visible faces. Furthermore, direct gaze elicited greater responses than averted gaze when participants were aware of the faces, but smaller responses when they unaware (Madipakkam et al., 2015; for review: Sterzer et al., 2014). These data suggest that even when there is no awareness of being stared at, certain structures in the gaze detection network still generate a neural response. Social information about being the focus of someone’s attention can thus be processed nonconsciously, a kind of blindsight (Kim & Blake, 2005; Leopold et al., 2002; Sterzer et al., 2014; Tong et al., 2006). One interesting finding is that the extrageniculate pathway is also involved in the nonconscious processing of emotional stimuli (Liddell et al., 2005; Morris et al., 1999; Troiani & Schultz, 2013). The study by Madipakkam et al. (2015) did not report any activity in the superior colliculus or the pulvinar. This may be due to the neural signals in these structures being too weak to show up in the results. It therefore is interesting to ask the question as to whether amygdala responses when participants were unaware of the stimuli were due to input from these extrageniculate structures or to back-projections from cortical areas. The involvement of the extrageniculate pathway in the nonconscious feeling of being stared at is therefore an avenue for further investigation.

Overall, the neural processing of gaze seems to involve different networks. In the light of all the data reviewed, this paper proposes to describe how information processing builds through specialized networks, to provide the most complete explanation possible of the neural processing of gaze. Although further research is needed, it would appear that detecting gaze and its direction is supported by a specialized system (Fig. 2) combining (i) whole low-level processing pathways, namely the extrageniculate pathway (i.e., superior colliculus, pulvinar and amygdala; Kawashima et al., 1999; Senju & Johnson, 2009a, 2009b) and sections of the geniculostriate pathway (i.e., the primary visual cortex), (ii) and temporal structures, including the FG which ensures multisensory processing (George et al., 2001; Grosbras et al., 2005) and the STS, which enables the representation of a person’s gaze direction (George et al., 2001; Grosbras et al., 2005; Hoffman & Haxby, 2000; Itier & Batty, 2009; Perrett et al., 1990). It would appear that luminance contrast derived from spatial and topographical aspects of the face triggers activity within the striate cortex related to the direction of gaze. This activity is thought to contribute toward building a representation of the direction of gaze within the pSTS containing retinotopic information and, thus, enable the subsequent construction of a dynamic representation of gaze direction and movements. This information may also enable the construction of a view-invariant representation within the aSTS (Carlin & Calder, 2013; Cheng et al., 2018). Other cortical structures seem to be involved, but their role is less clear. These are (iii) parietal areas, such as the IPS and the TPJ, that may help to allocate attentional resources (Carlin & Calder, 2013; George et al., 2001; Grosbras et al., 2005; von dem Hagen et al., 2014) and (iv) frontal areas that might be involved in coding emotional information conveyed by a person’s gaze, or promoting joint attention (Carlin & Calder, 2013; Itier & Batty, 2009). Finally, the right cerebral hemisphere seems to be more involved in the processing of faces with direct gaze (George et al., 2001; Senju & Johnson, 2009b), while the left hemisphere may signal the presence of a face looking somewhere else (von dem Hagen et al., 2014).

Fig. 2
figure 2

Gaze detection model

While the literature contains some information about the potential role of each hemisphere in gaze perception, very few studies have investigated the possibility of any putative hemispheric asymmetry. A visual field effect on perception of gaze direction has been found (Ricciardelli et al., 2002). In that study, participants with typical development viewed eye stimuli displayed on a computer monitor and were required to judge the direction of gaze. A Left Visual Field advantage was found (more correct answers), indicating right hemispheric dominance. This hemispheric dominance seems specific to the perception of gaze direction and is already present at the age of 4-6 months (de Heering & Rossion, 2015). However, despite this right hemispheric dominance, both hemispheres seem to be involved in the detection and perception of gaze direction, as previously mentioned (George et al., 2001; Senju & Johnson, 2009b; von dem Hagen et al., 2014), meaning that they have to interact in some way at some time. This last point is not clear, and further investigation is needed. Determining the exact direction of gaze and who is the focus of someone's attention may result from coordination between the two hemispheres through interhemispheric connections, in the posterior sections of the corpus callosum, for example. According to Hellige (1993) and Kinsbourne (1970, 1982, 2003), the two hemispheres mutual inhibit each other through callosal connections, resulting in a functional equilibrium. The activation of one hemisphere during a specific process would inhibit the other hemisphere in order to reduce interference (Kinsbourne, 1975), but in a way that also allows the level of asymmetric activations to be coordinated and regulated (Banich & Karol, 1992). It is possible that right hemisphere processes relating to direct gaze detection are rendered more efficient through the inhibition of left hemisphere averted gaze detection processes. To assess the hypothesis of hemispheric interaction and coordination in gaze detection and perception, a cueing paradigm could be used in which the cues would be chimeric facial stimuli. For example, participants could be required to fixate a central fixation sign and quickly detect a dot target presented on a computer monitor. The target could be preceded by a tachistoscopically presented chimera composed of a half-face with direct gaze and another half-face with averted gaze. The midline of the chimeric face would coincide with the fixation sign so that each half-face would be preferentially projected to each hemisphere. The target dot would be placed either at the location of the directed gaze or the averted gaze. Overall responses would be expected to be faster for targets appearing at the location of direct gaze (compared with averted gaze). If the hemispheric asymmetry hypothesis is correct, i.e., direct gaze is preferentially processed by the right hemisphere and averted gaze by the left hemisphere, then faster responses are to be expected for targets appearing in the left visual hemifield where a direct gaze is presented, and for targets appearing in the right visual hemifield where an averted gaze is presented compared with the opposite hemifield-gaze direction combinations. However, this paradigm also would provide information on whether the detection of direct gaze is achieved through inhibition exerted by the right hemisphere on the left hemisphere. Indeed, if the right hemisphere is stimulated through a direct gaze (appearing in the left hemifield) but the dot-target is in the left hemisphere (right hemifield), then responses would be much slower than the opposite combination, even if slower responses are expected in both scenarios due to the attraction of attention toward direct gaze and away from the dot target. Similar paradigms would provide interesting information on hemispheric processes.

Disturbance of gaze perception and its impact on social cognition and socially adaptive behavior

The detection of the direction of gaze is a useful signal since it directly provides information on the person’s interest and the presence of possible dangers nearby (Emery, 2000). What one looks at also provides clues to the person’s inner state, such as what they know and what they want (Emery, 2000; Kendon, 1967). Given the importance of gaze direction detection and perception for social interaction, it is expected that any disturbance of the ability to detect or even follow another person’s gaze is likely to impair social cognition and behavior, and could also lead to the misinterpretation of intentions and mental states (Cañadas & Lupiáñez, 2012). Several disorders are associated with abnormal gaze perception and eye contact. Here, we will discuss autism spectrum disorder, schizophrenia, social anxiety disorder, and finally 22q11.2 deletion syndrome. This selection is based on the fact that these are disorders associated with impaired social interaction linked to gaze detection.

Gaze direction perception in Autism Spectrum Disorder

Autism Spectrum Disorder (ASD) is probably the best-known disorder in which abnormal eye contact is one of the main features (American Psychiatric Association, 2013). This is a neurodevelopmental condition which generates severe deficiencies in communication and reciprocal social interaction. Research has shown that, when looking at their environment, children with ASD tend to look downward and explore the environment with a wider lateral field of view than children with typical development (Noris et al., 2012). The same pattern of visual exploration for objects is observed in ASD and in controls (Mottron et al., 2007), and people with ASD are not different from controls in their ability to recognize objects such as cars and homes (López et al., 2004; Wallace et al., 2008; Wolf et al., 2008). Yet they exhibit different behavioral responses when they detect a direct gaze (Elsabbagh et al., 2012; Forgeot d’Arc et al., 2017; Jones & Klin, 2013; Pantelis & Kennedy, 2017; for a review, see Senju & Johnson, 2009b). More precisely, they direct their gaze towards that of others less spontaneously (Leekam et al., 1997) and have a tendency to look at the mouth region more than the eye region (Klin et al., 2002; Pelphrey et al., 2002). Various interpretations have been put forward to explain this difference. The hypothesis of passive insensitivity to direct gaze (Helminen et al., 2017; Lauttia et al., 2019; Moriuchi et al., 2017) was proposed after analyzing exploration patterns using eye tracking techniques with young children with ASD. It seems that when the task explicitly asks someone to look at someone else’s eyes, children with ASD do not look further away from the eyes, and behave as typically developing children (Moriuchi et al., 2017). It is likely that children with ASD are not automatically attracted to the eyes as a salient stimulus and thus may subsequently show less interest in the gaze. Orienting to an eye stimulus is a controlled rather than an automatic process in ASD. On the other hand, a second hypothesis assumes that individuals with ASD exhibit active eye avoidance (Klin et al., 2003; Klin et al., 2009; Tanaka & Sung, 2016). This hypothesis suggests that the eyes are an emotionally charged region of the face that trigger an immediate visceral response and an increase in amygdala activity in individuals with ASD. They therefore avoid eye contact and focus on the external features (clothing, hair, hands) or other features and regions of the face (mouth, chin) as an adaptive and compensatory strategy. According to Tanaka and Sung (2016), this approach protects individuals with ASD from the discomfort and threat of the eyes. Thus, there is a fundamental difference between the two theories as the latter considers abnormal gaze detection and eye contact to be a top-down strategy, while the former sees them as a deficit in the (visual or social) salience of the eyes resulting in poor attraction of attention. Baron-Cohen(2000) considers that the ability to decode the direction of gaze may be intact in ASD but that difficulties arise from incorrect inference of the meaning of the eyes as a stimulus. This reasoning is slightly different reasoning from the other two as it places the locus of dysfunction beyond the perceptual processing of the eyes, just before the adoption of an active strategy to avoid eye contact. Recent research indicates that the key processes involved in coding gaze direction, namely the adaptation of neuronal responses to low-level perceptual processing of sensory information (e.g., the eyes), are intact in adults with ASD (Palmer, Lawson, et al., 2018b). In a study about divisive normalization in early visual processing, Rosenberg et al. (2015) demonstrated a reduction in sensory responses in the primary visual cortex. This observation could, in our opinion, for example be at the origin of a reduction in the representation of the distribution of luminance and consequently lead to an alteration of the coding of the direction of gaze in people with ASD.

An adaptation of the cueing paradigm (Posner, 1980) could help to disentangle the two hypotheses (salience hypothesis/active avoidance hypothesis). Participants could be required to quickly detect a target dot appearing close to one corner of an imaginary square centered at the point of fixation. This target could be preceded by the brief appearance, at the same location, of a face with direct gaze (i.e., the cue). The critical manipulation would concern the location of the target dot in relation the eyes: very close to the eyes, somewhere else on the face, or outside the face. Another cue stimulus (e.g., a house) could replace the face in the control condition in order to ensure that the observed effects are face-specific. The response times for participants taken from the general population would be expected to be shorter for targets near the eyes as opposed to elsewhere. No variation in performance would be expected in the case of a nonfacial cue. This pattern would demonstrate preferential orienting to the eyes. However, if the salience hypothesis is correct, then the responses would not be expected to differ according to the location of the target, but also according to the cue type. People with ASD might be expected to exhibit slower response times than typically developing people for targets close to the eyes. Conversely, if the active avoidance hypothesis is correct, then the response pattern of ASD patients would be inverted compared with the controls. Response times to detect the target will be faster when it is moved away from the eye area. Again, no variation in performance would be expected in the case of a nonfacial cue.

From a neural point of view, based on fMRI evidence, Senju and Johnson (2009b) hypothesized that the detection and perception of direct gaze modulates the activation of the brain's social network. In fact, they suggest that atypical eye contact processing in ASD originates in the lack of influence of a subcortical face and eye contact detection route, which is hypothesized to modulate eye contact processing and guide its emergent specialization during development. Infants at high risk of ASD would exhibit atypical brain responses, which would suggest atypical modulation and/or synchronization of neural activities in response to perceived direct gaze. Anatomo-functional anomalies in the STS could constitute the first step in a cascade of neural dysfunctions underlying abnormal detection of gaze in ASD (Saitovitch et al., 2012). The use of the aforementioned cueing paradigm under fMRI could help validate this hypothesis.

Gaze direction perception in schizophrenia

Another condition in which abnormal eye contact is studied is schizophrenia. The literature is inconsistent as far as the perception of the direction of gaze in schizophrenia is concerned. Some authors report no deficit (Franck et al., 1998, 2002; Kohler et al., 2008), while others do (Hooker & Park, 2005; Rosse et al., 1994; Tso et al., 2012; Tso et al., 2014). These discrepancies seem to be due to the variety of the methodologies used. Some research suggests that a bias in attribution of the gaze towards oneself is observed in schizophrenia. For instance, Tso et al. (2012) have shown that people with schizophrenia are more likely to say that other people are looking at them when they are not. They perceive ambiguous looks as being directed toward them, which may imply feelings of persecution. This bias increases with the severity of negative symptoms (Tso et al., 2012). Patients with paranoid symptomatology seem to assess the reliability of the person they are looking at differently depending on the direction of their gaze (Abbott et al., 2018). Other studies have shown no impairment in gaze perception in schizophrenia using a gaze cueing paradigm (Franck et al., 1998; Seymour et al., 2017). Seymour et al. (2017) suggested that the tendency to misjudge the direction of the gaze was due to the effects of the instructions for the task. In studies by Franck et al. (1998, 2002), participants were asked to decide whether the gaze was directed to the right or to the left. They also found that patients were slower when asked to judge whether the displayed face was looking at them or not. For instance, an instruction, such as "Is this person looking at you?" encourages them to judge the direction of gaze of others (Franck et al., 1998, 2002; Seymour et al., 2016), and may introduce response biases. Recently, it has been shown that gaze direction processing is intact in schizophrenia despite reduced performance in a theory of mind task (Palmer, Caruana, et al., 2018a). These results support the idea of Seymour et al. (2017) that people with schizophrenia have intact gaze direction perception and that the contradictory results are actually due to response biases produced by the tasks and instructions. In other words, these studies provide evidence in favor of a specific self-referential bias when the gaze is ambiguous, but no deficit when participants are requested to determine the direction of the gaze (for a review, see Bortolon et al., 2015). Observing a specific response bias when the gaze is ambiguous may suggest that this bias is the result of atypical functioning at the early processing levels, such as those involved in the processing of the contrast intrinsic to the eye (sclera vs. pupil). On the other hand, according to signal detection theory (Macmillan & Creelman, 2004), response biases may result from a different decision threshold relating to the direction of gaze, rather than from a change in lower-level sensory and perceptual processes. The discrepant results on gaze direction detection in schizophrenia found in the literature may therefore reflect differences in decision thresholds in people with schizophrenia caused by task instructions and/or their symptoms. For instance, patients with more severe negative symptoms (Tso et al., 2012) and signs of paranoia (Abbott et al., 2018) may be more likely to respond that someone is looking at them as a result of their symptomatology.

From a neural perspective, one study investigated the neural correlates of gaze perception in schizophrenia reported a decrease in activation in the frontal (e.g., bilateral inferior frontal), temporal (e.g., right tonsil, parahippocampal gyrus, bilateral spindle), occipito-parietal (e.g., bilateral occipital gyri), and subcortical areas in the patients studied (Kohler et al., 2008). Interestingly, the controls exhibited increased activation in the frontal and temporal regions with increased task difficulty, while the patients exhibited increased activation of the temporal regions, including the superior temporal gyrus. The superior temporal gyrus is part of the ventral frontal-parietal network that is believed to be responsible for interrupting current processing and redirecting attention to a new focus (Corbetta et al., 2008). This has been linked to gaze processing, which may indicate heightened awareness of the gaze of others (Nummenmaa & Calder, 2009). Considering that patients have more difficulty determining the self-referential nature of gaze (Tso et al., 2012), it is possible that increased activation of the superior temporal gyrus for direct gaze in patients with schizophrenia may indicate increased awareness that another person is looking at them. Furthermore, increased activity in the pSTS was found when patients with schizophrenia looked at faces with neutral facial expressions. Yan et al. (2020) assumed that any hyperfunctioning of the right pSTS would result in an increased tendency to perceive neutral social stimuli as emotionally salient, or with intentions. We therefore suggest that increased activation of the pSTS may lead to an altered perception of other people’s gaze direction and an altered interpretation of their intentions.

Gaze direction perception in Social Anxiety Disorder

Relatively little attention has been paid to biases in gaze detection and perception in people with social anxiety disorder (SAD). This is particularly surprising, because SAD is related to intense feelings of being stared at by others, as well as to avoidance and fear of eye contact during social interactions (Schneier et al., 2011). The feeling of being stared at is even greater in SAD when faces are neutral or express a negative emotion (Horley et al., 2003, 2004). In addition, increased physiological arousal in SAD has been found for direct gaze versus averted gaze, suggesting that mutual gaze may be perceived as threatening (Baker & Edelmann, 2002). Studies using eye-tracking technology have supported these findings by demonstrating a smaller number of fixations and shorter fixation times on the eye region of faces with emotional expressions in people suffering from social phobia (Horley et al., 2003, 2004; Moukheiber et al., 2010). Other studies have shown that people with high levels of social anxiety fixate the eye region for a longer period of time, regardless of the direction of gaze, when the face has a neutral expression (Wieser et al., 2009). The visual exploration of the eye region of faces thus differs according to the expression of emotion on the face. Moreover, high social anxiety was associated with a wider cone of direct gaze (i.e., concept used to measure mutual gaze perception) across emotions in males (Jun et al., 2013; Schulze et al., 2013). Furthermore, Wieser et al. (2009) found that socially anxious and nonanxious women considered the averted gaze of neutral faces to be more unpleasant than a direct gaze, and averted gazes therefore may produce a reaction of motivational avoidance in people with SAD (Hietanen et al., 2008). This may be linked to the idea that looking away signals disinterest (Itier & Batty, 2009). These different results can be interpreted in light of cognitive models. The "eyes" stimulus is probably processed simultaneously via(i) an analysis that allows people suffering from SAD to extract this stimulus from the face, and (ii) another analysis allowing the extraction of perceptual information specific to the eyes (darker region/white region). In view of the aforementioned studies (Horley et al., 2003; Schneier et al., 2011; Wieser et al., 2009), it would seem that these information processing steps are not impaired in people suffering from social anxiety, even if no direct evidence of this has been found to date. On the other hand, the physiological reactions produced by the perception of gaze appear to be altered in SAD since averted gaze produces an avoidance reaction significantly different from that of individuals without SAD (Hietanen et al., 2008; Wieser et al., 2009). Future research should determine the locus of disturbance in the processing of gaze direction in SAD and related disorders.

From a neural point of view, Schneier et al. (2009) reported for the first time differences in neural activity associated with gaze behaviors in SAD. Their findings support the hypotheses of preferential activation of fear circuitry structures, such as the amygdala and insula, associated frontal regions (rostral anterior cingulate and medial prefrontal cortex), and core areas of visual face processing (e.g., the FG) in SAD in response to direct gaze. Eye-tracking findings did not differ significantly between groups in this study, but the direction of the nonsignificant differences was consistent with the hypothesis that SAD patients show greater gaze aversion in response to direct vs averted gaze. This is consistent with prior findings that SAD subjects avoid viewing the eye region (Horley et al., 2003). This suggests that, in SAD, the direction of gaze is directly related to fear responses and may simply constitute a sign of danger.

Gaze direction perception in 22q11.2 Deletion Syndrome

Features of the autism spectrum are found in several pathogenic copy number variations, such as 22q11.2 deletion syndrome (22q11.2DS) (Vorstman et al., 2013). 22q11.2DS, one of the most common genetic syndromes (1/2000-1/4000 births), is one of the most robust genetic risk factors for schizophrenia (1-2% of cases). It has been suggested that the high prevalence of autistic behaviors in children with 22q11.2 deletions should not be viewed as ASD, but rather as prodromal symptoms preceding the onset of schizophrenia (Eliez, 2007; Karayiorgou et al., 2010; Van et al., 2017; Vorstman et al., 2006). Approximately 30% of patients with 22q11.2DS develop psychotic symptoms in adolescence or early adulthood (Monks et al., 2014). Anxiety disorders also are frequent (35% of children and 27% of adults), often expressed in the form of simple phobias (fear of the dark, fear of animals, fear of thunderstorms, etc.) or a social phobia (Philip & Bassett, 2011). Difficulties in social relationships with peers are a common complaint among school-aged children with 22q11.2DS. It is well established that children and adults with 22q11.2DS have poorer social skills compared with typically developing young people, particularly in recognizing emotions on faces and in voices, as well as understanding the emotions involved in scenes depicting social interactions (Campbell et al., 2015; Leleu et al., 2016; Peyroux et al., 2020). Underlying impairments in social cognitive processes might be partially responsible for these social dysfunctions (for a review, see Norkett et al., 2017) and could be linked to a diagnosis of psychosis (Jalbrzikowski et al., 2012; Morel et al., 2018). Interestingly, although some authors suggest that the overall exploration pattern of a face is abnormal in 22q11.2DS, an analysis of error patterns has shown that 22q11.2DS patients more frequently mistake sadness for fear, surprise for happiness, and disgust for surprise (Peyroux et al., 2020). The authors suggest this confusion is related to visual details of the faces. For instance, the configuration of the eyes and eyebrows may explain why sadness is mistaken for fear and disgust for surprise. This analysis provides some indirect clues as to how 22q11.2DS patients process the region of the eyes. To our knowledge, however, there are no data in the literature on gaze direction detection and perception in 22q11.2DS. From a clinical point of view, it seems that very early on in their development, certain children carrying 22q11.2DS exhibit atypical behavioral responses to the detection of the gaze of others, such as eye contact avoidance. They also sometimes have the impression that they are being looked at when this is not the case, just like people with paranoid traits. There is, therefore, a pressing need to better investigate and understand the specific pattern of the gaze direction detection in this pathology and how it influences social cognition and behavior. One possibility is that the paranoid symptomatology (Schneider et al., 2017) and social phobia (Philip & Bassett, 2011) observed in 22q11.2DS are at least partly due to misperception or misinterpretation of gaze direction. To our knowledge, from a neuronal point of view, there is very little specific literature.

Conclusion and future directions

In this narrative review, we first focused on the detection and perception of gaze direction by combining the associated behavioral and neural responses. Certain disturbances in which difficulties in gaze detection are found were then also reviewed, in order to better understand how the particularities of information processing might negatively impact social relations.

Several behavioral models have been proposed to explain the perception of direct or averted gaze (Baron-Cohen, 1995; Morton & Johnson, 1991; Simion & Giorgio, 2015). Although at first sight these models appear to be contradictory, combining them provides a better understanding of the various processes underlying the detection of the direction of gaze. However, there are still a number of grey areas. As far as behavior is concerned, these uncertainties concern: (a) the time course of visual information processing (eyes first or face first), and (b) the hypothesis of salience priority, i.e., if the visual contrast between facial features defines preferential orienting. As far as pathological conditions are concerned, questions remain regarding: (a) the hypothesis of passive insensitivity to gaze in people with ASD, (b) the hypothesis of active avoidance of gaze in people with ASD, as well as (c) the level of gaze treatment deficit in people with schizophrenia. These issues should be the focus of future research, and we have made some suggestions about how to investigate the hypotheses and theoretical models. Obviously other paradigms are possible.

At the neural level, different networks seem to allow both the detection of a gaze and, more precisely, the perception of direct gaze versus diverted gaze. Although the data are not certain, a specialized system seems to exist, including several hubs dedicated to different aspects of information processing. The pathway through which the amygdala is involved in the nonconscious perception of gaze remains to be explored. Unraveling whether its activation is due to feedback from the striate cortex or to input from the superior colliculus and the pulvinar could provide insights into the role of the subcortical route in the feeling of being stared at. In the same way, the role of the frontal areas should be looked at more closely. We might find that this structure does not play a prominent role in gaze detection and perception, but that it is recruited because of its role in joint attention, emotional decoding, and theory of mind (Ingvar & Franzén, 1974; Lee et al., 2004). Indeed, a link seems to have been established between the perceived direction of gaze and the neuronal processing of social decision-making(Sun et al., 2018). This could lead to a novel explanation for mistaken gaze direction (sometimes found) in schizophrenia, which is known to be a neurodevelopmental hypo-frontal condition. We could hypothesize that impaired theory of mind due to frontal dysfunction leads to the misattribution of gaze direction, rather than the misattribution of gaze perception leading to paranoid symptoms. One last grey area at the neuronal level concerns coordination between the two cerebral hemispheres. We have suggested a number of ways of testing the different hypotheses, but other paradigms could be used, of course.

Overall, we believe that we have clearly demonstrated the importance of better understanding the different stages in the processing of information related to gaze detection and perception. Future research should focus on the aforementioned areas of ongoing debate to better establish how the social brain develops and works, taking the detection of another person’s gaze as the starting point. An improved understanding of these processes and their development would make it possible to offer early and adapted care to the disturbances observed. Improved brain activation in facial treatment networks has already been demonstrated after specific cognitive remediation for the recognition of facial effects (Bölte et al., 2015), specifically in the eye region (Karagöz-Üzel et al., 2018). It is therefore possible that gaze direction reeducation would have an impact on the neurodevelopmental cascade of the construction of the social brain.