Enhanced detection of gaze toward an object: Sociocognitive influences on visual search

Another person’s gaze direction is a rich source of social information, especially eyes gazing toward prominent or relevant objects. To guide attention to these important stimuli, visual search mechanisms may incorporate sophisticated coding of eye-gaze and its spatial relationship to other objects. Alternatively, any guidance might reflect the action of simple perceptual ‘templates’ tuned to visual features of socially relevant objects, or intrinsic salience of direct-gazing eyes for human vision. Previous findings that direct gaze (toward oneself) is prioritised over averted gaze do not distinguish between these accounts. To resolve this issue, we compared search for eyes gazing toward a prominent object versus gazing away, finding more efficient search for eyes ‘gazing toward’ the object. This effect was most clearly seen in target-present trials when gaze was task-relevant. Visual search mechanisms appear to specify gazer-object relations, a computational building-block of theory of mind.


Introduction
The human eye's marked dark-iris, light-sclera morphology (Kobayashi & Kohshima, 1997) offers a salient and important social signal (e.g., Cañigueral & Hamilton, 2019;Senju & Johnson, 2009). Perception of another's eye gaze activates a large-scale social-cognition network in the human brain (e.g. Carlin & Calder, 2013;McCrackin & Itier, 2019), and is considered a foundation for social development (e.g., Baron-Cohen, 1994;Baron-Cohen, 2005;Charman et al., 2000;Tomasello, Carpenter, Call, Behne, & Moll, 2005). Given their crucial importance, one would expect visual attention to be drawn toward eyes, when they are present. Not all eyes need be of equal priority, however. Those gazing toward us ('direct gaze') or at other relevant objects and events may be particularly informative and so prioritised over 'other gaze' during search.
As these two alternative possibilities attest, human vision could prioritise direct gaze without coding gaze as gaze. Because direct gaze eyes form a very particular stimulus set with perceptual features that are distinct from averted gaze, it is difficult to disentangle which of the above mechanisms is operating in multiple-face arrays. Here, to sidestep these difficulties of interpretation, we instead examined search for eyes gazing toward a prominent object versus gazing away. Crucially, the eyes and faces in these two conditions were, across trials, identical (though of course different in each display), differing only in terms of the spatial relationship between the eyes' gaze-direction and the object's location. Accordingly, any enhanced search for eyes gazing toward an object (relative to eyes gazing away) could not reflect local physical characteristics or perceptual templates. These new conditions provide a previously unexploited opportunity to study sociocognitive processes involved in attentional guidance during search, helping to circumvent the difficulties of interpretation for other social stimuli (highlighted by Vestner, Gray, & Cook, 2020).
Experiment 1: Influence of gazing at versus away when task-irrelevant

Methods
The basic task in Experiment 1 was to detect the presence of target eyes in search displays, on the basis that they gazed in the opposite direction to the other eyes. Our manipulation of interest, whether the target eyes gazed toward the prominent object (hereafter, 'Congruent Gaze') or away ('Incongruent Gaze'), was irrelevant for the observer. One concern with this design was that observers might opt to use a 'singleton detection mode' (e.g., Bacon & Egeth, 1994), attempting to detect physical discontinuities in the display rather than processing gaze. Accordingly, prior to 75% of search displays, a cue signalled to the observer which gaze direction (left or right) the target would have, to encourage them to search on the basis of gaze. The remaining 25% of trials comprised no cue. The Cued trials were designed to be more frequent to maximise the likelihood that observers would use the cue when it arose.

Observers and sample size
For Experiment 1, we estimated that 24 observers should suffice to detect medium-large effects (Cohen's f = 0.33/Cohen's d= 0.65) of interest in a repeated-measures ANOVA with one group and two measurements (G*Power 3.0 software; Faul, Erdfelder, Lang, & Buchner, 2007). Twenty-four university students (m = 18, f = 6, ages 18-35 years) were recruited from posters and an online volunteer recruitment system and paid £7 for participating. The study was approved by the University of Cambridge Psychology Research Ethics Committee.

Apparatus and stimuli
Observers sat 70 cm from a 24-in. Dell monitor (model number SE2416H, screen resolution 1,920 × 1,080), and made responses using a standard USB keyboard. Stimuli were presented using E-Prime 2.0 software (Psychology Software Tools Inc., 2013). Search displays were arrays of oval-cropped, forward-facing faces looking either to the left or to the right of the observer. To create the images, eyes from left-/right-gazing images were pasted onto the same closed-eyes image, so that only the eyes differed between the two (Fig. 1b). A Statue of Liberty image (hereafter 'SoL', downloaded from an open-source image database) was converted to grayscale and formed the prominent object toward which, or away from which, eyes might gaze in a display.
Each individual face (279 × 370 pixels) subtended a visual angle of approximately 6°× 8°retinal angle. Faces were arranged in seven fixed positions spaced evenly on an imaginary circle (of 258-mm diameter) centred on fixation. In Set Size 7 displays, there was one face at each position. Set Size 3 displays were equally and unpredictably assigned to three configurations to prevent clustering in any one region of the display (see Fig. 2d and legend). Within each Set Size 3 configuration, and at Set Size 7, the target appeared equiprobably and unpredictably at any of the item locations. The Cue, on Cued Trials (see leftmost panel, Fig. 1a), was an enlarged snapshot of the eye region from the left or right gazing faces (105 × 96 pixels), subtending an angle of approximately 2°square. An SoL image (329 × 839 pixels), subtending a visual angle of approximately 7°× 18°, was placed either to the left or to the right of each search display (Fig. 1a, second and third panels). Figure 1a schematises a typical display sequence in trials from Experiment 1. Each trial presented a fixation cross (1,500 ms), then a 'placeholder' display of three or seven faces with eyes closed (250 ms) followed by a search display of the same faces with eyes opened, lending a naturalistic impression of eyes opening. The presence of placeholder stimuli served to minimise distracting effects of non-eye facial features by cueing observers to the eye regions of the faces. At the same time that the placeholder stimuli appeared, the SoL image also appeared on screen, either to the right or left of the faces. A tall object was chosen such that faces at any position on the screen might conceivably be looking at the object. Observers were instructed to find the odd-one-out, unique target gaze in the display (present, unpredictably, on half of trials) when the eyes opened and make one of two keyboard responses as quickly as possible to indicate whether a target was present or not. Within each block, every combination of Set Size (Three, Seven), Target Presence (Present, Absent), and Target Gaze (Congruent, Incongruent) were equally represented, 75% of trials comprising a cue. Targets were either left-gazing eyes among right-gazing distracters or vice versa.

Procedure
As shown in Fig. 1a, on 75% of trials, a pictorial target cue signalling the gaze of the unique target face in the subsequent search display informed observers as to which direction (left or right) the unique target eyes would gaze (no cue on remaining 25%). Orthogonally to this factor, Gaze Congruence (whether the target, if present, gazed toward the SoL or not) was manipulated: whether the target eyes gazed toward the SoL (Congruent Target Gaze) and the distracters gazed away) or vice versa (Incongruent Target Gaze). This congruency in Experiment 1 was entirely task-irrelevantobservers were only instructed to detect the unique gaze in each display. The search task began with a practice block (12 cued trials, three trials of each unique combination, and four uncued trials, one trial of each unique combination).
Observers received feedback on their responses ("Correct!" = correct response, "----" = incorrect response). The main experimental trials followed, with no feedback, and were presented as five blocks of 64 trials, with each block having the same ratio of Cued to Uncued trials. Each block was followed by a 10-s break.
We predicted that search would be more efficient for congruent gaze eyes (looking toward the prominent SoL image) than incongruent gaze ones (looking away from the SoL). This prediction should be expressed as smaller search 'slopes' (RT increments from Set Size 3 to 7) for Target-Present trials only, as in Target-Absent trials all faces would have the same gaze so would compete equally for attention. Target-Absent trials were also analysed to confirm that any effect in Target-Present trials was not negated by an opposing effect in Target-Absent trials, which would point to a difference of response bias, rather than search efficiency. We made no strong predictions with respect to whether the effects would be stronger in Cued than Uncued Trialsthis manipulation served only to enhance the potential to observe a gaze-congruence effect.

Results and discussion
For each observer, response-time (RT) data for accurate responses were trimmed to exclude any RTs ± 3 standard deviations (SDs) for Cued and Uncued trials separately, for each combination of Target Presence, Congruence, and Set Size (our standard, data treatment as described in Ramamoorthy et al., 2019). One observer's data were excluded as their RTs exceeded ± 3 SD from sample mean; another's data were lost due to a technical error during acquisition. Figure 2 plots mean RT slopes for Congruent and Incongruent trials, separately for Target-Present (left two panels) and Target-Absent (right two panels) trials. While search slopes (increases in RT from Set Size 3 to 7) appeared similar for Congruent and Incongruent trials in the Target-Absent condition, the same could not be said for the Target-Present condition. Instead, there appeared to be no effect of Congruence for Cued trials, but the predicted effect of Congruence for Uncued trials (shallower search slopes for Congruent than Incongruent trials).
For brevity, as our terms-of-interest all involved search efficiency, we simplified the analyses described here by calculating search slopes for each condition and observer (RT for Set Size 7 minus RT for Set Size 3),

p=0.02
Response Time Slopes in Experiment 1 Fig. 2 Response-time slopes for Congruent ('Con') versus Incongruent ('Inc') Gaze target trials in Cued conditions (dark grey circles) and Uncued conditions (light grey squares) of Experiment 1. Each point is an individual observer's slope for each condition using this as the dependent variable for ANOVA. 1 Note, this choice did not impact any of our conclusions; mean RTs and accuracy for each condition are detailed in ]. This interaction reflected shallower search slopes (more efficient search) for Congruent than Incongruent Gaze targets in Uncued trials (t(21)= -2.518, p= .020) that were reduced or absent for Cued trials (t(21)= 1.051, p= .305). That is, the effect of Congruence was as predicted in Target-Present trials overall, but not with respect to its small size or absence in Cued, Target-Present Trials. Corresponding analyses for Target-Absent trials revealed no main effect or interaction (max F = 0.4, n.s.). This was consistent with the effect in Target-Present trials reflecting biased attentional guidance toward Congruent Gaze targets when they were present, rather than faster serial rejection of (congruent or incongruent) distractersthe latter would be typically associated with larger absolute effects of Congruence on RTs in Target-Absent trials than Target-Present ones.
We also investigated the three-way interaction by splitting the analysis into Cued and Uncued trials. This alternative analysis was perhaps less powerful, given the greater variability of Target-Absent search slopes, but perhaps more sensitive to effects common to both Target Experiment 2: Influence of task-relevant gaze congruence

Method
The aim here was to more clearly examine the congruence effects observed in Experiment 1. Accordingly, Experiment 2 largely replicated the conditions of Experiment 1, but with two important exceptions. First, each observer now only experienced Cued or Uncued trials, yielding a completely independent assessment of Congruence effects for each Cue condition individually (see below, for details). Second, to maximise the opportunity to see Target Gaze Congruence effects (target gazing toward the SoL, vs. gazing away), this feature was now task-relevant : that is, the side on which the SoL was presented was 100% informative as to which side the target would gaze in any block of trials (half of the blocks of trials comprised only Congruent Gaze targets, the remainder, only Incongruent Gaze ones). Based on results from Experiment 1, our single important prediction was an effect of congruence for the Uncued Condition. We now made no strong predictions for the Cued Condition, which served only to clarify whether congruence effects could also be observed with a prior cue.

Observers and sample size
For the two conditions of Experiment 2, the same calculations as for Experiment 1 were applied (24 observers per study). Note these conditions were run consecutively (not randomly assigned in advance). Accordingly, each condition is technically a separate experiment. Here, for brevity and clarity, they are described together to parallel the description of Experiment 1. The total sample (48 observers, m = 29, f = 19) was estimated to be sufficient for cross-study within-between interactions and between-observers main effects (Cohen's f = 0.33, two groups, four measures, power = 0.8).

Apparatus, stimuli and procedure
Cued and Uncued conditions replicated those of Experiment 1, with the following exceptions. First, trials were now divided into two blocks of 120 trials each, with each block of only one Congruence type (run order for Congruent and Incongruent blocks counterbalanced across observers). The side on which the SoL appeared (only 250 ms prior to the search display, along with placeholder faces) provided 100% valid cueing as to where the unique target gaze would be directed, thus making Target Gaze Congruence task-relevant. Second, while the Cued Condition supplemented this information with a 100% valid prior cue as to the direction of target gaze (left or right, as in Cued trials of Experiment 1), the Uncued Condition did not (as in Experiment 1).

Results and discussion
One observer had to be removed from each of the Conditions, as even post trim their RTs exceeded ± 3 SD from sample mean. Figure 3 plots RTs for Cued and Uncued conditions in the same format as Fig. 2. As for Experiment 1, analyses were conducted on RT search slopes. 2 An omnibus three-way ANOVA (factors of Congruence and Target Presence as for Experiment 1, Cue now between-observers) revealed standard effects of Target Presence [F (1, 44) = 111.669, p < .001, η 2 p = .717], Congruence [F (1, 44) = 12.087, p < .001, η 2 p = .215] and a marginal Congruence by Cue interaction [F (1, 44) = 2.914, p = .095, η 2 p = .062] but no other main effects or interactions (max F = 1.2 , n.s.). These findings provided strong evidence that search for Congruent targets is more efficient than for Incongruent ones. Evidence for any modulation of this effect by the presence of a Cue was relatively weak.
To parallel our analyses for Experiment 1, we conducted a two-way mixed ANOVA on RT slopes from Target Experiment 2 therefore provided strong confirmation of our claim here: that search for congruent gaze (eyes looking at a prominent object) is more efficient than incongruent gaze (eyes looking elsewhere). Again, contrary to our concern prior to Experiment 1 that Congruence effects might only arise in Cued trials, this effect was again most clearly evident in Uncued trials. However, while Congruence effects were unclear in Cued Trials, overall they showed the same numerical trend as for Uncued trials (p = 0.158). It therefore seems likely that the effect of adding a cue to Cued trials is not to eliminate the congruence effect, but to render it more difficult to detect; the cue likely biasing attention toward the target's gaze direction, thus obscuring any subtler effects of gaze-congruence.

General discussion
The current experiments were motivated by previous work on the 'Stare in the Crowd' effect (SITCE): more efficient search for direct gaze (towards the observer) than for averted gaze. That effect does not provide clear evidence of social-cognition influences on search as it might, instead, reflect intrinsic salience of direct-gazing eyes or a specific perceptual-template tuned to direct gaze. Here, we found that search for congruent gaze (toward an object) is more efficient than for incongruent gaze (away from p=0.01 Response Time Slopes in Experiment 2 it). In this novel comparison, face and eye stimuli in these two conditions were identical (across trials), precluding explanation in terms of local templates or intrinsic salience. Congruent and Incongruent gaze differed only in their spatial relationship to a prominent object, so the observed advantage for Congruent Gaze must have reflected this.
One possibility is that the congruent gaze advantage reflects processes in search that explicitly code gaze as gaze, perhaps the same processes that determine conscious perception of gaze when observers give unhurried attentional scrutiny to one face. In such a case, the congruent gaze advantage might be shaped by the high-level, theory-of-mind-related processes identified in non-search tasks (Hamilton, 2016;Teufel, Fletcher, & Davis, 2010). However, such rich processing is not typical of the types of representations found to guide visual search (Wolfe & Horowitz, 2017). Neither is it required to explain our results. At a minimum, the process identified here need only code eyegaze images as 'pointing' left or right, plus the spatial relationship of eye-gaze and object. Accordingly, therefore, we speculate that the social or 'protosocial' processes underpinning the gazecongruence effect will be specific to social stimuli yet lack the theory-of-mind-related influences established for individually attended faces.
Another high-profile recent claim that social processes can influence search has been based on detection of social dyads: silhouette figures facing each other or away (Vestner et al., 2020). That study suggested that more efficient search for human figures facing toward than away from each other in those displays may be ascribed to alternative explanations in terms of stimulus confounds arising because the two interacting figures in those experiments are side by side. Here, by contrast, the gaze-congruence effect reflects a relationship between target eyes and an object that typically were distant from one anotherthose same concerns cannot apply.
Finally, a challenging question facing any study of attention-guidance in search is whether shallower searchslopes are better explained by faster serial selection and rejection of items in a display or by noisy, inefficient, yet parallel guidance. As Wolfe and Horowitz (2017) note, previous findings with regard to potential higher-level influences in search (in that case, for facial expressions) are often amenable to explanation in terms of self-terminating serial search models with no parallel guidance component. These latter models would predict larger absolute effects on target-absent trials than target-present trials. Here, in contrast, our gazecongruence effect was only clearly evident in target-present trials: inconsistent with that model and exactly as predicted by attention-guidance views. While parallel and serial models can, in principle, both be extended and contorted to accommodate any new finding, our results are more readily explained as gaze-congruence influences on attention guidance. On this basis, the current results provide much clearer evidence for social or proto-social guidance of attention.
One unresolved issue in the current experiments is whether a prior cue (in Cued trials) suppresses the gaze congruence effect or merely obscures it. In Experiment 2, there was a strong overall effect of congruence, but still not clearly evident for Cued trials considered in isolation. We speculate that the cue in Cued trials likely induces an attentional bias toward the target's eye gaze and that this obscures expression of gaze congruence effects. Future research should unpick the particular contributions of these two types of effects on search. However, this does not impact our central claim here, regarding the existence of congruence effects.
In summary, these findings provide the clearest evidence to date that social processes influence visual search for gaze. Our findings cannot be accounted for by local salience, perceptualtemplates or other stimulus confounds. Whether the gaze coding that drives search efficiency is as sophisticated as that which attends our perception and cognition of single, attended faces remains to be seen. However, gaze-congruence effects demand explanation in terms of spatial-relationships between gazing eyes and objects in search: a much richer process than has previously been demonstrated in search for gaze.
Data Availability The data and materials for the experiments reported here will be made available upon request.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.