Social interactions are an important part of everyday life. Specific interactions may or may not be meaningful in and of themselves, but cumulatively can lead to interpretations that influence how we think about others and ourselves (Sedikides & Strube, 1997; Taylor & Brown, 1988). What does it mean to talk to a group of people, and someone looks away? In social situations, there is an abundance of relevant information (e.g., eye movements, body posture) that can lead to specific interpretations about what a behavior during an interaction means, whether a person likes you, dislikes you, is interested, or not interested. A large body of research has investigated the links between social information and visual attention, highlighting the many ways in which social cues may be related to the orienting of visual attention (Atkinson et al., 2018). Moreover, recent research has demonstrated that positive social cues are quickly learned and may be associated with the rewards to such cues in real life (Vernetti et al., 2017). There is a broad theoretical basis underpinning the link between visual attention and specific social cues (Atkinson et al., 2018) and a range of recent experimental research has sought to provide empirical support in increasingly nuanced ways to capture the complexity of attention and social interactions in everyday life.

In the context of a conversation, a large body of work has focused on the role of gaze location and attention during conversation (Frischen et al., 2007). While valuable, prior work has focused on 2D computer-based displays of behaviors (including attentional allocation during public speaking—e.g., Chen et al., 2016). More recent work has shifted towards strategies for assessing attention using gaze in the real world. Hadley et al. (2019) tracked gaze with wearable eye trackers during dyadic conversations in order to examine the influence of gaze behavior in other individuals on the participant’s attention. They found that, depending on ambient noise, participant gaze shifted from eyes (less noise) to mouth (more noise). Another study suggests that turn-taking in conversation may be a more potent determinant of gaze than cues like eye movements (Hessels et al., 2019). Taken together, these studies conducted in naturalistic contexts suggest that where someone looks during a conversation is quite complicated and highly determined by context, although the other person’s eyes do provide some useful information.

While gaze behavior during dyadic conversation is one salient feature of social information, there are many potential interactions to consider. For instance, Hessels et al. (2020) investigated this question in the context of walk-by interactions. Findings from this study indicate that the action of the person being passed is an important determinant of where participants looked (e.g., if passing by someone handing out a leaflet, most time is spent fixating on the hand with the leaflet). While intuitive, these results highlight the importance of considering the influence of social cues on gaze behavior in a variety of contexts. Another study examined the influence group size (dyads vs. groups of five) on gaze behaviors, finding that there was less gaze to others when the group size was larger (Maran et al., 2020). Together, these results show that the nuanced contextual factors available using real-world eye-tracking methods. However, the question remains how more fine-grained behaviors (e.g., facial expressions, nodding, looking away) might reflect social engagement, and in turn influence attentional processes. Analyses of interactions have typically relied on aggregated measures of gaze (Hessels, 2020), whereas it seems likely that a particular gaze change is contingent on temporally specific events that aggregated approaches do not capture. The question is complex, as it will depend not only on the social relevance and concomitant subjective value of the information but also on the detectability of the event as well as its ability to capture attention if it falls in the peripheral retina (Tatler et al., 2011).

One context in which this issue has been addressed is public speaking, which is, for many, a stressful or anxiety-provoking experience. There is a large body of work to indicate that social anxiety influences attention (Bar-Haim et al., 2007), although precisely how this plays out remains an area of active investigation. Specifically, there is the question of whether individuals are attracted by (vigilant to) certain audience behaviors (like looking away) or whether they are repelled by (avoidant of) other behaviors (like frowning). Models suggest competing (or perhaps complementary) claims that individuals experiencing greater anxiety may tend to look for confirmation that they are doing poorly, which would suggest that an audience member looking away would be particularly salient. At the same time individuals may avoid behaviors suggesting lack of interest, because the perceived cost of doing poorly is so high for the individual (so it is preferable to engage attention elsewhere). It is also possible that there is some combination of these behaviors at play, such that individuals find some types of audience behavior (looking away) particularly salient, yet other types of behavior (frowning) particularly aversive (Wong & Rapee, 2016). The intuitions for these competing hypotheses come from decades of clinical observation. However, research into this has been largely determined by the technologies available, primarily using 2D stimuli (e.g., words or faces) presented on a screen, which makes it difficult to disentangle the complex relationship between attention and naturalistic social cues.

Work by Rösler and colleagues (Rösler et al., 2021) used real-world eye tracking to suggest that social interactions may elicit physiological responsivity (increased heart rate), but did not provide evidence for an effect on gaze behavior. The authors point out that when examining the link between gaze and social behaviors in the real-world, it is extremely difficult to control event occurrence. Indeed, in the study, the authors did not provide any specific constraints on the form of the interaction (aside from it being with a single confederate in a specific location). Thus, there is room to use more standardized methods to gain greater insight into the nuanced influence of specific behaviors during social interactions. Earlier work on public speaking used 2D monitors to understand the influence of anxiety and audience member actions on gaze behavior (Chen et al., 2015; Lin et al., 2016). Research has indicated that the use of virtual reality (VR) is superior to 2D computer-based contexts for understanding the influence of social cues on gaze behaviors (Rubo & Gamer, 2020). One reason this might be is that on a computer screen, all the actions are directly within the field of view and the position of each person within the frame of the camera (and thus the amount of socially relevant information available) is not influenced by the viewer, which is very different from how attention unfolds in real-world social situations. In day-to-day life, attention is contingent not just on movements of the eyes but also head movements and body position. In VR, head movements and body position influence the field of view and therefore how gaze is distributed. While there are constraints in VR not present in real-world interactions, the benefit is that the VR environment can be fully controlled and replicated to reduce variability in the stimuli participants encounter. We recently investigated the link between anxiety and audience member actions using a 360°-video virtual reality environment and found that social anxiety was associated with avoidance of the uninterested audience members (Rubin et al., 2020). In our paradigm, audience members behaved in characteristic ways (interested/uninterested/neutral). Our previous finding was drawn from simple grouping of audience members based on their behavior in general rather than on specific actions they took. We counted fixations on audience members regardless of whether the audience member was actually engaging in some action.

In the present study, we were interested in whether actions by audience members actually acted to attract gaze. In the natural world, it is clear that much gaze behavior is driven by behavioral goals, but it is also necessary to direct gaze towards unexpected or unpredictable information in the scene, and both the instances when this occurs and the way it is controlled is an unresolved question (Hayhoe, 2017). In a walking task in a virtual environment, walkers’ gaze did not appear to be attracted to unexpected actions by virtual walkers who occasionally veered towards the walker on a collision course (Jovancevic et al., 2006), indicating that automatic “visual capture” by salient events does not always lead to gaze changes in natural environments (also see review by Luck et al., 2021). Work by Jovancevic-Misic and Hayhoe (2009) in a real walking environment revealed that subjects allocated gaze to pedestrians based on context-dependent policies that reflected the statistical regularities of the environments and the intrinsic reward value of the information, in addition to the task goals. Actions by audience members are not initially known or easily predictable, so it is of interest whether audience actions attract gaze, either by virtue of bottom-up attentional capture or as a result of a policy where subjects covertly monitor peripheral events to make attributions about audience members and/or as a way to determine how well a presentation is going.

Our new analysis also addressed a somewhat different issue—that is, do speakers actively avoidFootnote 1 or look away from audience members if they display lack of interest in the speaker’s presentation. The analyses in our previous study did not allow us to draw any direct conclusions about whether avoidant gaze behaviors were motivated by a policy of avoidance based on relatively few instances of a behavior (similar to past work on walking behavior), or if the avoidance was due to ongoing audience actions that continually elicited a strong avoidance of each action. Thus, the present paper will provide a better sense of both (1) what information in peripheral vision attracts gaze and (2) whether specific actions elicit a negative response that motivates avoidance.

Methods

Participants

We enrolled 96 participants in the study, of whom we report on data for 84. Participants were excluded because (a) they had their eyes closed throughout the public speaking task (n = 3); (b) the hardware used to capture the gaze location encountered technical difficulties which resulted in a loss of the data (n = 6); (c) they did not finish the prespeech protocol before beginning the public speaking task (n = 2); (d) they chose to withdraw from the study (n = 1). The study was approved by the University of Texas Institutional Review Board (IRB). All participants provided informed consent. Of the participants included, 46% were female (n = 39), 24% were Hispanic/Latinx (n = 20), 8% were Black or African American (n = 7), 24% were Asian (n = 20), 55% were White (n = 46), and 13% reported another race (n = 11). The average age of participants was 19.79 years (SD = 4.02; range: 18–45). Participants were students at the University of Texas who enrolled for course credit and individuals from the Austin community who enrolled on a voluntary (unpaid) basis. The study was available to undergraduate participants through SONA (a web-based study recruitment platform). Community participants were able to sign up through email. Qualtrics was used to collect self-report data (Qualtrics, Provo, UT).

360°-video virtual reality speech environment

We used a 360°-video virtual reality (VR) stimulus with five individuals seated in chairs behind a table (see Fig. 1a). Individuals in the video were graduate students in psychology at the University or volunteers from the Austin community. Each person in the video was coached to make actions that would indicate interest (e.g., leaning forward, smiling) or lack of interest (yawning, looking at their phone), with the exception of the neutral audience member who was told to act naturally (which included movements, mainly shifting in the chair) without indicating overt interest (e.g., nodding/smiling) or lack of interest (e.g., looking away). We grouped the behaviors into three categories: small movements (e.g., head movements: looking away–uninterested; nodding–interested), large movements (e.g., the whole body shifting: leaning back–uninterested; leaning forward–interested), and taking out a phone (uninterested audience members only).

Fig. 1
figure 1

a The full 360° video (unrolled). We counted audience members as 1–5, from left to right. Audience members 1 and 4 completed actions reflecting a lack of interest, such as looking away, leaning back, and looking at their phone. Audience members 2 and 5 completed actions reflecting interest, such as nodding, smiling, and leaning forward. The central audience member 3 completed actions that were ambiguous such as shifting in the chair or scratching an arm. b The 360° video as viewed from the perspective within the Oculus headset. The green dot reflects where the participant was looking (this was not visible to participants actually viewing the video). c The OpenPose mapping to each of the audience members. d The inner color represents the ROI for each audience member based on keypoints from OpenPose (the ROI is dynamic for each frame). The outer color represents the 5°–20° area for fixations towards audience members

Frequency of movements varied between audience members whose actions reflected either interest (nodding, smiling, and leaning forward) or a lack of interest (looking away, leaning back, and looking at their phone), or neutral (shifting in the chair, or scratching an arm). Audience members were labelled 1 through 5, from left to right (see Fig. 1). Audience members 1 and 4 acted in an uninterested manner, whereas members 2 and 5 acted interested. Thus, there was a positive and negative actor in the most extreme positions (1 and 5) and also in the less eccentric positions (2 and 4), while the central audience was neutral. Table 1 shows the number of actions of different types (small and large) for each audience member. We list phone actions in a separate category. For easier interpretation of the analyses, we label the audience Pos 2 and Pos 5, Neg 1 and Neg 4, and Neutral 3. Note that there is some partial confounding between audience position and type of actions which was unavoidable in such a small study and will limit the generality of our findings to some extent. Audience members also differed in the total number of actions, and this is another limitation of the study. Assistance with development of the stimulus was provided by Moody College of Communications. The film was created with two SP360° 4K VR Cameras, mounted on a tripod, placed in front of a podium. For the public speaking tasks, participants were asked to stand behind that same podium when using VR headset and giving their speech. Participants rated the realism and immersion of the public-speaking challenge, with the majority reporting being at least somewhat immersed (83%) and forgetting that the audience was pre-recorded at least some of the time (56%). The video can be made available to researchers on reasonable request.

Table 1 Audience member action counts

Virtual reality headset and eye tracker

The 360° environment was displayed to participants with the Oculus Rift DKII virtual reality headset. Eye movements were monitored using an SMI eye-tracker upgrade, which captured eye movements at 75 Hz. In addition, a HiBall motion-tracking system (3rdTech) complemented the Oculus position tracking to ensure smooth display updating during vertical and horizontal head movements. The field of view of the helmet was 93.6 deg binocularly (see Fig. 1b). Prior to beginning the speech, all participants completed the SMI calibration procedure. The real-time correspondence of gaze position overlaid on the 360° video was captured for each video frame. These image-based sequences were later used to verify the raw eye-position data and subsequent eye-tracking pipeline.

Procedure

Participants completed an orientation to the virtual reality environment which included viewing the 360°-video environment for 90 s without the audience present. Next, participants were asked if they experienced any dizziness/nausea. Following this evaluation, participants were informed that they had 2 minutes to prepare a 5-minute speech on the topic “something you are proud of.” Before beginning the speech, the experimenter conducted a brief (less than 60 s) calibration check after which, participants were instructed to deliver their speech to the virtual audience members. Participants who paused for longer than 30 seconds prior to completing the 5-min speech were instructed to continue their speech for the full 5 min period. Participants were told during the consent process that what they viewed in the VR headset was a prerecorded video. Nonetheless some participants expressed curiosity about whether the audience was “live.” During debriefing, the researcher clarified that the audience was prerecorded.

Gaze analysis

Eye movement data preprocessing

Eye tracking data was saved from Vizard 4 (WorldViz). The pipeline for preprocessing the data involved OpenPose (Cao, Hidalgo, Simon, Wei, & Sheikh, 2019) and custom MATLAB code. OpenPose was used to detect hand, body, and face keypoint locations as they changed across the duration of the 360° video. MATLAB was used to create dynamic ROI masks using these tracked keypoints, in order to determine fixation targets (fixations were identified in the following preprocessing step; see below). The ROIs were defined using coordinates of OpenPose detected keypoints in image space (see Fig. 1c). Different possible regions for each audience member were defined based on each audience member’s face, hand, arm, and torso keypoints (see Fig. 1d). For the face and hand ROIs, these were defined in the image space using convex hulls of all hand or face keypoints, respectively. For torso and arm ROIs, radii of ~6° and ~3° were manually determined to best match all audience members. For the arm ROIs, regions of the image that were within ~3° of arm segments (in a direction perpendicular to the arm segment) were marked as an arm ROI. This was done for each arm. For the torso ROIs, pixels within ~6° of the torso (in a direction perpendicular to the torso orientation) were included in the torso ROI. Gaze data was labeled for each frame, by computing which ROI the 2D gaze location resided in for that frame. When gaze was not within any ROI, it was assigned as background.

To identify fixations an in-house program that had been previously used for a range of studies was employed (Kit, Katz, Sullivan, Snyder, Ballard, & Hayhoe, 2014; Li, Aivar, Kit, Tong, & Hayhoe, 2016 ; Li, Aivar, Tong, & Hayhoe, 2018; Rubin et al., 2020; Tong, Zohar, & Hayhoe, 2017). This program first filters gaze points using a median filter across a three-frame moving window. Next the program partitions data into fixations or saccades based on eye movement velocity and temporal stability (50°/s and 85 milliseconds). Lastly, the program combines fixations sequentially based on distance and temporal difference (within 1° and less than 80 ms apart). Track losses were handled by omitting data or were ignored when the fixation was on the same ROI before and after the track loss based on the conditions for combining fixations sequentially (i.e., a track loss of more than a few frames would be omitted).

Eye-movement data manual checks

In order to validate the preprocessing pipeline, two trained raters evaluated 30 second parts of each participant’s data. They compared whether the fixation location output from the pipeline matched the gaze location in the screen capture of the trial. Agreement with the pipeline was high: the locations of 94.7% of fixations were in agreement between the raters and the pipeline output.

Gaze attraction towards audience actions

We first calculated the proportion of fixations that reflected gaze towards audience members. We counted a fixation as gaze towards an audience member action if that action occurred within 20° of the previous fixation. We found that 20° of visual angle away from a specific audience member reflected limited peripheral visibility of that specific audience member. A fixation was also only counted as gaze towards an audience member action if the previous fixation was at least 5° away from the ROI of the previous fixation. This was to prevent the inclusion of typically occurring small saccadic eye movements between sequential fixations within a similar region given the size (in visual angle) of the audience members. This is illustrated as the outer region in Fig. 1d, where the inner region reflects the area of the audience member, and the outer diamond shaped band reflects the 5°–20° area from which we evaluated gaze attraction towards audience members. Lastly, to be classified as gaze towards an audience member, the fixation toward the audience member had to occur either during the action or within 300 ms. of the end of the action. (Note that the data were invariant across duration [100–600 ms] of this inclusion window at the end of the action.)

To examine whether participants were more likely to look at audience members when they were engaged in any action compared with no action we also calculated fixations when no action was completed. To evaluate the likelihood of gaze towards an audience member in response to specific actions, we separately calculated the proportion of fixations between all actions and then for each action for each audience member.

Gaze away from audience actions

We counted a fixation as related to gaze away from an audience member behavior if the prior fixation had been on an audience member, the new fixation was at least 20° away, and the movement away from the audience member occurred during the action. In relation to Fig. 1d, this means that following a fixation on an audience member, the next fixation was only counted as a fixation away if it was beyond that audience members outer band.

We calculated the same three proportions for gaze away. To examine whether participants were more likely to look away from audience members when they were engaged in any action compared with no action, we also calculated fixations away from audience members when no action occurred as a baseline value. To evaluate the likelihood of gaze away in response to specific actions, we separately calculated the proportion of fixations between all actions and then for each action for each audience member.

Normalization of gaze towards and away

The periods when audience members actually engaged in some sort of behavior were variable and much shorter in general than the periods when they were inactive. Therefore, we normalized the fixation frequencies to fixation per unit time to take account of the different time periods when audience members were executing actions versus inactive. Additionally, for each participant we evaluated the proportion of time that audience members were actually visible—as the FOV only included three audience members maximum (see Fig. 1b). To scale the proportion of fixations we first balanced the duration in frames between (a) when an audience member was active and in view (“active period”) and (b) when he or she was not active and in view (“inactive period”). Then the proportion of fixations during each period was scaled with the same factors, respectively. For example, if an audience member was engaged in some (visible on-screen) behaviors 25% of the time, it was scaled to 50% by multiplying by 2. Similarly, the 75% inactive period was also scaled to 50% by multiplying by 2/3. This allowed us to evaluate proportion of fixations in terms of the relatively likelihood that fixations reflected gaze towards or away from audience members compared with baseline across the speech.

Data analysis

Gaze towards audience actions

Using the processed gaze data, we evaluated whether the speaker’s gaze was oriented towards audience behavior generally. We first tested the overall comparison between the likelihood that audience member behaviors would be more likely to lead to gaze towards audience members relative to baseline (when a given audience member was doing nothing). We then evaluated relative differences in gaze towards individual audience members because audience members were different and behaved somewhat differently. This allowed for a general comparison of the effects of specific audience behaviors. Finally, to address the role of specific actions, we estimated the differences in likelihood that specific actions (small actions like looking away, compared with large actions like shifting back in the chair, compared with taking out the phone) played in orienting gaze towards audience members. Additionally, we explored whether there were any differences in gaze towards audience between small and large actions for each audience member.

Gaze away from audience actions

Using the processed gaze avoidance data, we evaluated the same comparisons to evaluate whether speaker’s displayed evidence of avoidant gaze (away) behavior.

Statistical approach

All analyses were conducted using the brms package (Version 2.13.5) in R (Bürkner, 2018). To evaluate the hypotheses, we conducted multilevel models (with a random effect for participant) with default priors, using the zero-inflated/zero-one-inflated beta distribution to appropriately model the skewed distribution of the proportion dependent variable. To address the first hypothesis, we used the adjusted proportion of fixations (see above for a description of how we calculated the adjustment). To address the second hypothesis, we used the raw proportion of fixations. We calculated contrasts using the emmeans package with a highest posterior density (HPD) interval of 1, meaning that the interval around the median estimate contains the full posterior distribution (the estimate falls somewhere within the distribution, whereas with a 95% interval, the estimate is estimated to fall outside the distribution 5% of the time). Results were transformed from unstandardized percentile estimates to standardized odds ratios for reporting purposes. Syntax and data used for the analyses are available here (https://osf.io/zg9xh/).

Results

On average, 1.226±0.041% of total participant fixations were towards audience members during an action (raw proportion = 4.621%, count of fixations = 1,877) and 0.433±0.020% of fixations were towards audience members when there was no action being completed (raw proportion = 1.649%, count of fixations = 653). During the times that audience members were completing an action and participants fixated towards an audience member, 5.858±0.291% of fixations were towards large actions, 5.984±0.294% of fixations were towards small actions, and 7.087±0.703% of fixations were towards phone actions.

On average 0.721±0.028% of total fixations were away from audience members when they were completing an action (raw proportion = 2.439%, count of fixations = 955) and an average of 0.651±0.024% of fixations were away from audience members when they were not completing an action (raw proportion = 2.713%, count of fixations = 1,061). During the times that audience members were completing an action and participants fixated away from an audience member on a subsequent fixation, 3.008±0.158% of fixations were away from large actions, 3.035±0.161% of fixations were away from small actions, and away from 3.636±0.366% of fixations were away from phone actions.

In addition to fixations towards or away from audience members, there were fixations made to audience members when the previous fixation was less than 5° away (raw proportion = 43.500%, count of fixations = 17,223). The proximity of the previous fixation to the audience member makes it impossible to determine the relationship between the audience member action and the subsequent fixation, given that gaze was already so close to the audience member. The remaining fixations were made to the background (raw proportion = 45.079%, count of fixations = 17,035) and the raw proportions sum to 100% accounting for all fixations made.

Gaze oriented towards audience members

As shown in Fig. 2a, gaze was more likely to be oriented towards audience members when they were engaged in an action compared with when they were inactive (baseline) odds ratio (OR) = 4.509, 95% confidence interval [3.608, 5.481].

Fig. 2
figure 2

a The odds ratio that gaze was oriented towards audience members during an action compared with baseline (no action). b The odds ratio that gaze was oriented towards individual audience members. Pos 1 = audience member 1 (far left), who acted interested; Neg 2 = audience member 2 (near left), who acted uninterested; C 3 = audience member 3 (center), whose actions were neutral; Neg 4 = audience member 4 (near right), who acted uninterested; Pos 5 = audience member 5 (far right), who acted interested; Neg 4/Pos 5 indicates the likelihood of gaze oriented towards audience member 4 compared with audience member 5 (where higher values indicate greater likelihood. c The odds ratio that gaze was oriented towards different actions. Note. The red line at 1 denotes the point at which the highest posterior density (HPD) suggests that the direction of the probability is not reliable. We used an HPD of 1, meaning that the interval around the median estimate plotted represents the full distribution of posterior draws from the model (i.e., the estimate falls entirely within the distribution)

In general, speakers were more likely to look at the actions taken by centrally located audience members as shown in Fig. 2b. For instance, the likelihood of gaze being oriented towards the action of audience member 3 (whose actions were neutral) sitting in the center, or audience member 4 (whose actions indicate lack of interest—e.g., looking away) next to him was not meaningfully different, OR = 1.11, [HPD: 0.782, 1.520]. On the other hand, the likelihood of gaze towards the action of audience member 3 was reliably greater than the likelihood of being attracted to an action of audience member 1 whose actions also indicated lack of interest, but who was sitting on the periphery, OR = 2.69, [HPD: 1.807, 4.38]. In other words, there was an approximately equal probability that a participant looked at audience member 3 or 4 when they were doing some action, but it was three times more likely that a participant looked at audience member 3 than audience member 1 when they were doing some action.

Figure 2c highlights the finding that gaze was more likely to be oriented towards phone actions compared with large actions, OR = 3.20, [HPD: 1.153, 7.54] and small actions, OR = 3.05, [HPD: 1.185, 7.14]. There was no meaningful difference between gaze towards small and large actions, OR = 1.05, [HPD: 0.821, 1.32].

Gaze oriented away from audience members

As shown in Fig. 3a, participants displayed evidence of gaze away from audience members during actions compared with when audience members were not completing any action (baseline), OR = 2.083, [HPD: 1.690, 2.594]. However, the probability of this avoidance action was somewhat smaller that the probability of attraction. As shown in Fig. 3b, participants were more likely to look away from audience member actions when those audience members were located in the center compared with the periphery.

Fig. 3
figure 3

a The odds ratio that gaze was oriented away from audience members during an action compared with baseline (no action). b The odds ratio that gaze was oriented away from individual audience members. Pos 1 = audience member 1 (far left), who acted interested; Neg 2 = audience member 2 (near left), who acted uninterested; C 3 = audience member 3 (center), whose actions were neutral; Neg 4 = audience member 4 (near right), who acted uninterested; Pos 5 = audience member 5 (far right), who acted interested; Neg 4/Pos 5 indicates the likelihood of gaze oriented away from audience member 4 compared with audience member 5 (where higher values indicate greater likelihood. c The odds ratio that gaze was oriented away from different actions. Note. The red line at 1 denotes the point at which the highest posterior density (HPD) suggests that the direction of the probability is not reliable. We used an HPD of 1, meaning that the interval around the median estimate plotted represents the full distribution of posterior draws from the model (i.e., the estimate falls entirely within the distribution)

Figure 3c shows that there were no reliable differences for gaze away from specific actions - between phone actions compared with small, OR = 2.31, [HPD: 0.576, 8.19], large actions, OR = 2.49, [HPD: 0.605, 8.165], or between small and large actions, OR = 1.080, [HPD: 0.783, 1.44].

Exploratory effects of audience member action type

Our exploratory analysis, summarized in Table 2, aimed to determine whether there were differences in general audience member category (interested/uninterested/neutral) and behaviors that reflected that category. We did not identify any reliable differences between audience members for the relative potency of large versus small actions, regardless of who completed the action. That is, whether an audience member was nodding or looking away (small actions indicating either interest or lack of interest) did not appear to make a difference; whether an audience member leaned forward or back (large actions indicating either lack of interest or interest); whether an audience member leaned forward or nodded/leaned back or looked away (small vs. large actions indicating interest/lack of interest) did not make a difference. In other words, we found that peripheral actions attracted gaze, whether or not they indicated interest. This shows that in naturalistic contexts, gaze can be attracted by unpredictable events in the peripheral retina. In addition, when the participant’s gaze was on an audience member who then executed an action, gaze away from the actor was observed. The avoidance eye movements did not appear to be linked to the nature of the action (interested vs. lack of interest).

Table 2 Audience Member × Action contrasts

Discussion

This paper examined the time-locked relationship between specific audience member actions during a public-speaking challenge and the speaker’s gaze behavior. We extended past findings on gaze during public speaking to identify gaze behaviors contingent on specific actions. Most importantly, the results make clear that unexpected peripheral events attract attention in a naturalistic context. It is difficult to know on the basis of these results what mechanisms underlie gaze attraction. While there is extensive evidence for attentional capture by bottom-up signals, there is also evidence that this can be modulated by task factors (Luck et al., 2021; Jovancevic-Misic & Hayhoe, 2009). There may have been an “information gathering” purpose to most fixations on audience members during actions. For example, a speaker might periodically check on the audience members to reduce their uncertainty about their level of interest. Yet taking out a phone led to a much higher probability of gaze towards that action compared with other actions, perhaps because of the peripheral detectability of this event. It may be because it provided a clearer way to interpret other more ambiguous behaviors the same audience member was engaged in (e.g., looking away because of boredom rather than just a random movement). Moreover, our exploratory analyses showed that gaze was attracted towards audience member actions regardless of whether they were interested or uninterested. This may be a consequence of the difficulty of classifying the social significance of the event from purely peripheral information. Part of processing information during public speaking may be managing uncertainty and gaze towards specific actions may play a particularly important role in modulating uncertainty. Future research explicitly modeling uncertainty (Sullivan et al., 2011) might be able to provide insight into the potential causal influences on gaze behaviors in social contexts.

While we found that taking out a phone was a stronger attractor of gaze than the other audience actions it is important to note that we cannot be certain that gaze towards the audience member taking out their phone was because of what the action represented. Perhaps something about the short series of actions and posture as a combination leads to greater uncertainty about what the audience member was doing (e.g., if they had taken out a random object from their pockets it may have elicited a similar increase in gaze towards an audience member). The lack of a control behavior (e.g., taking out a granola bar or a pad of paper) limits the conclusions we can draw about the phone action specifically.

Much more attention has been paid to gaze towards stimuli (attentional capture) compared with gaze away from stimuli (gaze avoidance). However, in the literature on responses to socially aversive behaviors, much has been made of the role of hypervigilance versus avoidance (Bögels & Mansell, 2004), making it worthwhile to explicitly examine possible underlying features motivating gaze away from audience members. The findings from this study suggest that gaze away from actions is not strongly related to any specific audience member behavior. Avoidance might not be conditioned to specific behaviors, but rather the interpretation of what those behaviors imply (i.e., lack of interest or social threat). The lack of differences regarding behaviors leading to gaze away from audience members might suggest that people simply do not react so strongly to lack of interest in terms of gaze behaviors. This could also mean that avoidance is linked to a general policy, rather than a consequence of specific audience member actions. However, audience member actions did generally evoke looks away from the actor. One possible reason that gaze away was not linked to the type of action, might be related to the cost of looking away. We based our hypotheses on the assumption that the social aversiveness of individual signaling lack of interest would motivate a strong reaction. There is a substantial body of literature on gaze behavior suggesting that there is a tendency to choose movements that reduce motor cost (Hayhoe, 2017). The weight of contextual factors during a public-speaking challenge makes understanding the mechanisms driving gaze in this context difficult to interpret (see discussion of attentional capture effects in Tatler et al., 2011). Our findings might then simply reflect the speed with which behaviors are processed. It may be worthwhile for future experimental research to consider how long social stimuli need to be attended for determinants about social information to be made. It would also be interesting to consider whether there is a synchrony between internal, emotional (perhaps physiological) responses to perceived lack of interest in the audience and subsequent gaze behaviors.

More broadly, our results also suggest that the use of 2D static representations (e.g., faces with different expressions), to reflect social threat, could make it difficult to understand real-world attentional processes, because disentangling the reasons for vigilance versus avoidance is complicated by a lack of flexibility to engage in information gathering. The size of the computer display, and the static nature of the stimuli presented, constrain the scope of attentional processing (see Tatler et al., 2011). Gaze to different parts of a computer screen involve only eye movements, whereas in everyday life head and body position also play important roles in determining where gaze is allocated and how attention is distributed. Moreover, unlike in task-based paradigms, the frequency of events in naturalistic environments is lower and more difficult to predict, with competing demands on attention. Indeed, our results indicate the low base-rate of gaze in relation to audience member actions. In the context of past research related to social anxiety, fixations on an audience member who is not doing something might reflect a kind of “monitoring” behavior. Public speaking is a fairly constrained context, and it would be worth considering gaze behaviors to specific actions in other contexts as well. It is also important to consider whether the cognitive demands needed to complete the public speaking task played a role in minimizing the influence of audience actions on gaze behaviors. Unfortunately, we were not able to examine the relationship between what a participant was doing (namely, were they speaking or not) and their gaze behavior. Furthermore, under conditions of lower cognitive load (for instance, if the speech had been prepared ahead of time), it is possible that there may have been a greater influence of the audience member behaviors on attention. We also did not evaluate the role of competing actions when multiple actions would have been visible to participants. While these occurrences were rare in the current data it would be interesting to consider preferential gaze allocation when competing audience member behaviors are visible. Additionally, in the current paper, we did not consider the role of social anxiety, which may influence reactivity to specific behaviors. Other investigations have considered whether contextual factors might promote avoidance (Laidlaw et al., 2011) or if there might be other gaze behaviors that reflect heightened vigilance, such as hyperscanning (Chen & Clarke, 2017).

Taken together our findings indicate that audience actions reliably lead to gaze towards audience members, although there is relatively low sensitivity to specific social cues, with the notable exception of taking out a phone. While generally audience actions led to gaze away from audience members, there was little relationship to specific audience member actions. In the context of examining attention related to specific behaviors or cues, it is possible that some of the cues were fairly subtle. Given the participants’ flexibility to incorporate head and body movements into their gaze behavior, as well as the limited view of the audience (in that participants could not see everyone at once), this behavioral context more closely approximates the way in which social behaviors are encountered in the real world. This might suggest that future research carefully consider how responses to social cues are interpreted in less dynamic and flexible experimental settings. It is also possible that research in real-world contexts is needed to understand whether the virtual environment is potentially less effective, as well as to explore the potential role of task cognitive load.