Introduction

Background

Social anxiety disorder (SAD) is characterized by a marked fear and avoidance of social and performance situations (American Psychiatric Association, 2013). SAD is one of the most common psychiatric disorders, with an estimated lifetime prevalence of 10.7% (Kessler et al., 2012). Researchers on social anxiety have highlighted two critical attentional biases as central maintenance factors of SAD: 1) self-focused attention (SFA), in which attention is focused on negative inner cues such as thoughts, self-imagery, and bodily sensations, and 2) other-focused attention (OFA), in which attention is focused on environmental threats such as evaluations by others (Clark et al, 1995; Rapee & Heimberg, 1997).

A number of studies have shown that highly socially anxious individuals and patients with SAD have higher levels of self-reported SFA than do participants with low social anxiety (Daly et al., 1989; Hackmann et al., 1998; Mellings & Alden, 2000). Researchers have identified the observer perspective as an important component of SFA, which is when individuals mentally picture themselves, they perceive themselves from another person’s viewpoint (Clark et al, 1995). Experimental manipulations of SFA (e.g., with mirrors, video cameras, or instructions for self-focus and observer perspective) caused anticipated anxiety about a social situation, public self-awareness, state anxiety, frequent negative thoughts, safety behaviors, and low self-evaluation of performance (Bögels & Lamers, 2002; George & Stopa, 2008; Squrr & Stopa, 2003; Woody & Rodriguez, 2000).

Higher OFA was also found in highly socially anxious individuals and in SAD patients, indicated in cognitive tasks using facial stimuli and eye-tracking tasks (Amir et al., 2003; Gamble & Rapee, 2010; Lazarov et al., 2016). In contrast, other research findings suggested an external attention bias that is not OFA with selective attention for threat but attentional avoidance of salient social stimuli such as eye contact and negative face (Howell et al., 2016; Wermes et al., 2018; see for review: Chen et al., 2020). Such mixed findings prompted the development of a vigilance-avoidance model in which anxious people are initially vigilant for threat and subsequently avoid it, and the findings have been supported in several studies (Mogg et al., 2004; Wieser et al., 2009). A systematic review suggested that the pattern of external attention bias was influenced by severity of social anxiety, type of social situation, and developmental stages (age group) (Chen et al., 2020).

Because the experimental paradigms used for SFA and OFA were completely different, researchers have examined SFA and OFA in relative isolation, and few have investigated the relationship between the two and their relative importance in the attentional processes of social anxiety (Choi et al., 2016; Schultz & Heimberg, 2008). Researchers have developed probe-detection paradigms to investigate the balance between SFA and OFA in which the external probe was superimposed on pictures of emotion faces or household objects that were presented on a computer screen and the internal probe was a pulse to the finger (Mansell et al., 2003). In addition, researchers have investigated brain activity during the probe-detection paradigm with event-related brain potentials (Kanai et al., 2012) and functional magnetic resonance imaging (fMRI) (Choi et al., 2016).

Boehme et al. (2015) manipulated SFA and OFA during a simulated social situation displayed on a screen and compared the neural activity associated with SFA with OFA using fMRI. Then, Pujol et al. (2013) compared brain activity using fMRI under a self-condition (participants watching a video about themselves) and an other-condition (watching a video about an unknown person) between participants with SAD and control participants. In these studies, activation in the medial prefrontal cortex (mPFC), temporo-parietal junction, temporal pole, and primary visual cortex and deactivation in the dorsal frontoparietal cortex were related to SFA (Boehme et al., 2015; Pujol et al., 2013).

Some researchers proposed that direct measurement of SFA and OFA in social situations is necessary to clarify the process of attentional focus in social anxiety (Schultz & Heimberg, 2008). Although fMRI provides valuable information on neural activity, it is highly sensitive to motion artifacts and as such requires that participants to remain as still as possible throughout the measurement session. Therefore, in the studies by Boehme et al. (2015) and Pujol et al. (2013), participants passively watched videos in social situations, but they did not participate in real-time conversations.

To measure SFA and OFA directly in a real-time social situation, Tomita et al. (2020) used near-infrared spectroscopy (NIRS) and eye-tracking to examine changes in brain activity following manipulation of SFA and OFA during speech tasks. NIRS is less restrictive than fMRI, making it easier to measure brain activity in real-time social situations (Tomita et al., 2020). In the study by Tomita et al., healthy participants performed speech tasks under SFA, OFA, and control conditions in front of a monitor that displayed four people acting as audience members, who gestured positive, negative, and neutral, respectively. Although the audience members were in prerecorded videos, participants were told that the audiences were watching them from another room.

By measuring eye-tracking and brain activity simultaneously, Tomita et al. (2020) matched brain regions associated with SFA and OFA with behavioral features such as hypervigilance and avoidance in social situations. During speech tasks, participants were instructed to pay attention according to each condition, and immediately after each speech task, self-report questionnaires were administered. Tomita et al. found that in the SFA condition, the brain activity in the right frontopolar area (rFPA), which covers the mPFC (Xu et al., 2017), and that in the right dorsolateral prefrontal cortex (rdlPFC) were greater than the activity in the control condition.

In the OFA condition, the brain activity in the left superior temporal gyrus (lSTG) was greater than that in the control condition (Tomita et al., 2020). In addition, SFA instructions induced eye movements that indicated avoidance of an audience member who displayed negative gestures, with a positive relationship between these avoidant eye movements and brain activity in the rFPA; however, in the OFA condition, participants did not demonstrate any eye movements associated with hypervigilance or avoidance (Tomita et al., 2020). From the above results, Tomita et al. determined that “under the situation of giving a speech in a social setting, greater oxy-Hb responses in the rFPA with avoidant eye-movement and in the rdlPFC and greater oxy-Hb responses in the lSTG may be used as some of the objective measurements of SFA and OFA, respectively” (2020, p. 522). Although previous researchers (Vriends et al., 2017) proposed observing videos of participants such as in video phone calls to measure SFA in social situations, using videos and mirror images as in earlier studies (e.g., Davies, 1982; Vriends et al., 2017) enhanced participants’ self-perceptions (Hofmann & Heinrichs, 2002, 2003). Tomita et al. (2020) suggested that the avoidant eye-movement pattern associated with SFA manipulation might be useful for measuring the degree of SFA in social situations without affecting self-perceptions.

Tomita et al. (2020) analyzed the effect of attention manipulation after controlling for social anxiety because they focused on the effects of SFA and OFA manipulation irrespective of social anxiety. Although they set the regions of interest (ROI) based on previous results for SFA and OFA in individuals with social anxiety (Boehme et al., 2015; Choi et al., 2016; Gentili et al., 2008; Pujol et al., 2013; Straube et al., 2004), it was not clear to what extent the brain regions revealed in their study are relevant to the underlying pathology of social anxiety because Tomita et al. controlled the effect of social anxiety.

The Current Study

We aimed to investigate whether the activity in rFPA, rdlPFC, and lSTG increases proportional to social anxiety tendency without manipulating SFA and OFA. Using the experimental paradigm developed by Tomita et al. (2020), we had participants perform speech tasks. The only difference between the current study and Tomita et al.’s was that we did not perform attentional manipulation of SFA and OFA. In the current study, the speech tasks comprised two conditions: a no-instruction condition (natural setting) and a control condition. In the no-instruction condition, participants were told to speak freely. We set a control condition because brain activity related to motor outputs such as utterances and perceptions of visual stimuli should be subtracted from those related to the task (no-instruction) condition. Therefore, we used the control data only for the analysis of brain activity. The control condition was the same as in Tomita et al. (2020), in which participants were instructed to “look at various objects and places on the screen as well as the appearance and reactions of the audience members” (p. 514).

We compared brain activity in the two conditions using the same measurement instruments as Tomita et al. (2020). Furthermore, to clarify the extent to which the brain regions activated proportional to social anxiety trait were relevant to attention pathology in social anxiety, we checked for correlations between the oxy-Hb responses of relevant brain regions, subjective responses to SFA and OFA questionnaires, and fixation duration of eye movements.

Hypotheses

First, we hypothesized that higher social anxiety would be associated with higher rFPA, rdlPFC, and lSTG activity in the no-instruction condition compared with the control condition.

Second, we hypothesized that participants with higher social anxiety would spend less time watching audience members who gestured negatively than would audience members who gestured positively during the no-instruction condition, as demonstrated in Tomita et al.’s (2020) SFA condition.

Third, we hypothesized that higher subjective SFA during the no-instruction condition with higher social anxiety would be associated with increased rFPA and rdlPFC between the no-instruction and control conditions and higher subjective OFA during the no-instruction condition with higher social anxiety would be associated with increased lSTG between the no-instruction and control conditions. In addition, based on Tomita et al. (2020), we predicted that higher social anxiety with spending less time watching audience members who gestured negatively than audience members who gestured positively would be associated with increased rFPA.

Method

Participants

We recruited 39 students (23 women, 16 men), 19–20 years old (M = 19.68, SD = 0.87) by handing out an application to students attending classes at a university in Japan. We used the same inclusion criteria as Tomita et al. (2020), and “all participants were Japanese and reported no psychological disorders including SAD, hearing problems, or neurological or cardiovascular illnesses. Further, no participants reported poor physical condition, lack of sleep, or any medication within 24 h or alcohol consumption within 12 h of the beginning of the experiment” (Tomita et al., 2020, p. 513). In addition, no participants had participated in Tomita et al.’s experiment. After the study, participants were compensated 1,500 yen for their time.

Self-Report Measures

All self-report measures were the same as used by Tomita et al. (2020), and their study contains the detailed information such as the validity and reliability of each scale.

Japanese Version of the Liebowitz Social Anxiety Scale (LSAS-J)

The LSAS-J (Asakura et al., 2002; originally developed by Liebowitz, 1987) assesses fear and avoidance in 24 typical social performance and interaction situations. The responses to each situation are aggregated to produce a measure of the severity of social anxiety trait. The scale consists of 24 items, each depicting different social situations. For each situation, participants rated their levels of fear and avoidance on four-point Likert scales. The fear ratings range from 0 (no fear) to 3 (severe fear). The avoidance ratings response options were 0 (never), 1 (avoid 33% or less), 2 (avoid 50%), and 3 (avoid 67%–100%) based on the percent of time avoiding each situation. Total LSAS-J scores ranged from 0 to 144.

Focused Attention Scale (FAS)

The FAS (Yamada et al., 2002) was based on the Focused Attention Questionnaire (FAQ; Chambless & Glass, 1984) and comprises three items translated from the FAQ and nine original items. The FAS comprises two subscales: FAS-self (six items), which measures the degree to which participants attend to their physical sensations (e.g., heartbeat), and FAS-others (six items), which measures the degree to which participants attend to the behavior of others (e.g., others’ facial expressions). Participants responded to each statement using a five-point Likert scale ranging from 1 (not focusing at all) to 5 (focusing totally). Total scores on each subscale ranged from 1 to 30, where higher scores indicate greater SFA and OFA.

Mental Perspective Scale for SAD

When individuals with social anxiety focus their attention on self, they take the form of a mental visual image experienced from an observer perspective, wherein individuals perceive themselves from another personʼs viewpoint (Clark et al, 1995). In contrast, the field perspective means that their image of the situation is perceived as if they are viewing the scene from inside their own eyes, observing the details around them (Spurr & Stopa, 2003). The Mental Perspective Scale (MPS) (Tomita et al., 2018) comprises three subscales: field perspective (MPS-F: five items), observer perspective (MPS-O: four items), and detached mindfulness perspective (MPS-DM: four items). Participants responded to each statement using a six-point Likert scale ranging from 1 (not at all) to 6 (totally). We used the MPS-F and MPS-O subscales in the current study; the MPS-F assesses the extent to which participants see all the audience members and various objects on the screen (e.g., “I look at various objects in the social situation”), which we regarded as measuring the opposite construct of OFA. The MPS-O assesses the extent to which participants see themselves from the perspective of others (e.g., “I imagine my behavior as though from another person's viewpoint”), which is an essential component of SFA (Clark et al, 1995). In the current study, we used FAS-self and MPS-O to measure the subjective degree of SFA and FAS-others and the reverse score of MPS-F to measure the subjective degree of OFA during the speech tasks.

Visual Analog Scale Assessing State Anxiety and Manipulation in the Control Condition

Participants were asked to indicate the degree to which they were experiencing state anxiety before and during the speech tasks using a Visual Analog Scale (VAS) (questions: How anxious are you now? How anxious did you feel during your speech?), with 0 (not at all) and 100 (completely) at the extreme ends of the scale. Participants also used a VAS to indicate whether they were able to deliver the speech, as instructed, in the control condition (question: How well did you make a speech according to my instruction?), with 0 (not at all) and 100 (completely) at the extreme ends of the scale.

Participants' Impression of the Audiences

After the speech task, participants looked at each photograph of the audience members and were asked to rate their impressions of each audience member on a seven-point Likert scale that ranged from 1 (extremely positive) to 7 (extremely negative). This exercise was so we could identify the participants who evaluated the audience member's negative gestures incorrectly as positive or vice versa.

Near-infrared Spectroscopy

We used an optical topography system (ETG-4000, Hitachi Medical Corporation, Japan), which measures “changes in cerebral oxy-Hb and deoxyhemoglobin (deoxy-Hb) concentrations at two wavelengths of near-infrared light (695 nm and 830 nm)” (Tomita et al., 2020, p. 516). We used the same type of probe holder, probe position, number of measurement channels, and measurement principles as those used by Tomita et al. (2020).

Eye-tracking System

We used an eye-tracking device (QG-Plus, DITECT, Japan) that measures binocular gaze using dark pupil–corneal reflection at a rate of 60 frames per second. QG-Plus automatically accommodates for head movements within a 28 cm × 16 cm × 35 cm (width × height × depth) space up to 25 cm/sec. Fixation was identified when the device detected the pupils of both eyes and the participant’s gaze stayed in an area smaller than 50 pixels for at least 100 ms.

Procedure

Participants received an explanation of the nature and purpose of the study and signed a written consent. Before the experiment commenced, participants completed a medical checklist, a VAS about state anxiety, and the LSAS-J. They sat in front of a computer and received an explanation of the speech tasks. After that, the NIRS probe holder and the eye-tracker were attached and calibrated. During each speech task, participants' eye movements and brain activity were measured. Immediately after each speech task, participants rated their subjective degree of SFA and OFA during the speech, using the relevant FAS and MPS subscales. After the control condition only, participants used a VAS to indicate whether they engaged a speech as instructed, and they were asked to explain why they had given their VAS scores to confirm whether they had correctly understood the instruction.

At the end of the experimental sessions, participants rated their impression of each audience member and rated their overall state anxiety during all speech tasks with a VAS. The next day, participants completed an online questionnaire that was unrelated to the current study, and they were debriefed that the audiences they had seen had actually been prerecorded. Participants were also asked to use an online VAS to indicate whether they had suspected that the audiences were actually prerecorded videos (0 = extremely believed and 100 = extremely noticed).

Speech Tasks

Again under the same protocol as Tomita et al.’s (2020, p. 514), “the participants performed speech tasks in front of a monitor that displayed four audience members: two acted out positive and negative gestures, respectively, while the other two acted out neutral gestures.” Although we had told the participants that the audience members were in the next room evaluating their speech in real time, the audiences were again actually prerecorded. For each speech task, participants were instructed to talk about “their school life from elementary school to the present” (Tomita et al., 2020). There were three speech task sets, each of which consisted of a set of rest periods and speech tasks: a 60-s rest period in which participants gazed at a fixation cross in the center of the screen, a 60-s speech task, a 40-s rest period, a 60-s speech task, and a final 30-s rest period.

As shown in Fig. 1, the no-instruction condition consisted of two speech task sets, and the control condition consisted of one set. Tomita et al. (2020) conducted three sets of speech tasks: one for the SFA condition, one for the OFA condition, and one for the control condition. The order of the SFA condition and the OFA condition was counterbalanced, and the control condition was always last (Tomita et al., 2020). In the current study, we conducted the two speech task sets in the no-instruction condition first, followed by the control condition, for each participant. The instructions given to participants in the control condition, the audience videos, and the topic that participants were instructed to speak about were the same as those of Tomita et al.

Fig. 1
figure 1

The differences in the experimental design between Tomita et al. (2020) and the current study. Note. In this study, two sets of speech tasks in the no-instruction condition were conducted first, followed by one set of speech tasks in the control condition. Each speech task set consisted of two 60-s speech tasks. Except for the manipulation of SFA and OFA, the experimental design of the current study was the same as that used by Tomita et al. (2020)

In the no-instruction condition, we did not provide any instructional manipulation. We told participants that for the first two speech task sets, they were to speak freely. In the control condition, participants were instructed to “look at various objects and places on the screen as well as the appearance and reactions of the audience members” (Tomita et al., 2020, p. 514). The control condition was intended as an adaptive attention control condition asking the participants to attend to various stimuli evenly without producing SFA and OFA.

We used the same audience videos as those used by Tomita et al. (2020). One task set included two 60-s speech tasks (i.e., 60 s × 2); that is, two audience videos were used per set. Tomita et al. used six different videos for each participant: two per speech task set (i.e., one per 60-s speech) across three conditions (2 × 3) to prevent the participants from noticing that the audiences were on prerecorded videos. For the current study, we also used six different videos for each participant: four in the no-instruction condition (two speech task sets), and two in the control condition (one task set). Detailed information on the videos (e.g., the contents of gestures in each audience member) is available from Tomita et al. (2020).

Data Preparation and Analysis

NIRS Data

We used only oxy-Hb data for the analysis as in previous studies (Tomita et al., 2017, 2020; Yokoyama et al., 2015). We used the integral mode in the ETG-4000’s analysis program to calculate baseline data and calculated the average changes in oxy-Hb concentration (Δoxy-Hb) over the baseline for each channel during each speech task (60 s). We obtained the average Δoxy-Hb from the four no-instruction speech tasks and the two control speech tasks. Finally, we divided the two averaged sets of 60-s data into three intervals: 0–20 s, 20–40 s, and 40–60 s. We also followed Tomita et al. (2020) and calculated the average Δoxy-Hb for each of these intervals for both conditions.

We used virtual registration (Tsuzuki et al., 2007), which enables the probabilistic registration of NIRS data into the Montreal Neurological Institute coordinate space. We set the ROIs based on the regions associated with SFA and OFA in Tomita et al.’s study (2020). The three ROIs were placed over one left and two right brain areas, including the lSTG (CH41), rFPA covering the mPFC (CH25, CH26, CH35, CH36, CH46, CH47; Xu et al., 2017), and rdlPFC (CH5, CH13, CH15). We calculated Δoxy-Hb in each ROI by averaging the corresponding channels (Tomita et al., 2020).

Eye-tracker Data

The eye movements were analyzed using only the data from the no-instruction condition because we thought that the eye-movement pattern of SFA and OFA in high social anxiety individuals would be evident under the no-instruction condition.

The circular areas of interest (AOIs) were set around the audience members’ faces: positive, neutral, and negative (see Tomita et al., 2020). Then, we calculated the average fixation time for each AOI of the four 60-s speeches in the no-instruction condition. We divided the averaged 60-s data into three intervals: 0–20 s, 20–40 s, and 40–60 s. The fixation times of the faces were normalized to respective valid fixation time on the whole scene for each participant to remove the effects of fatigue or movement artifacts. In total, four speech tasks were used for the analysis (no-instruction condition: 60 s × 4 times), and we measured the fixation times every 20 s (0–20 s, 20–40 s, 40–60 s). Therefore, we calculated the averaged invalid percent of 12 Sects. (20-s × 3 × 4 times) for each participant. Then, we calculated the median and the 25th and 75th percentiles of invalid percent of all participants.

Following previous studies (Tomita et al., 2020; White et al., 2019), we observed the differences in the duration of fixation on positive faces minus that on negative faces during no-instruction condition as a behavioral indicator for social anxiety in social situations for each participant (i.e., hypervigilance for threats or threat avoidance).

Statistical Analysis

To investigate the first hypothesis, we performed multilevel multiple regression analyses using a hierarchical liner model. Predictor variables were LSAS-J score, condition (no-instruction, control), and the interaction between LSAS-J and the condition; the outcome variable was the Δoxy-Hb in each ROI for each time interval. When the interaction between LSAS-J and condition was shown in each analysis, we used simple main effect analysis using the data of ± 1SD of LSAS-J to plot the pattern of interaction.

To investigate the second hypothesis, we performed multilevel multiple regression analyses using a hierarchical liner model. The predictor variables were LSAS-J, face (positive or negative), and the interaction between LSAS-J and face, and the outcome variable was the duration time for each time interval in the no-instruction condition. When the interaction between LSAS-J and face was shown in each analysis, we used simple main effect analysis to plot the pattern of interaction.

To investigate the third hypothesis, we performed multiple regression analyses with LSAS-J, each relevant MPS and FAS subscale answered after the no-instruction condition, interactions between LSAS-J, each of the MPS and FAS subscales as predictor variables, and the difference between Δoxy-Hb in no-instruction condition and that in control condition (the diff of Δoxy-Hb) at the ROIs proved in the first hypothesis as outcome variables. We used the diff of Δoxy-Hb because brain activity related to motor outputs such as utterances and perceptions of visual stimuli should be subtracted from those related to the task (no-instruction) condition. Then, we performed a multiple regression analysis with LSAS-J, the differences in fixation duration on a positive face minus a negative face in no-instruction condition, and interaction between LSAS-J, with the differences in fixation duration of positive face minus negative face as predictor variables and the diff of Δoxy-Hb in the rFPA as the outcome variable. We used simple slope analysis to plot the pattern of interaction when the interaction between LSAS-J and each relevant MPS and FAS subscale or that of LSAS-J and the differences in fixation duration on a positive face minus negative face was shown in each analysis. We used HAD17_105 (Shimizu, 2016) software for these analyses.

Exclusion Criteria for Data Analysis

We excluded participants’ data for the following reasons: (1) misunderstanding of the instructions for the control condition; (2) audience member's negative gestures were incorrectly evaluated as positive or vice versa; (3) noticing that the audiences were actually prerecorded videos, in combination with eye-tracking and NIRS data identified as outliers that were defined as below the 25th percentile–1.5 × IQR (75th percentile–25th percentile) and above than 75th percentile + 1.5 × IQR in the box and whisker plot; (4) the eye-gaze duration data showed more than half of outliers shown by all participants in the box and whisker plot; or (5) the NIRS data showed more than half of outliers shown by all participants in the box and whisker plot. In addition, we excluded eye-tracking data if (6) the calibration of the eye-tracking device gave an error, and we excluded NIRS ROI data if (7) the integral mode could not be used for calculating the average waveform because of body movement.

Results

Confirmation of Exclusion Criteria

We excluded one participant according to the second criterion and one participant with four outliers out of seven shown by all participants against the fourth criterion, and we used the data on the remaining 37 participants for the analyses. We excluded eye-tracking data from another two participants according to the sixth criterion and used the data from 35 participants for analysis. We excluded one participant from the analysis using oxy-Hb data in the lSTG according to the seventh criterion and used the data from 36 participants.

Experimental Manipulation Check

We performed multilevel multiple regression analysis with LSAS-J, time (baseline, speech tasks), and interaction between LSAS-J and time as predictor variables and VAS state anxiety scores as an outcome variable. LSAS-J significantly predicted an increase in VAS state anxiety scores (B = 0.245, SE = 0.078, p = 0.003). Time also significantly predicted higher VAS state anxiety scores (B = 43.811, SE = 5.138, p = 4.6E–10), which means that the participants reported higher state anxiety during the speech tasks (M = 66.459, SD = 23.583) than at baseline (M = 22.649, SD = 18.891). The interaction between LSAS-J and time was not significant (B = –0.168, SE = 0.200, p = 0.405).

Next, we examined participants' impressions of the positive, negative, and neutral audience members; the lower the impression score, the more positively the participants evaluated the audience member. For this analysis, we divided the participants into high and low social anxiety groups (HSA and LSA, respectively) based on LSAS-J because we could not easily perform multilevel regression analysis on repeated measures with more than two levels using HAD17_105 (Shimizu, 2016) software: The HSA group had scored 55 or higher on the LSAS-J, an above average score, and participants in the LSA group had scored 54 or lower. We submitted the impressions scores to a 2 (group: HSA, LSA) × 3 (audience: positive, negative, neutral) mixed-design ANOVA with repeated measures on the second variable. While there was no main effect for group or the group × audience interaction, the main effect of audience was significant, F (1.41, 49.42) = 363.423, p = 0.001, η2 = 0.886. Participants reported lower scores for audience members who acted in a positive manner (M = 1.387, SD = 0.686) than for those who acted in a negative (M = 6.196, SD = 0.811) (t (35) = 21.278, p = 0.001) or neutral (M = 4.365, SD = 0.633) (t (32) = 16.835, p = 0.001) manner. Additionally, participants gave lower scores for audience members who acted in a neutral manner than for those who acted negatively, t (35) = 15.004, p = 0.001. Thus, each audience member impressed the participants as intended.

Hypothesis 1: The Changes in Brain Activities

Table 1 shows the results for the multilevel multiple regression analysis. The results showed that the interaction of LSAS-J and condition (no-instruction, control) significantly predicted the Δoxy-Hb in the rFPA at 0–20 s and 20–40 s, and the interaction of LSAS-J and condition tended to predict Δoxy-Hb in lSTG at 0–20 s. The interaction of LSAS-J and condition did not predict the Δoxy-Hb in rdlPFC. Figure 2 shows the results of simple main effect analysis. When LSAS-J was at + 1SD, Δoxy-Hb in the rFPA at 0–20 s was significantly greater in the no-instruction condition than in the control condition (LSAS-J + 1SD: B = 0.092, SE = 0.022, p = 1.6E–4; LSAS-J –1SD: B = 0.027, SE = 0.023, p = 0.254). Then, at 20–40 s, the simple main effect of LSAS-J at + 1SD and − 1SD were both significant (LSAS-J + 1SD: B = 0.169, SE = 0.029, p = 1.7E–6; LSAS-J − 1SD: B = 0.082, SE = 0.028, p = 0.006). Regarding Δoxy-Hb in the lSTG at 0–20 s, the simple main effects were not significant (LSAS-J + 1SD: B = 0.037, SE = 0.037, p = 0.325; LSAS-J –1SD: B = –0.053, SE = 0.034, p = 0.126).

Table 1 Multilevel Multiple Regression Analysis Results for Brain Activity
Fig. 2
figure 2

The results of simple main effect analysis in each brain region, showing the interaction of LSAS-J and condition. Note. (a) Δoxy-Hb in the rFPA at 0–20 s (b) Δoxy-Hb in the rFPA at 20–40 s (c) Δoxy-Hb in the lSTG at 0–20 s. LSAS-J: Japanese version of the Liebowitz Social Anxiety Scale, rFPA: right frontopolar area, lSTG: left superior temporal gyrus. ** p < 0.01, *** p < 0.001

Hypothesis 2: The Changes in Eye Movements

The median and 25th and 75th percentiles of invalid percent of all subjects were as follows: median = 1.411%, 25th percentile = 0.503%, 75th percentile = 3.469%. The interaction of LSAS-J and face (positive, negative) did not predict the fixation duration for any time interval in the no-instruction condition (0–20 s: B =  − 0.003, SE = 0.043, p = 0.952; 20–40 s: B =  − 0.057, SE = 0.063, p = 0.370; 40–60 s: B =  − 0.069, SE = 0.066, p = 0.302). The main effects of face were all significant; fixation duration on a negative face was significantly shorter than that for positive face for each time interval irrespective of the degree of social anxiety (0–20 s: B = 2.768, SE = 0.912, p = 0.005; 20–40 s: B = 3.993, SE = 1.335, p = 0.005; 40–60 s: B = 3.319, SE = 1.316, p = 0.017). The main effects of LSAS-J were not all significant (0–20 s: B = 0.025, SE = 0.053, p = 0.645; 20–40 s: B = –0.022, SE = 0.051, p = 0.664; 40–60 s: B = –0.011, SE = 0.052, p = 0.827).

Hypothesis 3: The Relationship Between the Subjective and Objective Measurements of SFA and OFA

Based on the results of testing the first hypothesis on the activities in the rFPA, we performed multiple regression analyses with LSAS-J, subjective measurements of SFA (either FAS-self or MPS-O), and interaction of LSAS-J and subjective SFA measurement as predictor variables; diff of Δoxy-Hb in the rFPA at 0–20 s and at 20–40 s were the outcome variables. The results showed that the interaction of LSAS-J and FAS-self significantly predicted the diff of Δoxy-Hb in rFPA at 0–20 s (β = 0.538, p = 0.001). Figure 3 shows the results of simple slope analysis. When LSAS-J was at + 1SD, FAS-self during the speech task predicted diff of Δoxy-Hb in rFPA at 0–20 s (LSAS-J + 1SD: B = 0.015, SE = 0.005, p = 0.003; LSAS-J –1SD: B = –0.006, SE = –0.289, p = 0.097). In these results, the interaction of LSAS-J and MPS-O predicted diff of Δoxy-Hb in rFPA at 0–20 s (β = 0.311, p = 0.063). The simple slope analysis results were not significant (LSAS-J + 1SD: B = 0.008, SE = 0.008, p = 0.270; LSAS-J –1SD: B = –0.008, SE = 0.006, p = 0.181). Neither the interaction of LSAS-J and FAS-self nor that of LSAS-J and MPS-O significantly predicted diff of Δoxy-Hb in rFPA at 20–40 s (FAS-self: β = 0.023, p = 0.893; MPS-O: β =  − 0.002, p = 0.989).

Fig. 3
figure 3

The results of simple slope analysis in increased rFPA in 0–20 s and subjective SFA. Note. (a) The interaction of LSAS-J and FAS-self (b) The interaction of LSAS-J and MPS-O. rFPA: right frontopolar area, SFA: self-focused attention, LSAS-J: Japanese version of the Liebowitz Social Anxiety Scale, FAS-self: self-focused attention, which is the subscale of Focused Attention Scale, MPS-O: observer perspective, which is the subscale of Mental Perspective Scale for Social Anxiety. Error bars represent standard errors

Based on the results of testing the first hypothesis on the activities in the lSTG, multiple regression analysis with LSAS-J, the subjective measurements of OFA (either FAS-others or MPS-F), and the interaction of LSAS-J and subjective OFA measurement as predictor variables and diff of Δoxy-Hb in the lSTG at 0–20 s as an outcome variable. The results showed that neither the interaction of LSAS-J and FAS-others nor that of LSAS-J and MPS-F predicted diff of Δoxy-Hb in lSTG at 0–20 s (FAS-others: β = 2.9E-5, p = 0.874; MPS-f: β = 1.1E-4, p = 0.545).

We did not examine whether LSAS-J moderated the effects of the differences in the fixation duration of positive faces minus negative faces on diff of Δoxy-Hb in rFPA because the second hypothesis was not supported.

Discussion

With this study, we aimed to investigate whether the rFPA, rdlPFC, and lSTG activity during speech tasks increased proportional to social anxiety tendency without manipulating SFA and OFA. We found that the higher the participants’ social anxiety, the more rFPA activity they showed in the no-instruction condition compared to the control condition. Higher social anxiety was also associated with more lSTG activity in the no-instruction condition than in the control condition, although the interaction of LSAS-J and condition was marginally significant. Therefore, the first hypothesis was supported, with the exception of the results for the rdlPFC.

The dlPFC is associated with voluntary control of attention (Comte et al., 2016). Previous studies demonstrated that networks within the right frontotemporal region have roles of self-evaluation, autobiographical memory, and self-recognition (Keenan et al., 2000). Based on Tomita et al. (2020), we had hypothesized that dlPFC activity would be greater in the no-instruction condition than the control condition. The main difference between Tomita et al.’s study (2020) and the current work is that Tomita et al. experimentally manipulated SFA, whereas we did not. It is possible that participants in Tomita et al.’s study might have tried to intentionally control the focus of their attention more than the participants in the current study. Therefore, activation of the rdlPFC during the SFA condition in Tomita et al. might have reflected the cognitive process of trying to intentionally control attention.

Figure 2 shows that the participants with higher social anxiety showed less rFPA activity than did those with lower social anxiety in the control condition at 0–20 s, and they showed a slight decrease in the control condition and a marked increase in the no-instruction condition at 20–40 s. These results might be relevant to the previous research demonstrating that the response pattern of the prefrontal cortex in socially anxious individuals differs depending on whether the experimental task requires special procedures such as attention control. For example, when performing the emotional control task and the verbal fluency task, the activity in the medial prefrontal cortex and the bilateral ventrolateral prefrontal cortex was lower in patients with SAD than in healthy individuals (Brühl et al., 2014; Yokoyama et al., 2015).

Other researchers, however, have reported higher prefrontal cortex activity in patients with SAD than in healthy individuals when they perform tasks that do not require any top-down control. Researchers have observed that prefrontal cortex activity in patients with SAD could reflect nonfunctional cognitive activity such as inhibition (Brühl et al., 2014). Based on these findings, in this study, the prefrontal cortex activity in the participants with higher social anxiety might have decreased more under the present control condition that required top-down control to conform to the task instruction, whereas the activity might have increased more with the nonfunctional cognitive activity of SFA under the no-instruction condition. Therefore, we assumed that the difference in rFPA activity between the no-instruction and the control conditions became larger proportional to social anxiety tendency.

As for the second hypothesis, it was not supported. Only the main effect of face was significant: The fixation duration in the no-instruction condition for negative faces was significantly shorter than that for positive faces for all time intervals irrespective of the degree of social anxiety. Therefore, we could not confirm Tomita et al.’s (2020) finding that avoidant eye-movement patterns might be useful for measuring the degree of SFA in social situations. Lin et al. (2016) administered a speech task with a prerecorded audience for healthy subjects as in the current study, and the HSA group showed longer total fixation on negative stimuli and shorter total fixation on positive stimuli than did the group with low trait social anxiety. The LSA group also looked less at negative feedback than at positive and neutral feedback, whereas the HSA group did not display this bias (Lin et al., 2016). Therefore, our result was not consistent with Lin et al.’s. Tomita et al.’s (2020) suggestion concerning avoidant eye-movement patterns as a useful measure of SFA is based on experimentally manipulating SFA. However, Lin et al. (2016) discussed, previous researchers have used fixation duration on external stimuli to investigate external attention bias like OFA. Based on the findings for the second hypothesis, it might be difficult to use fixation duration on external stimuli as an objective measurement of SFA. Alternatively, the second hypothesis might not have been supported due to the limited time resolution. We calculated the average fixation time on each AOI every 20 s. Several studies have suggested that atypical eye movements in social anxiety are much more subtle and can best be seen in the temporal and spatial distribution of the fixations (Chen et al., 2015; Kleberg et al., 2021). To heighten the time resolution, researchers have used a visual scanpath, or a tracing of the motion of the eye made while viewing a complex stimulus, and it consists of a sequence of fixations and saccades (Chen et al., 2015). Adults with SAD showed a longer scanpath than healthy controls during public speaking (Chen et al., 2015), whereas youth with SAD showed a shorter scanpath than healthy controls during emotional recognition (Kleberg et al., 2021). Other researchers use pupil dilation, an index of arousal, and closely link it to attention (Keil et al., 2018; Kleberg et al., 2019, 2021). The youth SAD group had higher pupil dilation than controls (Kleberg et al., 2021), and larger pupil dilation to happy face stimuli before 12-week cognitive behavioral treatment for adolescent SAD was related to worse treatment response (Kleberg et al., 2019). Therefore, eye tracking analysis with better time resolution, such as visual scanpath and pupil dilation, may capture the degree of SFA as well as OFA related to social anxiety.

As Fig. 3 shows, increased rFPA activity during 0–20 s was associated with higher social anxiety and higher subjective SFA. Therefore, the third hypothesis, as well as the first hypothesis, were supported for rFPA. Although the increase in rFPA activity during 20–40 s was greater in the no-instruction condition than in the control condition and that difference was greater according to social anxiety tendency, the interaction of LSAS-J and subjective measurement of SFA was not significant for increased rFPA activity during 20–40. When the participants were asked about the subjective degree of SFA, they might have answered based on the memory immediately after the speech commencement (0–20 s).

In contrast, our prediction of a relationship between increased lSTG activity and the interaction of LSAS-J and subjective OFA was not supported. Considering this result and the result of testing the first hypothesis that the effect of the interaction of LSAS-J and condition in lSTG activity was marginally significant, it may be difficult to conclude that lSTG activity is a useful objective measurement of OFA. We did not analyze the relationship between rFPA and avoidant eye-movement because the second hypothesis was not supported. In the present study, we demonstrated that objective measurement of brain activity, which Tomita et al. (2020) measured by manipulating SFA, can be adapted to naturally occurring SFA related to social anxiety. In contrast, it is difficult to adapt the relationship between brain activity and eye movement, as suggested by manipulating SFA, to a natural setting.

Following Tomita et al. (2020), for the present study, we created “an experimental paradigm to measure both SFA and OFA in a social setting” (Tomita et al., 2020, p. 522). In a similar paradigm, Glassman et al. (2017) used fNIRS to measure brain activity in the dlPFC while participants performed a speech task in front of a prerecorded video of a small audience. They compared participants with HSA and LSA and investigated the relationships between dlPFC activity, participants’ performance skills, and state anxiety during the speech. While Glassman et al. focused on participant performance skills that other experimenters had evaluated, we focused our study on the participants’ self-reported SFA and OFA.

In Glassman et al (2017) study, the ROIs were placed over the dlPFC covering the medial prefrontal regions, thus overlapping the frontopolar channels we used in the current study. Glassman et al. found that the relationship between social anxiety and both blood volume and deoxygenated hemoglobin (Hb) varied significantly as a function of speech performance, such that participants with LSA who performed well showed higher dlPFC activity than did those who did not perform well. In contrast, participants with HAS who performed well showed lower dlPFC activity than did those with LSA who performed well. In poor performers, dlPFC activity was slightly higher in the participants with HSA than in those with LSA. Although Glassman et al. suggested that the dlPFC activity in poor performers with HSA could reflect heightened SFA, this was not based on experimental data because they did not measure the degree of SFA. Because we here demonstrated activation in a similar brain region proportional to social anxiety and to subjective SFA, our findings support the suggestion of Glassman et al.

Conclusions

In the present study, activity in the rFPA, which was associated with SFA in Tomita et al. (2020), increased proportional to social anxiety in the absence of SFA manipulation. Besides higher social anxiety, higher subjective SFA during the no-instruction condition was also associated with increased rFPA between the no-instruction and control conditions. Freitas-Ferrari et al. (2010) observed that the limbic structure and the mPFC were most consistently related to the pathology of social anxiety and proposed that research is needed to understand further the roles of these regions in the neural circuitry of social anxiety. In the present results, SFA was essential in understanding the role of mPFC in the pathology of social anxiety.

The results of the current study suggested that when people speak publicly in social settings, greater oxy-Hb responses in the rFPA could be used as objective measurements of SFA in people with higher social anxiety. By using real-time monitoring of brain activity in these regions during social situations, we should be able to assess how changes in SFA occur in people with high social anxiety without assessing subjective SFA.

Limitations

This study has several limitations. First, we used NIRS to measure brain activity and eye movements simultaneously because NIRS is less susceptible to body movement than are other neuroimaging procedures, such as fMRI. Although using NIRS increased the ecological validity of the current study, we could not investigate the deep parts of the brain such as the amygdala and the insula that are thought to be active in the neural circuit of anxiety (Etkin & Wager, 2007).

A second limitation was that although participants reported that their state anxiety, measured by VAS, was significantly higher during the speech tasks than before the tasks, speaking in front of audiences displayed on the screen is different from social situations in everyday life. Therefore, we should attempt to replicate our results with participants speaking in front of live audiences.

The third limitation was that we did not include participants’ gender as a factor in the analyses because Tomita et al. (2020) also did not assess the effect of gender. Indeed, because there were few men, we had difficulty investigating differential brain and behavioral responses in men versus women. We used the same videos as Tomita et al. in which the gender of the negative audience members was opposite the participant’s gender to maximize the effect of the audience member who acted negatively. Future researchers should investigate the effects of participants’ and audience members’ gender by including more male participants and a balanced number of male and female negative audience members.

Fourth, regarding brain regions related only to SFA and OFA, we set a control condition to subtract brain activities other than SFA and OFA from those in the no-instruction condition. Because it was difficult to create a neutral control condition without inducing SFA and OFA while giving a speech, we instructed the participants to attend equally to various stimuli to avoid inducing SFA and OFA in the control condition. However, a neutral state of speaking without SFA or OFA and a state without SFA and OFA due to adaptive attention might not be equivalent. The part of brain activities in the present control condition might represent adaptive attention, and we might have subtracted associated brain activities from those in the no-instruction condition.