Introduction

Selective mutism (SM) is an anxiety disorder in which affected children are consistently unable to speak in certain social situations, while their speech production is not impaired in other situations, such as with close family and friends (American Psychiatric Association, 2013). Situations typically associated with the inability to speak include, for example, unfamiliar places or the presence of strangers (Schwenck et al., 2021). The disorder typically occurs between 2 and 5 of age (Muris & Ollendick, 2015; Remschmidt et al., 2001; Steinhausen et al., 2006), severely interferes with everyday life functioning (Milic et al., 2020; Schwartz et al., 2006) and is associated with mental and communicative problems in adulthood (Remschmidt et al., 2001; Steinhausen et al., 2006).

SM and (Social) Anxiety

SM was first classified as an anxiety disorder in Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM–5, American Psychiatric Association, 2013) due to an overlap with other anxiety disorders, particularly social anxiety disorder (SAD) (Muris & Ollendick, 2015). SAD is an anxiety disorder which is characterized by a marked fear of being evaluated by others in social situations (American Psychiatric Association, 2013). A recent meta-analysis demonstrates that 69% of children with SM have SAD (Driessen et al., 2020) with even higher rates, up to 100%, in most studies on SM (Gensthaler et al., 2016b; Oerbeck et al., 2014; Yeganeh et al., 2006). Despite the central importance of social anxiety for both SAD and SM (Gensthaler et al., 2016b; Muris & Ollendick, 2015; Schwenck et al., 2019; Vogel et al., 2019), it remains largely unclear why children with SM are unable to speak in certain social situations, whereas children with SAD do not. In this regard, evidence indicates that SM is associated with a more extreme fear in speech-demanding social situations than SAD (Schwenck et al., 2019) and affected children are unable to speak because they are overwhelmed by their anxiety (Black & Uhde, 1995; Muris & Ollendick, 2015). This is also supported by findings that show that children with SM are evaluated by teachers and clinicians to be more anxious than children with SAD in speech-demanding social situations, but not in nonverbal social situations (Poole et al., 2020; Yeganeh et al., 2003; Young et al., 2012). Consistently, the level of social anxiety in general (Muris & Ollendick, 2015) and level of fear in nonverbal or embarrassing social situations does not differ between the two clinical groups (Schwenck et al., 2019). Further support for the assumption that the inability to speak in certain situations in SM is caused by an extreme fear during verbal situations comes from findings that children with SM show higher levels of the temperamental style Behavioral Inhibition (BI) than children with SAD. This is especially the case for the subscale shyness, which indicates BI with regard to social situations (Gensthaler et al., 2016a). In a longitudinal study, this early inhibition to social stimuli in particular was found to predict a later inhibition of language in social interactions (Kochanska & Radke-Yarrow, 1992). In addition, a proportion of children with SM actually report a paralyzing anxiety during speech-demanding situations (Vogel et al., 2019), and are described as “frozen with fear” in the clinical literature (Anstendig, 1999; Yeganeh et al., 2003). However, this theory is based only on questionnaire data and clinical observations, and experimental studies investigating fear-related mechanisms are almost non-existent. Given the severe impairment of affected children with SM, it is surprising that there is minimal experimental research in SM so far and no disorder-specific model based on psychophysiological mechanisms such as attention processing exists. In contrast, disorder-specific models of SAD contain mechanisms that have proven to be successful in explaining symptomatology of both affected adults (Clark & Wells, 1995; Wong & Rapee, 2016) and children (Schäfer et al., 2012) and are therefore key targets of evidence-based therapy (Clark et al., 2006; Heeren et al., 2012).

Attention Processing in Anxiety Disorders

Direction of Biased Attention

One such mechanism is biased attention processing, which is considered to play a role in the development and maintenance of anxiety disorders (Dudeney et al., 2015; Mogg & Bradley, 1998). Three components of biased attention processing in anxiety disorders have been identified, each of which makes assumptions about the direction of bias and the stage of processing (Cisler & Koster, 2010). First, facilitated attention to threat is an early and automatic bias towards threat and thus is associated with a faster detection of threatening stimuli. Second, delayed disengagement from threat describes a prolonged attentional focus on threatening stimuli after a threat has been detected. Third, attentional avoidance is a strategic and late-occurring focus of attention away from the threat.

While there is strong evidence of an early attentional bias towards threat in adults with different anxiety disorders compared to healthy individuals (Bar-Haim et al., 2007), there are considerably fewer studies and mixed findings regarding the direction of attentional bias in anxious children (Dudeney et al., 2015; Lisk et al., 2020). While one meta-analysis that included both eye-tracking studies and studies using reaction time-based paradigms (e.g. dot-probe-task) found an early bias toward threat (Dudeney et al., 2015), another meta-analysis that included only eye-tracking studies found no such bias (Lisk et al., 2020). It is noticeable that most of the studies investigating attentional bias in anxious children used a transdiagnostic approach and included samples of mixed anxiety disorders (Dudeney et al., 2015; Lisk et al., 2020). However, research indicates a significantly greater effect size of attentional bias towards disorder-congruent threatening stimuli (e.g. socially relevant stimuli in SAD) compared to incongruent stimuli (Pergamin-Hight et al., 2015), which is also in line with the idea that specific fears underlie each anxiety disorder (American Psychiatric Association, 2013) and which could explain the mixed findings in samples of children with mixed anxiety disorders. For the later-occurring components, there are also mixed findings in anxious children and adolescents. At a transdiagnostic level, studies based on reaction time measures combined with longer stimulus presentations tend to indicate delayed disengagement (Dudeney et al., 2015), whereas eye-tracking studies point to attentional avoidance (Lisk et al., 2020). Given that the exact time of the onset of the later components as well as the time course of attentional disengagement is largely unclear, Lisk et al. (2020) point out the importance of differentiated analysis in different time windows for future studies.

Visual Exploration

In addition to the components regarding the direction of attentional bias, the extent of visual exploration that reflects oculomotor activity in the presence of a threat has relevance to attention processing in the context of fear (Löw et al., 2015; Rösler & Gamer, 2019; Wendt et al., 2017). Here, a reduced visual exploration, which is also called attentive freezing, is considered part of a biologically-driven defense cascade that also includes other psychophysiological fear responses such as a reduction of body movement and vocal inhibition (Kozlowska et al., 2015; Löw et al., 2015; Roelofs, 2017; Rösler & Gamer, 2019; Wendt et al., 2017). While there are few studies on the extent of visual exploration in adults with mixed findings (Chen et al., 2015; Horley et al., 2004; Löw et al., 2015; Rösler & Gamer, 2019; Toh et al., 2011; Wendt et al., 2017; Wermes et al., 2018), there are no such studies on children yet. Given that children with SM are considered to be even more inhibited during verbal social situations than children with SAD, (Poole et al., 2020; Young et al., 2012) and are described as frozen during social situations (Anstendig, 1999), (attentive) freezing might be a potential mechanism in SM.

Biased Attention Processing in Children with SAD

On a disorder-specific level, there are only three studies in children with homogenous samples of SAD based on eye-tracking (Keil et al., 2018; Schmidtendorf et al., 2018; Seefeldt et al., 2014) and one based on discrete reaction time measures (Waters et al., 2010) investigating the direction of attentional bias. While all four studies indicate the presence of an early attentional bias towards threat in children with SAD compared to healthy individuals, they are contradictory regarding the later occurring components of biased attention processing (Keil et al., 2018; Schmidtendorf et al., 2018; Seefeldt et al., 2014; Waters et al., 2010). While only children with a low symptom severity of SAD showed an avoidance of threatening faces in the study of Waters et al. (2010), the results of Seefeldt et al. (2014) indicate that children with SAD show difficulties in disengaging from threat. Schmidtendorf et al. (2018) could not replicate the finding of a relocation of attention to threat at later stages of attention processing in children with SAD and Keil et al. (2018) found a shorter fixation time to the eye-region in children with SAD compared to healthy children in an early phase. However, there were no differences at later stages of attention processing in the latter study.

Possible reasons for the mixed findings regarding the later-occurring components might be methodological flaws and variations such as differences in the symptom severity of SAD across studies or that studies differed in terms of used threatening stimuli. While only the study by Keil et al. (2018) used the eye area of a social counterpart as a threatening stimulus, the other studies defined the entire face as social threat. In adult SAD patients, however, there are several studies indicating the direct gaze of a social counterpart as especially fear-inducing in affected individuals, which suggests that it has great relevance when investigating attentional processing in SAD (Judah et al., 2019; Langer & Rodebaugh, 2013; Moukheiber et al., 2010; Rigato & Farroni, 2013; Weeks et al., 2013, 2019; Wieser et al., 2009). Another critical methodological aspect with respect to these previous studies on attention processing in children with SAD, is that all previous studies used static stimuli to induce fear. In these studies, angry faces were paired as threatening stimuli with other social or neutral stimuli and attentional bias was measured in contrast to the two stimuli. Even though it is an established and standardized procedure, the use of static and thus less naturalistic social stimuli is increasingly criticized (Lisk et al., 2020). In this regard, dynamic social situations that have a high social relevance seem to be of particular importance for the creation of an anxiety-driven attention bias in individuals with high social anxiety (Lisk et al., 2020; Risko et al., 2016; Rubo et al., 2020) and thus are better proxies for real social situations. Furthermore, no study has examined the extent of visual exploration in children with anxiety disorders, although theories suggest that this may also be an important part of the fear response.

Biased Attention Processing in Children with SM

It is striking that no studies to date have investigated attention processing in children with SM. Given that social anxiety also lies at the heart of SM, which leads to high rates of comorbid SAD in SM and inability to speak occurs during an expectation to speak (APA, 2013), it can be assumed that both social-evaluative as well as speech-demanding situations are disorder-congruent in SM. Empirical evidence also supports this assumption (Schwenck et al., 2019). Given that clinical case reports suggest that some children with SM avoid eye contact in social situations (Kovac & Furr, 2019; Muris & Ollendick, 2021; Wong, 2010), direct eye-contact might also be a disorder-congruent threat in SM.

Current Study

This is the first study to investigate the attention processing of threat in children with SM. We also aim to investigate whether and how components of the attentional bias differ between children with SM and children with SAD in order to identify disorder-specific mechanisms of SM. According to the research outlined above (Pergamin-Hight et al., 2015; Schwenck et al., 2019; Weeks et al., 2013), we applied dynamic video-stimuli of high social relevance, which included the three conditions: questions, social-evaluative and neutral statements. As the dependent variable, we examined fixation time on the eye region of the social counterpart during different phases of attention processing as well as the amount of visual exploration. Both variables were measured using eye tracking. The following three research questions will be examined:

  1. 1.

    Given that social anxiety lies at the heart of both SM and SAD and given that previous studies indicate an early attentional bias towards disorder-congruent threat in socially-anxious children, we predicted that children with SM and children with SAD would show an early attentional bias towards a social counterpart’s eye-region in disorder-congruent situations compared to typically developing (TD) children.

    1. 1.1:

      Given that questions are disorder-congruent in children with SM, as they show a stronger fear-inducing effect in children with SM compared to children with SAD, who in turn experience higher levels of fear when asked a question than children with TD (Schwenck et al., 2019), we expect a stronger bias towards threat in children with SM compared to the other groups and a stronger bias in children with SAD compared to children with TD (SM > SAD > TD).

    2. 1.2:

      We predict that both children with SM and children with SAD would demonstrate a bias towards social-evaluative threat (SM = SAD > TD).

  2. 2.

    Both the investigation of delayed disengagement and avoidance will be exploratory as studies of later-occurring components of attentional bias in socially anxious children produced mixed results.

  3. 3.

    Given that there has been no previous study of the extent of eye movements in anxious children, we investigate the length of scanpaths in the presence of threat in an exploratory manner. In this context, we aim to compare the scanpath length between the three groups (SM, SAD, TD). Because possible reduced visual exploration is related to the concept of attentive freezing and given that BI, which is a key risk factor for SM and SAD, is associated with the inhibition of motor function, we also want to investigate whether the level of BI predicts visual exploration.

Materials and Methods

Sample

Children aged 8 to 12 years old with either SM or SAD or both were recruited in rural and urban areas throughout the state of Hesse (Germany) via in-patient and out-patient clinics, speech therapists, schools and via communications with households in Giessen (Germany). The participation was compensated with a voucher worth €20 (approx. $24). Children of the TD group were recruited from an existing database, via online advertisements and newsletters. Initially, a total of n = 159 caregivers of children took part in an online questionnaire on the platform UNIPARK to screen for symptoms of SM and SAD. Regarding the screening instruments, 44 children met both criteria for SM and SAD, 29 met criteria exclusively for SAD, 11 children met criteria exclusively for SM and 75 children did not meet criteria for either SM or SAD. A total number of 95 caregivers gave written consent and took part in the experiment. The remaining 65 of the initial 159 individuals did not agree to further participate in the experiment and decided not to continue with the study. We visited the families at home in order to conduct the Kinder-DIPS with caregivers and the experimental paradigm with the children. We conducted the experiment at the families’ homes because we hoped to reach families in a larger area and to be able to include individuals with more severe symptoms, as children with SM and SAD have higher levels of anxiety in unknown places and tend to avoid them. Due to missing data caused by technical problems, four children had to be excluded. A number of seven of the 91 remaining individuals had a tracking ratio of less than 50% during the experiment, so that these children were also excluded according to previous research (Hartmann & Schwenck, 2020). The children excluded based on the tracking ratio did not differ from the final sample in age (p = 0.247), gender (p = 0.399), or SM symptomatology (p = 0.840) and SAD symptomatology (p = 0.714), so this is unlikely to be a selective drop-out. The final sample consisted of 84 children, from which 28 met the primary diagnosis of SM and were assigned to the SM group. Children who met the criteria for SM were assigned to the SM group regardless of whether they had a comorbid SAD or not. Of the 28 childrenwith SM, 25 (89.3%) also met the criteria for SAD, which is in line with previous research. According to the DSM-5 criteria, a child with SM does not additionally meet the criteria for SAD if, for example, he or she does not show anxiety towards other children, as this is a prerequisite for the SAD diagnosis. Twenty-eight (n = 28) children solely met the diagnosis of SAD and thus were assigned to the SAD-group and 28 showed no mental disorder and thus were assigned to the TD. All diagnoses, and thus assignment to groups, were based on DSM-5 criteria using a structured clinical interview (Kinder-DIPS) with parents. Experimenters who have conducted the Kinder-DIPS were adequately trained and were either psychologists or advanced students of psychology. A list of comorbidities is provided in the supplements. There was no significant difference regarding comorbidities between the SM and SAD groups. The mean age of our sample was M = 9.71 (SD = 1.25). Sample characteristics are displayed in Table 1.

Table 1 Sample characteristics

Materials

Diagnostic Interview for Mental Disorders in Children and Adolescents (Kinder-DIPS)

The Kinder-DIPS (Margraf et al., 2017; Schneider et al., 2017) is a structured clinical interview enabling the diagnosis of frequent mental disorders of childhood and adolescence according to DSM-5 and ICD-10. The interview has a high acceptance by both interviewers and interviewees (Neuschwander et al., 2017) as well as good to very good interrater-reliability (Neuschwander et al., 2013). In the present study, Kinder-DIPS was used to diagnose the mental disorder or to rule out the presence of such a disorder. Individuals who met the DSM-5 criteria for SM were assigned to the SM group independently of whether they also fulfilled the criteria for SAD. Individuals who only met the DSM-5 criteria for SAD but not for SM were assigned to the SAD group and individuals who did not meet any mental disorder criteria were assigned to the TD group.

Frankfurt Scale of Selective Mutism (FSSM)

The FSSM (Gensthaler et al., 2020b) is a parent-rated questionnaire assessing symptoms of SM in children and adolescents aged 3–18 years. The FSSM includes a diagnostic scale (DS) consisting of ten dichotomous items (yes–no) on the child's general speech pattern, with a cut-off value of 6 or 7 indicating the presence of SM, depending on the developmentally adapted version. Developmentally adapted versions are available for kindergarten age, school-age children between 6 and 11, and adolescents from 12 to 18. In addition, the FSSM provides a Severity Scale (SS) that can be used to dimensionally assess the symptom severity of SM for each version. Depending on the developmentally adapted version, this comprises either 41 items for kindergarten age or 42 items for the other two versions on speech behavior in different situations, taking into account the factors of verbal content, person and place. The questions are answered on a 5-point Likert scale and a total sum score can be calculated. In the present study, we formed z-scores of the SS in order to integrate sum scores of the different developmentally adapted versions to a joint total score. For this purpose, we z-standardized the mean SS scores of the different developmentally adapted versions for each child and created a new variable with the comparable z-score. Receiver operating characteristics analysis, which assesses the balance between sensitivity and specificity of a diagnostic instrument, indicated very good differentiation between children with SM, SAD and children without a mental disorder in the original sample of the FSSM (Gensthaler et al., 2020b). The authors of the questionnaire report high validity, as the SS of the FSSM correlates significantly with clinicians' symptom ratings (Gensthaler et al., 2020b). Previous reports also indicate excellent reliability (Cronbach’s α = 0.90–0.98) for the FSSM (Gensthaler et al., 2020b). The reliability was excellent in our sample as well (α = 0.951–0.959).

Social Phobia and Anxiety Inventory for Children (SPAI-C)

We adopted the German version of the SPAI-C (Beidel et al., 2000; Melfsen et al., 2011), measuring self-reported symptoms of social anxiety. The questionnaire consists of 26 items with a 3-point Likert scale concerning different social situations. Scores ranged from 0 to 52. Validity is considered as high (Kley et al., 2012; Melfsen et al., 2011). Previous reports also indicate excellent reliability (Cronbach's Alpha = 0.92) for the SPAI-C (Kley et al., 2012; Melfsen et al., 2011). The reliability was excellent in our sample as well (α = 0.959). Beidel et al. (2000) reported a cut-off score of 18, which differentiates well between children with SAD and non-socially anxious children.

Retrospective Infant Behavioral Inhibition Scale (RIBI)

The RIBI (Gensthaler et al., 2013; Gensthaler et al., 2020a) is a questionnaire assessing the child's BI regarding the first two years of life based on a retrospective parent report. The RIBI includes the subscales Distress to Novelty, Fear and Shyness and is summed up to a total score of BI. Items are answered on a 5-point scale (0 = Yes, 1 = more likely Yes, 2 = partly, 3 = more likely Not, 4 = Not). The test has excellent reliability (α > 0.90) and convergent validity indicated by positive correlations with questionnaires assessing BI as well as the behavioral observation of BI at 14 months of age. The reliability was excellent in our sample as well (α = 0.909).

Video Task

The self-constructed video task consists of a set of 39 trials (13 trials × 3 conditions) containing a fixation cross (randomly presented for 2–4 s) followed by a video-sequence (2–6 s) and a free viewing task (4 s). We created two identical sets of 36 videos with one female and one male adult amateur actor each (both were in their mid-20 s). In each video, the actor formulates either (1) a question ("How are you feeling today?"), (2) a negative social-evaluative statement ("You don't look good today!") or (3) a neutral statement ("I feel pretty good today."). During the free-viewing task, we presented the actor’s face as a static image for 4 s after his/her question or statement was finished. The rationale here was that we could thus study gaze behavior both during dynamic interaction and in response to a question or a neutral or evaluative statement (after the question or statement was expressed). The actors looked into the camera throughout the sequences as well as the free-viewing task as if they were addressing the children directly. For the purpose of the standardization of the actors’ position, the size of the face (55% in relation to the background), facial expression and clothes were kept constant throughout all videos. The length of the videos including the free-viewing task varies between 6 and 10 s (M = 7.90, SD = 0.94). Between conditions, on average the videos did not differ in length. To check whether the different lengths (6, 7, 8, 9 and 10 s) of the videos have an influence on gaze behavior, we investigated this for scanpath and fixation time in a mixed ANOVA (length x group). There was neither a main effect for length nor an interaction. To test whether the measured variables fixation time (in ms) and scanpath length (in px) were reliable in the present paradigm, we calculated Cronbach's alpha for both variables across the 13 trials per condition (3 × 13 trials). Fixation time exhibited excellent reliability scores for the whole video (α = 0.948–0.955) as well as for the 4 s free viewing task (α = 0.915–0.922) and good to very good reliability scores for the first 500 ms of the whole video (α = 0.756–0.808). The scanpath length had very good reliability scores for the whole video (α = 0.800–0.873) as well as for the 4 s free viewing task (α = 0.814–0.820).

Procedure

After parents participated in the online questionnaire to complete the FSSM, two experimenters visited the families at home to conduct the Kinder-DIPS with caregivers and the experimental paradigm with the children simultaneously but in separate rooms. While one experimenter conducted the interview with the parents, the other experimenter conducted various experiments with the child. Prior to the eye-tracking experiment published here, a physiological measurement was performed, which will be published elsewhere. The child had a standardized 5-min break before the eye-tracking experiment began. To run the experimental paradigm, the experimenters brought a laptop with an eye-tracker and installed it at a table, in front of which the child was seated with a chair. Each child was placed in a standardized position at a distance of 60 cm from the front of the screen (DELL Precision M4800, 17 inch) on which a remote eye tracker was mounted (SMI RED 250 mobile). The sound of the videos was played through the headphones that the children wore during the experiment. Children were instructed by the experimenter to sit as calmly as possible in front of the screen to avoid motion artifacts. They were also informed that they were about to watch some videos in which a person would speak to them. They were instructed to watch the videos and not answer back. The stimuli were presented using the SMI Experiment Center and eye-tracking data was recorded continuously. A 5-point calibration was performed with a tolerated deviation of 0.5 degrees followed by a validation step with an identical tolerated deviation. Trials were presented in a randomized order, whether the video was presented with a female or male actor was also chosen at random. After finishing the video task, children had time to complete the SPAI-C. The whole study session at the families’ home lasted about two hours. The Local Ethics Committee of the Department of Psychology of the University of Giessen approved the study.

Data Preparation and Statistical Analysis

Data Preparation and Pre-analysis on Whole Video Sequences

According to previous studies using direct gaze as threatening stimuli (Weeks et al., 2013), we created an area of interest (AOI) around the eye region using BeGaze SMI-Software. All statistical analyses regarding components of attentional bias and visual exploration have been conducted in SPSS 26 using an alpha level of 0.05 and Bonferroni correction for multiple testing. We calculated correlations between the dependent variables and the score of BI as well as symptom scores of SAD and SM in the whole sample (N = 84). For both the investigation of possible attentional bias and the extent of visual exploration, we first (a) performed analyses for the complete video sequences independently of the conditions. We did this to investigate whether the eye contact of the depicted social counterpart, which represents a threatening stimulus for the clinical groups, is associated with an altered fixation time in ms or the extent of visual exploration across all videos. As a marker of the extent of visual exploration or eye-movement, we analyzed the length of the scanpath in pixels according to previous studies (Horley et al., 2004; Toh et al., 2011).

Early Attentional Bias Towards Threat

Regarding an early attentional bias towards the eye region directly when the social counterpart is displayed on the screen, we (b) analyzed according to previous research (Waters et al., 2010) the fixation time in ms on the eye-region during the first 500 ms of the whole video sequences. We further investigated (c) the first 500 ms of the 4 s free-viewing time window following the video sequences. Here, we wanted to examine an early attentional bias towards the eye-region in response to each condition (question, evaluation, neutral statement) as formulated in hypothesis 1.

Late-Occurring Attentional Biases Towards Threat: Avoidance and Delayed Disengagement

To assess the potential late-occurring attentional components delayed disengagement and avoidance in reaction to the videos and because analyzing the entire video might blur the effect, we examined (d) fixation time in ms for the second half of the free-viewing task (the last 2 s). For a higher resolution, we additionally divided the 4-s sequence into eight blocks of 500 ms each, and (e) we examined the course of attention over the time intervals according to previous studies (Lisk et al., 2020; Schmidtendorf et al., 2018).

Extent of Visual Exploration

For the extent of visual exploration or eye-movement in response to the three conditions, we (f) investigated the length scanpath in pixels for 4 s of the free-viewing task. Given that the extent of visual exploration is associated with the construct of attentive freezing (Rösler & Gamer, 2019), we calculated correlations between length of the scanpath and freezing-related items (item 3: “Is your child incapable in certain situations of shaking his/her head, of nodding or of pointing to something when asked to?”, item 4: “Do his/her movements seem slow or frozen-like to you in certain situations?” and item 5: “Does your child’s facial expression appear less vivid or even expressionless and “frozen” in certain situations?”) of the DS of FSSM (Gensthaler et al., 2020b). Due to the limited sample size, it was not possible to test if these items’ load on a joint factor using an exploratory factor analysis. However, the three items showed substantial correlations among themselves (r = 0.640–0.347), in contrast to the other items of the DS.

Performed Analyses

a & b: In order to investigate the fixation time on the eye-region during the whole video, the fixation time on eye-region during the first 500 ms of the whole video sequence and length of the scanpath during the whole video sequence, we conducted a multivariate analysis of variance (MANOVA) comparing groups regarding the above-mentioned dependent variables.

c & d: In order to investigate fixation time on the eye-region during the first 500 ms of 4 s-free viewing task and the fixation time on the eye-region during the second half of the 4 s-free viewing task, we conducted 3 (group) × 3 (condition) repeated measures analyses of variance (ANOVA) for each of these dependent variables. Group (SM, SAD, TD) served as between-subject independent variable and condition as within-subjects variable.

e: Regarding the analysis of the 4 s-sequence in 500 ms time windows, we calculated a 3 (condition) × 3 (group) × 8 (8 time intervals a 500 ms) repeated measures ANOVA for fixation time in ms.

f: In order to investigate the length of the scanpath during the 4 s free viewing task, we conducted a 3 (group) × 3 (condition) repeated measures analyses of variance (ANOVA). Group (SM, SAD, TD) served as between-subject independent variable and condition as within-subjects variable.

The MANOVA (analyses: a & b) as well as ANOVA performed for the scanpath and non-exploratory fixation time variables (analyses: c, d and f) met all assumptions as the variables follow a normal distribution and Mauchly's tests for sphericity were not significant. Given that the Mauchly test for the ANOVA performed for exploratory analyses of the time course (e) was violated, a Greenhouse–Geisser adjustment was used to correct violations of sphericity.

Results

Descriptive Statistics and Correlational Analyses

Demographics and mean scores on psychometric measures for all three groups are presented in Table 1. Groups did not differ significantly regarding age, gender or quality of eye-tracking data. In line with previous studies, we found group differences regarding diagnostic scale as well as severity scale of the FSSM (Gensthaler et al., 2020b), elevated levels of trait social anxiety and BI in both children with SM and SAD indicated by the SPAI-C score (Muris & Ollendick, 2015) and RIBI-score (Gensthaler et al., 2016a) respectively. Additionally, there were no significant correlations between age, gender and the dependent variables (Table 2). Regarding the exploratory sum score of the three freezing items of the FSSM, we found group differences (p < 0.001, SM > SAD > TD) with the highest score was in children with SM (range: 0–3, M = 1.96, SD = 1.10), followed by children with SAD (M = 0.96, SD = 1.07) and the lowest score was seen in children with TD (M = 0.11, SD = 0.57).

Table 2 Correlations between dependent variables of gaze behavior analyzed for whole video sequences and age, gender, and questionnaire sum scores based on the whole sample (N = 84)

The correlation analysis revealed a negative relation between the symptom severity score of SM and fixation time on the eye-region across all video sequences in the whole sample (N = 84). We did not find any significant relation between SAD-symptoms or BI-scores and gaze behavior during complete video sequences (Table 2).

Attentional Bias

Regarding (a) the fixation time on the eye-region (F(2,81) = 1.030, p = 0.362, η2 = 0.025) for complete video sequences we did not find group differences.

Early Attentional Bias

Regarding (b) the fixation time on eye-region during the first 500 ms of the whole video sequences (F(2,81) = 0.746, p = 0.477, η2 = 0.018), we did not find group differences. Further, we did not find any main effect for condition (F(2,162) = 0.018, p = 0.982, η2 = 0.001), group (F(2,80) = 0.379, p = 0.686, η2 = 0.009) or interaction (F(4,162) = 0.352, p = 0.842, η2 = 0.009) for (c) the fixation time on eye-region during the first 500 ms of the 4 s-free-viewing task in response to conditions.

Delayed Disengagement and Avoidance

Regarding analysis of a potential delayed disengagement or avoidance, we did not find any main effect for condition (F(2,162) = 0.340, p = 0.712, η2 = 0.004), group (F(2,160) = 1.013, p = 0.368, η2 = 0.024) or interaction (F(4,162) = 0.446, p = 0.775, η2 = 0.011) for the second half of the 4 s—free-viewing task.

Time Course of Gaze Behavior During the 4 s Free Viewing Task

We displayed the time course of attention for each condition in Figs. 1, 2 and 3. There was a main effect for time interval (F(181.779, 4.477) = 6.139, p < 0.001, η2 = 0.071). As contrasts revealed a significant linear decline of fixation time on the eye-region over the course of attention (p < 0.001, η2 = 0.167), individuals looked increasingly less into the eyes of the social counterpart as the duration of the stimulus presentation progressed. We neither found a main effect for group (F(626.602, 2) = 0.215, p = 0.807, η2 = 0.005) nor any interactions for condition x group (F(7263.199, 3.537) = 0.215, p = 0.807, η2 = 0.007), condition x time (F(2168.349, 9.226) = 0.495, p < 0.882, η2 = 0.006) or time x group (F(7984.392, 8.953) = 1.348, p = 0.174, η2 = 0.033). Given that the interaction of group x time did not reach a significant level, a similar decline of fixation time on the eye-region for each group is suggested.

Fig. 1
figure 1

Fixation time in ms on the AOI of eye-region during the 4 s free-viewing task averaged per group across all videos of condition question. SM Selective Mutism, SAD Social Anxiety Disorder, TD Typical Development

Fig. 2
figure 2

Fixation time in ms on the AOI of eye-region during the 4 s free-viewing task averaged per group across all videos of condition evaluation. SM Selective Mutism, SAD Social Anxiety Disorder, TD Typical Development

Fig. 3
figure 3

Fixation time in ms on the AOI of eye-region during the 4 s free-viewing task averaged per group across all videos of condition neutral. SM Selective Mutism, SAD Social Anxiety Disorder, TD Typical Development

Extent of Visual Exploration

Main Analysis

Regarding (a) the length of the scanpath (F(2,81) = 0.174, p = 0.840, η2 = 0.004) for complete video sequences we did not find group differences. Groups differed regarding the length of scanpath during the 4 s free-viewing task (F(2,81) = 5.839, p = 0.004, η2 = 0.126). We neither found a main effect of condition (F(2,162) = 0.211, p = 0.810, η2 = 0.003) nor an interaction of condition and group (F(4,162) = 0.771, p = 0.535, η2 = 0.019), indicating that group differences did not depend on condition. Bonferroni-corrected post-hoc tests revealed a significantly lower length of scanpath in the SM-group compared to the TD-group (p = 0.003; SM < TD). There were no differences between groups of SM and SAD (p = 0.542) and groups of SAD and TD (p = 0.133). Because we only found a significant main effect for group, but not a significant interaction for group x condition, we could not perform group comparisons per condition. The extent of visual exploration of groups per condition are displayed in Fig. 4.

Fig. 4
figure 4

Total length of Scanpath in number of pixel during the 4 s free-viewing task averaged per group across all videos of each condition. SM Selective Mutism, SAD Social Anxiety Disorder, TD Typical Development

Further Analysis

In order to additionally test whether group differences in the scanpath are due to a fundamentally lower level of visual exploration or a longer attentional focus towards another area, we performed two further analyses with the Bonferroni-corrected alpha level (α = 0.025). Groups did not differ regarding the length of the scanpath during the presentation of the fixation crosses (as a kind of baseline condition without a threatening stimulus) based on an ANOVA (F(161.632, 2) = 0.424, p = 0.656), indicating a similar level of visual exploration across groups. Furthermore, groups did not differ on the duration of all detected fixations on the screen (independent of AOI) during the 4 s free-viewing task based on an ANOVA (F(115.530, 2) = 1.405, p = 0.251), indicating on average a similar duration of fixation time on any area across groups. Fixation detection parameters were a minimum focus duration of 80 ms with a maximal dispersion of 2°.

Correlational analysis regarding 4 s free-viewing task in response to conditions revealed a negative relation between the exploratory calculated score of freezing-items of FSSM-DS and the length of the scanpath for all conditions (Table 3) in the whole sample (N = 84). As the extent of freezing increases, the extent of visual exploration decreases. Because of the exploratory nature of these correlations, we adjusted for multiple testing according to the conservative Bonferroni correction. The adjusted alpha level for this is α = 0.001.

Table 3 Correlations between dependent variables of gaze behavior during 4 s-free viewing task and age, gender, and questionnaire sum scores based on the whole sample (N = 84)

To further investigate which variable is able to predict the extent of visual exploration while statistically controlling for the remaining variables, we also performed a multiple regression with SPAI-C, FSSM-SS, and FSSM-Freezing items. In the significant model (R2 = 0.142, F(3, 83) = 4.407, p = 0.006) only the freezing items (β = − 0.412, p = 0.004) predicted the length of the scanpath, while SPAI-C (β = − 0.001, p = 0.991) and FSSM-SS (β = 0.059, p = 0.401) did not have an influence on the extent of visual exploration. Due to the exploratory nature of the regression, we again applied a Bonferroni alpha level correction per each included predictor. The adjusted significance level is α = 0.016.

Sensitivity Analysis

To check whether we had sufficient statistical power for the analyses we performed, we conducted a sensitivity analysis as recommended in the literature (Perugini et al., 2018). The mixed ANOVAs comparing the dependent variables between groups for the three conditions (question, evaluation, neutral) with conservatively expected small effect size of a group x condition interaction and correlations found between within-variables of average r = 0.80 between the three conditions, had a power of 0.81. The mixed ANOVAs based on the eight time intervals and correlations of an average of r = 0.65 between the eight time intervals within conditions, had a power of 0.81. Dimensional analyses (correlations and regression) based on the entire sample of the current study also had a sufficient power (0.84–0.88) assuming medium effect sizes as reported in the literature (Seefeldt et al., 2014). Group comparisons on the dependent variables and conducted MANOVA were slightly underpowered with a power of 0.64 and 0.65, assuming medium effect sizes between socially anxious and healthy children of the same age group as reported in the literature (Seefeldt et al., 2014).

Discussion

The purpose of the present study was to examine different components of attentional bias as well as the extent of visual exploration in the presence of threat in children with SM compared to children with SAD and children with TD. We measured attentional processing using eye-tracking during videos displaying a social counterpart directly looking at the child as well as during a free-viewing task that followed these video sequences of threatening disorder-congruent situations.

Early Attentional Bias Towards Threat

Results contradict our assumption that children with SAD and children with SM show an early attentional bias towards threat, which was indicated by previous studies in socially anxious children (Keil et al., 2018; Schmidtendorf et al., 2018; Seefeldt et al., 2014; Waters et al., 2010). Contradictory findings might be explained by differences between paradigms, as our study is the first to investigate attentional bias in children with SAD during the course of a dynamic social situation. Most previous studies in children with SAD have measured attention bias by contrasting threatening and neutral static stimuli. Additionally, so far only Keil et al. (2018) used the eye region of the counterpart as threatening stimulus in children with SAD, whereas the other disorder-specific studies in children with SAD each used the entire face (Schmidtendorf et al., 2018; Seefeldt et al., 2014; Waters et al., 2010). In the study done by Keil et al. (2018), children with SAD showed an early attentional bias toward the eye area in only one of three conditions. This raises the question regarding the robustness of the early bias towards threatening stimuli in children with SAD when a differentiated analysis of gaze behavior with respect to the eye area is performed, which is, according to previous research, of high importance (Weeks et al., 2013). An additional explanation for our finding of no attentional bias, especially in children with SM, is that attentional focus may depend on the amount of visual exploration, masking possible effects. Given the finding that children with SM show reduced visual exploration, it could be that their eye movement is frozen, so that they show less eye-movement towards or away from the eye-region and thus demonstrate no attentional bias. Beyond these methodological and theoretical considerations, it is important to emphasize that our group comparisons had somewhat too little statistical power and thus we might have failed to detect possible effects in children with SAD or SM.

Avoidance

Our results do not indicate a stronger avoidance or delayed disengagement in children with SM and children with SAD compared to children with TD. However, we found a negative correlation between fixation time on the eye-region across conditions and symptom severity of SM across the entire sample. Taking into account that our group comparisons were underpowered, it stands to reason that we could not detect this potential effect in the contrast between groups and thus captured it only based on the dimensional analyses. In addition to other possibilities that might explain why a child gazes at a certain stimulus for less time (for example, loss of interest in the stimulus or a generally reduced attention span), avoiding a threatening stimulus seems to be a reasonable explanation in the context of SM. This would be in line with the conceptualization of SM as an anxiety disorder and findings that avoidance of threatening stimuli is a component of attention processing in the context of anxiety (Cisler & Koster, 2010). Interestingly, several case reports already exist suggesting that some children with SM avoid eye contact due to fear (Kovac & Furr, 2019; Muris & Ollendick, 2015; Wong, 2010). Given that there are numerous studies showing that socially anxious adult individuals avoid eye contact in dynamic social situations and social anxiety is a core feature of SM (Muris & Ollendick, 2021), it would seem reasonable to assume that this relationship may be driven by social anxiety. However, in our study, we did not find a correlation between social anxiety and avoidance of eye contact. In previous research on children with SAD, the only study that examined gaze behavior with respect to the eye region in socially anxious children also did not find attentional avoidance in children with SAD. Although speculative, this may suggest that social anxiety does not yet play a role in relation to gaze avoidance in children, and gaze avoidance may rather be a mechanism associated with the symptomatology of SM in children.

We also found a gradual decrease of attention on, and thus probably an avoidance of, the social counterpart’s eye-region over time in response to the videos for all three groups. This finding could be in line with developmental research (Dudeney et al., 2015; Field & Lester, 2010), which suggests that all individuals, healthy ones included, display an attentional bias. While healthy individuals learn to regulate their attention during development, anxious individuals retain an attentional bias regarding threat. It is suggested that this process is associated with a lack of maturation of top-down regulatory cognitive functions in anxious individuals (Dudeney et al., 2015). Again reasons other than avoidance could also be considered as reason why a decrease in focusing on the eye region with the duration of stimulus presentation have been observed. For example, preliminary research suggests that boredom is also associated with a loss of attentional focus (Kim et al., 2018).

Extent of Visual Exploration

Interestingly, we found that only children with SM had a significantly lower visual exploration compared to children with TD. Although only children with SM differed from children with TD in terms of the scanpath length, there seems to be a gradient in the extent of visual exploration across groups, with children with SM showing the strongest inhibition of visual exploration (see Fig. 4). Because our group comparisons were underpowered, we may not have detected a possible difference between children with SAD (without SM) and children with TD. Therefore, reduced visual exploration, although possibly less pronounced, might also occur in children with SAD. Additionally, given that a large proportion of children with SM in our study also met criteria for SAD, it is questionable whether reduced visual exploration is only a characteristic of children with SM and SAD or also occurs in non-socially anxious children with SM. Future studies should disentangle this by examining visual exploration in subgroups of children with SM with sufficiently large sample sizes.

Various explanations are possible for the finding of reduced visual exploration in our SM-group. For example, it could be that children with SM show a fundamentally lower level of visual exploration than children with TD or focus longer on another area during social interaction and remain there with their attention. However, both explanations are contradicted by the result that the groups did not differ on visual exploration during the presentation of the fixation crosses as well as regarding the average duration of fixations, irrespective of areas of interest. However speculative, the current finding might be an indicator of the involvement of the mechanism of attentive freezing in SM, which is also associated with reduced eye-movement and thus a reduced visual exploration. This would be in line with previous research which indicates an association between reduced visual exploration and the fear response of freezing in healthy adults (Löw et al., 2015; Rösler & Gamer, 2019; Wendt et al., 2017). Furthermore, this consideration is supported by our correlation analysis (see Table 3) as well as the conducted multiple regression, in which reduced visual exploration (indicated by length of the scanpath) was predicted by the freezing-items of the FSSM.

Implications for SM-Symptomatology

Although it can only be speculated based on the current results whether reduced exploration was due to attentive freezing, there is evidence that suggests that this may be linked with the inability to speak in certain situations in children with SM. Attentive freezing is considered as part of a biologically-driven defense cascade that occurs across species (Kozlowska et al., 2015) and includes a pattern of psychophysiological reactions such as a decline of heart rate as well as reduced motor activity including eye-movement and vocal inhibition (Kozlowska et al., 2015; Roelofs, 2017; Rösler & Gamer, 2019). Findings that children with SM display high levels of BI (Gensthaler et al., 2016a), are described as frozen with fear by clinicians (Anstendig, 1999) and report a paralyzing fear themselves (Vogel et al., 2019) might also support this assumption. Furthermore, the mechanism of freezing would also provide an explanation as to why in the current study children with SM showed reduced visual exploration in general, regardless of whether the situation contained a question or not. Because of the assumed biological foundation of the mechanism, freezing might be compared to attentional biases that depend on disorder-congruency of stimuli (Pergamin-Hight et al., 2015), less dependent on learning experiences and on the content of the social situation (i.e., whether it has a speech component or not). Consistently, evidence from recent research indicates that eye contact and the presence of strangers per se, induces fear in children with SM (Schwenck et al., 2021) and that children with SM exhibit longer latency to movement, even in social situations where they do not need to speak (Milic et al., 2020).

In order to draw a valid conclusion regarding whether the reduced visual exploration found in children with SM can be explained by freezing and in order to determine whether freezing is involved in SM-symptomatology, it would be important for future studies to also assess other features of freezing (e.g., physiological responses) during a task that requires speech-production in children with SM. Although we did not find an association between visual exploration and retrospectively recorded BI in our study, due to the conceptual proximity of freezing and BI on the one hand and BI and SM on the other, it would be important to further investigate the interaction of these variables. In this context, research in very young children with SM or high BI that is not dependent on retrospective data would be important.

Clinical Implications

Our findings might also have important clinical implications. Our results suggest that a frozen motor activity might be involved in SM-symptomatology. Furthermore, they indicate that this inhibition occurs in various types of social situations, including neutral situations that do not involve an expectation to speak or a social evaluation. For therapy, this means that even in the absence of an expectation to speak, and despite the use of techniques such as defocused communication, increased inhibition in children with SM might be expected. Thus, interventions that counteract this state of freezing during social situations might be promising. Although this is the first finding that suggests that it may be important to address the state of freezing in therapy, this approach is already found in the therapeutic literature on SM. Here, for example, it is described that activation exercises can be applied as a supportive element of an exposure (McHolm et al., 2005). Another clinical implication may arise from the finding that the symptom severity of SM is associated with avoidance of eye contact, which has already been described in single-case studies. This might suggest that eye contact is experienced as aversive by children with SM and is consequently avoided. Consequently, clinicians should be aware that direct eye contact may be counterproductive when interacting with children with SM during defocused communication. However, over time, learning to maintain direct eye contact could be a valuable target for exposure therapy.

Limitations

The current study has some limitations to acknowledge. First, comparability between our study and previously conducted studies on attentional bias in socially anxious children is limited due to differences in the applied paradigm as well as a descriptively lower symptom level of SAD in our sample compared to samples of previous studies (Schmidtendorf et al., 2018; Seefeldt et al., 2014). Second, due to a rather narrow age range of 8–12 years for our sample, applying these results to the typical onset of pre-school age or to early phases of SM might be limited. Third, three of the children assigned to the SM group in our study, consistent with findings on comorbidity rates between SM and SAD, did not have comorbid SAD. In contrast to the regression analyses based on SM- and SAD-symptomatology, the group comparison in our study does not allow an entirely accurate conclusion about which findings are specific to children with SM and whether SM subgroups (e.g. SM with and without SAD) would differ with respect to mechanisms. Fourth, individuals did not actually have to answer during the speech-demanding condition, so that this condition might not have had the expected fear-inducing effect. Although we did not assess state anxiety with respect to the stimuli, a previous study indicates a fear-inducing effect of the chosen conditions based on a subjective anxiety level (Schwenck et al., 2019). Fifth, four of the thirteen neutral statements included statements that pertained to the person depicted in the video (e.g., "I feel pretty good today."), which could lead to attention being focused on that person. Given that a comparison of fixation time in response to these four neutral items with fixation time in response to the remaining neutral items did not reveal a difference, this does not seem to have resulted in a bias in the measures of attention. Sixth, the actors shown in the videos were adults, as is the case in previous research of gaze behavior in socially anxious children. Given that some children with SM are more likely to show symptoms in the presence of adults, while others are more likely to have difficulty in speaking in the presence of peers, it would be useful to investigate gaze behavior towards peers as well. Seventh, the group comparisons show a power that is slightly too low to detect the effects between socially anxious and healthy children assumed in the literature. However, the remaining analyses conducted in this study had sufficient statistical power.

Conclusion

In conclusion, this is the first study that has investigated attention processing in children with SM. We did not find evidence of the presence of any of the components of an attention bias in dynamic social situations in children with SM nor in children with SAD. However, we found a lower attentional focus on the eye-region to be associated with a higher level of SM-symptoms, probably indicating a relation between avoidance of eye-contact and the presence of SM. Given that there is already evidence of an early bias in children with SAD from studies with static threatening stimuli, the different findings could be due to differences in the applied paradigm. We also found that children with SM showed reduced visual exploration regardless of the video condition. This suggests that reduced visual exploration in children with SM generally occurs in social situations and does not depend on the context of the social situation. Reduced visual exploration might be explained by attentive freezing (inhibition of the visual motor system). The literature suggests that attentive freezing is part of a more fundamental psychophysiological response that may also affect speech production. Thus, this mechanism may be used to explore the occurrence of the inability to speak in certain situations in children with SM. Additional experimental research is needed to address the assumption that freezing is involved in the symptomatology of children with SM.