Introduction

Excessive worry about uncertain future events with possible negative outcome(s) increases mental distress (Borkovec et al., 1983) and plays an important role in anxiety (Borkovec & Inz, 1990; Tallis et al., 1991). The persistence and experienced uncontrollability of worry serves as an important risk and maintenance factor for Generalized Anxiety Disorder (GAD), in which worries about a wide variety of topics are prominent, but also for affective disorders in general (Davey & Wells, 2006; Hirsch & Mathews, 2012).

Worry evokes thoughts related to potential negativity, which gives rise to further worrying (Wells & Papageorgiou, 1995). That is, a vicious cycle develops, in which repeated negative thoughts are triggered and reinforced by biased informational processing, and deficits in exerting cognitive control concurrently hinder effective down-regulation of these thoughts/biases (Hirsch & Mathews, 2012). One possible factor feeding into this vicious cycle is metacognitive beliefs about worry—stable beliefs people have about their own cognitive systems and cognitive coping strategies (e.g., worry; Wells, 2019; Wells & Capobianco, 2020). They include an individual’s positive and negative beliefs about their own thinking that subsequently influence their appraisal and regulation of thoughts. In the context of worry, positive metacognitive beliefs about the use of worrying to anticipate or cope with potential threats (e.g., ‘Worrying prepares me’) contrast negative beliefs about the consequences, danger and uncontrollability of worry (e.g., ‘If I keep worrying like this, I will go crazy’). Relatedly, metacognitions refer to ‘psychological structures, knowledge, events and processes that are involved in the control, modification, and interpretation of thinking itself’ (Wells & Cartwright-Hatton, 2004, p. 385).

The Metacognitive model of GAD (Wells, 1995) addresses the specific role of metacognitive beliefs in maladaptive emotion regulation and the development and persistence of psychopathology. It posits that positive beliefs about worrying being helpful, prompt the worry process. However, once this worrying is initiated, negative beliefs about the danger and uncontrollability of worry instigate unproductive cognitive processes (e.g., suppression and avoidance). The latter contribute to the escalation of worry, interfere with effective control over worry and reinforce existing negative beliefs about the uncontrollability and danger of worry, leading to even more pervasive distress (Wells, 2019). Consequently, negative metacognitive beliefs predict and worsen the development of GAD symptoms (Nassif, 1999; Penney et al., 2013), underlining their significant role in the etiology and maintenance of excessive worrying.

Metacognitive Therapy (MCT) originated from the Metacognitive model of GAD and aims to target positive and negative metacognitive thoughts (Wells, 2008). Despite both being effective in reducing worry and GAD symptoms, MCT was found to generate better long-term recovery rates compared to Cognitive Behavioral Therapy (CBT; Nordahl et al., 2018). In addition, a recent meta-analysis reported considerable evidence for the effectiveness of individual MCT compared to CBT in a range of psychopathological complaints (Normann & Morina, 2018). However, these conclusions were drawn from a small number of studies that mostly focused on anxiety and depression and that consisted of rather small samples. Further refinement and augmentation of treatment is needed to improve response rates of MCT, as well as CBT, which both leave considerable room for improvement.

Cognitive Bias Modification (CBM) has been proposed as an augmentation strategy to existing forms of CBT that tend to have a rather explicit nature. CBM techniques have been developed to test the causal role of information processing bias in psychological phenomena (in this case worry), to understand underlying mechanisms of these biases, and to evaluate the therapeutic effects of modifying them. CBM-interpretation interventions targeting interpretation bias (i.e., the tendency to interpret ambiguous information in a negative manner), for example, have been shown to successfully facilitate more benign interpretations about potential future negativity in anxiety, and to ameliorate worry related to anxiety, as well as general repetitive negative thinking across different psychological disorders (Fodor, et al., 2020; Hayes et al., 2010; Hirsch et al., 2016, 2020; Krahé et al., 2019). These effects seems to be further enhanced when the intervention is combined with sustained imagery instructions (Feng et al., 2020; Menne-Lothmann et al., 2014). Importantly, such CBM-studies in high-worriers and individuals with GAD aim to modify the process that occurs when ambiguity is encountered in order to make the positive interpretation more accessible. The reviewed body of literature stresses the importance of negative metacognitive beliefs within the course of excessive worrying. Reappraisal of fixed, negative beliefs about the consequences, danger and uncontrollability of worry (meta-worry) may thus serve as a new avenue for modification. A study by Clerkin and Teachman (2011) indeed demonstrates CBM’s success in modifying core psychopathological beliefs through reappraisal. Following a positive reappraisal training targeting negative interpretations of intrusive thoughts, participants with heightened obsessive–compulsive disorder (OCD) symptoms revealed less OCD-relevant beliefs. This was accompanied by less negative affect in response to a stressor, underlining the potential therapeutic relevance of CBM. In addition, a single CBM training inducing either positive or negative metacognitive beliefs related to (anger-)rumination, evoked lower (rather than the expected higher) self-reported anger/aggression in response to hypothetical anger-provoking scenarios in the positive condition—but did not differ from the negative condition (Krans et al., 2014).

Thus far, the causal role of negative appraisals of metacognitive beliefs has not been addressed in the context of worry. The goal of the current study was, therefore, to investigate whether a CBM-based functional—compared to dysfunctional—reappraisal training (RT) could induce either a functional (i.e., disconfirmation of negative metacognitive beliefs) or dysfunctional (i.e., confirmation of negative metacognitive beliefs) metacognitive appraisal bias in a sample of high-worriers. The metacognitive appraisal bias referred to here, reflects the relative bias for having a functional (i.e., non-negative) over dysfunctional (negative) appraisal of one’s thinking and coping. Key to this RT, is that the reappraisal of existing dysfunctional metacognitive beliefs (rather than increased accessibility of benign interpretations as is altered in CBM-interpretation interventions) is suggested to discontinue the vicious worry circle. To examine the potential effects of the RT on negative thoughts (directly following training and after a worry induction) and worry-related interpretations, the Breathing Focus Task (BFT; Hirsch et al., 2009) and ambiguous open-ended stem sentences were used. In a review, Jones and Sharpe (2017) show that two meta-analyses (Beard et al., 2012; Menne-Lothmann et al., 2014) identified a higher number of sessions as a significant moderator of CBM training effects. At the same time, Cristea et al. (2016) found a single session to be more effective compared to multiple sessions. Given these inconsistencies and the proof-of-principle nature of the current study, the research aims above were addressed in a single RT session.

Participants receiving a functional RT were hypothesized to have more positive metacognitive appraisal bias. For participants in the dysfunctional RT group, the exact opposite was hypothesized (i.e., a more negative metacognitive appraisal bias). In line with expectations about the training, participants in the functional RT were hypothesized to experience fewer negative thoughts after the training and worry induction, and to reveal more positive interpretations as compared to participants in the dysfunctional RT group.

Besides the theoretical implications of understanding how negative metacognitive beliefs relate to excessive worrying, this study also aims to advance present knowledge with respect to CBM’s potential to cover a broader range of cognitive processes, and thus, whether reappraisal can alter beliefs about cognitions in general. This differs from the modification of thought content, which has already been achieved in various samples across different psychopathological disorders (e.g., Lang et al., 2009; Woud et al., 2013). In doing so, CBM of metacognitive appraisals does not require idiosyncratic stimuli that are specifically related to a particular disorder (MacLeod et al., 2009), as the used stimuli do not tap into disorder-specific aspects, but instead relate to general cognitions about the thinking itself (Wisco, 2009).

Method

Participants

A total of 635 potential participants completed the Penn State Worry Questionnaire (PSWQ; Meyer et al., 1990) prior to the experiment, as part of a larger online screening. In line with prior studies (e.g., Molina & Borkovec, 1994; Stokes & Hirsch, 2010; Williams et al., 2014), high-worrying participants with a score of 56 or higher on the PSWQ were invited to participate in the study. This score falls one SD below the mean score of individuals meeting DSM-IV GAD diagnostic criteria. Participants received course credit or payment for their participation.

The sample consisted of 81 undergraduate students (7 male; Mage = 20.69, SDage = 4.02), of which 22% were native German speakers. All participants were enrolled in a Dutch psychology program and spoke fluent Dutch. Participants were randomly assigned to either the functional (n = 40) or dysfunctional (n = 41) RT.

Baseline Measures

Worry

The Penn State Worry Questionnaire (PSWQ; Meyer et al., 1990) is a 16-item self-report questionnaire. Items (e.g., ‘I know I should not worry about things, but I just cannot help it’) were scored on a 5-point Likert-scale ranging from 1 (‘not at all typical’) to 5 (‘very typical’). Total sum scores range from 16 to 80, with higher scores indicating more worry.

Metacognitive Beliefs

The negative beliefs about uncontrollability and danger subscale of the Meta-Cognitions Questionnaire 30 (MCQ-30; Wells & Cartwright-Hatton, 2004) was used to assess negative metacognitive beliefs, a 30-item self-report questionnaire covering dimensions of worry-related metacognitive beliefs. Items (e.g., ‘My worrying is dangerous for me’) were assessed on a 4-point Likert-scale ranging from 1 (‘do not agree’) to 4 (‘agree very much’). Total subscale scores range from 6–24, higher scores indicate stronger negative beliefs.

Generalized Anxiety Symptoms

The Generalized Anxiety Disorder Questionnaire IV (GADQ-IV; Newman et al., 2002) is a 9-item diagnostic self-report measure screening for the presence of GAD symptoms. Total scores range from 0 to 12, higher scores indicate more symptoms.

Mood

Participants had to indicate their present mood state (depression, anxiety, and happiness) at four different time points throughout the experiment on a standard visual analogue scale (VAS) of 100 mm length, anchored with 0 (‘not at all’) and 100 (‘extremely’). The vertical line marked on the VAS was transformed into a percentage score (0–100%).

Reappraisal Training

Worry-related metacognitive appraisal styles were induced using 71 single-sentence vignettes that covered negative metacognitive beliefs—uncontrollability of worrying, the perceived harmfulness/danger of worrying, and the need to be in control of one’s thoughts—created based on the Meta Worry Questionnaire (MWQ; Wells, 2005). Each vignette described a worry-related metacognitive belief that ended in a to-be completed word fragment, such that the meaning remained ambiguous until the final word had been resolved. Participants had to generate a response to complete the sentence in a meaningful way. Depending on the condition (functional vs. dysfunctional RTFootnote 1), vignettes were designed such that the last word fragment produced an outcome that would either confirm or disconfirm worry-related negative metacognitive beliefs (e.g., ‘When I start to worry, I know that these thoughts will st_p/c_ntinue’, resolved as ‘stop’ (functional condition) or ‘continue’ (dysfunctional condition)). In addition, 9 filler vignettes describing a neutral situation were formulated (e.g., ‘When I watch a movie, I always pick a com_dy/thrill_r’). All training and filler vignettes were randomized across participants.

Participants were explicitly instructed to carefully read the vignettes and to imagine themselves in the described situation. Hence, they were formulated from a first-person perspective. To ensure and screen for proper processing of the vignettes’ content, 16% of the training items were followed by a comprehension question (‘yes’ or ‘no’ answer). The RT was programmed in PsychoPy (Peirce et al., 2019) and took approximately 15–20 min to complete.

A short booster RT session (20 training items, 3 filler items and 5 comprehension questions—randomly presented) was administered halfway through the procedure to ensure that training effects were not reduced by other tasks (i.e., outcome measures).

Manipulation Check

Training-related changes in metacognitive appraisal bias were assessed using a two-phase task, consisting of an initial encoding phase and a subsequent surprise recognition phase (Mathews & Mackintosh, 2000). This task has been frequently used to assess interpretation bias. For the current study, and in line with Woud et al. (2012), it was adapted to measure appraisals of worry thoughts. During the encoding phase, participants were presented with ten titles together with ten matching scenarios covering potential worry topics. For example, ‘Graduation’ (title); ‘Next year I will graduate. There are not as many jobs available as there are students. I am afraid that I will not be able to find a job’ (scenario).

During the recognition phase, participants re-encountered the ten titles, which were each presented together with four randomly ordered statements. These statements referred to the scenarios that were presented during the encoding phase. Critically, two of these were target statements and tapped into a functional metacognitive appraisal (e.g., ‘The fact that it may not be possible to find a job, is a worry that will not continue to persist forever’) or dysfunctional metacognitive appraisal (e.g., ‘The fact that it may not be possible to find a job will persist uncontrollably in my thoughts’). As such, the statements are considered metacognitive appraisals covering (the consequences of) the uncontrollability, perceived harmfulness and danger of worry. In addition, two emotional-nonspecific (i.e., not metacognitive) foil statements were presented (e.g., ‘The fact that it may not be possible to find a job is because of the economic crisis’). Participants had to rate the resemblance of the meaning of each statement to the meaning of the previously encountered scenarios on a 4-point Likert scale (1 = not at all to 4 = very much). None of the statements resembled the actual scenario’s content, but rather offered a different appraisal. This served as a way of assessing the participant’s metacognitive appraisal bias. To assess training-related effects on the metacognitive appraisal bias over time, this task was administered pre- and post-RT. Accordingly, two sets (including ten different stories with corresponding statements each) were created and assessed in a counterbalanced order. Trials were randomized within each assessment.

Outcome Measures

Negative Thoughts

To assess negative thoughts, the Breathing Focus Task (BFT; Borkovec et al., 1983) adapted by Hirsch et al. (2009) was administered. While the BFT usually consists of three phases, the current study extended it to four phases (5 min each): a pre-training breathing phase (assessment of negative thoughts before RT), a post-training breathing phase (assessment of negative thoughts directly after RT), an instructed worry period (worry induction), and a post-worry breathing phase (assessment of negative thoughts after the worry induction). This extension allowed for the analyses of the immediate effects of the RT on negative thoughts, as well as the investigation of its effects in the context of stress (i.e., following the worry induction).

During the pre-training and post-training phase, participants were instructed to focus on their breathing. At 12 random time points (20–30 s intervals), they heard a computer-generated beep, and were asked to indicate whether their attention was focused on their breathing or elsewhere. In the latter case, they were asked to categorize the thought as positive, negative or neutral, and to provide a brief description of it (e.g., positive—dinner with friends). After these 5 min, participants rated their present mood on the VAS. In addition, participants had to indicate how they experienced their performance on the BFT in terms of difficulty to focus attention, ability to focus on one’s breathing, degree to which they were distracted during the 5-min phase, and the amount of time they spent worrying (all on a standard VAS). This served as a means to assess whether the two RT groups performed equally well on the BFT. Finally, the experimenter read the brief descriptions from the 5-min phase aloud and participants were asked to elaborate on them.Footnote 2 These elaborate descriptions were audio-recorded and rated by external assessors.

A psychologist rated the valence (i.e., positive, negative, or neutral) of the elaborate thought descriptions that were recorded. Another psychologist rated 25% of the descriptions (n = 20; 10 randomly selected participants per group). The inter-rater reliability of the assessor’s ratings using Cohen’s kappa statistic (κ) revealed almost perfect agreement: .86). Both assessors were blind to the condition allocation or the type of breathing phase (i.e., baseline, post-training or post-worry phase) that they rated.

For the instructed worry period (worry induction), participants were prompted to come up with a personal, currently-relevant worry topic. This topic was briefly discussed with the experimenter to ensure that it was related to a possibly negative future situation triggering worry. Subsequently, they were instructed to worry about the topic alone in a cubicle. After the worry induction, mood and perceived performance on this induction were assessed.

Negative Interpretations

To assess RT transfer effects to worry-related interpretations, an open-ended stem sentence task was administered pre- and post-RT. Participants received 16 ambiguous open-ended stem sentences that had to be completed. Half of these sentences were possibly triggering worry (e.g., ‘You have a doctor’s appointment to discuss the results of your blood count. As you enter the doctor’s office, he starts telling you that…’), which were developed based on the Worry Domain Questionnaire (WDQ; Tallis et al., 1992). The other half were neutral control sentences (e.g., ‘Your friends give you a present for your birthday. As you open their present you think…’). Worry-triggering sentences were counterbalanced across pre- and post-assessments and randomly presented during each assessment.

All sentence endings were rated by psychologists for valence: positive, neutral, negative, or non-codable (see Supplementary Materials for coding criteria). Rating reliability revealed a substantial agreement (k = .65).

Filler Task

To avoid possible transfer of mood effects of the RT, a classic Stroop task (Stroop, 1935) was administered after the training. Two sets with a list of 40 written color names were presented (on the computer) in either matching (congruent set) or non-matching (incongruent set) color prints. Participants had to name the color of each word. Congruent and incongruent sets were counterbalanced across participants. For each set, the time it took to name the list of color names, as well as the number of mistakes, were recorded.

Procedure

After providing informed consent, participants filled out baseline questionnaires (PSWQ, GADQ-IV, MCQ-30, mood) and demographic questions. Subsequently, they completed the two-phase encoding/recognition task and open-ended stem sentence task. They then completed the pre-training breathing phase, after which mood was assessed. Following this, participants were randomly assigned to the functional or dysfunctional RT. Next, the classic Stroop task was completed, and the two-phase encoding/recognition task and open-ended stem sentence task were assessed again. A booster RT and post-training breathing phase followed, including a mood assessment. Lastly, participants completed the instructed worry period, another mood assessment, and the post-worry breathing phase that was again followed by a mood assessment. The RT was delivered in Dutch or German, depending on the participants’ native language. As all participants spoke fluent Dutch, the questionnaires were assessed in Dutch. An informal check at the end of the experiment ensured that participants were unaware of the purpose of the study. Following this, participants were debriefed. For an overview see Fig. 1.

Fig. 1
figure 1

Overview study design

Results

Participant Characteristics at Baseline

No between-group differences were found for age, sex, nationality, worry (PSWQ), GAD symptoms (GADQ-IV), negative beliefs (MCQ-30) or baseline mood ratings of happiness, anxiety, and depression (see Table 1).

Table 1 Group differences on demographic variables and baseline questionnaires

Manipulation Check (Effects of Reappraisal Training on Metacognitive Beliefs)

Change in metacognitive appraisal bias from pre- to post (assessed with the two-phase encoding/recognition task) served as a manipulation check. In line with Woud et al. (2012), ratings of target sentences including functional or dysfunctional metacognitive appraisals of the two-phase encoding/recognition task were used to calculate a metacognitive appraisal bias index-score for both pre- and post-training by subtracting the mean dysfunctional rating score from the mean functional rating score. A positive metacognitive appraisal bias index-score indicated that participants had a relatively more functional than dysfunctional appraisal bias. A negative metacognitive appraisal bias index-score reflected the opposite. Skewness (pre: − 0.043; post: 0.246) and kurtosis (pre: − 0.198; post: 0.718) were all in acceptable range for pre- and post-metacognitive appraisal bias index-scores. No outliers were detected using boxplots. Two participants were excluded due to a high percentage of incorrect answers to comprehension questions in the RT that served as an indication of training compliance and sufficient processing of the items, leaving 79 participants in the analytical sample.Footnote 3

A 2 (Time: pre-training, post-training) × 2 (Training group: functional, dysfunctional RT) repeated-measures analysis of variance (ANOVA) was conducted to assess whether negative metacognitive beliefs could be affected by RT. A main effect was found for Time, F(1,75) = 22.77, p < .001, ηp2 = .23, indicating that across groups, the bias became more positive, for Training Group, F(1,75) = 10.53, p = .002, ηp2 = .12, indicating that groups differed significantly. This was qualified by the crucial Time × Training Group interaction, F(1,75) = 12.52, p = .001, ηp2 = .14.Footnote 4

Paired-samples t-tests revealed that it was the functional training, t(37) = 6.07, p < .001, rather than the dysfunctional training, t(38) = 0.85, p = .402, that caused a change in metacognitive appraisal bias. Thus, the functional RT led participants to appraise their worry-related metacognitive beliefs as less negative. To further explore whether the two groups differed at pre- and/or post-training, additional post-hoc independent-samples t-tests were conducted and confirmed that the groups did not differ pre-training, t(75) = 1.21, p = .229, but did differ post-training, t(67.55) = 4.54, p < .001. Inspection of the means (Table 2) shows that participants in the functional RT group revealed a more positive metacognitive appraisal bias index-score following training, while in the dysfunctional training group a negative metacognitive bias index-score remained.

Table 2 Mean scores (standard deviations) on primary and secondary outcome measures

Mood After Reappraisal Training

To analyze immediate effects of the RT on mood, a 3 (Time: pre-training, post-training, post-worry induction) × 2 (Training group: functional, dysfunctional RT) × 3 (Mood type: happiness, anxiety, depression) repeated-measures ANOVA was conducted. Mauchly’s test indicated a violation of sphericity for Mood, χ2(2) = 36.74, p < .001, and Time × Mood, χ2(9) = 89.70, p < .001, therefore, degrees of freedom were corrected using Greenhouse–Geisser (ε < .75) estimates of sphericity. Main effects of Time, F(2,146) = 9.83, p < .001, ηp2 = .12, Mood, F(1.43,104.31) = 99.80, p < .001, ηp2 = .58, and interaction effects of Time × Mood, F(2.41,175.80) = 8.22, p < .001, ηp2 = .10, were found (for means see Table 2). Importantly, the Time × Group × Mood interaction revealed no significant differences between both training groups in self-reported mood (i.e., happiness, anxiety, depression) over time as a result of training type, F(4,292) = 1.48, p = .209.

Effects of Reappraisal Training on Negative Thoughts

BFT data was used to investigate RT effects on change of negative thoughts. Thus, whether participants receiving the functional RT demonstrated a decline, and participants receiving the dysfunctional RT had an increase of negative thoughts from pre- to post-RT. BFT data are hierarchical within participants (12 beeps/measurements) for each assessment (pre-training, post-training, post-worry induction), and dichotomous in nature (i.e., there are effectively only two possible outcomes on each trial, negative thought: yes or noFootnote 5), implying a binomial distribution. Moreover, number of negative thoughts was self-reported by participants and rated by external assessors. Therefore, a multivariate multilevel logistic regression analysis was conducted to test the effects of Training group (functional RT, dysfunctional RT) on the change of number of negative thoughts (self-ratings and external ratings). Training group and BFT assessment were dummy-coded, with the dysfunctional RT and post-training BFT assessment as reference categories. See Table 3 for effects. One participant was excluded for these analyses for not completing the BFT. The covariance matrix of the random effects can be found in the Supplementary Materials (Table S1 and S2). The binomial distribution fits the data well, the extra binomial parameter is 0.837 (SE = 0.096) for self-ratings and 0.885 (SE = 0.102) for external ratings.

Table 3 Multivariate multilevel analysis estimates and standard errors of the number of negative thoughts (self-ratings + external ratings)

Results demonstrated a non-significant Training group (functional RT, dysfunctional RT) × Time (pre-training, post-training) interaction for both the self-ratings (t(73) = 0.39, p = .349) and external ratings (t(73) = 0.56, p = .289). This boils down to the test of significance of the estimate of the difference between the training groups at post-training (for self-ratings: -0.116, (SE = 0.300); for external ratings: 0.188 (SE = 0.338)). Similarly, no significant Training group (functional RT, dysfunctional RT) × Time (pre-training, post-training) interactions were found regarding participant’s experienced performance on the BFT for the ability to focus attention on one’s breathing (Breathing 1; -0.028, SE = 0.045; t(73) = 0.62, p = .269), experienced difficulty to focus attention (Breathing 2; 0.022, SE = 0.057; t(73) = 0.39, p = .349), degree to which participants were distracted in doing so (Breathing 3; 0.001, SE = 0.050; t(73) = 0.02, p = .492), and the amount of time they experienced worrying (Breathing 4; 0.012, SE = 0.050; t(73) = 0.24, p = .406)—all based on self-ratings only.

In order to investigate whether the type of RT affected the change of negative thoughts from post-training to post-worry induction, the Training group (functional RT, dysfunctional RT) × Time (post-training, post-worry induction) interaction was tested. See Table 4 for effects. This interaction was not significant for self-ratings (-0.254, SE = 0.292; t(73) = 0.87, p = .194) and external ratings (0.139, SE = 0.338; t(73) = 0.41, p = .341). In line with this, no significant Training group (functional RT, dysfunctional RT) × Time (post-training, post-worry induction) interactions were found for the participant’s experienced performance on the BFT self-ratings relating to the ability to focus attention on one’s breathing (Breathing 1; -0.008, SE = 0.046; t(73) = 0.17, p = .433), the experienced difficulty to focus attention (Breathing 2; 0.006, SE = 0.058; t(73) = 0.10, p = .460), the degree to which participants were distracted in doing so (Breathing 3; -0.063, SE = 0.051; t(73) = 1.24, p = .109), and the amount of time they experienced worrying (Breathing 4; 0.014, SE = 0.050; t(73) = 0.28, p = .390).

Table 4 Multivariate multilevel analysis estimates and standard errors of the experienced performance on the BFT (self-ratings only)

Effects of Reappraisal Training on Interpretations

In line with the research questions on the effects of the RT on the appraisal of negative metacognitive beliefs and negative thoughts, it was investigated whether the type of RT affected negative interpretations of the content of thoughts. The percentage scores obtained from the negative solutions on the open-ended stem sentence (total of 16 sentences) task are ultimately proportions, which cannot be analyzed using ANOVA techniques that assume a normal distribution of the data (Jaeger, 2008). One should assume a binomial distribution, consequently, a logistic regression analysis was applied. Training group was dummy-coded and the dysfunctional RT is the reference category. Scores at pre-training were centeredFootnote 6 over all cases. The extra binomial parameter could not be calculated for the current analysis, because actual scores used in this analysis were already converted to proportions. See Table 5 for effects.

Table 5 Estimates and standard errors logistic regression analysis on negative interpretations

The Training group (functional, dysfunctional RT) × Time (pre-training, post-training) was not significant (-0.189, SE = 0.958; t(71) = 0.20, p = .421). Thus, from pre- to post-training, effects did not differ significantly for the RT groups. Interestingly, for both groups the post-training effects seem to depend on the level of negative interpretations at pre-training (2.765, SE = 0.595; t(71) = 4.65, p < .001).

Discussion

The goal of the current study was to investigate whether a functional Cognitive Bias Modification (CBM) reappraisal training (RT) was able to establish an either functional or dysfunctional metacognitive appraisal bias in high-worrying students, and whether the effect of this RT transferred to experienced negative thoughts and worry-related interpretations. The results support a general reduction of negative metacognitive appraisal bias following training. While both training groups displayed a negative appraisal bias of metacognitive beliefs at baseline, only participants in the functional RT revealed a positive metacognitive appraisal bias at post-training. Participants in the functional RT group did not show a reduction of negative thoughts from pre- to post-training. Moreover, they did not experience fewer negative thoughts or more attenuated stress levels following the worry induction, which established an increase of negative thoughts in both groups. Thus, the RT had no buffering effects in the context of stress (i.e., following a worry induction, which is expected to cause more difficulty in focusing one’s attention because of persistent negative thoughts). Finally, no changes in negative worry-relevant interpretations were found.

Despite the benefits of using a behavioral assessment and laboratory stress induction (i.e., BFT task) rather than solely relying on self-reports, it may not have been optimal for capturing effects of the training on behavior. Even though an emotionally vulnerable sample of high-worriers was included, a generally low number of thought intrusions was reported by both training groups, already before the CBM-based RT training. This points to a possible floor effectFootnote 7 in high-worrying samples. When administered in a clinical sample, however, a relatively higher number of thought intrusions is typically reported (Hirsch et al., 2013). Accordingly, it can be speculated that the use of the BFT may be restricted to clinical samples, in which negative thoughts are more frequent and dysfunctional (Ruscio & Borkovec, 2004).

A more general reason for why the training failed to induce relevant changes in negative thoughts in the context of stress might be found in the use of a single training session. Within the CBM literature, it is still a matter of debate whether spaced learning is required and how many training sessions are needed in order to produce stable enduring effects on cognitive biases (Hertel & Mathews, 2011; Koster et al., 2009; MacLeod et al., 2009). Considering that metacognitive beliefs arise from experiences and interactions with the environment over time (Flavell, 1979), it may be that several sessions over longer periods of time are needed in order to produce observable and enduring effects. Relatedly, the time of the BFT assessment may account for the absent changes in negative thoughts. Negative thoughts were here assessed immediately after the RT. Consistent application of the trained appraisal style in response to metacognitive beliefs over a longer period, however, could have led to training-congruent changes of these thoughts at a later point in time (Woud et al., 2018).

Against our expectations, effects of the RT in the current sample did not transfer to interpretations of the sentences in the trained direction. Participants in the functional RT group did not produce significantly less negative sentence resolutions compared to participants in the dysfunctional RT group. This could potentially be explained by the fact that the ambiguous sentences in the current sentence task (derived from the WDQ) may have covered worry topics that (not all) high-worriers are concerned with (Hirsch et al., 2020). In other words, these sentences might have not reflected idiosyncratic content relevant to the participant. Also, the level of negative interpretations after the RT was dependent on the level of negative interpretations before training. Higher levels of negative interpretations before training resulted in fewer negative sentence solutions following training for both RT groups, and particularly the dysfunctional RT group. Despite this reduction not being significant, participants may have attempted to attenuate negative mood using mood repair strategies (Kovacs et al., 2015).

The current study examined the effects of a reappraisal training on worry-related negative metacognitive appraisal bias and provides evidence that metacognitive appraisals relevant to high-worriers can be modified by means of a computerized training. At this point, however, no conclusions about long-lasting changes to metacognitive beliefs can be drawn as this would warrant a follow-up assessment (e.g., by using the MCQ-30) over a longer period of time. Also, no training effects can be declared at a behavioral level in the context of stress. Moreover, given that the non-clinical sample in the current study only demonstrated excessive worry, it would be premature to draw conclusions regarding the effectiveness of the training in clinical samples. A necessary subsequent step in future studies would, therefore, be to examine the effects of the RT in a (clinical) population with pathological or excessive worry, for which negative metacognitions are even more relevant (Wells, 2010), and for whom worrying is more frequent and uncontrollable. In addition, it needs to be investigated how many training sessions are needed, and whether other behavioral tasks would be more suitable to assess training-related changes with regard to excessive worrying. Nevertheless, the RT effects demonstrate that CBM can be expanded to cover a wider range of applications (Koster et al., 2009). That is, to alter metacognitive appraisals, CBM includes stimuli related to general cognitions about thinking, rather than idiosyncratic (disorder-specific) stimuli. Moreover, the current findings show that the automatic components of metacognitive beliefs can be targeted by means of CBM, independently of their widely acknowledged explicit characteristics. This highlights its promise as an augmenting strategy to enhance current treatment efforts. Finally, the absence of eminent changes in mood following the training, strengthen the reported results and obviate the existence of mere mood effects.