Introduction

Generalized anxiety disorder (GAD) is characterised by high levels of uncontrollable worry across a range of domains, accompanied by a variety of distressing psychophysiological symptoms (American Psychiatric Association, 2013). GAD is a common (Somers et al., 2006) and potentially chronic condition (Yonkers et al., 2000), associated with high levels of comorbidity (Kessler et al., 2005) and psychosocial impairment (Hunt et al., 2004). As a result, much research over the past 20 years has explored the cognitive mechanisms contributing to the development and maintenance of GAD, with the aim of identifying targets for intervention and treatment.

The metacognitive model of GAD (Wells, 1995) is based on the Self-Regulatory Executive Function (S-REF) model (Wells & Matthews, 1994, 1996), a transdiagnostic theory that incorporates the cognitive and attentional processes and biases implicated in emotional disorders. In relation to GAD, the metacognitive model postulates that holding both positive and negative beliefs about worry results in heightened levels of perseverative thinking and distress. When perceived threats trigger intrusive negative thoughts (e.g., “What if it’s cancer?”), positive beliefs are activated (e.g., “Worrying helps me solve problems”), leading to the selection of worry as a coping strategy. This form of worry, which involves perseverative catastrophising about the perceived threat, is labelled Type 1 worry. With increased use, the Type 1 worry process becomes increasingly automatic, leading to the development of beliefs that worry is uncontrollable. Beliefs about the dangerous effects of worrying on one’s physical and mental health are then activated. These negative metacognitive beliefs about uncontrollability and danger subsequently result in the individual worrying about their worry (labelled Type 2 worry, or meta-worry). It has been proposed that Type 2 worry may also be exacerbated by an increased awareness of thoughts, and a lack of confidence in one’s memory (Cartwright-Hatton & Wells, 1997). Increased cognitive self-consciousness may heighten perceptions that worry is uncontrollable, while a lack of cognitive confidence may lead to repeated selection of worrying as a problem-solving strategy, further reinforcing negative metacognitive beliefs.

The metacognitive model hypothesises that the presence of Type 2 worry distinguishes people with GAD from those without, due to its association with counter-productive coping strategies such as attempted thought control and avoidance behaviours (Cartwright-Hatton et al., 2004; Wells, 1995). The ineffectiveness of these strategies reinforces uncontrollability beliefs, leading to an escalation in worry and distress, which serves to further reinforce beliefs about the dangers of worrying and Type 2 worry.

The Metacognitions Questionnaire (MCQ; Cartwright-Hatton & Wells, 1997) was developed to operationalise the metacognitive model of GAD. The self-report measure encompasses five factors corresponding to parameters central to the metacognitive model: (1) positive beliefs about worry (e.g., “Worrying helps me to solve problems”); (2) negative beliefs about the uncontrollability and danger of worry (e.g., “When I start worrying, I cannot stop”); (3) lack of cognitive confidence (e.g., “I have a poor memory”); (4) negative beliefs about thoughts in general, including themes of superstition, punishment and responsibility (e.g., “It is bad to think certain thoughts”); and (5) cognitive self-consciousness (e.g., “I am constantly aware of my thinking”). The Metacognitions Questionnaire – Short Form (MCQ-30; Wells & Cartwright-Hatton, 2004) was adapted as a more economical version of the MCQ to facilitate its use in research and clinical settings, with the previously heterogenic superstition, punishment and responsibility beliefs subscale renamed “beliefs concerning need for control” (Wells & Cartwright-Hatton, 2004).

Both the MCQ and, predominantly, the MCQ-30 have been employed to evaluate the predictions of the metacognitive model of GAD (White & Abbott, 2022). The MCQ-30 is also commonly administered in clinical practice, to guide the focus of metacognitive therapy (MCT; Wells, 2009) and evaluate changes in beliefs following treatment (e.g., McEvoy et al., 2015b; van der Heiden et al., 2012). However, the validity of the findings of this body of research is contingent on the reliability and validity of the measure itself.

A recent systematic review of measures of metacognitions about worry found that overall, the body of evidence supporting their psychometric properties was limited (White et al., 2022). None of the ten identified measures exhibited strong evidence of sound psychometric properties across all of the areas assessed. This was predominantly due to the small number of studies and methodological limitations, including inadequate sample sizes, not involving the target population in item generation, and a lack of information on the measurement properties of comparator instruments and the treatment of missing data.

Of the measures designed for use with adults, the MCQ-30 received the most positive ratings of its psychometric properties, with moderate evidence found in support of its structural validity, internal consistency, and convergent validity. Confirmatory factor analyses (CFA) indicated that a five-factor model approached a good approximation of the data, with most fit indices within acceptable parameters among two convenience samples of university students and non-students (Spada et al., 2008; Wells & Cartwright-Hatton, 2004), a large community sample (Fergus & Bardeen, 2019), people with cancer (Cook et al., 2014) and people with epilepsy (Fisher et al., 2016). Fergus and Bardeen (2019) also found evidence in support of a bifactor model, including a general metacognition factor. Internal consistency of the MCQ-30 subscales was acceptable, good or excellent within community (Spada et al., 2008; Wells & Cartwright-Hatton, 2004) and health samples (Cook et al., 2014; Fisher et al., 2016). Test-retest reliability of the subscales ranged from poor to acceptable within a community sample, although these results were of questionable validity as the retest interval range from 22 to 118 days (Wells & Cartwright-Hatton, 2004). Convergent validity among community samples was demonstrated with measures of worry (Penn State Worry Questionnaire [PSWQ]; Meyer et al., 1990) and trait anxiety (State-Trait Anxiety Inventory [STAI]; Spielberger, 1983) (Wells & Cartwright-Hatton, 2004). None of the reviewed studies provided evidence of the criterion validity of the scale. While there is some evidence of the reliability and validity of the MCQ-30 in community and health samples, no studies to date have examined the psychometric properties of the scale among people with GAD, nor identified clinical cut-off scores or sensitivity to treatment for GAD.

The overall aim of this study was therefore to assess the measurement properties and clinical utility of the MCQ-30 in treatment-seeking adults with GAD. It was anticipated that CFA of the MCQ-30 would reveal the same five-factor structure as that found in community samples, and that the total scale and each of the subscales would exhibit adequate internal consistency and test-retest reliability.

Concurrent validity was expected with alternative measures of the constructs purported to be measured by the MCQ-30, including problematic worry, uncontrollability and danger beliefs, and the need to control thoughts. Specifically, the following relationships were expected based on metacognitive theory (Wells, 1995) and previous findings: a strong positive correlation between Uncontrollability and Danger Beliefs and the PSWQ; weak positive correlations between the PSWQ and Cognitive Confidence, Positive Beliefs and Cognitive Self-Consciousness; a moderate positive correlation between the total MCQ-30 and the PSWQ; a moderate positive correlation between Uncontrollability and Danger Beliefs and the Anxiety subscale of the Affective Control Scale (ACS; Williams et al., 1997); and a moderate positive correlation between Need for Control and the White Bear Suppression Inventory (WBSI; Wegner & Zanakos, 1994).

Consistent with the metacognitive model, it was predicted that the MCQ-30 would display convergent validity with symptoms of GAD, including heightened distress and quality of life interference. Specifically, it was hypothesised that all subscales and the total MCQ-30 would be positively correlated with the Generalized Anxiety Disorder Questionnaire (GAD-Q; Roemer et al., 1995), the Stress subscale of the Depression Anxiety Stress Scales – Short Form (DASS-21; Lovibond & Lovibond, 1995), and the Life Interference Scale: Worry (LIS; Abbott et al., 2021). The magnitude of these associations was expected to be similar to those outlined above between the MCQ-30 and the PSWQ.

Regarding the clinical utility of the MCQ-30, it was hypothesised that criterion validity would be demonstrated via significant group differences between people with a principal diagnosis of GAD and non-clinical controls. Specifically, it was expected that clinical participants would score significantly higher on all subscales and the total MCQ-30 than non-clinical controls. Treatment sensitivity was also expected, as evidenced by a significant reduction in scores on the MCQ-30 following participation in a group-based psychological intervention. Finally, this study aimed to identify clinical responsiveness and cut-off scores for the total MCQ-30 and subscales, to assist in determining the levels of maladaptive metacognitions that are consistent with GAD presentations.

Method

Participants

The study involved 139 clinical participants (78% female) aged 18 to 70 years (mean = 37 years, SD = 12.2) who presented to the Macquarie University Emotional Health Clinic for treatment of GAD as part of a randomised controlled trial (RCT) (Abbott, 2007). Of these, 84 participated in either an active treatment condition or a waitlist condition and were reassessed at post-treatment or waitlist (75% female; aged 19 to 65 years, mean = 38 years, SD = 12.7). In addition, 76 non-clinical control participants in the same age range without anxiety disorders were recruited via notices in local newspapers (66% female, mean age = 36 years, SD = 14.1). There were no significant differences in age, t213 = 0.74, p = .46, gender composition, χ2 (1, n = 215) = 3.58, p = .06, education, χ2 (2, n = 213) = 5.18, p = .08 or employment status, χ2 (7, n = 215) = 4.41, p = .73 between the clinical and control groups. The two groups differed, however, in terms of marital status, χ2 (1, n = 215) = 13.78, p < .001, with 72% of control participants classified as single compared with 46% of clinical participants.

Diagnostic interviews were conducted by clinical psychologists or doctoral level students under the supervision of senior clinical psychologists. One quarter of the interviews were coded by supervised doctoral level clinical psychologists who were blind to the original diagnoses. Interrater reliability for the primary diagnosis of GAD was high (κ = 0.84). All clinical participants met DSM-IV criteria for a principal diagnosis of GAD (American Psychiatric Association, 1994). Of these, 73% also met criteria for one or more comorbid diagnoses, including social anxiety disorder (51%), major depressive disorder (24%), specific phobia (21%), panic disorder (10%), dysthymia (8%), obsessive-compulsive disorder (4%), and post-traumatic stress disorder (1%). In addition, 43% were taking medication for their anxiety at the time of assessment, predominantly SSRIs. None of the control participants met criteria for any Axis I psychological disorder, and none were taking psychotropic medication at the time of assessment.

Measures

Clinical Diagnosis

The Anxiety Disorders Interview Schedule for DSM-IV: Adult Version (ADIS-IV; Brown et al., 1994) is a semi-structured interview designed for the assessment and diagnosis of anxiety and mood disorders, as well as to screen for the presence of other prevalent psychological disorders, such as psychosis. Inter-rater reliability has been found to be good to excellent for primary anxiety and mood disorder diagnoses (Di Nardo et al., 1995).

In addition to the MCQ-30, described above, participants completed a battery of self-report questionnaires, including the following:

Worry

Excessive worry was assessed using the PSWQ (Meyer et al., 1990), a single factor scale consisting of 16 items aimed at measuring the frequency and intensity of worry. The internal consistency and test-retest reliability of the scale have been demonstrated to be excellent amongst university samples (Meyer et al., 1990). Construct validity was found with measures of trait worry, and convergent validity with theoretically meaningful personality constructs and maladaptive coping strategies (Meyer et al., 1990). In university samples, PSWQ scores discriminated between high, medium and low worriers, as well as between participants who met all, some, or none of the GAD criteria, and those who met GAD compared to post-traumatic stress disorder (PTSD) criteria (Meyer et al., 1990). People diagnosed with GAD endorsed very high scores on the PSWQ and scores were sensitive to cognitive therapy (Meyer et al., 1990). Internal consistency within the clinical sample in this study was good, α = 0.86.

GAD Symptoms

Symptoms of GAD were measured using the GAD-Q (Roemer et al., 1995) with scoring adapted according to the GAD-Q-IV (Newman et al., 2002) to reflect DSM-IV criteria. Agreement between the GAD-Q-IV and ADIS-IV has been demonstrated to be strong (κ = 0.67), with the GAD-Q-IV correctly classifying 88% of participants (university students) (Newman et al., 2002). In addition, there was no significant difference in PSWQ scores between the analogue GAD groups identified with the GAD-Q and GAD-Q-IV, and clinical participants with GAD (Newman et al., 2002; Roemer et al., 1995). Internal consistency within the clinical sample in this study was modest, α = 0.55, which may be expected given the variety of symptom combinations possible for individuals meeting the diagnostic criteria for GAD.

Distress

The Stress subscale of the DASS-21 (Lovibond & Lovibond, 1995) was included as a measure of clinical distress in this study. The DASS-21 contains three subscales designed to assess clients’ symptoms of depression, anxiety and stress over the previous week. The DASS-21 subscales have been found to have excellent internal consistency, test-retest reliability, and concurrent validity with other measures of depression and anxiety in both community and clinical samples (Antony et al., 1998). Discriminant validity of the Depression, Anxiety and Stress subscales in delineating between symptoms of depression, physiological arousal, and psychological agitation respectively has also been demonstrated in community and clinical samples (Antony et al., 1998). Internal consistency amongst the clinical sample in this study was good for the Stress subscale, α = 0.83.

Interference

The LIS (Abbott et al., 2021) contains six items regarding the extent to which worrying currently interferes with various components of everyday life, including work or studies, leisure activities, socialising, and home or family life. Internal consistency in the clinical sample of this study was good, α = 0.87.

Uncontrollability and Danger Beliefs

The ACS (Williams et al., 1997) was designed to assess fear of emotions, with four subscales corresponding to fear of anger, depression, anxiety, and positive emotions such as joy. Each subscale measures beliefs about the uncontrollability and negative physical and psychological consequences of the relevant emotion. In undergraduate student samples, the total scale and each subscale have demonstrated normal distribution, acceptable to excellent internal consistency, and acceptable test-retest reliability (Berg et al., 1998; Williams et al., 1997). In addition, concurrent validity was demonstrated via a strong negative correlation between the ACS and a measure of emotional control (Williams et al., 1997), and convergent validity via strong positive correlations with measures of neuroticism and trait anxiety (Berg et al., 1998). Internal consistency of the Anxiety subscale amongst the clinical sample in this study was good, α = 0.84.

Need to Control Thoughts

The WBSI (Wegner & Zanakos, 1994) was included as an alternative measure of attempted thought suppression. The WBSI is a single factor scale comprising 15 items. The scale has exhibited good internal consistency across a variety of university samples and acceptable test-retest reliability across three time periods (Wegner & Zanakos, 1994). Convergent validity was demonstrated by moderate positive correlations with measures of obsessive-compulsive symptoms, trait anxiety, and anxiety sensitivity (Wegner & Zanakos, 1994). Internal consistency in the clinical sample for this study was good, α = 0.88.

Procedure

The RCT during which data was collected, and the present study and its methodology, received ethical approval from both The University of Sydney and Macquarie University. Potential participants in the RCT contacted the Clinic following publicity, word of mouth, or referral in the case of clinical participants. After a brief suitability assessment by phone, participants were mailed information and consent forms and a questionnaire battery to complete at home, which was returned when they attended the clinic for their ADIS-IV assessment. Clinical participants were randomly allocated into one of three 12-week conditions: a cognitive-behavioural group treatment (CBT) program for GAD (Abbott, 2007); a mindfulness-based group treatment (MBT) program for GAD (Huxter, 2006) or a waiting list. The CBT program included psychoeducation about GAD, cognitive challenging of Type 1 and 2 worries, exposure and response prevention of safety strategies, imaginal exposure (worry stories), challenging of positive beliefs about worry and negative core beliefs, and relapse prevention. The MBT program included general process-based mindfulness techniques, such as not engaging with thoughts. Participants in the active treatment conditions attended one three-hour group per week, comprising between six and eight participants, and facilitated by two clinical psychologists or doctoral level clinical psychology interns. After completion of their allocated 12-week condition, clinical participants again completed the ADIS-IV and the questionnaire pack.

Overview of Statistical Analyses

Preliminary Analyses

Data transformations, descriptive statistics, and bivariate and multivariate analyses were conducted using SPSS Version 28 (IBM Corp., 2021a). Where participants had completed at least 80% of items in a particular subscale (or scale, if single factor), any missing items were replaced with the mean subscale (or scale) score. No missing data was imputed for the GAD-Q or LIS due to the discrete constructs measured by each item on these scales. In all other cases, missing items were deleted pairwise from bivariate analyses and listwise from multivariate analyses. Analyses of group means were conducted using independent samples t-tests, between groups analyses of variance (ANOVA) and chi-square analyses, and tests of normality and frequency histograms were examined for all scales and subscales to assess the distribution of scores prior to analysis. Scatterplots of bivariate relationships were also examined prior to correlational analysis.

Structural Validity

Using data from the clinical sample, inter-item and item-total correlations were produced for the MCQ-30 to analyse the relationship between variables prior to factor analysis, then CFA was conducted using SPSS AMOS Version 28 (IBM Corp., 2021b) to evaluate the fit of the data to the hypothesised five-factor model. It is recommended practice to report a variety of model fit indices in addition to the Chi-square statistic (χ2), which is extremely sensitive to sample size (Bryant & Yarnold, 1995; Hu & Bentler, 1999; Marsh et al., 1988; Schreiber et al., 2006). In the present study, absolute fit (how well the pattern of data corresponds to the model) was measured with the ratio of χ2 to its degrees of freedom (χ2/df), with χ2/df < 2 proposed as an indication of acceptable fit (Hoelter, 1983). Relative fit (how well the model explains the data compared with alternative models, most commonly the null model assuming no relationships between variables) was measured by the Tucker-Lewis index (TLI; Tucker & Lewis, 1973) and the incremental fit index (IFI; Bollen, 1990), with values greater than or equal to 0.95 considered indications of good model fit (Hu & Bentler, 1999; Schreiber et al., 2006). Noncentrality-based fit (assuming that the model is not perfect even in the population, therefore χ2 ≠ 0) was measured by the root mean square error of approximation (RMSEA), with RMSEA < 0.06 to 0.8 indicating good model fit (Hu & Bentler, 1999; Schreiber et al., 2006).

Reliability and Validity

The internal consistency of each of the measurement scales and subscales was calculated using Cronbach’s alpha coefficient. Test-retest reliability of the MCQ-30 was evaluated by calculating the intra-class correlation between initial and 12-week scores for participants allocated to the waiting list condition. Given the non-normality of the MCQ-30 subscale residuals, Spearman’s rho non-parametric correlations were calculated to examine the hypothesised relationships between the MCQ-30 and other variables of interest.

Clinical Utility

Treatment sensitivity was evaluated via mixed effect ANOVAs using completer data and Cohen’s d effect sizes. Receiver operating characteristic (ROC) curve analysis was conducted with MedCalc Version 19.4.1 (MedCalc Software, 2020) to examine the clinical responsiveness, cut-off scores, sensitivity and specificity of the MCQ-30 subscales and total scale (Hsiao et al., 1989). The area under the curve (AUC) provides a metric of the questionnaire’s ability to accurately discriminate between people with GAD and those without, and can also be utilised to compare the diagnostic performance of two or more tests (Hanley & McNeil, 1982). Statistically significant AUC values indicate that the scale performs better than chance at distinguishing between the two groups. Additional indicators were examined to facilitate selection of the appropriate cut-off scores: sensitivity (proportion of true positives), specificity (proportion of true negatives), positive predictive value (PPV; probability that the GAD is present when the test is positive), and negative predictive value (NPV; probability that GAD is not present when the test is negative). Prevalence assumptions were based on data from the most recent large-scale Australian health survey, which indicated that 2.7% of the adult population met criteria for GAD over the previous year (Australian Bureau of Statistics, 2015). The Youden Index was used to identify the criterion (cut-off score) which maximised both sensitivity and specificity (Fluss et al., 2005).

Results

Preliminary Analyses

Descriptive Statistics

Descriptive statistics, including floor and ceiling effects, for each of the subscales and the total MCQ-30 are shown in Table 1. Across the total sample, 3.3% of items were missing on the MCQ-30 at assessment, and 3.4% were missing from the clinical sample post-treatment or waitlist condition.

Table 1 Descriptive statistics and group mean comparisons of MCQ-30 in clinical (n = 139) and Control (n = 76) samples

Floor and Ceiling Effects

Among clinical participants, total MCQ-30 scores ranged from 38 to 106, out of the possible range of 30 to 120, with no participants endorsing the lowest or highest possible scores. Among non-clinical controls, scores ranged from 30 to 79, with only 1.3% of participants endorsing the lowest possible score.

Gender and Age Differences

In the clinical sample, independent samples t-tests revealed a significant difference between the scores of males and females on the MCQ-30 Uncontrollability and Danger Beliefs subscale, p = .01, the WBSI, p = .01, and the ACS Anxiety subscale, p = .03, with females scoring higher than males. The difference between males and females’ scores on the GAD-Q approached significance, p = .06, with females again scoring higher than males, indicating that the above differences may reflect higher overall severity of GAD symptoms amongst females. There were no significant differences on the remaining subscales or total scale scores. Due to the high proportions of females in the sample, subsequent analyses were conducted on the group as a whole. Between groups ANOVAs indicated no significant differences between scores on the MCQ-30 subscales and total scale across age groups.

Normality of Measures

Amongst the clinical sample, Shapiro-Wilk tests of normality indicated that the residuals of all subscales of the MCQ-30 were not normally distributed, p < .01 (see Table 1). However, most of the statistics were above or close to 0.95, an acceptable normality indicator, with Positive Beliefs the lowest at 0.92. Examination of the frequency histograms, in conjunction with skewness and kurtosis statistics, indicated that Positive Beliefs, Need for Control and Cognitive Confidence were significantly positively skewed, with scores on these subscales tending to cluster at the lower end of the possible range. Cognitive Self-Consciousness scores were negatively skewed, clustering at the upper end of the range. Uncontrollability and Danger scores were not significantly skewed. The residuals of the total scale were normally distributed, p = .74.

Structural Validity

Inter-item and Item-total Correlations

In the clinical sample, item-total correlations with the full scale were all significantly positive, p < .01, and higher than the conventional minimum value of 0.2 (Kline, 1986), ranging from r = .33 to 0.62. Further, all item-total correlations with the relevant subscale were significantly positive, p < .01, and strong, ranging from r = .60 to 0.91.

Subscale Intercorrelations

All subscales were moderately to strongly positively correlated with the scale as a whole (see Table 2). Intercorrelations between Uncontrollability and Danger Beliefs, Need for Control, and Cognitive Self-Consciousness were all moderate and positive. Positive Beliefs displayed a weak positive correlation with Need for Control, Cognitive Self-Consciousness and Cognitive Confidence. Cognitive Confidence was also weakly correlated with Need for Control and Cognitive Self-Consciousness. There was no significant relationship between Uncontrollability and Danger Beliefs and either Positive Beliefs or Cognitive Confidence.

Power Analysis

Based on the recommended minimum sample size ratio for CFA of five participants per observed variable (Bryant & Yarnold, 1995), this analysis was slightly underpowered (n = 139 vs. recommended n ≥ 150). It should be noted, however, that the validity of rules of thumb regarding sample size has been questioned by some researchers. MacCallum et al. (1999), for instance, concluded that when communalities between variables are high and factors well-determined (as was the case in this study), a sample size below 100 may be adequate.

Confirmatory Factor Analysis

CFA assumes multivariate normality, however tests of normality in AMOS indicated skewness and kurtosis in the data. Standard errors were therefore estimated using the method of maximum likelihood via non-parametric bootstrap samples (n = 1000). There were, however, no notable outliers, and the assumption of a wide range of scores was met (see Table 1).

Table 2 Internal consistency, test-retest reliability and Spearman’s rho correlation coefficients in clinical sample (n = 139)

An orthogonal five-factor model was fitted first, then modification indices (MI > 4) were examined to ascertain whether estimating the correlations between any of the factors, or error terms within them, would improve the fit of the model. This resulted in the addition of eight covariances between factors, consistent with the bivariate subscale intercorrelations outlined above, and nine between error terms. The final model, fit indices and standardised factor loadings are shown in Fig. 1. The ratio of χ2 to its degrees of freedom (χ2/df) and RMSEA indicated a good fit to the data, however IFI and TLI were slightly below the recommended 0.95. Taken together, these indices suggest that the specified model approached an acceptable approximation of the observed data.

Fig. 1
figure 1

CFA model of MCQ-30 (n = 139) and model fit indices in clinical sample

POS = Positive Beliefs; UD = Uncontrollability and Danger Beliefs; CSC = Cognitive Self-Consciousness; NC = Need for Control; CC = Cognitive Confidence

Reliability

Internal Consistency

Internal consistency amongst the clinical sample was found to be adequate for the Uncontrollability and Danger and Need for Control subscales, and good for Positive Beliefs, Cognitive Self-Consciousness, Cognitive Confidence, and the total MCQ-30 (see Table 2).

Test-retest Reliability

Clinical participants allocated to the waitlist group (n = 21) completed the MCQ-30 at assessment (T1) and again 12 weeks later (T2). As shown in Table 2, the one-way random effects intraclass correlation (ICC) between T1 and T2 was adequate for the Cognitive Confidence subscale, and good for all other subscales and the total scale.

Validity

The convergent and concurrent validity of the MCQ-30 within the clinical sample was evaluated using Spearman non-parametric correlations, as shown in Table 2.

Concurrent Validity

In the clinical sample, the total MCQ-30 and all subscales were significantly positively correlated with the PSWQ, indicating that higher levels of metacognitive beliefs and processes were associated with more excessive worry. The magnitudes of the correlations ranged from weak (Positive Beliefs, Need for Control, Cognitive Self-Consciousness and Cognitive Confidence) to moderate (Uncontrollability and Danger Beliefs, and the total scale). In addition, there were moderate positive correlations between Uncontrollability and Danger Beliefs and ACS Anxiety, and between Need for Control and the WBSI.

Convergent Validity

Uncontrollability and Danger Beliefs, Need for Control, Cognitive Self-Consciousness and the total MCQ-30 were significantly positively correlated with GAD-Q scores. The strength of the correlation was moderate for Uncontrollability and Danger Beliefs, and weak for the remainder.

The total scale and all subscales were significantly positively correlated with DASS Stress scores, with strengths ranging from moderate for Uncontrollability and Danger Beliefs, Need for Control, Cognitive Self-Consciousness and the total scale, to weak for Positive Beliefs and Cognitive Confidence.

All subscales, other than Cognitive Confidence, and the total scale were significantly positively correlated with the LIS. The strengths of the associations were moderate for Uncontrollability and Danger Beliefs and Need for Control, and weak for Positive Beliefs and Cognitive Self-Consciousness.

Criterion Validity

Independent samples t-tests revealed that people with GAD scored significantly higher than controls across all subscales of the MCQ-30 and the total scale (see Table 1). Effect sizes across all subscales were medium to large (Cohen, 1988), with the largest difference exhibited on Uncontrollability and Danger Beliefs.

Clinical Utility

Treatment Sensitivity

Treatment sensitivity was assessed by comparing differences between clinical participants’ scores on the MCQ-30 and its subscales before and after a 12-week intervention (CBT or MBT) or waitlist period (see Table 3). In the waitlist condition, within group effect sizes were negligible to small across all subscales and the total MCQ-30 score. In the treatment conditions, effect sizes ranged from negligible (Cognitive Confidence) to large (Uncontrollability and Danger Beliefs) (Cohen, 1988).

Table 3 MCQ-30 descriptive statistics pre and post 12-week intervention or waitlist

Between-groups ANOVAs revealed no differences in pre-treatment subscale or total MCQ-30 scores across the three conditions (CBT, MBT and waitlist) (POS: F(2,81) = 0.36, p = .70; UD: F(2,81) = 0.96, p = .39; NC: F(2,81) = 0.08, p = .92; CSC: F(2,81) = 0.51, p = .60; CC: F(2,81) = 0.02, p = .99; MCQ-30: F(2,81) = 0.36, p = .70).

Mixed effect ANOVAs were conducted to examine three time by treatment condition interactions for the total MCQ-30 and MCQ-30 subscales. As shown in Table 4, there were significant time by group interactions for all subscales and the total score when comparing the CBT group to waitlist, reflecting a greater reduction in pre to post scores for those in the CBT group. For the MBT vs. waitlist comparison, significantly greater reductions were displayed for Uncontrollability and Danger Beliefs, Need for Control, Cognitive Confidence and total MCQ-30 scores amongst those in the MBT group, however time by group interactions were not significant for Positive Beliefs or Cognitive Confidence scores. There were no significant differences in pre to post scores between the CBT and MBT conditions.

Table 4 Mixed effect ANOVAs for time by treatment condition interactions

Responsiveness

Test performance indicators and clinical cut-off scores for all subscales and the total MCQ-30 are presented in Table 5. The AUC of all subscales and the total MCQ-30 were significantly greater than 0.50, indicating that the scale is more likely than chance to discriminate between people with GAD and those without. Of the subscales, Uncontrollability and Danger had the highest sensitivity, specificity, PPV, NPV and AUC. The sensitivity, PPV, NPV and AUC of Uncontrollability and Danger was also higher than that of the total MCQ-30.

Table 5 Receiver operating characteristic curve analyses discriminating generalised anxiety disorder (n = 139) and Control (n = 76) Groups

Discussion

The broad purpose of this study was to evaluate the psychometric properties of the MCQ-30 amongst a treatment seeking group of adults with GAD, with the aim of validating the scale as a clinically useful measurement tool to aid research, assessment, and treatment. The results provided novel evidence of the reliability, validity, and clinical utility of the MCQ-30 in a clinical sample, as well as providing further evidence of the central role played by metacognitive beliefs about the uncontrollability and danger of worry in GAD.

The CFA indicated that a five-factor model, measuring Positive Beliefs, Uncontrollability and Danger Beliefs, Need for Control, Cognitive Self-Consciousness and Cognitive Confidence, approached an acceptable approximation of the data obtained with the MCQ-30. Although this study was slightly under-powered, half of the goodness-of-fit indices were within accepted parameters for good model fit, with the remainder slightly below the recommended threshold. These results with a clinically diagnosed sample differ from findings in community and health samples (Cook et al., 2014; Fisher et al., 2016; Spada et al., 2008; Wells & Cartwright-Hatton, 2004), in which the majority of indices indicated good model fit. This suggests that the relationships between the factors of the MCQ-30 may differ amongst people with GAD compared to those without.

In support of this suggestion, the final model in this study indicated no covariance between Uncontrollability and Danger Beliefs and either Positive Beliefs or Cognitive Confidence, whereas in previous studies, all five subscales were intercorrelated (Cook et al., 2014; Fisher et al., 2016; Spada et al., 2008; Wells & Cartwright-Hatton, 2004). Thus for people with GAD, holding negative beliefs about the uncontrollability and dangers of worrying appears minimally related to holding positive beliefs about worry or a lack of cognitive confidence. Although the metacognitive model does not specify a direct relationship between positive and negative beliefs about worry, but rather a mediated relationship through Type 1 worry, the theory does hypothesise that the cognitive dissonance created by holding both types of beliefs contributes to the perseverative worry cycle and, subsequently, anxiety levels (Wells, 1995, 1999). The lack of correlation between positive and negative beliefs about worry found in this study appears contradictory to this prediction, but is consistent with findings of a recent study in a clinical sample of people with GAD, that showed no relationship between Positive Beliefs and Uncontrollability and Danger Beliefs prior to treatment (McEvoy et al., 2015b). There was, however, a weak correlation between these two subscales immediately following treatment (McEvoy et al., 2015b), suggesting that perhaps participants lacked insight into their positive beliefs before treatment and only became aware of them through socialisation to the metacognitive model. In support of this hypothesis, the relatively low mean and significant positive skew to the distribution of Positive Belief scores in this study indicate that they are relatively uncommon, even among a clinical sample. Alternatively, it has been suggested that negative beliefs may be more salient prior to treatment, due to the distress and interference associated with clinical levels of GAD (McEvoy et al., 2015a). Assessing belief and symptom change throughout treatment would help to verify this hypothesis.

Internal consistency and test-retest reliability in this clinical sample was adequate to good for all subscales and the total scale, indicating that the constructs assessed by the MCQ-30 can be reliably measured in adults with GAD.

As expected, the MCQ-30 demonstrated initial concurrent validity with a measure of excessive worry. Consistent with a university and health service sample (Wells & Cartwright-Hatton, 2004), significant positive relationships were found between scores on the PSWQ and each of the MCQ-30 subscales, indicating that higher levels of each of these types of metacognitions are associated with more Type 1 worry in people with GAD. The magnitude of the correlations for most subscales, however, was weak, other than for Uncontrollability and Danger Beliefs, which had a moderate association with the PSWQ. This is also in keeping with community samples (Wells & Cartwright-Hatton, 2004), reinforcing the central role played by negative metacognitions in the maintenance of pathological worry. Uncontrollability and Danger Beliefs scores were also significantly and moderately positively associated with those obtained with the Anxiety subscale of the ACS, indicating that negative metacognitive beliefs about worry are associated with a fear of anxiety symptoms in general. Similarly, there was a moderate positive relationship between Need for Control and the WBSI, indicating an association between metacognitive beliefs about the need to control intrusive thoughts and attempted thought suppression, consistent with the metacognitive model.

People with GAD endorsed significantly higher levels of all metacognitive beliefs and processes than people without, providing preliminary evidence of the criterion validity of the scale. As predicted, higher levels of negative beliefs about worry and intrusive thoughts, and higher awareness of one’s thought processes, were associated with more symptoms of GAD, as measured by the GAD-Q. The strongest correlation was again exhibited with Uncontrollability and Danger Beliefs, further underscoring the cardinal role played by negative metacognitive beliefs about worry in maintaining clinical levels of GAD. Neither positive beliefs about worry nor a lack of cognitive confidence were associated with GAD symptom levels. This suggests that while these constructs may contribute to Type 1 worry levels, as discussed above, they are not involved in maintaining clinical levels of other symptoms of GAD, such as physiological arousal and interference. Consistent with this suggestion, cognitive confidence was not associated with interference due to symptoms of anxiety, and positive beliefs only marginally so. Each of the MCQ-30 subscales was, however, associated with psychological distress as measured by the DASS-21 Stress subscale, with the strengths of correlations following a similar pattern to those exhibited with the PSWQ. In summary, amongst people with GAD, convergent validity was consistently demonstrated between clinical symptoms of GAD, negative beliefs about worry and thoughts in general, and cognitive self-consciousness, but inconsistently with positive beliefs about worry and cognitive confidence.

Treatment sensitivity was supported amongst people with GAD through significantly larger reductions on all subscales and the total MCQ-30 following a 12-week CBT intervention for GAD, compared to a waitlist condition. Following a MBT intervention, scores on most MCQ-30 subscales reduced significantly more than waitlist, with the exception of Positive Beliefs and Cognitive Confidence. The largest MCQ-30 clinical vs. control and pre-to-post effect sizes were exhibited for Uncontrollability and Danger Beliefs, providing further support for the centrality of negative beliefs about worry to clinical presentations and treatment of GAD. It should be acknowledged that reliable change indices (RCI) and clinically significant change (CSC) thresholds were not calculated in this study as the treatment condition included two separate interventions, neither of which were MCT, the therapy which specifically targets the metacognitive beliefs and processes measured by the MCQ-30. As such, it is recommended that future research evaluates these statistics by administering the MCQ-30 in a sample of people with GAD before and after MCT.

Regarding clinical responsiveness, a score of 61 on the total MCQ-30 was identified as indicative of a clinical level of metacognitive beliefs relevant to GAD, and all subscales discriminated effectively between people with GAD and those without. In this study, the Uncontrollability and Danger Beliefs subscale demonstrated higher clinical sensitivity and discriminatory ability than the total scale, or any other subscales. It is important to note, however, that the MCQ-30 is not intended as a diagnostic tool, but rather as a measure of the metacognitive beliefs and processes relevant to the maintenance of GAD. As such, it should be used in conjunction with a well-validated diagnostic screening instrument as part of a thorough clinical assessment.

Taken together, the results of this study suggest that Uncontrollability and Danger Beliefs could be a more accurate and parsimonious measure of the metacognitive beliefs relevant to GAD than the full MCQ-30. The subscale displayed higher sensitivity, positive and negative predictive power, and AUC than the total scale, as well as evidence of concurrent, convergent and criterion validity with measures of excessive worry, GAD symptoms and diagnostic status respectively. At six items, the subscale is also more efficient to administer than the MCQ-30 or the full 65-item MCQ.

Although this study provides novel evidence of the psychometric properties and clinical utility of the MCQ-30 in a clinically diagnosed sample of people with GAD, a number of limitations should be acknowledged. First, the demographic questionnaire included only male and female as options for gender. This may have excluded participants of other gender identities, thereby limiting the generalisability of the results. Second, as noted above, the study was slightly underpowered for factor analysis. Future studies should aim for a valid sample size of at least 150 to ensure adequate power. Third, the concurrent validity of Positive Beliefs, Cognitive Confidence and Cognitive Self-Consciousness was not examined, therefore future studies may consider the addition of alternative measures to comprehensively assess these subscales of the MCQ-30, such as the Why Worry-II (Hebert et al., 2014) for positive beliefs about worry. Fourth, discriminant validity between psychological disorders was not evaluated in the current study, as the clinical sample was comprised solely of people with a primary diagnosis of GAD. Given that the metacognitive model of GAD is grounded in a transdiagnostic theory of emotional disorder, the S-REF model, it will be important to validate and evaluate the utility of the MCQ-30 in other clinical samples. Finally, treatment sensitivity was evaluated using participants in cognitive-behavioural and mindfulness-based group programs for GAD. Although the CBT program included components aimed at challenging both positive and negative beliefs about worry, the sensitivity of the MCQ-30 to metacognitive therapy for GAD has yet to be demonstrated.

Conclusions

In summary, this study provides the first evidence of the reliability, validity and clinical utility of the MCQ-30 with people diagnosed with GAD, filling a significant gap in the psychometric literature. In research settings, the MCQ-30 can now be more confidently used to test the predictions of the metacognitive model in clinical samples with GAD. However, the results of this study suggest that the Uncontrollability and Danger Beliefs subscale may be a more accurate and parsimonious measure of the metacognitive beliefs central to the maintenance of GAD. In clinical practice, it is therefore recommended that this subscale is used instead of the full scale to inform the client’s formulation and treatment.