For decades, anxiety and depression have been classified as affective disorders. According to Barlow (2004), affective disorders also include somatoform, dissociative, eating, and borderline personality disorders. Cognitive models of emotional disturbances (Lovibond, 2006) maintain that uncertainty is thought to be a significant stressor, and Intolerance of Uncertainty (IU) is considered a vulnerability factor, making patients less able to tolerate the unknown and triggering a range of negative cognitions and emotions.

IU was defined in Generalized Anxiety Disorders (GAD) as a broad construct underlying cognitive, emotional, and behavioral responses to uncertainty (Freeston et al., 1994). Since then, IU has been redefined several times to better specify its characteristics as a negative bias in the perception and interpretation of uncertainty, a predisposition to overreact to uncertainty emotionally, or a set of beliefs that negative events are unacceptable and threatening (Buhr & Dugas, 2002; Holaway et al., 2006; Ladouceur et al., 2000). Changes in IU were found to occur before corresponding changes in worry and anxiety during the treatment of individuals with GAD (Dugas & Ladouceur, 2000). Thus, being unable to tolerate uncertainty in everyday life is thought to explain pathological worry (Dugas et al., 2001; Einstein, 2014).

Beyond GAD, the IU trait has been recognized in the onset and severity of other anxiety disorders, such as social anxiety (Carleton et al., 2010a) and health anxiety (Boelen, 2010). Research has also found higher levels of IU in people with obsessive–compulsive disorder (Holaway et al., 2006; Tolin et al., 2003), depression (Gentes & Ruscio, 2011; Miranda et al., 2008), panic disorder, and agoraphobia (Carleton et al., 2013). Emerging evidence supports the involvement of IU in eating disorders (Sternheim et al., 2011), posttraumatic stress (Fetzner et al., 2013), autism spectrum disorders (Boulter et al., 2014), delayed grief (Boelen, 2010), and hoarding behaviors (Wheaton & Ward, 2020). Consequently, IU is considered a transdiagnostic risk and maintaining factor for various mental illnesses.

Interest in assessing IU’s transdiagnostic features has broadened in recent years (McEvoy et al., 2019). The Intolerance of Uncertainty Scale (IUS) was designed to measure a range of cognitive and behavioral responses to ambiguity and unknown situations; stress, and frustration; a desire to control the future; and the inability to act under uncertainty (Freeston et al., 1994). However, its factor structure was unstable across samples, languages, and cultural groups (e.g., Birrell et al., 2011). Hence, the IUS evolved into an abridged version (IUS-12) designed to measure only prospective and inhibitory anxiety (Carleton et al., 2007). Independent researchers (Hale et al., 2016; Lauriola et al., 2018; Shihata et al., 2018) concluded that a hierarchical bifactor structure was the best account for the IUS-12 and using the total score was more empirically supported than scoring the subscales.

The IUS-12 covers uncertainty reactions rather than maladaptive cognitions (Einstein, 2014; Fergus, 2013). This feature has been criticized because it might overestimate IU relationships with GAD and OCD symptoms and underestimate those with other disorders (Gosselin et al., 2008). To address this limitation, Thibodeau et al. (2015) proposed the Disorder-Specific Intolerance of Uncertainty Scale, using three items to describe the most salient characteristics of IU in eight clinical disorders. For instance, in panic disorders, IU might take the form of an inability to tolerate the uncertainty associated with the meaning of bodily sensations. In contrast, IU might emerge as a failure to endure the possibility of being negatively evaluated by others in social anxiety disorders. A bifactor model was the best fit for the DSIU. The analysis revealed a prominent general factor, inherently transdiagnostic, and eight group factors, each capturing the specific maladaptive features of a given group of patients (Thibodeau et al., 2015).

Another attempt to address the limitations of the IUS-12 was the Intolerance of Uncertainty Inventory (IUI; Gosselin et al., 2008). This instrument was developed in French and subsequently translated into English, Italian, and Chinese (Carleton et al., 2010b; Lauriola et al., 2018; Li et al., 2020). The IUI includes two separate item sets. The first set (IUI-A: general unacceptability of uncertainty) was devised to assess the belief that uncertainty, or the possibility that an adverse event might occur, is unacceptable (Gosselin et al., 2008). Targeting IU core beliefs made the IUI-A similar in scope to the IUS-12. However, item content differed substantially between scales. As a result, the IUI-A achieved incremental validity over the IUS-12 in predicting depression, anxiety, and obsessive–compulsive symptoms (Fergus, 2013). The second set (IUI-B: individuals’ reactions to uncertainty) was devised to cover six maladaptive consequences of IU that are frequently observed in clinical practice (i.e., avoidance, doubt, overestimation, worry, control, and reassurance).

The factorial structure of the IUI-A was tested with mixed results. Gosselin and colleagues (2008) reported three CFA models based on a preliminary exploratory factor analysis carried out on an independent sample. A three-factor model was a better fit than the two-factor and one-factor models in the original study. The subsequent cross-validation study (Carleton et al., 2010b) revealed unacceptable fit indices for all proposed models using the English-language version. However, the one-factor model was supported after an atheoretical removal of three items. The Italian-language version (Lauriola et al., 2018) failed to achieve an acceptable fit for all models. Similar to the English-language version, the one-factor model approached a good fit after removing five items. The Chinese-language study (Li et al., 2020) provided mixed findings: The two-factor model seemed to fit the data better than the one-factor model, but there was no conclusive statistical evidence because no formal chi-square difference test was performed. Nevertheless, the total IUI-A score was very reliable in all studies: The internal consistency coefficients were 0.96, 0.94, 0.91, and 0.92 for the French, English, Italian, and Chinese versions, respectively.

The factorial structure of the IUI-B was also investigated in previous research. Gosselin and colleagues (2008) reported a good fit for a six-factor model reflecting the item groupings presented in the inventory’s description. Carleton and colleagues (Carleton et al., 2010b) concluded that the six-factor model was supported relative to the one-factor model. Nevertheless, the model fit was not acceptable for the English-language version (Carleton et al., 2010b). The Italian study yielded a good fit for the six-factor model (Lauriola et al., 2018); however, a bifactor model adding one general factor to the six factors was a significant improvement and achieved an excellent fit. The Chinese-language study (Li et al., 2020) also tested the six-factor model with unsatisfactory results, proposing a four-factor model based on an exploratory analysis of a random split of the dataset.

In addition to cultural differences across samples and languages, methodological aspects might explain inconsistent findings. Indeed, previous research has shown that all IU scales incorporated general and specific variance components in varying proportions and used a bifactor model to clarify their psychometric characteristics (Shihata et al., 2018; Thibodeau et al., 2015). This strategy was not employed for testing the factor structure of the IUI-A, and only one study used a bifactor model on the IUI-B item set (Lauriola et al., 2018). A second aspect worth mentioning is that all previous studies used confirmatory factor analysis (CFA). This method could result in poor model fit due to unrealistic constraints on the factor loading pattern and could inflate the general factor variance or the factor correlations for the same reason (Joshanloo et al., 2017; Morin et al., 2015). Exploratory structural equation modeling (ESEM) has been recommended to overcome these limitations (Alamer, 2022; Morin et al., 2020), but no study has used this approach to model the IUI-A and -B factor structure. Last, all IUI studies used maximum likelihood (ML) estimation (or they seemed to have done so where no further details were given). A primary assumption of ML is that items are continuous and normally distributed, which was not the case for IUI items. Violating this assumption might lead to misleading conclusions regarding model fit and subsequent appraisals concerning the validity of the factor structure (Sellbom & Tellegen, 2019).

In the present study, we address three issues regarding the validity of the IUI. First, whether using state-of-the-art bifactor ESEM and estimators for categorical items clarifies the IUI-A and -B factorial structure. Specifically, we take advantage of data collected employing the Farsi-language version of the scale and test whether the unidimensional factor model of the IUI-A and the bifactor model of the IUI-B achieve an optimal fit without eliminating items or proposing alternative solutions. Moreover, bifactor models have been frequently used to support the use of the total score and sub-scale scores in the clinical assessment of multidimensional psychological constructs. For example, according to Reise et al. (2013), the general factor should preferably explain no less than 75 percent of the variance to use a total score even if the data are multidimensional. Thus, previous research has shown that only the IUS-12 total score was empirically supported for clinical assessment (Hale et al., 2016; Lauriola et al., 2016; Shihata et al., 2018), while less is known about whether the total or subscale scores of the IUI are empirically supported. Second, we tested whether the IUI can effectively discriminate clinical patients from community participants. Preliminarily, we studied the measurement invariance of IUI factors. No previous study addressed this issue, although measurement invariance is needed for unbiased group comparisons (Putnick & Bornstein, 2016; Vandenberg & Lance, 2000). Finally, we assessed the relations of IUI general and group factors with external variables, such as depression and anxiety symptoms (criterion-related validity) and IU as measured by other scales (concurrent validity). Previous research has shown that IUI-A achieves incremental validity over the IUS-12 in predicting depression, anxiety, and obsessive–compulsive symptoms in nonclinical samples (Fergus, 2013). This is the first study testing the incremental validity of the IUI using a clinical sample and two established IU scales (i.e., IUS-12 and DSIU).

Methods

Participants

A total of 698 participants were screened for eligibility (Fig. 1). The analyses were carried out on 625 participants, who met the study inclusion criteria (details see below). The nonclinical sample consisted of 427 community participants (208 women, 219 men) from the general population of Zanjan University of Medical Sciences. Age ranged from 17 to 62 years (M = 24.13; SD = 6.57). Nonclinical participants had not been hospitalized in the previous year, nor had they been recommended for psychiatric treatment or been undergoing psychotherapy. Eligible participants were recruited through the internet and social network advertisements. Two criteria were considered to ensure the absence of psychiatric disorders. First, participants’ scores on the Second Edition of the Beck Depression Scale, and the Beck Anxiety Scale had to be lower than 14 and 10, respectively. Second, the participants should not have received a diagnosis of psychiatric disorder in a semi-structured interview (ADIS-IV-L) conducted by telephone by two psychiatrist colleagues and the second author of this paper.

Fig. 1
figure 1

CONSORT flow diagram for the present study

The clinical sample included 198 patientsFootnote 1 (125 women, 73 men, and one undisclosed sex). Participant ages ranged from 17 to 60 years (M = 29.34; SD = 10.53). This sample was recruited from patients referred to medical facilities or private practices of two fellow psychiatrists. A diagnostic interview based on the anxiety and related disorders’ interview schedule for DSM-IV-lifetime with the participants was carried out. Patients should meet DSM-IV-TR criteria for affective disorders such as mood disorders, depression, anxiety disorders, and obsessive–compulsive disorders treated at Zanjan University of Medical Sciences (Table 1). Beck Depression and Anxiety Scales were used to check that the participants were included in the clinical group.

Table 1 Frequency of diagnosis in the clinical sample

The ethical committee at the Zanjan University of Medical Sciences approved all aspects of the study (protocol number IR.ZUMS.REC.1398.002). Informed consent was obtained from all individual participants included in the study. The authors of this paper declare that they have no conflicts of interest. Upon a reasonable request, the data will be available from the first and the corresponding authors.

Instruments

Anxiety and Related Disorders Interview Schedule for DSM-IV-Lifetime (ADIS-IV-L)

The ADIS-IV-L assesses the existence and severity of anxiety, mood, and somatoform disorders and the previous background of psychiatric disorders. Each diagnosis is graded on a scale of clinical severity rating (CSR) from 0 (no symptoms) to 8 (extremely severe symptoms). A score of 4 on this scale is the clinical severity threshold for diagnosis based on DSM-IV (Brown & Barlow, 2013). This scale also includes a short screening for psychotic symptoms and alcohol and substance use. It has excellent internal reliability for anxiety and mood disorders (Abdi et al., 2013).

Intolerance of Uncertainty Inventory (IUI)

The IUI (Gosselin et al., 2008) included 15 items for IUI-A and 30 items for IUI-B. Following Gosselin et al. (2008), the items were administered using a 5-point Likert scale ranging from 1 (not at all like me) to 5 (like me completely). This research used the Farsi-language version of the IUI. The translation procedures were as follows. First, the English version of IUI was translated into Farsi by the first author of this paper. Then, a bilingual professional translator translated the IUI back into English without consulting the original text to assure linguistic equivalence. We reconciled discrepancies between translations through discussion and achieved a consensus version after a few rounds. In the present study, Cronbach’s α coefficients were 0.92 and 0.96 for IUI-A and IUI-B, respectively.

Intolerance of Uncertainty Scale, Short Form (IUS-12)

The IUS-12 (Carleton et al., 2007) is a 12-item scale assessing reactions to ambiguous situations, uncertainty, and the future. Each item is rated on a 5-point Likert scale (Not at all like me) to 5 (like me completely). Cronbach’s α in the current study was 0.90.

Disorder-Specific Intolerance of Uncertainty (DSIU)

The DSIU (Thibodeau et al., 2015) consists of 24 items drafted to represent IU characteristics in general anxiety disorder, social anxiety disorder, obsessive–compulsive disorder, health anxiety, post-traumatic stress and panic, specific phobia, and major depressive disorder. Each item used a 5-point Likert scale ranging from 0 (not at all) to 4 (extremely). In the present study, Cronbach’s α for the total score was 0.76.

Beck Depression Inventory-II (BDI-II)

The Beck Depression Inventory-II (BDI-II; Iranian version Ghassemzadeh et al., 2005) is a 21-item multiple-choice self-report scale designed to assess affective, somatic, or cognitive symptoms of depression. The respondents’ task was to rate each symptom’s severity using a 4-point Likert scale ranging from 0 to 3 (higher scores indicated symptom severity). The total score (Cronbach’s α = 0.89 in the present study) is a valid measure of the severity of depression.

Beck Anxiety Inventory (BAI)

The Beck Anxiety Inventory (BAI; Iranian version Rafiei & Seifi, 2013) measures anxiety symptoms in adults and adolescents. It includes 21 items using a 4-point Likert scale from 0 (not at all) to 3 (severely). The Cronbach’s α for the total score was 0.92.

Statistical Analyses

Exploratory Structural Equation Modeling

The factor structure of IUI-A and IUI-B was investigated using ESEM as implemented in Mplus (Version 8.4). For the IUI-A, we began with testing the theoretically expected one-factor model. Then, we investigated alternative models that previous research proposed to account for the misfit of the unidimensional structure (e.g., Li et al., 2020). For the IUI-B, we tested one-factor, six-factor, and bifactor models. A bifactor model is a latent structure in which each item loads on a general factor that affects all items and a specific factor that influences a group of items. The general factor represents the content domain that the scale is thought to measure (e.g., Intolerance of Uncertainty). Specific factors (e.g., avoidance) have narrower content than the general factor and explain item responses that the general factor does not. Following Marsh et al., (2014; Morin et al., 2015), we set up a target rotation in ESEM because it allows us to test an a priori model, provides greater control over model specifications, and makes it easier to interpret the results. To assess whether ESEM was superior to CFA in representing the measurement model of IUI-A and IUI-B, we compared the best-fitting ESEM models to the corresponding CFA models. Because the IUI uses Likert-type items, the analyses used the weighted least square mean and variance adjusted estimator (WLSMV). This method is recommended to handle ordinal categorical data and has no distributional assumptions (Rhemtulla et al., 2012). The model’s fit was assessed using the scaled WLSMV χ2 and the following fit indices with the associated cutoffs: comparative fit index (CFI > 0.95), Tucker-Lewis Index (TLI > 0.95), root mean square error of approximation (RMSEA < 0.06), and standardized root mean square residual (SRMR < 0.05) (Schermelleh-Engel et al., 2003). The best-fitting model for the whole sample was the baseline model for testing measurement invariance in multigroup analyses. We compared models using the scaled chi-square difference test, provided that the models were nested. For the IUI-A and IUI-B, the one-factor model was nested in any multidimensional model, and these latter were nested in the corresponding bifactor model (Reise, 2012).

Bifactor Model Indices

We used the bifactor indices calculator package for R to evaluate the standardized factor loading matrix quality (Dueber, 2020). The ECV assesses the proportion of variance in IUI items accounted for by the general or group factors relative to the total common variance explained. Other useful indices are ω and ωH. Briefly, ω reflects the overall proportion of reliable variance in the IUI score and can be interpreted as a standard internal consistency coefficient. The ωH reflects the proportion of reliable variance in the total score due to the general factor only. For the group factors, ωH represents the proportion of reliable variance in the corresponding subscale score after removing the general factor variance. The PUC assesses the bias in model parameters due to forcing a multidimensional construct into a unidimensional structure. The higher the PUC, the greater the likelihood that the IUI item correlation matrix reflected the general domain factor. According to Rodriguez et al. (2016), if PUC and ECV are greater than 0.80, one can assume the factor loadings on the general factor in a bifactor model to reflect quite well those obtained from a standard one-factor model. If PUC is lower than 0.80, ECV greater than 0.60, and ωH greater than 0.70 support the instrument’s interpretation as essentially unidimensional (Rodriguez et al., 2016). To assess whether a given set of items measured the corresponding latent factor with sufficient precision, one can determine the proportion of variance in the latent factor explained by its indicators. This index, called factor replicability (H), ranges from 0 to 1, and higher values are recommended. Last, factor score determinacy (FD) represents how factor scores reflect individual differences in each factor. FD values equal to or greater than 0.90 reflect reliable factor score estimates.

Measurement Invariance

Testing for measurement invariance is based on examining the decline in model fit throughout increasingly constrained models. First, we tested whether the same subsets of items were associated with the same factors in clinical and nonclinical groups (configural invariance). Next, we constrained the factor loadings to be equal between groups (metric invariance). If supported, metric invariance ensures that each item’s measurement unit is the same in clinical and nonclinical groups. A subsequent analysis constrained the item thresholds to be equal between groups (scalar invariance). Threshold invariance is for categorical items analog to intercept invariance with continuous indicators. If supported, the latent variable score translates into the same response category for clinical and nonclinical participants. Finally, we imposed the equality of the error terms and tested the full invariance of the scale (residual invariance). The four models are hierarchically nested so that a more constrained model can be formally compared. Again, the scaled chi-square difference test assessed whether a more restrictive model is significantly different from a less restrictive one. However, because trivial differences between models might yield statistically significant differences with large samples, Chen (2007) recommended the following criteria to support the practical equivalence of model fit: change in CFI ≤ 0.010, paired with changes in RMSEA and SRMR ≤ 0.015. The sample size for the clinical patient group in the current study was smaller than that for the participant group from the community. Yoon and Lai (2018) assert that if the size of the groups is not roughly equal, the results of the invariance analysis may be biased. In particular, the statistical power for detecting the violation of invariance is thought to decrease as the disparity in sample sizes rises. To address this problem, we used the strategy proposed by Yoon and Lai (2018). From the community participant group, we created 100 subsamples, each the same size as the clinical group, and ran invariance tests on each subsample. Average fit indices and fit differences were calculated and reported as additional information to rule out the possibility that sample size inequality may have caused us to claim model invariance when it was false.

Results

Descriptive Analyses and Group Differences

Descriptive statistics for each scale and subscale are reported in Table 2. The clinical group participants reported significantly higher scores than nonclinical ones on all measures (all p-s < 0.001). The most significant effect sizes were for the two symptom measures. For instance, with a Cohen’s d of 3.13 (as for BAI scores), 99.9% of the clinical group was above the nonclinical group's average, and there was only a 12% overlap between the groups in anxiety symptom distribution. Likewise, for BDI scores (Cohen’s d = 2.71), 99.7 of the clinical group was above the nonclinical group’s average, and the overlap in depression symptoms was 18%. There were considerable differences between groups on the IUS-12 and DSIU, followed by IUI-A and IUI-B.

Table 2 Descriptive statistics and group comparisons

Factor Structure of the IUI-A

We tested the one-factor model for this set of items because the IUI-A was devised to measure core IU beliefs as a unitary construct. The model fit was good according to CFI, TLI, and SRMR (Table 3). However, the RMSEA was slightly above the recommended thresholds for acceptable fit. It follows that the one-factor model might conceal some degree of misspecification. Previous research also questioned the fit of the one-factor model for the IUI-A. Following this literature, we examined an alternative model with two factors: (F1) “intolerance of the unexpected and difficulty waiting in an uncertain situation” (items #1–5) and (F2) “intolerance of uncertainty and uncertain situations” (items #6–15) (Li et al., 2020). The two-factor model fit the data better than the one-factor model; however, the inspection of model parameters revealed a factor loading greater than 1.00 for item #2 and a high intercorrelation between factors (ϕ = 0.81).

Table 3 Exploratory structural equation modeling of the IUI-A: fit indices and model comparisons

Because the model estimation terminated regularly, and no negative residuals were observed, factor loadings greater than one can occur when too many factors have been specified, or the factors are too intercorrelated, resulting in standardized loadings that cannot be interpreted as correlation coefficients. Both conditions appeared to exist. For instance, an exploratory factor analysis (EFA) of the item set suggested unidimensionality because only one eigenvalue greater than one was found, and the first-to-second eigenvalue ratio was greater than 8. Furthermore, no clearly defined multidimensional solutions were found when we tried to interpret more than one factor. In particular, the two-factor EFA solution lacked discriminant validity after Promax rotation (ϕ = 0.70), and the factor loading pattern was inconsistent with the two-factor model tested in ESEM analysis (i.e., most items loaded on the first factor, items 6, 12, 13, and 15 loaded on the second factor, and several cross-loadings limited the identification of factor markers). Conversely, all factor loadings on the first unrotated EFA factor were good (i.e., λ > 0.60), except for item #13, which was acceptable (i.e., λ = 0.52).

To reconcile the EFA findings with previous literature (Li et al., 2020), we hypothesized a bifactor model for the IUI-A, with F1 and F2 as group factors. The model’s fit was excellent (Table 3), and the chi-square difference test supported this model over the two-factor model. The bifactor ESEM model for the IUI-A (Table 3) fitted the data as equally well as the corresponding bifactor CFA model (χ2 = 257.46; df = 75; p < 0.001: CFI = 0.990; TLI = 0.986; SRMR = 0.021; RMSEA = 0.062; 95% CI = 0.054–0.071; p-close = 0.007). This finding indicated that adding constraints to the factor loading pattern on the two group factors was not decisive for enhancing model’s fit. Table 4 reports the standardized factor loadings. All items significantly loaded on the general factor (λ = 0.48–0.87, Mdn = 0.77), resembling the first unrotated EFA factor. The group factor corresponding to intolerance of uncertainty and uncertain situations (λ = 0.14–0.55, Mdn = 0.25) was primarily defined by items #1 and #2. Intolerance of the unexpected and difficulty waiting in an uncertain situation (λ = -0.03–0.43, Mdn = 0.05) was almost empty. No item loaded a group factor more than the general factor, and no items, except #2, had significant cross-loading.

Table 4 Exploratory structural equation modeling of the IUI-A: standardized factor loadings, uniqueness, and bifactor model statistics

Elaborating on the factor loading matrix, we obtained bifactor model statistics to clarify the scale dimensionality and address the use of total or subscale scores (Table 4). The general factor accounted for 87% of the common variance, whereas F1 and F2 explained 8% and 6%, respectively. The model-based reliability coefficients ω were high for all factors. However, the coefficients ωH (corresponding to the reliability of specific item groups after general factor variance was removed) fell dramatically for F1 and became negligible for F2. Conversely, the reliability of the general factor remained high after removing the variance accounted for by the group factors (ωH = 0.78).

The IECV provides an assessment of unidimensionality at the item level by determining how much variance in each item depends on the general factor only. Items with an IECV above 0.85 are assumed to primarily reflect the general factor’s content and are likely to produce a one-dimensional item set (Stucky & Edelen, 2015). Except for #1, #2, and #8, this requirement was met for all items. At the model level, when PUC values are less than 0.80, ECV and H for the overall factor are greater than 0.60 and 0.70, respectively, then multidimensionality is not severe enough to rule out the instrument’s interpretation as essentially one-dimensional (Rodriguez et al., 2016). As shown in Table 4, the H measure of construct replicability and the factor determinacy index showed that the general factor was well defined and supported the total score for use in research and clinical applications. Collectively, the results are compatible with the unidimensionality assumption and suggest that the imperfect fit of the one-factor model is primarily due to flaws in some items, as indeed all previous research has pointed out.

Measurement Invariance of the IUI-A

The bifactor model was the baseline model for multigroup analyses (fit statistics and model comparisons are reported in Table 3). Mirroring single-group analysis, the configuration equality model yielded excellent fit indices. The metric invariance model was significantly different from the configural model. Likewise, the scalar invariance model showed a significant loss of fit relative to the metric invariance model, and the residual invariance model was significantly different from the scalar invariance model. Nonetheless, ΔCFI, ΔRMSEA, and ΔSRMR were negligible for all invariance tests, eventually supporting the measurement invariance of the IUI-A according to Chen (2007) criteria. Due to unequal group sizes, the statistical power to detect invariance violations may be reduced. Table 3 also summarizes the results of invariance tests obtained from 100 replications of the analyses on two groups of equal sample size, obtained by resampling the larger group to match the size of the clinical group (Yoon & Lai, 2018). The analyses showed that the average ΔCFI, ΔRMSEA, and ΔSRMR were in keeping with Chen (2007) criteria, supporting all levels of invariance. Scalar invariance is a prerequisite for comparing latent factor means. However, because the analyses also supported the more restrictive residual invariance, we interpreted the differences in latent factors as resulting from this model. Using the nonclinical group as the reference group, the largest latent mean difference was on the general factor (d =  + 1.70; SE = 0.15; p = 0.000). Among group factors, only intolerance of the unexpected and difficulty waiting in an uncertain situation yielded a significant latent mean difference, but in a negative direction and much smaller size (d = – 0.34; SE = 0.12; p = 0.004). As expected, the patients scored higher than community participants on the IUI-A general factor, supposedly reflecting IU core beliefs.

Factor Structure of the IUI-B

The fit statistics for alternative models and model comparisons are reported in Table 5. The one-factor model was a poor fit. Therefore, we rejected the hypothesis that all IUI-B items reflected only a single latent construct. The six-factor model significantly improved model fit, and all indices were good. The hypothesis that IUI-B reflected six related clinical consequences of IU was overall supported. However, six out of fifteen latent variable correlations were large to very large, ranging between 0.60 and 0.70, and six of the remaining nine were greater than 0.50, indicating a substantial common variance associated with the IUI-B factors. Compared to the six-factor model, the bifactor model was a significant improvement: The CFI, TLI, and SRMR indicated an excellent fit, and the RMSEA achieved a close-fit. The bifactor ESEM model for the IUI-B (Table 5) also outperformed the corresponding bifactor CFA model on all fit measures (χ2 = 1060.88; df = 375; p < 0.001: CFI = 0.977; TLI = 0.973; SRMR = 0.036; RMSEA = 0.054; 95% CI = 0.046–0.058; p-close = 0.004). Unlike IUI-A, imposing unrealistic constraints on the factor loading pattern worsened the model’s ability to fit the data.

Table 5 Exploratory structural equation modeling of the IUI-B: fit indices and model comparisons

Table 6 shows the standardized factor loadings for the IUI-B items. All items significantly loaded on the general factor (λ = 0.38–0.85, Mdn = 0.69). Overestimation (λ = 0.44–0.54, Mdn = 0.49) and control (λ = 0.48–0.57, Mdn = 0.54) were more homogeneous in terms of item loadings, which were greater than 0.40 and statistically significant. Both reassurance (λ = 0.17–0.60, Mdn = 0.52) and avoidance (λ = 0.14–0.64, Mdn = 0.38) had two items with relatively low factor loadings, but they were still statistically significant. Worry (λ = 0.26–0.40, Mdn = 0.32) and doubt (λ = 0.24–0.44, Mdn = 0.35) were overall well defined. Although the analysis yielded 58 statistically significant nontarget loadings (out of 150), none of them was large enough to threaten the factor definition (|λ|= 0.00–0.26, Mdn = 0.04). All were lower than the target loadings on the general factor, and never exceeded the target loadings on the group factors.

Table 6 Exploratory structural equation modeling of the IUI-B: standardized factor loadings, uniqueness, and bifactor model statistics

According to bifactor model indices (Table 6), the general factor accounted for over two-thirds of the common variance, with the remaining spread across the six group factors. The model-based reliability coefficients ω were high for all factors. For the general factor, the ωH was close to the standard ω. The ωH coefficients showed a substantial decrease for the group factors. The coefficients became small for overestimation, control, reassurance, and avoidance and negligible for the worry and doubt factors. Although these results confirm the presence of an overarching general factor, the IUI-B items preserved some degree of multidimensionality. Indeed, only a few items achieved the IECV requirements for unidimensional scales (Table 6).

The H measure of construct replicability was excellent for the general factor but barely sufficient for overestimation, control, and reassurance. Notably, the FD index for the general factor was above the recommended standard, while it approached the required value for overestimation and control. These results indicated that the general IUI-B factor was well defined, and the corresponding factor score can be used in subsequent scale applications. Among the group factors, the overestimation and control factors were sufficiently well defined, and the factor scores can reproduce the corresponding latent variables to a reasonable degree. However, these factors account for a limited amount of common variance. The worry and doubt factors were less substantive, and their factor scores were more ephemeral. Because the PUC assessed for the IUI-B was 0.86, with ECV and ωH greater than 0.80, the factor loadings on the general factor in the bifactor model would not differ dramatically from those resulting from the one-factor model. Consequently, the IUI-B inherent multidimensionality was substantive but not strong enough to impair the total score’s ability to reflect the general factor’s common variance.

Measurement Invariance of the IUI-B

The bifactor model was used as the baseline model in multigroup analyses. Fit indices and hypothesis tests are shown in Table 5. Mirroring single-group analysis, the configural invariance model was an excellent fit. All the subsequent invariance tests were significant. The ΔCFI and ΔRMSEA were negligible, supporting the full measurement invariance of IUI-B factors according to Chen’s (2007) criteria. This conclusion was also supported replicating the configural, metric, scalar, and residual terms’ invariance tests on 100 subsamples of the community participant group, each the same size as the clinical group (Yoon & Lai, 2018). The residual invariance model was inspected to examine latent mean differences between clinical patients and nonclinical participants. The largest difference was found for the general factor (d = 1.32; SE = 0.10; p = 0.000), followed by worry (d = 0.58; SE = 0.27; p = 0.034) and avoidance (d = 0.55; SE = 0.16; p = 0.001). Overestimation, control, reassurance, and doubt did not differ between groups.

Criterion-Related Validity of IUI Factors

Following Alamer (2022), we used a structural ESEM model to test whether IUI-A and IUI-B factors could predict the severity of depression-anxiety symptoms (defined BDI-II and BAI) and the IU construct (defined by IUS-12 and DSIU) (Fig. 2a). Incidentally, the two symptom scales and the two IU scales were correlated over 0.70 and loaded high on the corresponding latent variable in the model.

Fig. 2
figure 2

Structural bifactor ESEM models. IUI-A and IUI-B factors predicted anxiety-depression symptoms and intolerance of uncertainty (a). IUI-A and IUI-B factors predicted anxiety-depression symptoms through intolerance of uncertainty (b)

Model fit was excellent (χ2 = 1400.72; df = 817; p < 0.001; CFI = 0.985; TLI = 0.980; SRMR = 0.025; RMSEA = 0.036; 95% CI = 0.033–0.039; p-close = 1.000). Although the two general factors (namely, GA and GB) were highly intercorrelated (r = 0.81), the analysis showed that each uniquely predicted the latent dependent variables to a similar extent, with GB slightly more aligned with the IU construct than GA (Fig. 2a). Focusing on the group factors, only worry was significantly associated with symptom severity and the IU construct, while unexpectedly reassurance was negatively associated with IU (Fig. 2a). Omitted from Fig. 2a, GA was modestly correlated with IUI-B group factors, threat (r = 0.11), reassurance (r = – 0.10), avoidance (r = 0.25), and worry (r = 0.18). Conversely, GB was not significantly associated with IUI-A group factors. Overall, the model explained 49% and 92% of the variance in symptoms and IU, respectively.

Next, we examined whether IUI-A and IUI-B can predict clinical outcomes above and beyond other IU scales (i.e., incremental validity). The structural model was modified, adding a regression path to symptom severity from the IU construct (Fig. 2b). The different model specifications allowed us to test whether GA, GB, and worry still predicted symptom severity after controlling for the variance in the IU construct. The model failed to converge, presumably because of the high collinearity of GA, GB, and IU in predicting symptoms. To circumvent this problem, we set the direct effects of IUI-A and IUI-B factors on symptom severity to zero and compared the fit of the constrained model to alternative models in which either IUI-A or IUI-B was allowed to predict symptom severity. In this way, a significant difference in the model fit would have demonstrated the overall incremental validity of IUI factors. The inspection of specific parameters would, in turn, reveal which IUI factor(s) are more predictive.

The fully constrained model (Fig. 2b) converged to a solution and was an excellent fit (χ2 = 1558.72; df = 909; p < 0.001; CFI = 0.982; TLI = 0.977; SRMR = 0.026; RMSEA = 0.036; 95% CI = 0.033–0.039; p-close = 1.000). GA, GB, and worry were associated with the IU construct, and the latter was associated with symptom severity. Compared to previous analyses, reassurance was no longer associated with IU. As a result, the IU construct mediated the effects of GA (indirect = 0.21; Wald = 6.32; p < 0.000), GB (indirect = 0.17; Wald = 4.90; p < 0.000), and worry (indirect = 0.07; Wald = 3.81; p < 0.000). The model in which IUI-A factors were allowed to predict symptom severity significantly improved the fit compared with the fully constrained model (∆χ2 = 18.27; df = 3; p < 0.001). However, GA did not predict symptom severity, and F2 (i.e., intolerance of uncertainty and uncertain situations) was the only factor associated with the clinical outcome, but in a negative direction (β = – 0.15; Wald = – 3.48; p < 0.001). The model in which IUI-B factors could predict symptom severity improved the fit significantly relative to the fully constrained model (∆χ2 = 23.66; df = 7; p < 0.01). However, GB and worry did not predict symptom severity, and control was the only factor associated with the clinical outcome, again in a negative direction (β = – 0.13; Wald = – 3.45; p < 0.001).

Finally, we reran the analyses, replacing the IU construct with IUS-12 or DSIU scores. In this way, we examined the incremental validity of the IU factors with respect to specific observed variables rather than the IU construct in general. The results showed that GA predicted symptom severity after controlling for DSIU (β = 0.15; Wald = 2.93; p < 0.01), but not after controlling for IUS-12 (β = – 0.24; Wald = – 1.81; p = 0.070). Conversely, GB did not predict symptom severity in any case (β = –0.21, Wald = –0.78, and p = 0.430 and β = 0.05, Wald = 0.25, and p = 0.806 controlling for DSIU and IUS-12, respectively). The incremental validity of the group factors was questionable. The IUI-A group factors were statistically significant after controlling for DSIU and IUS-12 (p-s < 0.05); however, they were negatively associated with symptom severity, with standardized regression coefficients from – 0.05 to – 0.08. The same negative coefficient was found for the control factor (β = – 0.18; Wald = – 3.66; p < 0.001), while no other IUI-B group factor was significant controlling for DSIU and IUS-12.

Discussion

In the present study, we addressed three unresolved issues regarding the construct validity of the IUI. Using a state-of-the-art methodology for analyzing ordered categorical items, we showed that a bifactor ESEM was the best fit for IUI-A and IUI-B. Second, we demonstrated that the IUI factors were invariant across clinical and nonclinical samples. Finally, we showed that the IUI general factors differentiated patients with affective disorders from community participants who never had a psychiatric diagnosis. Moreover, the general factors were aligned with the IU construct in predicting anxiety and depression symptoms.

Regarding the first issue, the factorial structure of the IUI-A was tested with mixed results in previous research. For instance, Gosselin and colleagues (2008) developed the IUI-A to assess a single construct, but they did not clarify whether one, two, or three factors represent the preferred scale structure. Two other studies (Carleton et al., 2010b; Lauriola et al., 2018) reported that a one-factor model was more tenable than any alternative model; however, each study dropped out different subsets of items to achieve acceptable fit indices in CFA. This choice was problematic because it was empirically driven and led to the conclusion that different country-specific versions of the IUI-A had to be used in clinical practice. Li and colleagues (2020) rejected the one-factor model in favor of a two-factor model, but they did not formally test which had a relatively better fit. Our findings partly agree with Li and colleagues (2020) because the two-factor model improved over the one-factor model. However, the two factors, especially the intolerance of uncertainty and uncertain situations, were inconsistent and poorly defined.

It is not unusual that correlated factors “vanish” when modeled as group factors in a bifactor model. For example, this occurred for the IUS-12 (e.g., Hale et al., 2016; Shihata et al., 2018), a scale very similar to the IUI-A in the underlying construct and number of items. Despite this, because of the excellent fit of the bifactor model and the accumulating evidence that most items are fundamentally unidimensional, the total score is still regarded as the best way to use the IUS-12 in research and clinical practice (e.g., Hale et al., 2016; Shihata et al., 2016). Since these same conditions apply to the IUI-A, we can conclude that the inconsistencies in previous research (Carleton et al., 2010b; Lauriola et al., 2018; Li et al., 2020) are mainly due to imperfections in some items. Second, outdated analysis methods, such as using estimators for continuous indicators instead of those for categorical indicators, or using CFA instead of ESEM, may have affected the model’s fit, leading researchers to search for empirical solutions with limited generalizability (Carleton et al., 2010b; Lauriola et al., 2018).

Overall, the six-factor model was supported for the IUI-B (Carleton et al.,  2010b; Gosselin et al., 2008; Lauriola et al., 2018). However, a bifactor model was found to better represent the IUI-B (Lauriola et al., 2018) and the IU construct in general (Shihata et al., 2018; Thibodeau et al., 2015). Thus, the finding that a general factor was needed to best represent the IUI-B factor structure is in keeping with previous IU studies. Moreover, our results support the view that a generalized maladaptive attitude toward uncertainty, common to several clinical manifestations of IU, could represent a transdiagnostic vulnerability factor for different affective disorders (e.g., Thibodeau et al., 2015).

Usually, psychometric scales such as the IUI are designed for clinical assessment and research. For instance, clinical patients can be assessed against normative data collected from community samples, or clinical groups can be compared to healthy controls. While previous studies have tested the IUI factor structure in community samples, no study has formally tested its measurement invariance between clinical and nonclinical participants, a precondition for comparing the corresponding group means (Putnick & Bornstein, 2016; Vandenberg & Lance, 2000). Our study showed that the IUI factors were scalar invariant across community participants and clinical patients with various affective disorders. Therefore, the IUI-A and IUI-B scores can reliably assess group differences in the general unacceptability of uncertainty and individuals’ reactions to uncertainty, respectively. In this regard, we found that clinical patients and community participants differed the most on the IUI-A and IUI-B general factors. Consistent with cognitive models of affective disorders (e.g., Lovibond, 2006), clinical patients viewed uncertainty as more unacceptable and threatening than community participants and were less able to tolerate uncertainty at a general level.

Regarding the group factors, worry was the most noticeable difference between groups. This finding is in line with the original IU research that the inability to tolerate uncertainty triggers chronic apprehension in individuals with GAD and other anxiety disorders (e.g., Boelen, 2010; Dugas et al., 2001; Einstein, 2014; Fergus, 2013). Another group factor whose latent means differed between groups was avoidance. This finding is also consistent with previous research showing that patients with affective disorders, who are also high on IU, are likely to engage in cognitive and experiential avoidance to cope with the uncertainty associated with feared situations (e.g., Sexton & Dugas, 2008).

Finally, we assessed the criterion-related validity of the IUI factors. Our study showed that GA, GB, and worry predicted symptom severity and IU, as measured by the well-established IUS-12 and the emerging DSIU scale. Collectively, these findings support the criterion-related and concurrent validity of the general factors. In contrast, the group factors, except worry, were less valid in predicting symptom severity and IU. Unexpectedly, reassurance, one of the IUI-B group factors, was negatively associated with IU. Reassurance seeking is a maladaptive strategy to cope with uncertainty more specific to OCD than anxiety disorders (e.g., Kobori & Salkovskis, 2013). For this reason, we can justify the absence of correlation between this group factor and anxiety-depressive symptoms. However, IU was also high in OCD people (Holaway et al., 2006; Tolin et al., 2003), and IU is usually positively correlated with reassurance-seeking behavior (Kobori & Salkovskis, 2013). It is worth noting that non-g factors represent variance orthogonal to the general factor in a bifactor model (e.g., Alamer, 2022; Morin et al., 2020). Because previous research did not parse g and non-g variance (Holaway et al., 2006; Tolin et al., 2003), our results are compatible with the view that the correlation between IU and reassurance seeking could be due to the general factor rather than a specific coping strategy.

Previous research has shown that the IUI-A achieved incremental validity relative to the IUS-12 in predicting depression and anxiety symptoms in a community sample (Fergus, 2013). This result was not replicated for GA with symptom severity controlling for the IU construct, and the same negative finding was observed for GB. The only evidence of incremental validity was for GA controlling for DSIU scores only. Different from previous research (Fergus, 2013), our study suggested that the general factors, and the corresponding total scores, might be of limited utility in a clinical context when used along with the IUS-12 or the DSIU. The group factors achieved statistical significance in incremental validity analyses, but the regression coefficients were small and negative in most cases. We believe that these results may be due to “negative suppression” in parameter estimation. This statistical artifact occurs in regression analyses when a predictor is more strongly associated with other predictors than with the clinical outcome (see Maassen & Bakker, 2001). Incremental validity analyses required that the predictors in the structural model were all related to the same construct, and this could precisely create the above artifact, especially for group factors, whose association with symptom severity was already observed to be smaller than their associations with the IU construct.

Although incremental validity findings might sound negative, it is worth noting that GA and GB significantly predicted clinical caseness in multigroup analyses and were well aligned with other IU measures. Therefore, our study supported the IUI’s overall validity, showing that they tapped into the same variance in clinical outcomes as competing instruments, albeit not superior in predictive performance to the IUS-12 or the DSIU. The IUI scales were similar in content and scope to the competing instruments. The IUI-A, like the IUS-12, was devised to measure IU according to the evolving definition of the construct. The IUI-B, like the DSIU, covers several cognitive and behavioral reactions to uncertainty, each likely to emerge in specific disorders. From this perspective, the greater empirical overlap of IUI-A with IUS-12 and IUI-B with the DSIU might also reflect these similarities.

Before concluding, it is worth mentioning some significant limitations of the present study. First, a mixed clinical sample was used to obtain a reasonable sample size for structural equation modeling. This characteristic impeded a fine-grained comparison based on specific diagnostic groups (e.g., latent mean differences, known group validity analysis). Second, our study suggested that the unique variance accounted for by the group factors was limited. Whereas the latent mean differences can still be reliable in the ESEM context, using subscale scores in clinical practice as reliable indicators of specific constructs currently warrants caution. Relatedly, our findings were primarily based on comparisons between the mixed clinical sample and a community sample. These comparisons might have led us to overestimate the variance of the general factors. For instance, studies have shown that anxiety and depression (both higher in the mixed clinical sample) might spuriously inflate the reporting of physical and psychological symptoms (e.g., Merckelbach et al., 2019). Likewise, clinical patients might have overreported the unacceptability of uncertainty and individuals’ reactions to uncertainty compared to community participants. Future validity studies, especially those oriented to increase our understanding of specific IU factors, might benefit from comparing broader, homogeneous clinical subgroups (e.g., GAD vs. depression) with similar tendencies to overreport psychological symptoms.

Notwithstanding limitations, the present study clarified the factor structure of the IUI and (as a byproduct) provided evidence that the Farsi-language version of the scale is a reliable and valid tool for assessing the general unacceptability of uncertainty and individuals’ reactions to uncertainty, performing as well in an eastern culture as in western ones.