Examining the Cross Cultural Validity and Measurement Invariance of the Emotion Beliefs Questionnaire (EBQ) in Iran and the USA

People’s beliefs about emotions contribute to their psychological wellbeing, and two important beliefs about emotions concern their controllability and usefulness. Recently, the Emotion Beliefs Questionnaire (EBQ) was developed to assess beliefs about the controllability and usefulness of positive and negative emotions. To date, most psychometric studies of the EBQ have been conducted with Western populations, and no studies have examined the EBQ’s psychometric properties among adolescents. We examined the psychometric properties of the EBQ among Iranian adolescents (n = 557), Iranian adults (n = 347), and American adults (n = 242). Participants also completed Implicit Theories of Emotions Scale (ITES), Perth Emotion Regulation Competency Inventory (PERCI), and Depression Anxiety and Stress Scale-21 (DASS-21) for measuring the concurrent validity of the EBQ. Confirmatory factor analyses supported the intended four-factor model that distinguishes between controllability and usefulness facets of beliefs about emotions across positive and negative emotions within all three samples. Importantly, this four-factor model was found invariant in terms of gender, age, and culture groups. Furthermore, the EBQ demonstrated good internal consistency, test-retest reliability, and concurrent validity. Our findings indicate that the EBQ has strong psychometric properties among both Asian and Western samples and can be utilised with adolescents too.

each other, are consistent with longstanding philosophical debates about the nature of emotions, and also influence both acute emotional responses and long-term health. They have also argued that the conceptualization of these two superordinate beliefs is inclusive, which means it covers a range of related constructs (e.g., attitudes, expectancies, opinions, theories). Some research has demonstrated these two categories of beliefs about emotions influence psychological well-being and emotion regulation engagement. For example, De Castella et al. (2018) have shown that individuals who consider emotions to be uncontrollable reported poor emotion regulation self-efficacy, less usage of adaptive regulation strategies, and poorer social adjustment. On the other hand, higher belief in the controllability of emotions has been shown to be associated with higher levels of emotion regulation self-efficacy, adaptive regulation strategies like cognitive reappraisal, and greater social adjustment. These findings make the assessment of beliefs about emotion an area of great interest for researchers and clinicians.
Based on Ford and Gross's theoretical framework, Becerra et al. (2020) proposed three criteria for an optimal measure of beliefs about emotions. Specifically, they argued that an optimal measure of beliefs about emotions should (1) be able to assess the controllability and usefulness domains separately; (2) assesses these domains of beliefs about emotions at the superordinate level, which means measuring beliefs about emotions as a general concept, not assessing beliefs about one's own emotions; and (3) provides valencespecific assessment of beliefs about controllability and usefulness of positive and negative emotions. After reviewing the existing measures of beliefs about emotions, Becerra et al. (2020) found that none of them met all the above-mentioned criteria. To address this assessment gap and provide a measure consistent with Ford and Gross's framework, they developed the Emotion Beliefs Questionnaire (EBQ). The EBQ is a 16-item self-report measure that assesses the two main dimensions of beliefs about controllability and usefulness of emotions at a superordinate level for both negative and positive emotions. The EBQ provides a total score, with higher values indicating more maladaptive beliefs that emotions are, in general, uncontrollable and useless. In addition, four valence-specific subscale scores can be derived from the EBQ for each belief set and emotion valence: (1) Negative-Controllability, (2) Positive-Controllability, (3) Negative-Usefulness, (4) Positive-Usefulness. The EBQ has demonstrated good validity and reliability (Becerra et al., 2020). For example, the EBQ showed good convergent validity as it was found that more maladaptive beliefs about controllability and usefulness of emotions were correlated with and predicted greater depression, anxiety, stress, and poorer emotion regulation ability. The total score as well as all EBQ subscales also have shown acceptable to good levels of internal consistency reliability (Cronbach's alpha = 0.70-0.88).
Previous studies have examined the psychometric properties of the EBQ among the general population of Western societies, specifically Australia (Becerra et al., 2020) and the USA (Becerra et al., under review). However, it is presently unclear whether the EBQ can measure beliefs about emotions in a valid and reliable manner among other populations, and if the current four-factor model with the corresponding subscales would be the best fitting factor structure among different populations. One such population are adolescents as the EBQ has been examined only among adults and no studies have examined its utility for adolescents despite adolescence being a crucial time for examining beliefs about emotions (Ford et al., 2018). Adolescence provides a unique transitional time from childhood to adulthood in which significant changes occur in multiple domains, including formation of beliefs about emotion (Tamir et al., 2007), and these beliefs may play a pivotal role in the development of mental disorders during this vulnerable developmental period. In addition, no study has examined the psychometric properties of the EBQ among Persian populations, despite the fact that further psychometric studies of EBQ with more diverse participant groups are required to maximize its utility. Finally, no studies examined the measurement invariance of EBQ with samples from different cultures. The measurement invariance for an instrument is required when comparing different groups using that measure to enable an accurate and meaningful comparison between the groups. Without measurement invariance between intended groups and cultures, the comparative analyses may results in compromised and even misleading results (Putnick & Bornstein, 2016). To address these gaps, the current study was designed to examine the psychometric properties of the Persian version of the EBQ among Iranian adults and adolescents, and to test its measurement invariance between the Iranian sample and an American adult sample. We tested its factor structure, measurement invariance, internal consistency, test-rest reliability, and concurrent validity.

Participants and Procedure
Ethics Approval for the project was obtained from the University of Western Australia Human Research Ethics Committee and Babol University of Medical Sciences in Iran. All participants provided informed consent for their data to be used. For the Iranian adolescent sample, parents provided the consent form for their adolescents to participate in the study. Three samples of participants were recruited.

Iran Adolescent Sample
The Iranian adolescent sample included 672 participants from three elementary schools from Gilan and Tehran cities in Iran. They completed the Persian version of the questionnaires via Porsline online survey platform (https://survey. porsline.ir/). The same data cleaning and participant exclusion criteria were used as with the EBQ original psychometric study (Becerra et al., 2020). Specifically, participants with incorrect responses to attention check questions (which asked them to select a specific scale response) and those who completed the questionnaires too quickly for attentive responding (i.e., less than 2 s for each item; see Becerra et al., 2020, Preece et al., 2018 were excluded. The final sample consisted of 557 adolescents (52.96% female). Adolescents were between 12 and 17 years old with the mean age of 14.94 and SD of 1.29.

Iran Adult Sample
The Iranian adult sample consisted of 409 participants from general population. They were recruited through advertisements in social media, including Telegram and WhatsApp, and completed the Persian version of the online survey via the Porsline platform (https://survey.porsline.ir/). Sixty-two participants were excluded in quality screening because they failed an attention check question or completed questionnaires too quickly (the same exclusion criteria were used for the adult sample as the adolescent sample). The final sample consisted of 347 participants (48.13% female; M age = 33.77, SD = 9.10, range = 18-60). Iranian adult participants were given access to five monetary prize draws for their participation.

United States Adult Sample
The American adult sample consisted of 268 participants who was recruited using the Amazon Mechanical Turk (MTurk; Litman et al., 2017). They completed the English version of the questionnaires as part of an online survey. After excluding participants with inattentive or too quick responses using the same exclusion criteria as the other samples, the final sample consisted of 242 participants (40.08% female; M age = 40.69, SD = 11.91, range = 20-73). They were compensated US$3 for participation.

Emotion Beliefs Questionnaire (EBQ)
The EBQ is a 16-item self-report measure of beliefs about emotions (Becerra et al., 2020). It measures the controllability and usefulness dimensions of beliefs for positive and negative emotions. The items are rated on a 7-point Likert scale (1 strongly disagree to 7 strongly agree), with higher scores indicating stronger belief that emotions are uncontrollable and useless. In addition to the four valence-specific subscales that mentioned before, two composite scores could be computed using summing the two controllability subscales into a General-Controllability composite and the two usefulness subscales into a General-Usefulness composite. A total score from all 16 items could also be computed that reflects an overall marker of maladaptive beliefs about emotions. The English version of the EBQ was translated into Persian by a native Persian speaker psychologist (the first author) and back translated into English by an independent translator. The back-translated version was checked by the developer of the original EBQ (the last author). A few minor corrections were applied. A copy of the final Persian version of the EBQ with scoring instructions is provided in the supplementary materials.

Implicit Theories of Emotions Scale (ITES)
The ITES (Tamir et al., 2007) is a 4-item self-report measure that assesses beliefs about controllability of emotions, and currently is the most widely employed tool in this area (e.g., Becerra et al., 2020;Tamir and Ford, 2012). The ITES measures beliefs about emotions in general and regardless of their valence (e.g., "Everyone can learn to control their emotions"). All items are answered on a 6-point Likert scale (from 1 = strongly disagree to 6 = strongly agree) and greater scores indicate stronger beliefs that emotions are controllable. Previous studies found psychometric support for the validity and reliability of the ITES (Burnette, 2010;Reffi et al., 2020).

Perth Emotion Regulation Competency Inventory (PERCI)
Participants' emotion regulation ability was assessed using the PERCI that measures one's emotion regualation ability across both positive and negative emotions (Preece et al., 2018). The PERCI comprises 32 items that are rated on a 7-point Likert scale (1 = strongly disagree to 7 = strongly agree), with higher scores reflecting poorer emotion regulation ability or more emotion regulation difficulties. Other than a total score, PERCI provides eight subscale scores and five composite scores from different dimensions of emotion both components (Model 6, see Fig. 1). In addition, to test if considering different valence domains and belief categories represented by the four-factor model improves the fit of the data with the latent structure of beliefs about emotions, we tested some simpler models as comparative baselines (Becerra et al., 2020;Mazidi et al., 2023b). These models were as follows: Model 1 was a one-factor model comprised a general factor. Model 2 was a two-factor model that distinguished items based on negative and positive valence but did not distinguish between the controllability and usefulness components. Model 3 was a non-valence two-factor model that distinguished between controllability and usefulness items but did not distinguish between positive and negative valences. Models 4 and 5 were both three-factor models and both made a distinction between the controllability and usefulness components, but for model 4 the valence distinction was made only for the controllability component, while for model 5 the valence distinction was made only for the usefulness component.
Model goodness-of-fit was judged based on four fit indices: the comparative fit index (CFI), Tucker Lewis index (TLI), root mean square error of approximation (RMSEA), and standardised root mean residual (SRMR). CFI and TLI values ≥ 0.90 were judged to indicate acceptable fit, as were RMSEA and SRMR values ≤ 0.08 (Bentler & Bonett, 1980;Browne & Cudeck, 1992;Marsh et al., 2004). The models also were directly compared using the Akaike Information Criterion (AIC), which penalises for model complexity, and lower values indicate a better fitting model (Byrne, 2016). Factor loadings ≥ 0.40 were considered meaningful loadings (Stevens, 1992). regulation ability. In the current study, the total score, the positive emotion regulation and negative emotion regulation scores were used. The PERCI has shown good validity and internal consistency in previous studies (Preece et al., 2018(Preece et al., , 2021. The Persian version of the PERCI has indicated the same factor structure and good psychometric properties (Mazidi et al., 2023a).

Depression Anxiety and Stress Scale-21 (DASS-21)
The DASS-21 was employed to measure participants' depression, anxiety, and stress symptoms over the past week (Lovibond & Lovibond, 1995). The DASS-21 comprises 21 items that are rated on a 4-point Likert scale (0 = did not apply to me at all to 3 = apply to me very much) and provide three subscales for each symptom category, as well as a total scale score as an overall marker of psychological distress. Both Persian and English versions of the DASS-21 have shown good psychometric properties inlcuding excellent internal consistency and good construct and convergent validity (Antony et al., 1998;Kami et al., 2019;Habibi et al., 2017).

Factorial Validity
Confirmatory factor analyses (CFA; maximum likelihood estimation with robust standard errors with the Satorra-Bentler [SB] scaled χ2 statistic and robust standard errors) were conducted using the lavaan package (Rosseel, 2012) for R version 4.0.2. We examined the four-factor model identified by Becerra et al. (2020) that distinguishes between the controllability and usefulness components, and also makes a distinction between positive and negative emotions for

Descriptive Statistics and Reliability Coefficients
Descriptive statistics and reliability coefficients for all EBQ subscales and composite scores are displayed in Table 1. All EBQ subscales and composite scores showed acceptable to excellent alpha reliabilities, with the exception of the Positive-Controllability subscale that showed below 0.7 reliabilities for Iranian adults (although its omega coefficient was 0.7). With respect to test-retest reliability, the ICC values were moderate to good for the negative-controllability, Positive-Controllability, Negative-Usefulness, Positive-Usefulness, General-Controllability, General-Usefulness, and total scale (0.58, 0.64, 0.73, 0.51, 0.65, 0.62, and 0.65, respectively).

Factor Structure
Fit indices for all CFA models for the three samples are displayed in Table 2. The intended four-factor model found to be the best fitting model in the three samples and indicated a good fit to the data according to all fit indices. All items loaded well on their intended subscale factor (i.e., > 0.40; see Table 3), and all factors were significantly positively correlated except the positive usefulness factor that was not correlated with the negative-controllability factor in the Iranian adults and adolescents samples (see Table 4). The fourfactor model was substantially better fitting than the other models, thus confirming the statistical value of distinguishing between the different valence categories and subscale components.

Measurement Invariance
Next, the measurement invariance of the four-factor model was tested across gender, age and culture. To test the measurement invariance of the four-factor model across gender, this model was tested separately for females (n = 475) and males (n = 429) in the two Iranian samples 1 . Because the model indicated acceptable fit indices for both gender groups, we continued with the configural invariance test. Equality constraints were imposed on all factor loadings, and the ΔCFI (= -0.001) indicated full metric invariance (see Table 5). Next, scalar invariance was tested by imposing equality constraints on all item intercepts, and full scalar invariance was found. The same procedure was 1 We replicated the measurement invariance analyses for gender separately for each of three sample of participants and the pattern of results were the same indicating configural, metric, and scalar invariance for each sample. See supplementary material for the results of this analysis.

Measurement Invariance
To examine the measurement invariance across gender, age, and culture, the best fitting factor model was tested separately for each group in the entire three samples (Joshanloo & Bakhshi, 2016). Then the basic configural invariance model (equal form) was tested followed by progressively more restrictive measurement invariance tests: metric invariance test (equal factor loadings), and scalar invariance test (equal intercepts). Models were compared in terms of the CFI. Full invariance was indicated when an absolute difference in CFI (ΔCFI) was less than 0.01 (Cheung & Rensvold, 2002).

Internal Consistency and Temporal Stability
Cronbach's alpha (α) reliability coefficients were calculated for all EBQ subscale and composite scores. Values ≥ 0.70 were judged acceptable, ≥ 0.80 good, and ≥ 0.90 excellent (Groth-Marnat, 2009). Temporal stability was also measured using data from 79 of adolescents who completed EBQ again after 2.5 weeks. The test-retest reliability was computed using the Intra-Class Correlation (ICC) as this method is a more precise compared to Pearson correlation (Portney & Watkins, 2009;Shrout & Fleiss, 1979). For ICC, values between 0.50 and 0.75 indicate moderate reliability, values between 0.75 and 0.90 indicate good reliability, and values greater than 0.90 indicate excellent reliability (Koo & Li, 2016;Portney & Watkins, 2009).

Relationships with Other Constructs/Measures
Pearson correlations were calculated between EBQ scores and ITES, DASS-21, and PERCI scores. We predicted that EBQ would correlate with ITES, which is another measure of maladaptive beliefs about emotions; and we expected significantly larger correlations between EBQ controllability beliefs and ITES than between EBQ usefulness and ITES, because ITES only measures beliefs about controllability of emotions. Regarding the emotion regulation ability, because conceptual frameworks of beliefs about emotions implicate maladaptive beliefs about controllability and usefulness of emotions as critical factors that negatively affect emotion regulation (Ford & Gross, 2019), we expected a positive association between greater EBQ scores and poor emotion regulation ability indicated by higher PERCI scores. Similarly, significant correlations were predicted between higher EBQ scores and heightened levels of depression, anxiety, and stress. (n = 347) and American participants (n = 242). The four-factor model showed configural and full metric invariance for culture. However, the model showed noninvariance at the scalar level as the ΔCFI exceeded the 0.01 criterion. Inspection of the modification indices indicated that freeing the constraints for items 1 and 3 would improve the model fit.
After doing so, the ΔCFI (= 0.008) indicated partial scalar followed for the measurement invariance of age, which was carried out using Iranian adolescents (n = 557) and Iranian adults (n = 347) to control for potential effects of culture.
The four-factor model showed configural, metric, and scalar invariance across age categories too as the CFI values did not differ substantially (i.e., < 0.01). Next, measurement invariance for culture was tested across Iranian adults

Discussion
The aim of the present study was to examine the psychometric properties of the Emotion Beliefs Questionnaire (EBQ) among Iranian and American adults, as well as a sample of Iranian adolescents. The measurement invariance of the EBQ in terms of gender, age, and culture was also tested. In what follows, the main findings are reviewed and discussed in turn. With respect to the factorial structure of the EBQ, the predicted four-factor model showed as the best fitting model among all three samples. These findings extend the current literature by demonstrating for the first time the utility of the EBQ among an adolescent sample and Iranian adults, and provide further evidence in support of the previous findings for the good psychometric properties of the EBQ among Australian adults (Becerra et al., 2020). The superiority of the four-factor model that discriminates between beliefs about controllability and usefulness for negative and positive emotions strongly supports the benefit and importance of distinguishing between negative and positive emotions and shows that beliefs about emotions are multidimensional.
With respect to concurrent validity, the expected correlations were found between the EBQ and other measures. Specifically, higher levels of maladaptive beliefs about emotions were associated with greater depression, anxiety, and stress, and more difficulty in emotion regulation. Moreover, invariance for culture (see Table 5). The intercepts for these items were higher for Iranians (b = 3.08, and 3.69, for items 1 and 3, respectively) compared to American participants (b = 2.23, and 2.78, for items 1 and 3, respectively).

Concurrent Validity
Correlations between EBQ and ITES, DASS-21, and PERCI for all three samples were consistent with our expectations. A table containing all Pearson correlations is provided in the supplementary materials. Greater overall maladaptive beliefs about emotions, as assessed by the EBQ total score, was significantly associated with: higher levels of depression, anxiety, and stress (rs = 0.22 to 0.54, all p < .01), poor emotion regulation ability (rs = 0.44 to 0.69, all p < .01), and

Table 6
Pearson Correlations between administered measures reliability scores for all subscales, as well as composite scores and the total score. The measurement invariance of the confirmed four-factor model of the EBQ was also measured across gender, age, and cultures in the current study. Full metric and scalar invariance was obtained for gender and age, which indicates that the latent structure of beliefs about emotions, in a manner that is measured by the EBQ, is similarly construed by male and female participants, as well as adults and adolescents (Putnick & Bornstein, 2016). Perhaps a more important finding was the strikingly similar ways in which the EBQ items are interpreted by Iranian and American participants. Although, a partial scalar invariance was found for culture, the finding that 14 out of 16 items of the EBQ are invariant across these two cultures is quite promising. This is important because it shows that this instrument can be confidently employed to measure and compare beliefs about emotions between individuals who differ in these demographic backgrounds, and pave the way for more robust cross-cultural studies in this field (Cheung & Rensvold, 2002).

Conclusions, Implications, and Limitations
In sum, the results of the current study supported the EBQ as a tool with strong psychometric properties that can be employed to measure two main dimensions of beliefs about emotions across different emotional valence among adults as well as adolescents. The EBQ demonstrated good reliability and convergent validity for all three samples and possessed full metric and full scalar invariance across gender, and age, and full metric and partial scalar invariance for cultures, which indicates that the construct and items are understood and responded to largely similarly by these group members. A few limitations must be considered when interpreting the findings of the current study. The samples recruited in this study were not from clinical populations and the present findings may not be generalizable to clinical samples. Moreover, the EBQ test-retest reliability was not assessed for Iranian and American adult samples, and the study did not recruit American adolescents for more direct comparisons between Iranian adolescents. Finally, we solely relied on online data collection methods in the current study. While this approach offers advantages in terms of participant diversity and efficiency, we acknowledge the need for future research to employ alternative data collection methods to provide complementary information on the validity and generalizability of the current findings (see Newman et al., 2021). These limitations notwithstanding, the current study was the first examination of the EBQ among an adolescent sample, and among Iranian participants. Thus, our findings supported the conceptualization of the EBQ as as predicted, the controllability subscale of the EBQ, compared to its usefulness subscale, showed significantly larger correlations with ITES scores, which measures beliefs about controllability of emotions. It has been demonstrated that more maladaptive beliefs about emotions is associated and predict higher psychopathology. For example, it has been shown that college students who believed emotions cannot be changed and are uncontrollable, experienced greater depression, more loneliness, and poorer social adjustment over a year (Tamir et al., 2007). It also has been shown that accepting mental experiences including emotions predict better psychological health and well-being (Ford et al., 2018). A surprising but interesting finding of the current study was the lack of significant association between adolescents' beliefs about usefulness of negative emotions and their depression, anxiety, and stress. This finding was replicated in a separate sample of Iranian adolescents too (n = 436; Mazidi et al., in preparation). Currently, it is difficult to speculate about the reasons for this finding due to the scarcity of studies on beliefs about emotions that distinguished between emotions and specifically examined beliefs about usefulness of negative emotions among adolescents. It remains for future studies to examine if this interesting pattern of associations is found among adolescents in western countries too, and what developmental or cultural factors may contribute to it. Regardless of the reasons for this finding, the distinct patterns of association found for usefulness beliefs for negative and positive emotions with depression, anxiety, and stress provides further support for the importance of distinguishing between emotion valences when measuring beliefs about emotions. This should also be noted that a significant but relatively weak association was found between adolescents' stronger belief that negative emotions are useless and higher levels of emotion regulation difficulty, which shows that considering negative emotions as useless may still have a negative impact on emotion regulation ability.
In terms of reliability, acceptable to excellent internal consistency was found for almost all subscales and composite scores of the EBQ among the three studied samples. The only exception was the internal consistency of the general usefulness composite score for adolescent sample which was 0.68 (but acceptable for the Iranian adult sample and excellent for the USA adult sample). However, it should be noted that the reliability of both subscales of usefulnessnegative and usefulness-positive were acceptable. We recommend prioritising using the usefulness subscales over its composite score for adolescent sample given the distinct pattern of association these two subscale showed and the reliability of the composite score. The test-retest reliability of the EBQ among adolescents showed acceptable to good