A comprehensive analysis of the psychometric properties of the contingencies of self-worth scale (CSWS)

The Contingencies of Self-Worth Scale (CSWS) is a widely used personality self-report questionnaire developed for measuring the domains in which self-esteem is sustained by successes and achievements as well as threatened by obstacles and failures. Two studies (Nstudy1 = 453, Nstudy2 = 293) aimed to further refine our knowledge of its psychometric properties. Results attested that, at the first-order level, the originally hypothesized seven-factor model proved to be the best-fitting one, but the inclusion of a method factor significantly improved the fit to the data. At the second-order level, the model with two higher-order variables representing private sphere and public sphere of CSW fit better than alternative models. Finally, there was evidence that first- and second-order domains had a good degree of construct and discriminant validity. Overall, these studies provided a step forward in refining the psychometric structure of the CSWS.

People differ about the domains they regard as relevant for their self-worth. Some students, for example, might base their self-esteem on scholastic activities, leading to a strong relationship of school achievement with self-esteem. Other students, instead, might invest more in social relationships or physical appearance. For them, the relationship between school achievement and self-esteem is expected to be very weak or nonsignificant. Indeed, according to James (1890), feelings of personal worth are entirely dependent on one's objectives in the world and thus on one's achievements in valued domains. Drawing upon James's intuition, Crocker and Wolfe (2001) introduced the construct of contingencies of self-worth and defined them as "the domains in which selfesteem is bolstered by successes and achievements, and threatened by setbacks and failures" (Crocker et al. 2003, p. 894). Accordingly, when people are faced with negative events in a specific valued domain, defensive responses or clear reductions to self-esteem levels are expected (see . To assess the hypothesized seven sources of selfesteem and to test their model,  developed the 35-item Contingencies of Self-Worth Scale (CSWS). This scale assesses people's perceived sense that their own judgments of self-worth are influenced by their behavior or by the outcomes reached in key domains of experience. Since its introduction, this scale has been widely adopted (with 1158 references in Google Scholar as of March 6, 2020), has been adapted in seven cultures (Self and Social Motivation Lab 2015), and has consistently shown good psychometric properties in several earlier empirical studies (see Luhtanen and Crocker, 2005;Park et al. 2007).
However, a review of the literature revealed that the hypothesized 7-factor model supported by  failed to show a consistently good fit to the data in several successive studies (i.e., Bentea 2016;Kempenaers Enrico Perinelli and Guido Alessandri should be considered co-first authors. They contributed equally to this work and the order of their names was arbitrary. Electronic supplementary material The online version of this article (https://doi.org/10.1007/s12144-020-01007-5) contains supplementary material, which is available to authorized users. Maricuțoiu et al. 2012). Thus, we hypothesized that one likely source of misfit is represented by the inclusion of positively and negatively worded items. While the presence of positively and negatively worded items has been often recommended to prevent response styles (Baumgartner and Steenkamp 2001;Paulhus 1991;Podsakoff et al. 2012), their presence has been documented to lead to several issues such as ingenerating artificial deviation from the hypothesized scale structure (Alessandri et al. 2011). We deemed this hypothesis compelling given that similar issues have been attested for other instruments used to assess individuals' self-esteem (see Alessandri et al. 2015).
We conducted two studies with the aim of providing a full evaluation of the psychometric properties of the CSWS in terms of structural validity and external correlates. Furthermore, we introduced and investigated the validity of a new bifactorial structure of the scale that controls for method effects due to item wording, thus allowing for a purer measurement of each of the seven domains. Moreover, we conducted an in-depth investigation of the second-order structure of the CSWS, examining its congruency with the theory suggesting that the seven first-order domains reflect the two basic typologies of intra-individual vs. inter-individual evaluations (see . We conducted our analyses on two samples, on a combined number of about 500 individuals, using cross-sectional, lagged, and intensive data. In this way, we hope to contribute to the growing literature regarding the CSWS by offering a detailed overview of its psychometric properties.

Contingencies of Self-Worth
According to Crocker and Wolfe (2001), the contingencies of self-worth (CSW) are the domains on which people's selfesteem depend, so that progresses or successes in these domains boost self-esteem, and failures or setbacks lead to reductions in self-esteem. The CSW were theoretically derived on the assumption that one's self-regard is based, at least in part, on a set of valued intrapersonal and interpersonal domains (see Crocker and Wolfe 2001). The CSW model of self-esteem focuses on the domains on which people stake their self-worth rather than on differences between people in whether they have contingent self-esteem or not. The authors of the model (Crocker and Wolfe 2001) selected seven domains, starting from the most known and widely adopted domains of competencies and social approval and further expanding them to include other common sources of self-esteem. Overall, these seven theoretically-derived CSW domains are clustered into two broad domains. The first is the "intrapersonal domain" which includes the subdomains of virtue and God's love. Most individuals base their sense of worthiness on being coherent to a moral code, which leads to the feeling of being a worthwhile person (Solomon et al. 1991). Therefore, virtue has been posited as one of the seven domains of the CSW. The domain of God's love reflects the fact that one's belief of being loved and unique in God's eyes might have a positive effect on self-esteem, especially for religious individuals (Benson and Spilka 1973;Blaine and Crocker 1995;Spilka et al. 1985).
The second is the "interpersonal domain" which includes the five subdomains of academic competence, competition, approval from others, appearance, and family support. Peoples' self-evaluation of the degree of CSW in these areas are conceived to be correlated but distinct from general self-esteem (i.e., judgments of overall selfworth; see Crocker and Wolfe 2001). The domain of competence, precisely of academic competence, has been selected because many previous studies (e.g., Coopersmith 1967;Demo and Parker 1987;Harter 1986;Hoge et al. 1990;Rosenberg et al. 1995;Richman et al. 1987) found that academic outcomes are related to global self-esteem. The domain of competition is linked with self-esteem because people might base their self-esteem not only on being competent but rather on being better than others (Cross & Madson, 1997;Josephs, Markus, & Tafarodi, 1992). The approval from others domain regards what people believe others think of them, and it is taken into account because this belief plays an important role for global self-esteem (e.g., Cooley 1902;Coopersmith 1967;Harter 1986;Mead 1934;Shrauger and Schoeneman 1979;Wylie 1979). Since the affection of close others might be particularly important to self-esteem, and perceived approval or love from family members is related to self-worth, family support is one of the seven domains of the contingencies of self-worth (Harter 1986). Finally, the domain of appearance has been included because it is one of the strongest predictors of global self-esteem among adolescents (Harter 1986).

Present Contribution
Given the above considerations, as following we introduce our studies, which aim at investigating some psychometric characteristics of the CSW scale. In Study 1, we conduct a thorough evaluation of the internal structure of the CSW scale. In particular, we would investigate the fit and the comparison of several first-and second-order competing models, in order to find the best solutions. Then, in Study 2, we try to mimic Study 1 findings in terms of internal structure, and we also investigate the relationship of CSW dimensions (first-and second-order) in regards of external variables of interest, in order to provide a better understanding of the nomological network of CSW. In Supplementary Material Section, we reported Mplus scripts for running all analyses.

Study 1
We designed this first study to conduct an in-depth evaluation of the structure of the CSWS at both the first-and second-order level using a large sample of university students. To this aim, we will investigate the competitive fit of alternative first-order structures that have been proposed in the literature. Then we will move to the second-order level, where the dimensionality of the scale has been seldom analyzed. This is surprising given the growing interest of researchers in investigating the higher-order structure of different constructs and scales (e.g., Isiordia et al. 2017). Moreover, it is noteworthy to say that a higher-order structure was initially hypothesized by  several years ago, but its robustness has never been tested empirically. Thus, we aimed at comparing different second-order models in order to find the best fitting one. In the following section, we describe the previous literature and the rationale of our goals in more detail.

Aim 1: Analysis of the First-Order Structure
For evaluating the structural validity of the CSW scale at the first-order level, we compared its theoretical model (positing seven correlated factors representing the seven theoretical CSW domains) with a set of alternative models proposed in the literature or representing extensions of existing models. Each of these models offer a very different interpretation of CSWS structure, and are all presented in Fig. 1. Model 1 is the theoretical model described above in which the seven domains of CSW were correlated. The following three models were introduced by , and they have since been investigated in previous studies on the dimensionality of the CSWS. Model 2 is a one-factor model with all items loading on a single general dimension of contingent self-worth. Model 3 posits two latent dimensions representing internal and external contingencies . Accordingly, items assessing God's love and virtues were loaded on the first factor (internal contingencies), whereas items assessing the remaining five dimensions were loaded on the second dimension (external contingencies). Model 4 is a three-factor model including the dimensions of "Self-esteem goes up", "Self-esteem goes down", and "Self-esteem depends" (see . We expected Model 1 to show a better fit than Models 2-4. However, given the presence of negatively worded items, we further investigated the presence of method variance by explicitly introducing a negative method factor in the model. Accordingly, Model 5 was a bifactor model suited to investigate the impact of method effect on scale dimensionality. This model used a bifactor approach (Reise 2012) for modeling method effect along with substantive factors (see Alessandri et al. 2010Alessandri et al. , 2011Alessandri et al. , 2015 and thus allowed us to disentangle true CSW variance from method variance ingenerated by the item formulation and keying. This model refines our understanding of item behavior and represents an improved version of Model 1 (rather than an alternative model).
More in detail, Model 5 used a version of the correlated trait-correlated method minus one framework (Eid et al. 2003), where "method" refers to the direction of item wording (keying). The M -1 approach was widely suggested in presence of method factors, given that it has been showed to reduce model complexity, to improve easiness of parameter interpretation, and to enhance model estimation (see Geiser and Lockhart 2012;Hintz et al. 2019). The positive wording method was chosen as the comparison standard, given that the CSWS has a lower number of negatively worded items (i.e., 7) than positively worded items. Therefore, in this way, a fewer number of parameters would be estimated. As a consequence, we modeled an orthogonal (i.e., not correlated with specific or substantive factors) latent factor that accounted for the residual variance shared by negatively worded items. This model is essentially abi factor model with correlated factors (given the supposed secondorder structure of the scale and the presence of a method factor; see Rindskopf and Rose 1988), which accounted for the covariation among CSWS items in terms of (a) the seven broad correlated general factors attested by the literature and (b) one method factor associated with negatively worded items (see Fig. 1, Model 5).

Aim 2: Analysis of the Higher-Order Factor Structure
The second aim, which is the more novel aim of this study, was to refine our understanding of the higher-order structure of the CSWS. As stated in the premise, the seven CSW dimensions were conceived as reflecting two higher-order domains of CSW, namely the intrapersonal and the interpersonal.  explicitly noted that "another possibility for describing the relations among contingencies is that the seven distinct factors are organized within two higher order factors, one for external contingencies and one for internal contingencies" (Crocker et al. 2003, p. 896). However, no previous studies have explicitly explored this possibility. Assuming the existence of the two higher-order factors of intraindividual and interindividual contingencies, one would expect to find a pattern of intercorrelations among Fig. 1 The five alternative first-order models for the Contingencies of Self-Worth Scale (CSWS). In each model, residual variances for the observed variables were omitted for sake of clarity. IT = Item; MFN = Method Factor associated with Negatively worded items the seven CSW first-order dimensions compatible with that hypothesis (Brown 2015). If this pattern holds, we would expect that a model positing two higher-order factors representing the intrapersonal and interpersonal dimensions (loaded by God's love and virtue and by the remaining five dimensions, respectively; see Fig. 1A in Supplementary Material, Model 6) would fit the data equally well (but it is more parsimonious in terms of free parameters) than a model with seven correlated factors.
An alternative interpretation of the second-order structure of the CSWS has been empirically derived by Stefanone et al. (2011). These authors conducted an exploratory factor analysis on the CSWS and derived two factors. The first factor was loaded by the dimensions of family support, virtue, and God's love. Given that the content of this factor appeared to reflect elements related to more traditional, personal domains, it was labeled private sphere of CSW. The second factor was loaded on the dimension of approval from others, appearance, and competition and was labeled public sphere of CSW. Academic competence cross-loaded on both factors (see Fig. 1A in Supplementary Material, Model 7).
We compared these two alternative second-order models against each other and against another alternative model positing a single higher-order factor representing a general CSW latent factor (see Fig. 1A in Supplementary Material, Model 8). This factor was specified as loaded by all the first-order dimensions, representing the alternative hypothesis that CSW blends into a single second-order dimension.

Participants and Procedures
The present study was based on a convenience sample including 453 Italian sophomore students (72.6% females), all Caucasian. The participation was voluntary and no exclusion criteria were applied (nor any outlying participant was detected or excluded). Participants were recruited by a team of researchers and agreed to complete a set of questionnaires administered by using an online platform directly at their homes. For their participation in the study, participants were offered feedback about their psychological profile. The age ranged from 17 to 58 years with a mean of 21.52 years (SD = 4.25).

Measures
The Contingencies of Self-Worth Scale (CSWS;  comprises 35 items, five for each of the seven types of contingencies described by the CSW model. Seven items are negatively worded and have been reversed before computing the total score. Participants are requested to evaluate each item using a 7-point Likert scale (from 1 = Strongly Disagree to 7 = Strongly Agree). Reliability for each of the seven subscales was widely acceptable (all Cronbach's alphas and omegas were above .75; see Table 1A in Supplementary Material).

Statistical Analyses
Given the ordinal nature of the data, models were tested with the statistical software Mplus 8 (Muthén andMuthén 1998-2017) using the weighted least squares mean-and variance-adjusted (WLSMV) estimation method. This estimator provides weighted least squares parameter estimates using a diagonal weight matrix and robust standards errors and a mean-and covariance-adjusted χ 2 test statistic (see Finney and DiStefano 2013), and it is especially suited for models with categorical variables (DiStefano et al. 2018). Model fit was evaluated, in addition to the previously presented χ 2 , by looking at values of the comparative fit index (CFI), Tucker-Lewis index (TLI), and the root-mean-square error of approximation (RMSEA). We accepted values of CFI and TLI higher than .95, and RMSEA values lower than 0.06 (Hu and Bentler 1999) or with a 90% upper limit lower than 0.10 (see Kline 2016). Hence, as a general rule, we deemed a model as fitting the data when showing values of approximate fit indices (CFI, TLI, RMSEA) within the aforementioned thresholds.
For comparing all models, given that AIC is not computed by Mplus when using a categorical estimator, we adapted a version of the AIC proposed by Yamaoka et al. (1978) for weighted least squares models (see also Banks and Joyner 2 0 1 7 ; S a l e h 2 0 1 4 ) , c o m p u t e d a s f o l l o w s : AIC = n*ln(fmin) + 2p + 1, where n is the sample size, ln is the natural log, fmin is the minimum value of the WLSMV fitting function (final iteration), and p is the number of free parameters (see Banks and Joyner 2017, p. 38, eq. 10). AIC rewards goodness of fit and includes a penalty that is an increasing function of the number of parameters estimated. We rescaled AIC values according to the following formula (Burnham and Anderson 2004) where AIC min is the minimum of the observed AIC values (among the i competitive models). This transformation forces the best model to have ΔAIC = 0 while the rest of the models have positive values. Accordingly, a model that differs less than ΔAIC = 2 from the best fitting model in a specific dataset is said to be "strongly supported by the evidence." If the difference lies between 4 ≤ and ≤ 7 there is considerably less support, whereas models with ΔAIC >10 have essentially no support (Burnham and Anderson 2004, p. 271).
In sum, the structural (internal) validity of a model (either first-order or higher-order) is attested whether a model shows (a) good fit indices and (b) a better fit than competing models.

Descriptive Statistics and Missing Data
The means for the 7-point Likert items ranged from 2.40 (Item 2, God's love) to 5.96 (Item 7, family support) with an overall mean of 4.39 (SD = 0.91). The distribution was fairly normal, indeed values for skewness ranged from −1.53 to 0.91 and values for kurtosis ranged from −0.94 to 2.90. All items were strongly correlated with their respective scale total scores (r tt : M = .63, SD = .16, minimum = .30, maximum = .92). Full details are presented in Table 1A (see Supplementary Material). For all items, the highest frequency of missingness was 2 and the minimum covariance coverage was 99.1%. Therefore, missingness was substantially negligible and therefore treated with pairwise method (as the default in Mplus when using categorical estimators). Table 1 reports goodness-of-fit indices for alternative models. Model 5, including the seven correlated CSW factors and one method effect (see Fig. 1, Model 5) showed the best fit.

First-Order Confirmatory Factor Analyses
Completely standardized loadings for Model 5 are presented in Table 2. These loadings ranged from .44 to .96 (M = .74, SD = .14) for the seven CSW factors and from .23 to .57 (M = .37, SD = .14) for the method factor associated with negatively worded items (MFN). The Average Variance Extracted (AVE) was 0.15 for method factor, 0.88 for God's love, 0.46 for virtue, 0.55 for academic competence, 0.64 for competition, 0.47 for approval from others, 0.46 for appearance, 0.49 for family support. Correlations among latent factors (see the bottom part of Table 2) were all positive and significant at p ≤ .001 except the correlations between competition with God's love (r = −.07, z = −1.382, p = .167), appearance with God's love (r = −.06, z = −1.213, p = .225), and appearance with virtue (r = .06, z = 1.233, p = .218). The remaining correlations ranged from .17 (z = 3.462, p = .001; family support with appearance) to .65 (z = 23.272, p < .001; competition with academic competence) with a mean of .37 (SD = .15).

Second-Order Confirmatory Factor Analyses
All our second-order confirmatory factor analyses started from the best fitting univariate solution (Model 5; seven   specific factors and one uncorrelated method factor). Table 1 presents goodness of fit indices for the three second-order models (Models 6-8). The fit of Model 7 resulted in the best higher-order factor solution, since it fit the data better than the competing higher-order models (see Table 1). At the secondorder level, loading values (see Fig. 2) ranged from .38 (z = 7.241, p < .001; God's love on the private sphere CSW factor) to .80 (z = 26.325, p < .001; competition on the public sphere CSW factor). The two higher-order factors of private sphere CSW and public sphere CSW were moderately correlated (r = .33, z = 5.481, p < .001).

Discussion
There were two important results offered by this study. First, items from the CSW scale were significantly contaminated by method variance. Accordingly, including a latent factor representing a method effect associated with negatively worded items is necessary not only for obtaining goodfitting models but also for obtaining a purer measure of the seven CSW domains. Second, the analysis of the second-order structure of the CSW scale revealed a significant deviation from the theoretical expectation that CSW domains should group into the two clusters initially hypothesized by . Indeed, we found evidence that the model empirically derived by Stefanone et al. (2011) best represented the observed covariance among the seven first-order CSW domains.
One aspect of this modified model is particularly important. The dimension of academic competence was revealed to be a kind of middle ground posited at the intersection of private and public domains while being specific to neither of them. The uses of scores on the two higher-order dimensions could be challenging when one is interested in using the broad dimensions of private sphere and public sphere of CSW outside of a CFA model. In conclusion, we found evidence that a mixed second-order bifactor structure best represents the covariance between the CSWS items. In Study 2, we further tested the stability of this factor solution and evaluated the external correlates of each factor.

Study 2
The first objective of this study was to further evaluate the best fitting models from Study 1 (i.e., Model 5 and Model 7; see Fig. 1 and Fig. 1A) on a different sample of students. To this aim, we retested the entire sequence of models considered in Study 1. We formulated the following two hypotheses: Hypothesis 1: Among the first-order competitive models, Model 5 is the best fitting one.  Table 1). Factor loadings are presented in standardized form and were all significant at p < .001. The first factor loading refers to Study 1, the second factor loading (after the comma) refers to Study 2. Residual variances of first-order latent variables, measurement models, and method factor (MFN; see Model 5 in Fig. 1) were omitted for sake of clarity .50 *** .58 *** .52 *** 1 . 1 9 ** 7. Family support .31 *** .40 *** .49 *** .24 *** .27 *** .17 *** 1 Note. All λs were significant at p ≤ .001 λ = Standardized factor loading; MFN = Method Factor associated with Negatively worded items. Correlations below the diagonal refer to Study 1, correlations above the diagonal refer to Study 2 n.s. = not statistically significant, or p > .10; + p ≤ .10; * p ≤ .05; ** p ≤ .01; *** p ≤ .001 Hypothesis 2: Among the second-order competitive models, Model 7 is the best fitting one.
The second and more important objective of this study was to investigate the external validity of the CSWS. To this aim, in a longitudinal design, we assessed CSW, personality traits, general self-esteem (GSE), implicit self-esteem (ISE), and religiosity in students before they began their sophomore year of college. Then we involved the same students in a daily study lasting 5 days in which they completed measures of daily self-esteem that we scored to obtain indices of self-esteem instability (GSE Inst ) and level (GSE Level ). At the end of the first semester, measures of grade point average (GPA) and depression were gathered for each student.
In accordance with correlations reported in previous studies Sargent et al. 2006;Stefanone et al. 2011;Zeigler-Hill et al. 2008), we predicted that God's love would correlate significantly and positively with religiosity (Hypothesis 3); academic competence would correlate significantly and positively with GPA (Hypothesis 4) and negatively with emotional stability (Hypothesis 4a); virtue would correlate significantly and positively with conscientiousness (Hypothesis 5) and agreeableness (Hypothesis 5a); appearance would correlate significantly and negatively with emotional stability (Hypothesis 6) and positively with depression (Hypothesis 6a); family support would correlate significantly and positively with agreeableness (Hypothesis 7); competition would correlate significantly and positively with GPA (Hypothesis 8); and approval from others would correlate significantly and negatively with GSE (Hypothesis 9). At the second-order level, we hypothesized that private sphere CSW would correlate significantly and negatively with openness (Hypothesis 10) and positively with conscientiousness (Hypothesis 10a), whereas public sphere CSW would correlate significantly and positively with energy/extraversion (Hypothesis 11) and agreeableness (Hypothesis 11a). We did not specify any hypotheses regarding the relationships between CSW and GSE Inst , GSE Level , and ISE because few studies have addressed these relationships (e.g., Maro iu et al. 2016). Furthermore, we expected that the hypothesized correlations would range between a low (.10) and medium (.30) effect size (Cohen 1992) given that previous studies have usually shown moderately low correlations between CSW and external criteria Sargent et al. 2006;Stefanone et al. 2011;Zeigler-Hill et al. 2008). The only exception is the relationship between God's love and religiosity, for which we expect a high correlation (> .50; Cohen 1992).

Participants and Procedure
This study was based on a convenience sample of 293 Italian undergraduates (77.1% females), all Caucasian, enrolled in two introductory psychology classes and compensated with partial course credit. The participation was voluntary and no exclusion criteria was selected. Their age ranged from 19 to 48 years with a mean of 20.99 (SD = 3.38). The participation was voluntary and no exclusion criteria were applied (nor any outlying participant was detected or excluded).
Participants' filled questionnaires on personality traits, religiosity, general self-esteem, and implicit self-esteem along with the CSWS administered by using an internal online platform. After two weeks, participants completed the modified version of the Rosenberg General Self-Esteem (RGSE) scale online at 24-h intervals (from 8 p.m. to 12 a.m.) for 5 consecutive days (e.g., from Monday to Friday). To enhance study participation, participants received an e-mail reminder at 7:55 p.m. each day with a link to a website to complete the daily RGSE scale. Approximately two months after the end of the daily surveys, participants received an additional booklet containing the Center for Epidemiologic Studies Depression (CES-D) scale and a question about their GPA.

Measures
Contingencies of Self-Worth Scale (CSWS). Participants filled out the same version of the CSWS as in Study 1. Reliability for each of the seven subscales was widely acceptable (all Cronbach's alphas and omegas were above .72; see Table 1A in Supplementary Material). General Self-Esteem (GSE). GSE was measured with the Rosenberg General Self-Esteem scale (Rosenberg 1965). Each item was scored on a 4-point Likert scale ranging from 1 (Strongly disagree) to 4 (Strongly agree). Cronbach's alpha reliability coefficient was .81. Daily Self-Esteem. Following the general procedure outlined by Kernis and his colleagues for measuring self-esteem instability (e.g., Kernis et al. 1993), participants were asked to complete a modified version of the RGSE scale each day. The RGSE scale was modified so that participants were instructed to give the response that best reflected how they felt at the moment that they completed the measure. Responses were offered on scales ranging from 1 (Strongly disagree) to 4 (Strongly agree). Reliability coefficients for days 1 through 5 were .86, .88, .89, .89, and .89, respectively. We computed the average self-esteem level (i.e., stable self-esteem; GSE Level ) as the mean of the daily score obtained by the participants on this scale and the self-esteem instability (GSE Inst ) as the standard deviation of each participant's score as observed across the 5-day period (see Kernis et al. 1993). Implicit Self-Esteem (ISE). The Implicit Association Test (IAT; Greenwald and Farnham 2000;Greenwald et al. 1998), implemented online through the INQUISIT software package (Millisecond Software 2000), was used to assess implicit self-esteem. In this test, the stimuli of the target-concept categories (Self vs. Others) were words related to "self" or "me" vs. "others" or "them". The stimuli words for the attribute-dimension (Pleasant vs. Unpleasant) were the emotionally-loaded attributes (e.g., positive/good vs. negative/bad). In the IAT, the participants performed two types of categorization tasks with five stimuli for each category. The words were presented in random order within each block of trials. As described by Greenwald et al. (1998), the entire procedure consisted of seven blocks of trials. Blocks 1 (Self vs. Others), 2 (Pleasant vs. Unpleasant), and 5 (Others vs. Self) were single categorization blocks of 20 trials, whereas Blocks 3-4 and 6-7 were combined blocks (Self or Pleasant vs. Others or Unpleasant) of 20 (3-6) and 40 (4-7) trials. Participants were requested to respond as quickly and accurately as possible to the stimuli-words that appeared on the monitor. Following Greenwald et al. (2003), data from blocks 3-4 and 6-7 were used to compute IAT difference scores according to the built-in error penalty method. Positive scores indicated high implicit self-esteem, and negative scores indicated low implicit self-esteem. The internal consistency of the scale scores was .56. Reliability was estimated by a splithalf index based on two partial scores, respectively computed from blocks 3 and 6 (20 + 20 trials) and from blocks 4 and 7 (40 + 40 trials) through the Spearman-Brown formula. Personality Traits. Personality traits were measured through a short version of the Big Five Questionnaire (BFQ; Caprara et al. 1993) containing 60 items (response scale ranged from 1 = Very false for me to 5 = Very true for me) that form five domain scales: energy/extraversion, friendliness/agreeableness, conscientiousness, emotional stability (vs. neuroticism), and openness. Reliability coefficients ranged from .73 (energy/extraversion) to .88 (emotional stability). Depression. Participants rated their levels of depressive symptoms using the CES-D, a 20-item scale developed by Radloff (1977). This scale measures the symptoms that characterize depression such as despondency, hopelessness, loss of appetite and interest in pleasurable activities, sleep disturbance, crying bouts, loss of initiative, and self-deprecation. The items were rated based on frequency of occurrence during the past week using a 4-point Likert-type scale, ranging from 1 = Rarely or none of the time (less than 1 day) to 4 = Most or all of the time (5-7 days). An example item is: "I was bothered by things that usually don't bother me". The reliability coefficient was .88. Religiosity. Religiosity was measured with a single item ("How religious are you?") rated on a scale ranging from 1 ("Not at all") to 7 ("Extremely").

Descriptive Statistics and Missing Data
The means for the 7-point Likert items ranged from 2.44 (Item 2, God's love) to 6.09 (Item 7, family support) with an overall mean of 4.45 (SD = .93). The distribution was fairly normal, indeed values for skewness ranged from −1.83 to 0.93 and values for kurtosis ranged from −1.08 to 4.92. All items were strongly correlated with their respective scale total scores (r tt : M = .63, SD = .17, minimum = .23, maximum = .92). Full details are presented in Table 1A (see Supplementary Material). For all items, the highest frequency of missingness was 2 and the minimum covariance coverage was 98.6%. As for Study 1, missingness was substantially negligible and therefore treated with pairwise method (as the default in Mplus when using categorical estimators).

First-Order and Second-Order Confirmatory Factor Analyses
The results of the model-fitting analyses fully replicated those obtained in Study 1 (see Table 1, Table 2, and Fig. 2). Model 5 (see Table 1) best fit the data at the first-order level, thus supporting Hypothesis 1. Factor loadings (see Table 2) ranged from .35 to .96 (M = .74, SD = .15) for the seven CSW factors and from .15 to .65 (M = .34, SD = .17) for the method factor associated with negatively worded items. The Average Variance Extracted (AVE) was 0.14 for method factor, 0.88 for God's love, 0.45 for virtue, 0.58 for academic competence, 0.68 for competition, 0.44 for approval from others, 0.48 for appearance, 0.49 for family support. Correlations among latent factors (see the bottom part of Table 2) were all positive and significant at p ≤ .01 except (consistent with Study 1) the correlations between competition with God's love (r = −.09, z = −1.573, p = .116), appearance with God's love (r = −.06, z = −0.909, p = .363), and appearance with virtue (r = .06, z = 0.960, p = .337). The remaining correlations ranged from .15 (z = 2.616, p = .009; virtue with competition) to .61 (z = 15.000, p < .001; competition with appearance) with a mean of .36 (SD = .14). At the higher-order level, Model 7 (see Table 1) provided the best data fit, thus supporting Hypothesis 2. Loading values (reported in Fig. 2) ranged from .34 (z = 5.442, p < .001; God's love on the private sphere CSW factor) to .79 (z = 11.445, p < .001; family support on the private sphere CSW factor). The two higher-order factors of private sphere CSW and public sphere CSW were moderately correlated (r = .33, z = 4.436, p < .001).

Correlations with External Criteria
To investigate the construct validity of CSWS domains, individuals' scores on the first-order CSW dimensions were computed as the average of the items composing the dimension, whereas scores on the two higher-order dimensions were computed as the average of the first-order dimensions composing each second-order dimension after excluding academic competence from both scores, as suggested by Stefanone et al. (2011).
In general, the hypothesized correlations between measures of CSW with external criteria ranged from low to moderate, as hypothesized (see Table 3). As far as measures of self-esteem are concerned, implicit self-esteem (ISE) failed to show any significant correlation with any CSW dimensions, whereas general self-esteem (GSE), level of self-esteem (GSE Level ), and instability of self-esteem (GSE Inst ) showed a few significant correlations. Indeed, GSE correlated significantly and negatively only with approval from others (r = −.13, p = .03; thus confirming Hypothesis 9); GSE Level was significantly and negatively correlated with appearance (r = −.17, p = .004) and approval from others (r = −.19, p = .001); and GSE Inst was significantly and positively correlated with competition (r = .16, p = .008) and appearance (r = .14, p = .016). Depression (measured 2 months after the end of the study) showed two positive and significant correlations with appearance (r = .19, p = .001; thus confirming Hypothesis 6a) and academic competence (r = .15, p = .014). Religiosity showed four positive and significant correlations. In particular, it was strongly correlated with God's love (r = .75, p < .001; thus confirming Hypothesis 3) and private sphere CSW (r = .63, p < .001), and it was moderately correlated with family love (r = .20, p = .001) and virtue (r = .20, p = .001). Academic GPA (measured 2 months after the end of the study) showed three interesting positive and significant correlations with academic competence (thus confirming Hypothesis 4), approval from others, and public sphere CSW (all r s = .14, p s = .02). Instead, contrary to our expectation, GPA did not significantly correlate with competition (so Hypothesis 8 was not supported).

Ancillary Analyses
Measurement Invariance across Studies We conducted a measurement invariance analysis between samples from Study 1 and Study 2 for the second-order best fitting model (i.e., Model 7). First of all, we checked the presence of differences between the number of categories for each item across samples. We found that item 7 and item 16 had a different number of categories; indeed, we found an empty cell for "item 7 category 3" and for "item 16 category 1" in Study 2. Given that the frequency for those cells was negligible in Study 1 (3 and 1, respectively) we collapsed those categories with the subsequent ones (i.e., we recoded category 3 with 4 for item 7 and category 1 with 2 for item 16). Then, we conducted a multiple-group measurement invariance analysis for ordinal data (Bowen and Masa 2015). We started by testing configural invariance simply assuming that Model 7 fits well in both samples. Indices of fit supported this first step (WLSMV-Based χ 2 = 2727.594, df = 1088, p < .001; CFI = .966; TLI = 0.963; RMSEA = 0.064). Then, in order to test metric invariance, we constrained first-order factor loadings (i.e., factor loadings linking observed variables to the seven specific factors and method factor) to be equal across samples. The diff-test function (implemented in Mplus for comparing models estimated with WLSMV) showed a significant worsening of the fit (Δχ 2 = 52.978, df = 34, p = .0201). Indeed, we found that the factor loading linking Appearance to item 21 was significantly different across samples. After removing that constraint, the model showed a good fit (WLSMV-Based χ 2 = 2680.341, df = 1121, p < .001; CFI = .967; TLI = 0.965; RMSEA = 0.061) and a non-significant diff-test (Δχ 2 = 42.503, df = 33, p = .1243). Then, given the secondorder structure of Model 7 (Chen et al. 2005), we tested the invariance of the factor loading linking the higher-order factors to the lower-order factors. Indices of fit (WLSMV-Based χ 2 = 2509.310, df = 1127, p < .001; CFI = .971; TLI = 0.969; RMSEA = 0.057) and diff-test (Δχ 2 = 6.070, df = 6, p = .4154) supported metric invariance of second-order factor loadings. Finally, we tested strong invariance by constraining all thresholds to be equal across samples. Again, indices of fit (WLSMV-Based χ 2 = 2602.568, df = 1334, p < .001; CFI = .973; TLI = 0.976; RMSEA = 0.050) and diff-test (Δχ 2 = 193.070, df = 207, p = .7478) supported this last step of invariance analysis.

Exploratory Structural Equation Model
As suggested by an anonymous Reviewer, we conducted an Exploratory Structural Equation Model (ESEM) to the best fitting firstorder solution (i.e., Model 5), in order to strengthen the validity of this model. Indeed, ESEM allows to estimate all the cross-loadings that in a common CFA are constrained to be zero (e.g., Morin and Maïano 2011). Hence, ESEM allows a thorough investigation of the relationship between all latent factors and all observed indicators, without zero-loading constraints. Thus, we estimated an ESEM starting from Model 5 structure, by specifying a "target" rotation, so that crossloadings are estimated as close to zero as possible. Results indicate a good fit to data.

Discussion
In Study 2, we confirmed results from Study 1 suggesting that the model with one method factor along with the seven substantive factors best fit the data. Moreover, we found further evidence that, at the second-order level, the covariance among the seven dimensions was best represented by two general factors grouping them in public sphere and private sphere of CSW.
An important aim of this study was to investigate the external validity of the seven CSW factors and the two higherorder factors. Our results showed a sufficient degree of external validity for each dimension included in the model. Indeed, 12 out of 14 hypotheses regarding the correlations between CSW domains and external criteria were confirmed in both their direction and size. Furthermore, the explorative analysis on the relationship between CSW and GSE Level , GSE Inst , and ISE highlighted the orthogonal relationship between CSW and these other types of self-esteem, which should be further investigated in future research.

General Discussion
We conducted two studies to provide a psychometric analysis regarding the internal and external validity of the first-and second-order structure of the Contingencies of Self-Worth Scale (CSWS). To this aim, we submitted the instrument to an in-depth psychometric investigation, comparing alternative measurement models and using several external criteria to assess the external validity of scale scores. Overall, our results suggest the tenability of the seven-factor structure when one considers an additional orthogonal method factor capturing spurious variance introduced by the presence of negatively worded items. Ignoring this additional method factor results in a model with a sub-optimal fit and biased factor loading estimates.
While confirming the first-order structure of the scale, our study also suggested that, at the second-order level, the relationships among the seven factors might differ from what was originally proposed by . Across two studies, the original model, proposing a dichotomy between internal/intrapersonal (i.e., God's love and virtue) and external/interpersonal (i.e., academic competence, competition, approval from others, appearance, and family support) contingencies, fared worse in terms of data fit than the revised model proposed by Stefanone et al. (2011), which contrasted private sphere (i.e., family support, virtue, God's love, and academic competence) and public sphere (i.e., approval from others, appearance, competition, and academic competence) of CSW.
This result is not devoid of complications given that, under this model, the academic competence domain of CSW is placed at the intersection between private sphere and public sphere of CSW. From a conceptual point of view, it is conceivable that the academic context represents an intermediate domain reflecting partly what the individual feels or strives to be (i.e., a good student in order to get a rewarding job) and partly what the individual in that specific moment is for society (i.e., a university student). In any case, this seems to be an area in need of further conceptual and theoretical developments from future research.
From a practical point of view, the cross-loading items of this dimension pose challenges to applied researchers. This is likely an area in need of further research and conceptual development before formulating any definitive recommendations. In the present study, we computed the individuals' scores on the private sphere CSW and the public sphere CSW after excluding academic competence from both dimensions. While this seems like a recommended (see Stefanone et al. 2011) and practical solution, it is not psychometrically sound. As a matter of fact, this procedure introduces a discrepancy between the structural higher-order model (in which the dimension exists) and the observed score constructed (in which the dimension is ignored). In sum, future studies should go further in proposing revisions of the CSWS aimed to clarify the nature of the academic competence dimension as belonging to the public sphere or the private sphere of CSW.
Finally, our Study 2 provided support for the external validity of the seven first-order dimensions and the two higherorder dimensions of CSWS. Surprisingly, we found few significant correlations between all the CSW dimensions and different measures of self-esteem. This might suggest that CSW are substantially different from general self-esteem (GSE) because, whereas GSE has mostly trait-like characteristics, CSW is conceptualized to be more prone to change according to variation in one's capability of self-regulation (e.g., Crockerb et al. 2006). Future studies should investigate and explain these differences (e.g., through genetic studies).
In general, as previously said, most of our hypothesized correlations were in the suggested direction, therefore supporting a certain degree of external validity; however, the low-size correlations between some CSWS domains and their hypothesized external criteria is worth of note. Indeed, although this result mimics patterns in previous works , one may speculate about the potential causes of this problem. For example, one may wonder if it reflects a theoretical misalignment between the theoretical status of the CSW and the outcomes considered (that reflects those used in the seminal work by , or if it suggests the necessity of a more deepen analysis of the CSW scale items content, or, simply, if it points to the need for a theoretically oriented reconsideration of CSW correlates. In a similar vein, recently Briganti et al. (2019) conducted a network analysis on the CSWS and find that "the seven domains of self-worth form a heterogeneous system in which domains are not uniformly positively connected with each other" (Briganti et al. 2019, p. 255). Considering together Briganti et al.'s (2019) findings and our main results (an ambiguous second-order bifactor structure and the small correlations with external variables), it seems reasonable to ponder if the CSWS dimensions are measures of self-worth. However, given that this is outside the scope of the present contribution, we recommend that this problem becomes the aim of future and dedicated studies on the content validity of the CSWS items. Indeed, the latter is a pivotal point to be addressed in order to correctly operationalize the construct of contingencies of selfworth. Another important point regards the need of managing the presence of negatively worded items. Given that the there are no clear procedure to take care of these effects using observed scores, it is likely that observed scores obtained from the CSWS scale might be contaminated. Finally, whereas our two-study contribution aimed at testing the most commonly used competing ways to modeling the internal structure of the CSWS (hence, we simply looked for the best solution among those selected models), we point out that best-fitting models (i.e., Model 5 and Model 7) are not without problems. For instance, all WLSMV-based χ 2 were significant; while it is a common procedure to rely on approximate fit indices more than on χ 2 test statistic (Ropovik 2015), a significant value of the latter indicates the presence of some model misspecifications. Our inspection of residuals and modification indices (Ropovik 2015) showed that some residual covariances among items should be freed, as well as some crossloadings. In conclusion, while we found that Model 5 and 7 represent the best solution among inspected models, we do not underestimate the possibility that future new dimensional conceptualizations and/or new reevaluations of item content are needed, in order to improve the structural validity of the CSWS. However, we believe that our contribution is a good starting point for future refinements of the scale.
Putting all said together, our findings and those of previous research may provide useful information for a future reconsideration of the structure and the item content of an instrument aiming at correctly assessing contingencies of self-worth.

Limitations
Several limitations should be acknowledged. First, some constraints on the generalizability of our findings deserve to be discussed (Simons et al. 2017). Indeed, the samples we used were recruited from the university population, so it is not possible to generalize our results to other populations. Yet, future studies should investigate whether our findings could be replicated in clinical samples or in samples with older people. Finally, we conducted both studies in Italy; hence, our findings could be generalizable at the best to other similar western cultures, but given the cultural differences in terms of contingencies of self-worth, our findings cannot be generalized to other different cultures, such as eastern countries.
Second, the CSWS was analyzed at a cross-sectional level. Future studies should test the longitudinal tenability of the CSWS structure found in our studies in terms of longitudinal measurement invariance.
Third, in this study we did not provide evidence concerning a possible substantial validity of the method factors. This might be an important task for future studies given that the literature offers similar examples (e.g., Alessandri et al. 2010Alessandri et al. , 2011.

Conclusion
The CSWS is a widely used self-report measure assessing self-worth across different domains and life situations. In this contribution, through two studies using large samples of university students, we attested (a) the validity of the seven firstorder domains, (b) the validity of two higher-order spheres of CSW (public and private) that significantly accounted for the shared variance of the first-order domains, (c) the importance of including one further orthogonal factor composed of negatively worded items, and (d) the external validity of all dimensions. In this way, we hope that our study might advance the literature on CSWS and stimulate future research on this instrument, which constitutes an important tool in personality assessment.

Compliance with Ethical Standards
Conflict of Interest All authors declares that they have no conflict of interest.
Ethical Approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent Informed consent was obtained from all individual participants included in the study.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.