Introduction

Body image refers to a broad and complex construct that includes perceptual, cognitive-affective, and behavioral components (Sepúlveda et al., 2002; Slade, 1994). Its perceptual area is called body esteem (BE) and is defined as the subjective assessment of one’s body (Mendelson et al., 2001). Adolescence is a risky period in which changes in body shape and body fat occur while adolescents are under sociocultural pressures to achieve the aesthetic ideal (Voelker et al., 2015). In addition, low levels of BE are found at this developmental stage (Bucchianeri et al., 2013; Holmqvist et al., 2007; Mak et al., 2012), especially among adolescents with a higher weight status (Moradi et al., 2020; Sagar & Gupta, 2018; Sander et al., 2021; Voelker et al., 2015). As low BE is related to psychological problems such as anxiety, depression, body shame, low self-esteem, and disordered eating (Bornioli et al., 2021; Forbes et al., 2012; Mendelson et al., 2002; Rousseau et al., 2015; Sander et al., 2021), it is of utmost importance to accurately assess BE during adolescence through culturally suitable assessment instruments. While several measures have been developed to assess body image in the Spanish population (Botella et al., 2009; Jáuregui & Bolaños, 2011), certain BE scales have not shown adequate psychometric properties in the Spanish adolescent population (Jorquera et al., 2005).

The Body-Esteem Scale for Adolescents and Adults (BESAA) (Mendelson et al., 2001) is one of the most widely used instruments for the assessment of BE (Kling et al., 2019). It was introduced by Mendelson et al., (1997) as an adaption of the existing Body Esteem Scale (BES) for children (Mendelson & White, 1993). The BESAA aims to assess BE among adolescent and youth populations, an age range that is not covered by other scales commonly used for a similar purpose (Franzoi & Shields, 1984; Mendelson et al., 2001). It explores three different domains of BE: (1) BE-Appearance (general feeling about one’s appearance); (2) BE-Weight (satisfaction with weight); and (3) BE-Attribution (evaluations of one’s body and appearance that are attributed to others, that is, what one thinks the opinions of people are about one’s body and appearance) (Mendelson et al., 1997). In 2001, the BESAA was validated as a 23-item self-reported questionnaire in a Canadian sample (12–25 years; M = 16.8 years). The factor analysis with oblique rotation corroborated a three-factor structure and good psychometric properties (Mendelson et al., 2001). Since then, the BESAA has been validated in several countries such as Italy (Confalonieri et al., 2008), Iceland (Jónsdóttir et al., 2008), France (Rousseau et al., 2015; Valls et al., 2011), Turkey (Arslan et al., 2020), and India (Garbett et al., 2021). Additionally, it was translated into Spanish for use in an Argentinean sample (Forbes et al., 2012), but psychometric validation was not reported.

There is some discussion regarding the psychometric structure of the scale (see Supplementary File 1 for a detailed comparison between validations). For instance, the Italian (Confalioneri et al., 2008) and Turkish validations (Arslan et al., 2020) proved the original three-factor structure after omitting nine and eight items, respectively, due to the overloading of items on more than one factor. Furthermore, the Icelandic validation replicated the three-factor structure, although some of the items that were loaded on the scales differed from those of the original (Jonsdottir et al., 2008). The BE-Attribution factor showed the weakest psychometric properties across studies, and the French versions did not identify this latent construct through factor analyses (Rousseau et al., 2015; Valls et al., 2011). Precisely for this reason, the English-language Indian validation conducted the factor analysis without the attribution factor (Garbett et al., 2021). Despite not having been validated, the Spanish version by Forbes et al., (2012) has been used in the Spanish adolescent population (Prieler et al., 2021). Since the influences of culture on BE have been well established (Holmqvist et al., 2007; Skorek et al., 2014), it is of the utmost significance to adopt a cross-cultural validation process to ensure the validity and reliability of this measure within the Spanish population.

Present Study

In the current study, we aimed to culturally validate the BESAA (Mendelson et al., 2001) within a Spanish adolescent population. First, we sought to examine Spanish adolescents’ cross-cultural adequacy and acceptability of the Argentinian-Spanish version of the BESAA (Forbes et al., 2012). Second, we aimed to assess the psychometric properties of the BESAA in terms of its internal structure, reliability (regarding internal consistency and test-retest), convergent and discriminant validity, and nomological validity. As nomological validity assesses whether the correlations between the BESAA and other variables reflect the theoretical claims in the literature (Moon, 2013), we investigated the scale’s relationship with related measures, such as body appreciation, general self-esteem, appearance-related sociocultural pressures, and eating disorders. Finally, we assessed the scale’s association with BMI z-scores. Guided by prior research, we hypothesized that the Spanish BESAA would exhibit a three-factor structure (BE-Appearance, BE-Weight, BE-Attribution) (Arslan et al., 2020; Confalioneri et al., 2008; Cragun et al., 2013; Jónsdóttiret al., 2012; Mendelson et al., 2001). We expected the BESAA subscales to be positively correlated with measures of body appreciation (Halliwell et al., 2017; Swami et al., 2008) and general self-esteem (Confalioneri et al., 2008; Mendelson et al., 2001; Rousseau et al., 2015), and negatively correlated with perceived appearance-related sociocultural pressures and the internalization of appearance ideals (Frisén & Berne, 2019) and eating psychopathology (Garbett et al., 2021; Kling et al., 2019). Finally, we expected BE to exhibit significant negative correlations with BMI z-scores (Moradi et al., 2021; Sanders et al., 2015), particularly with the BE-Weight subscale (Lieberman et al., 2001).

Hence, this paper aims to contribute to the scientific literature by bridging a gap in the assessment of BE in Spanish adolescents through the validation of a globally used scale and providing valuable insight into the BE dimensions of the Spanish adolescent population.

Method

Participants

A total of 1,258 adolescents between the ages of 12 and 18 completed the assessment protocol (M age = 15.56; SD = 1.69; female, N = 623; male, N = 583; non-binary/other gender, N = 52). The full sample was randomly divided into two subsamples. The first subsample (N = 612, M age = 15.02, SD = 1.60, female = 50.6%, male = 46.0%, non-binary/others = 3.4%) was used to perform the exploratory factor analysis. The second subsample (N = 646, M age = 15.02, SD = 1.59, female = 49.0%, male = 47.1%, non-binary/others = 3.9%) was employed for the confirmatory factor analysis. Note that both samples showed similar age and gender distribution.

Procedure

The cultural validation of the Spanish BESAA (BESAA-S) was carried out based on the Spanish version of the scale published by Forbes et al., (2012) for use in an Argentinian sample, after gaining the author’s consent. The principles of good practices from the International Test Commission (Hernández et al., 2020), and the recommendations for the validation of instruments for measuring body image (Swami & Barron, 2019) were followed. All procedures regarding sample collection in this study were performed under the ethical standards associated with scientific research in psychology. Moreover, ethical approval was obtained from the Research Ethics Committee of the Autonomous University of Madrid (CEI-98-103). The authors have no conflict of interest to declare.

A pilot study (n = 36) was carried out among a sample of volunteer adolescents to examine semantic equivalence and cultural appropriateness. The volunteers completed an online survey that comprised three questions for each item of the BESAA. Specifically, a 5-point Likert scale was used to assess the degree of clarity (1 = not clear, 5 = very clear), adequacy (1 = not adequate, 5 = very adequate), and emotional distress (1 = no discomfort, 5 = high discomfort) generated by each item. In addition, open-ended questions were formulated to gather unstructured feedback. For 91% of the participants, all the items were very clear, adequate, and generated no discomfort. As this percentage of comprehension and appropriateness was in accordance with previous guidelines (Reichenheim & Moraes, 2007; Swami & Barron, 2019), and the participants reported no difficulties in understanding the instructions and the items’ content, no changes were made to the items, and further validation of the scale was carried out.

The validation study design was cross-sectional. The recruitment of participants was conducted through five public and private secondary schools in Madrid, selected using convenience sampling, between October 2020 and May 2021. First, the researchers presented the study to the school principals, who gave their authorization. Second, parental consent and students’ informed consent for participation, which was voluntary and anonymous, were obtained. Then, the students completed the questionnaires during regular school hours through an online survey platform (Qualtrics.com). A subsample of 80 participants extracted from both initial samples (Sample one, n = 46; Sample two, n = 34), completed the BESAA one month later. Participants did not receive financial compensation but were invited to participate in a draw for two electronic tablets. The rate of non-participation in the study was 0.05%, primarily due to students being absent.

Measures

Body-Esteem Scale for Adolescents and Adults (BESAA)

The 23 items on the scale were rated on a 5-point Likert scale (0 = never, 4 = always). Body esteem levels were measured through three subscales: (1) Appearance (10 items, i.e., “I worry about the way I look”; Cronbach’s alpha (α) of 0.92); (2) Weight (eight items i.e., “I really like what I weigh”; α = 0.90); and (3) Attribution (five items, i.e., “People my own age like my looks”; α = 0.81) (Mendelson et al., 2001). The item scores of each subscale were averaged. There were no cut-off points, the higher scores implied greater BE.

The following measures were used to test the nomological validity of the BESAA. These measures were frequently used in previous validations and were appropriate for the target population.

Body Appreciation Scale (BAS)

The BAS (Avalos et al., 2005; Jáuregui & Bolaños, 2011) is an instrument that addresses a component of body image complementary to BE; the acceptance, respect, and care for one’s own body despite the level of satisfaction with its weight and shape (Jáuregui & Bolaños, 2011). Previous research has proven that body appreciation and body esteem are related constructs (Avalos et al., 2005). In addition, both scales have been used to study each other’s nomological validity (Halliwell et al., 2017; Swami et al., 2008). The Spanish-adapted version for adolescents (Jáuregui & Bolaños, 2011) measured 13 items on a 5-point Likert scale (1 = never, 5 = always). The item scores were averaged to obtain an overall body appreciation score. Higher scores indicated more positive body appreciation. The scale showed an adequate internal consistency (α = 0.94).

Rosenberg Self-Esteem Scale (RSE)

The RSE (Martín-Albo et al., 2007; Rosenberg, 1965) assesses general feelings of self-worth and self-acceptance with 10 items on a 4-point Likert scale (1 = strongly agree, 4 = strongly disagree). Given the relationship between BE and general self-esteem, previous research conceptualized BE as a component of general self-esteem (Mendelson et al., 1997). The RSE has been consistently used by previous authors to assess the nomological validity of the BESAA since its inception (Confalioneri et al., 2008; Mendelson et al., 2001; Rousseau et al., 2015; Valls et al., 2011). The scores of the items were averaged, and lower scores suggested less self-esteem. In the Spanish validation, α = 0.85–0.88 (Martín-Albo et al., 2007).

Sociocultural Attitudes Towards Appearance Questionnaire-4 (SATAQ-4)

The SATAQ-4 (Llorente et al., 2015; Schaefer et al., 2015) evaluates Western cultural standards’ level of approval concerning appearance. It measures 22 items on a 5-point Likert-type (1 = completely disagree, 5: completely agree). Three subscales measure perceived social pressure (family, peers, and media), and two subscales measure the internalization of appearance ideals (thinness and muscularity). Higher scores suggest greater acceptance of beauty standards. The Spanish version showed good reliability (α = 0.88–0.97) (Llorente et al., 2015). SATAQ and BESAA have been mutually used to test the nomological validity of their different versions (Lewish-Smith et al., 2021; Rodgers et al., 2016).

Eating Disorder Examination Questionnaire for Adolescents (EDE-Q-A)

The EDE-Q-A (Carter et al., 2001; Sepúlveda et al., 2019) used was the Spanish adolescent’s version of the EDE-Q-A (Carter et al., 2001), derived from the Eating Disorder Examination Questionnaire (EDE-Q) (Fairburn & Beglin, 1994). It measures the frequency or severity of eating psychopathology over the past 14 days on a 7-point Likert scale (0 = nothing, 6 = intensely). The original instrument has four subscales. The Spanish validation confirms a two-factor structure: Restraint (α = 0.89), and Eating, Shape, and Weight Concerns (α = 0.98). The EDE-Q was used in prior validations (Garbett et al., 2021), and was correlated with body image concerns (Lewis-Smith et al., 2020).

Weight Status

Height and weight measurements were collected through self-reported responses to calculate the participants’ Body Mass Index (BMI) (weight [kg]/height [m2]). The BMI standardized scores (BMI z-scores) were computed by comparing the participants’ BMI scores with the ideal BMI of the general population of the same age and gender (Sobradillo et al., 2004). When assessed as a continuous measure, self-reported anthropometric data presents a small margin of error to objective BMI and is a valid alternative (Ekström et al., 2015; Galán et al., 2001; Skeie et al., 2015).

Statistical Analyses

The sample size for the pre-testing of the scale was based on validation guidelines for body image instruments (Swami & Barron, 2019); likewise, the sample size for the Exploratory Factor Analysis (AFE) and Confirmatory Factor Analysis (AFC) was appropriate for the analyses, exceeding the recommended participant-to-item ratio of 20:1 (Hogarty et al., 2005; Swami & Barron, 2019;). Statistical analyses were performed in the following steps.

Sample Division

The participants were randomly assigned to Sample one or two using the Statistical Package for the Social Sciences (SPSS) software version 25. Sample one was used for the exploratory factor analysis, while Sample two was used for the confirmatory model.

Exploratory Factor Analysis (EFA)

An analysis of reliability and normality (Kolmogorov-Smirnov) was conducted for the full scale before running the EFA. The adequacy of sampling was tested using the Kaiser-Meyer-Olkin (KMO) test (Kaiser, 1970, 1974), while the strength of the relationship between the variables was assessed using Bartlett’s test of sphericity (Bartlett, 1954). It is considered that KMO values between 0.70 and 0.80 are good, values between 0.80 and 0.90 are great, and values of 0.90 and above are excellent (Hutcheson & Sofroniou, 1999). Moreover, a significant Bartlett’s test (p < 0.005) is required to run the EFA (Pallant, 2013). We then performed a parallel analysis using an SPSS macro (O’Connor, 2000) to define the number of factors to be estimated. Finally, the EFA model was computed using unweighted least squares (ULS) as the estimation method and oblimin as the rotation method.

Confirmatory Factor Analysis (CFA)

The CFA was conducted with the Lavaan package for R software (Rosseel et al., 2017). The results of the EFA were used as a guide to define the confirmatory model. The maximum likelihood method (MLM) was the estimation parameter used for the CFA analysis. The root mean square error of approximation (RMSEA) (Steiger & Lind, 1980), the comparative fit index (CFI) (Bentler, 1990), the Tucker-Lewis Index (TLI) (Bentler & Bonett, 1980), and the standardized root mean squared residual (SRMR) (Bentler, 1995) were the fit indices. RMSEA value of < 0.06 indicates a good fit and < 0.08 suggests a reasonable model-data fit, CFI and TLI values larger than 0.95 suggest relatively good model-data fit, and values around 0.90 depict a reasonable fit. Finally, SRMR below 0.08 suggests a reasonable model-data fit (Hu & Bentler, 1999).

Convergent and Discriminant Validity

Factor loading of the indicators, composite reliability (CR), and the average variance extracted (AVE) were employed for the study of convergent validity (Hair et al., 2005); to assume an appropriate convergent validity, the AVE value should be equal to or higher than 0.50. Discriminant validity, which measures the degree of differences between the overlapping factors (Hair et al., 2005), was assessed using the cross-loading of indicators (Fornell & Larcker, 1981) by comparing the squared correlations between each latent variable against the AVE scores in each latent variable; the AVE should be larger than the squared correlation with any other construct.

Reliability

Internal consistency and test-retest reliability were assessed for the final BESAA-S subscales. Cronbach’s alpha was calculated, in which values > 0.70 were considered acceptable (Nunnally, 1978). Then, Spearman’s Rho correlation coefficient was used to assess test-retest reliability, which can be interpreted as moderate, if between 0.40 and 0.69, and strong, if between 0.70 and 0.89 (Schober et al., 2018).

Nomological Validity

After checking the assumptions of normality, non-parametric tests were applied. A Spearman Rho correlation coefficient was computed between the final Spanish version of the BESAA and body appreciation (using BAS), self-esteem (using RSE), sociocultural pressures (using SATAQ-4), and eating pathology (using EDE-Q-A). Correlations of 0.10 were considered small, 0.30 were considered medium, and 0.50 were considered large (Cohen, 1992).

Results

EFA

Internal consistency was good for the full scale (α = 0.937), but the items followed a non-normal distribution (p < 0.001). Likewise, the outcomes for the KMO and Bartlett’s tests were satisfactory (KMO = 0.944; c2(253) = 9451.88, p < 0.001). Table 1 presents the outcomes of the parallel analysis and shows that only the first 3 actual eigenvalues were greater than those generated by the parallel analysis and were retained.

Table 1 Actual and Random Eigenvalues

Following these results, three factors were estimated using the ULS method and oblimin rotation procedure. The results show that the first factor (BE-Appearance) explained 44.14% of the variance, the second factor (BE-Weight), 11.40%, and the third factor (BE-Attribution), 6.90%. The three factors showed moderate correlations (r1 − 2 = 0.575, r1 − 3 = 0.231, and r2 − 3 = 0.304). Table 2-A shows the factorial weights of each item in the three estimated factors. A close inspection of the EFA results indicated that several items showed factor weights > 0.30 on two or more factors. These items were items 6, 15, 23 (BE-Appearance) and items 4, 18, 19, and 22 (BE-Weight). Moreover, item 1 (BE-Appearance) and item 3 (BE-Weight) obtained factor loadings above 0.30 on factors different than those proposed theoretically. Therefore, we decided to eliminate the problematic items and develop a short-scale version with 14 items.

A new AFE was computed for the short-scale version. The KMO (0.872) was good, and the Bartlett’s test was significant (c2 (91) = 4644.98, p < 0.001). The estimated factors explained 40.20% (Appearance), 15.58% (Weight), and 10.19% (Attribution) of the variance. Table 2-B presents the factor loadings for the short-scale version. Again, the three factors showed moderate correlations (r1 − 2 = 0.483, r1 − 3 = 0.195, and r2 − 3 = 0.347).

Table 2 A) standardized factor weights for the EFA with the full scale BESAA-S (23 items) and -B) for the Short-BESAA-S scale (14 items)

In conclusion, the BESAA-S (see Supplementary File 2) showed an adequate factorial structure.

CFA

A CFA was conducted using Sample two to test whether the factorial structure was replicated. The results showed an acceptable fit for RMSEA = 0.073, 90% CI = 0.066–0.081, CFI = 0.936, TLI = 0.920, and SRMR = 0.067. The results of the CFA are presented in Fig. 1. All factor loadings were statistically significant (p < 0.001), and the factorial structure was similar to the outcomes of the AFE. In addition, the correlations between the factors were moderate (range: 0.354–0.613).

Fig. 1
figure 1

Factor structure for the Short-BESAA-S model (14 items). (Note: AVE = Average Variance Extracted)

Convergent and Discriminant Validity of the BESAA-S

The latent factors showed adequate internal consistency (CRAppearance = 0.885, CRWeight = 0.892, and CRAttribution= 0.766). The AVE values were acceptable (> 0.50) for the Appearance and Weight factors, and unsatisfactory for the Attribution factor. In terms of discriminant validity, the AVE of the factors was greater than the square of the correlation between the factors in all comparisons. Therefore, we can assume discriminant validity between the subscales of the BESAA-S.

Internal Consistency and Test-retest Reliability

Table 3 presents the main descriptive statistics and internal consistency of the scales used for the nomological validity (Sample two).

Table 3 Descriptive statistics and internal consistency of the measures used (Short-BESAA-S)

Nomological Validity

Table 4 shows the Spearman correlations between the 14-item BESAA-S and the BAS, RSE, SATAQ, and EDEQ-A scales, as well as the BMI z-scores. All the BESAA-S subscales presented significant positive correlations with the BAS and RSE subscales, and significant negative correlations with the EDEQ-A subscales. Furthermore, the BE-Appearance factor showed a significant negative correlation with all the SATAQ subscales, and the same was found for the BE-Weight factor, except for the Internalization of Muscularity SATAQ subscale. Similarly, the BE-Attribution factor showed a significant negative correlation with all the SATAQ subscales, except for Internalization of Thinness and Family Pressure. Finally, all the BESAA-S subscales negatively correlated with the BMI z-scores, indicating that greater body weight correlated with lower BE.

Table 4 Correlations between the Short-BESAA-S and the other related variables

Discussion

The present study aimed to validate the BESAA (Mendelson et al., 2001) in the Spanish adolescent population. To our knowledge, this is the first study to culturally investigate the psychometric properties of this body image measure among urban, non-clinical adolescents in Spain. A 14-item, three-factor solution (BE-Appearance, BE-Weight, and BE-Attribution) proved the best fit for the data, resulting in a shortened Spanish version of the scale (BESAA-S).

Our results are consistent with theoretical delimitations (Mendelson et al., 2001) and previous validations among adolescent populations, specifically the 14-item Italian version (Confalonieri et al., 2008) and the 15-item Turkish version (Arslan et al., 2020), which also reported three subscales with reduced items and included similar item compositions. Similar to previous work, the main reason for the item deletion in the present study was the overloading of more than one factor. For example, item 6, “I like what I see when I look in the mirror,” and item 15, “I’m pretty happy about the way I look,” express general statements that do not appear to reflect adolescent concerns (Arslan et al., 2020; Confalonieri et al., 2008). After removing the problematic items, the shortened version showed adequate global and local adjustment indices. The fact that the scale was shortened was not a problem in itself, as the use of short scales is advisable for adolescent populations (Gordts et al., 2017; Ziegler et al., 2014).

The internal consistency of all subscales was satisfactory, which corroborates the findings of previous validations (Arslan et al., 2020; Confalonieri et al., 2008; Jónsdóttir et al., 2008; Mak et al., 2012; Mendelson et al., 2001). Moreover, our study supports the existent findings on temporal stability (Mendelson et al., 2001). The discriminant validity between the BESAA-S factors was appropriate, which appears to indicate an accurate theoretical delimitation among the subscales. The results of the convergent validity were acceptable for the BE-Appearance and BE-Weight factors, but unsatisfactory for the BE-Attribution factor. Regarding the correlation between the measures used to study the nomological validity, we expected BE and body appreciation to be positively correlated, as they are two components of the broader construct of body image (Slade, 1994). Similarly, we expected that BE would positively correlate with general self-esteem (Mendelson et al., 2001). As hypothesized, we found that the BESAA-S subscales showed a strong (BE-Weight and BE-Appearance) to moderate (BE-Attribution) positive correlation with body appreciation and general self-esteem, which reinforces the validity of the measure. Additionally, the BESAA-S subscale scores presented a significant correlation with the EDE-Q-A subscales. This is in accordance with our hypothesis, as body image concerns are a known risk factor for eating disorders (Beato-Fernández et al., 2004; Sehm & Warschburguer, 2015) and may be the underlying cause of the relationship between internalization of the ideal body and disordered eating (Flament et al., 2012). Furthermore, the results regarding the SATAQ subscales were in line with our hypothesis: the higher the perceived sociocultural pressure towards appearance and the internalization of body ideals, the lower the BE (Valls et al., 2011). However, it is worth noticing that the Internalization of Muscularity subscale showed no association with the BE-Weight factor and, of all the SATAQ subscales, had the weakest association with the BE-Appearance factor. This result may be due to the fact that the BESAA does not contain any specific items on muscularity. However, muscularity has nowadays become more relevant in the aesthetic ideal. For example, adolescent girls seek to achieve slightly toned, slim figures (Gruber, 2007; Mingoia et al., 2017), while males especially desire muscular bodies (Calogero & Tylka, 2010; Halliwell & Harvey, 2006). Therefore, further research needs to be undertaken to investigate whether BE incorporates components of muscularity. Finally, consistent with previous studies (Moradi et al., 2020; Sagar & Gupta, 2018; Sander et al., 2021; Volker et al., 2015), our data allows for the conclusion that the higher the BMI z-scores of adolescents, the lower the BE scores reported. However, contrary to expectations, the correlation between the BMI z-scores and the BE scores was weak, even for the BE-Weight dimension. Previous studies have found a stronger relationship between high BMI and body image concerns in females than males (Shriver et al., 2013; Swami et al., 2007; Van den Berg et al., 2010). One possible explanation is that, as mentioned, BE is related to social ideals of body weight, and men’s physical attractiveness is more strongly related to body muscularity than to BMI (Calogero & Tylka, 2010). Furthermore, a recent study (Lucibello et al., 2021) showed that internalized weight stigmas mediated the associations between weight perceptions and body-related shame and guilt in women, but not in men. We plan to further explore BE scores among higher weight status groups and other characteristics associated with the potential risk of eating disorders, such as gender differences (Goldhammer et al., 2019; Rancourt & McCullough, 2015), as these inquiries go beyond the scope of the present study.

Whilst the BESAA-S is shown to be a reliable and stable measure, it must be recognized that the BE-Attribution factor was found to be psychometrically weaker, as in previous studies (Arslan et al., 2020; Confalonieri et al., 2008; Jónsdóttir et al., 2008; Olenik-Shemesh & Heiman, 2017). In particular, our study points to a poor convergent validity. Accordingly, as BE-Attribution refers to the external perceptions of weight and figure rather than self-evaluation, it has been discussed that it might not strictly be a BE dimension (Garbett et al., 2021). However, the question of to what extent it matters arises. Previous research highlights that the BE-Attribution dimension gathers valuable information about the area of BE dependent on an external point of view (Mendelson et al., 2001) and is related to relevant outcomes, such as eating disorders (Ferrand et al., 2009), sexting (Bianchi et al., 2017), and attitudes toward menstruation (Lawal et al., 2020). In our view, further improvement might strengthen the psychometric properties of this scale, which may require the inclusion of new items, such as BE-Attribution items about weight perception (i.e., “Other people consider I have a proper weight”), which are lacking in the current scale.

Finally, inconsistencies in the internal structure of the scale across validations might reflect the presence of cultural variations. The work of Jung and Forbes (2007) provides evidence that cultural differences in BE may exist even among geographically proximate populations with shared values. A plausible explanation is that cross-cultural differences might also account for the equivocal nature of the BE-Attribution dimension, which is likely to be influenced by the sociocultural context to a great extent (Miller, 1984). Moreover, it is also feasible that the variation between the BESAA validations is age-dependent. For instance, the French version (Valls et al., 2011) is the exception among the previous works in terms of having found a three-factor structure, and having been the only one to perform the validation with an adult population (18–30 years old). Therefore, the psychometric validation of body image-related scales in new target populations prior to their use is a need to which the present study contributes.

The current study is not without limitations. The validation was based on the Argentinian-Spanish version of the BESAA, the BESAA-S. Although semantics and syntax do not change significantly in different Spanish-speaking countries, we recognize that culture can play an influential role in the adaptations of assessment instruments (Lira & Caballero, 2020). To address this limitation, we conducted a pilot test of the scale to assess item clarity, appropriateness, and emotional discomfort. Moreover, the current study only included non-clinical participants aged 12 to 18 years. Thus, further research is needed to validate the BESAA-S in a young adult Spanish population and explore its value in clinical practice.

Despite these limitations, we believe this study presents major strengths. Firstly, a strict validation procedure was carried out. We carried out the initial analyses from an exploratory perspective, not assuming a priori hypotheses. Two different samples, of adequate sizes, were used for the exploratory and confirmatory analyses to reduce the probability of overfitting. Moreover, test-retest reliability analysis was performed, which we consider a contribution to previous literature, as most existing validations have not assessed temporal stability (see Supplementary File 1). Secondly, this work contributes to the understanding of the psychometric properties of this scale by drawing comparisons with previous validations published in non-English languages, which have generally been omitted in previous works (i.e., Garbett et al., 2021). Finally, this study has clear implications. The BESAA-S has proven to be a valid measure to assess BE among Spanish adolescents, suggesting its applicability for future adolescent-related research. Having a validated instrument in the target population lends credibility to scientific research, and in this case, allows us to know the BE levels of the Spanish adolescent population. Moreover, given that this is a construct closely linked to general and eating psychopathology, our study contributes a first step toward the science needed to build the evidence base for related psychopathology prevention and treatment interventions.

Overall, BE refers to a nuanced concept that goes beyond low or high BE. This paper contributes to the state of knowledge on this construct by providing the psychometric properties of the Spanish version of a widely used body esteem scale (BESAA-S) among Spanish adolescents.