Background

Eating disorders (ED), such as Anorexia Nervosa (AN), Bulimia Nervosa (BN), and Binge-Eating Disorder (BED), are characterized by body image disturbances, abnormal eating patterns, and weight-control behaviors [1]. With an estimated lifetime prevalence of 2.6–8.4% in women and 0.7–2.2% in men, the public health relevance of ED is becoming increasingly relevant [2,3,4]. Consequences of ED are, for example, one of the leading risks of mortality among mental disorders [5] and enormous health and economic costs [6].

Although the perception that ED primarily affect women is still widely prevalent [7, 8], ED are also a significant health risk for men [9,10,11]. Since 1990, despite lower rates in general, the prevalence in men has increased faster than in women (in 2019 by 22% and 12% to 117.9 and 231.5 per 100 000 men and women, respectively; [12]). Similarly, disability-adjusted life years (DALYs) increased by 0.7% each year for men, compared to 0.6% for women [13]. Nevertheless, men’s representation in ED research remains lower than expected based on prevalence estimates [14], while experts agree that the available data may still underestimate the prevalence of men with ED, as men may be more reluctant to disclose their condition and seek treatment [15, 16].

Gender differences in ED pathology may play a central role in lowered recognition and underrepresentation of men in ED research, as standard assessments were developed based on ED presentation in adolescents and young women [17, 18]. Consequently, there have already been initial attempts to develop men-specific assessment instruments [19,20,21] as well as to validate existing ED assessments in men [22,23,24]. For example, the widely used Eating Disorder Examination-Questionnaire (EDE-Q; [25]), designed to capture the variation and severity in eating-related psychopathology [26], is based on common cognitive behavioral models of ED. Its items thus reflect the theory that the pursuit of thinness underlies ED [27]. Consistent with its conceptualization, the EDE-Q groups ED-associated pathology along four subscales labeled “Restraint”, “Eating Concern”, “Shape Concern”, and “Weight Concern” [25]. While the EDE-Q shows high levels of convergent validity, the validity of its factor structure remains an issue of ongoing discussion [28,29,30]. The original four-factor structure appears particularly problematic for men [23], for whom the concept of body image appears to be relevant, but a focus on the underlined “thinness ideal” is too limiting and may underestimate the role of concerns about musculature [27] -which, as we must note, are also increasingly relevant among women [31, 32]. These and similar findings question the validity of using the EDE-Q subscales for measuring ED pathology in men. Indeed, several studies have shown difficulties reproducing the original EDE-Q factor structure, including in cohorts of men [24, 33]. This indicates that the original factor model may not be sensitive to different ED manifestations in adult men [23, 24]. For example, Klimek et al. [34] showed that in men who identify as gay, bisexual, or otherwise non-heterosexual (n = 479), the original four-factor solution could not be applied, similar to a male Argentine sample consisting of university students and athletes (n = 986; [35]).

Thus far, however, previous investigations on EDE-Q factor structures in men have been limited to non-clinical settings [36, 37], severely limiting conclusions about the EDE-Q’s factorial validity among men with diagnosed ED. For example, it is currently unknown whether and which subscale scores should be reported for men with ED, or which subscale could be used to compare men with ED to other samples, or for tracking treatment response [24]. Hence, analysis and, if necessary, adaptation of the factor structure to specific patient groups are elementary for valid ED assessment. Therefore, the purpose of the current study was to explore the factor structure of the German EDE-Q in a clinical group of adult men with ED.

Methods

Participants and measures

The participants in this study were adult men with ED admitted for inpatient treatment (N = 188, consisted of a subsample of previously described patients) [10]. Thus, data were retrospectively analyzed. The patients had been registered for inpatient ED-specific treatment at the Klinik am Korso, Bad Oeynhausen, Germany, between January 2018 and December 2021, and received consecutive treatment at least once. Diagnoses according to ICD-10 criteria [38] were based on the clinical judgment of long-term experts in the diagnosis and treatment of ED and validated by means of peer review (group meetings). For one man with AN treated more than once at the clinic, only the data from the first admission were selected for analysis. Exclusion criteria were absence of an ED and gender other than male. In addition, only adult patients were included. The study had been approved under application AZ 2021–849 by the Ethics Committee of the Medical Faculty of the Ruhr-University Bochum at the East Westphalia Campus. The study was also registered prospectively in the German Clinical Trials Registry as part of application DRKS00028441. Data sets and analysis scripts are available from the authors upon request.

Basic data (demographics, height, and weight) of patients were recorded at admission. Height and body weight were used to compute body mass index (BMI). General psychopathology was assessed with the Symptom Checklist-27-plus (SCL-27-plus; [39]), a modification of the Symptom Checklist-90-R (SCL-90-R; [40]) with 28 items in five subscales (“Depressive Symptoms”, “Vegetative Symptoms”, “Agoraphobic Symptoms”, “Sociophobic Symptoms” and “Pain”). The answering format ranged on a five-point scale from 0 (never) to 4 (very often). Additionally, the widely used Beck Depression Inventory (BDI-I; [41]) was used to assess depression symptom severity with 21 items. The answering format ranged on a five-point scale from 0 to 4, the sum score ranged between 0 and 63. Scores up to 9 indicate minimal depression, scores of 10–18 mild depression, scores of 19–29 moderate depression, and scores of 30 and above indicate severe depression.

The cognitive and behavioral symptoms of ED were assessed using the EDE-Q. Due to the retrospective nature of the analyses, no other men-specific aspects (e.g., measures of muscularity) were available. The EDE-Q [25] is a self-assessment questionnaire assessing core ED symptoms over the past 28 days. It is based on the Eating Disorder Examination Interview [42], includes definitions and time frames for major symptoms, and can usually be completed in minutes. Here, the validated German translation [43, 44] which contains 28 items, was used. Each item was rated on a seven-point scale ranging between 1 (never) and 7 (every day). For comparability with the original coding, all data were recoded from 0 to 6 for analysis. The EDE-Q is evaluated using sum values. There is a total sum value next to sum values for four subscales that are usually determined (“Restraint”, “Eating Concern”, “Shape Concern”, and “Weight Concern”; [45]). Six further open questions assess the frequencies of compensatory behaviors and objective binge-eating. These are generally not included in the evaluation and calculation of the subscales and were also excluded from the analysis in the present context (items nos. 13 to 18). The internal consistency of the subscales of the EDE-Q was examined in four studies and showed acceptable values ranging from 0.70 to 0.93 (examined in female students and treatment-seeking men and women with BED). Re-test reliability data ranged from r = 0.66 to r = 0.94 (examined in female students, in female community samples, and in women with BN [26]). Individuals are usually classified into the clinical range based on a cut-off score of ≥ 4 (on one of the four subscales and/or total score) of the EDE-Q. The EDE-Q was applied in a standardized manner at the beginning and end of treatment. Only admission data were used for the following analyses; we did not consider using the EDE-Q data collected post-treatment for factorial validation given that treatment effects may interfere with the applicability of many of the items.

Statistical analyses

Because of the inconsistency of the internal structure of the EDE-Q across different samples as described above, it did not seem reasonable to assume that the structure of the EDE-Q found in other studies is the same in adult inpatient men with ED. Therefore, an exploratory—rather than a confirmatory—approach was chosen to analyze the factor structure of the German EDE-Q. All available data were included (missing data < 5%, data were missing completely at random according to Little's missing completely at random [MCAR] test, χ2 (df = 120) = 120, p = 0.48). The Kaiser–Meyer–Olkin measure of sampling adequacy (KMO) was used to ensure the conductibility of the exploratory factor analysis (EFA) as recommended by Dziuban and Shirkey [46]. A result near 1.0 on the KMO indicated the amount of variance of variables that could be explained by underlying factors (a minimum value of 0.5 is required; [46]). Because of their increased robustness against normality violations, EFA of polychoric correlations using principal-axis factoring was used [47]. Varimax-Rotation with Kaiser-Normalization was chosen for factor rotations to maximize the variance of the squared loadings across a factor, rather than the variance of the squared loadings for the variables [48]. Horn’s parallel analysis is considered among the best empirical methods for factor retention in factor analysis [49, 50] and was used to determine the number of factors to retain. Based on the idea that real data with an underlying factor structure should produce factors with eigenvalues larger than those derived from random data sets with identical numbers of cases and variables [51], parallel analysis compares, from first to last extracted, eigenvalues from real data with average eigenvalues from a large number of constructed random data sets (5 000, in our case). A factor is retained if its eigenvalue is larger than the parallel average eigenvalue derived from the random data sets (i.e., the adjusted difference between real and random data eigenvalues > 0). Item retainment was based on the results of an initial EFA based on the communalities after extraction, which needed to be higher than 0.3. Items with high factor loadings defined each factor and each item was referred to the factor in which it had the highest loading. Each item had to have a (moderate) loading over 0.40, as to be a clinically meaningful item in one of these factors and based on standard criteria [52]. If this value was not met, the affected items were excluded and the analysis was repeated. In addition, we considered item cross-loadings, with relative load differences below 0.20 defined as the threshold [48, 53], although we ultimately based dropping or retaining cross-loading items on their theoretical plausibility and their implications for factor structure.

For data analyses, we used SPSS Statistics version 28 for Windows [54] and R version 4.2.2 [55], using packages naniar 1.0.0 for MCAR test [56], EFA.dimensions 0.1.7.6 [57] for calculating polychoric correlations (using method “Fox”) and psych 2.2.9 [58] for the feasibility test, parallel analysis, and the EFA. Figure 2 is based on R package DandEFA 1.6 [59]. Descriptive statistics (means and SD) were computed for demographic characteristics and factors. In addition, Spearman correlations were calculated for the factors, for the factors and the BMI, for the factors and the BDI-I, and the factors and the SCL-27-plus subscales (“Depressive Symptoms”, “Vegetative Symptoms”, “Agoraphobic Symptoms”, “Sociophobic Symptoms” and “Pain”). For reliability analysis, Cronbach’s alpha was calculated to assess the internal consistency of the factors; values of 0.8 and above were considered good. As this is a retrospective analysis, no power planning had been carried out in advance.

Results

Sample characteristics

The characteristics of the sample are shown in Table 1.

Table 1 Sample Descriptive Statistics

Feasibility tests for exploratory factor analysis

Because of their increased robustness against normality violations, EFA of polychoric correlations using principal-axis factoring was used [47, 60]. The overall KMO (0.80) confirmed that the polychoric correlation matrix of items was factorable.

Results of the exploratory factor analysis

Horn’s parallel analysis [49, 50, 61] showed five factors with adjusted eigenvalues greater than 0, suggesting that five factors should be retained (see Fig. 1). The initial EFA revealed four items with communality values below 0.3 after extraction (nos. 2, 19, 21, 24), and one item with loading less than 0.4 (-0.05—0.37) across factors (no. 9). These items were therefore excluded. The second and final EFA included 17 of the original 22 items, which all met minimal communality and loading criteria. Two items (nos. 10 and 20) presented with cross-loadings based on relative loading difference criteria. However, because the cross-loadings were theoretically plausible (see below) and excluding the items neither reduced the number of suggested factors in parallel analysis nor affected the other items’ loading structure, we retained both items.

Fig. 1
figure 1

Horn’s parallel analysis

Table 2 presents the communality and loading of the items across factors. Items 1 and 3–6 primarily loaded on factor 1 (eigenvalue [EV] = 3.03, variance explained [VE] = 0.26), items 25–28 on factor 2 (EV = 2.80, VE = 0.24), items 10–12 and 20 on factor 3 (EV = 2.06, VE = 0.18), items 7 and 8 on factor 4 (EV = 1.90, VE = 0.17), and items 22 and 23 on factor 5 (EV = 1.69, VE = 0.15). All loadings ranged from 0.45 to 0.86. Thirteen of the factor loadings were above 0.70, which corresponded to an ideal loading [52], and all had values of more than 0.40, that according to Swami and Barron, [47] the sample size may be considered to be suitable.

Table 2 Item loadings on factors

The factors explained 68% of the total variance, which is to be considered a high percentage. Cronbach’s α indicated high internal consistencies across factors 1 to 5 (i.e., 0.85, 0.90, 0.83, 0.89, and 0.95, respectively), suggesting adequacy of the five-factor solution.

Spearman correlations calculated for the factors, for the factors and the BMI, for the factors and the BDI, and the factors and the SCL-27-plus subscales (“Depressive Symptoms”, “Vegetative Symptoms”, “Agoraphobic Symptoms”, “Sociophobic Symptoms” and “Pain”) are given in Table 3.

Table 3 Spearman-Correlations of the EDE-Q subscale means, the SCL-27-plus, the BDI, and the BMI

Content evaluation of the calculated factor structure

Labeling of the factors was based on a content assessment of the items with primary loadings, which were assigned to different constructs in their entirety. Compared to the item assignment of the original subscales, the “Restraint” subscale was not substantially different in the present study (factor 1). Differences were that one item from the original subscale was removed (item no 2), and item no 6 (“Flat stomach”) was added. As shown in Fig. 2, factor 2, labeled “Body Dissatisfaction”, included dissatisfaction with weight and figure, and discomfort as well as avoidance of exposure, representing a subset of the original “Weight Concern” and “Shape Concern” items. Factor 3, labeled “Weight Concern”, contained four “Eating, Weight, and Shape Concern” items related to weight loss. Finally, Factors 4 and 5 emerged as two-item solutions, independently representing “Preoccupation” (with food/calories, shape/weight) items and “Importance” (of shape and weight) items across the original “Concerns” subscale. Similar to the original EDE-Q, two items, both from the “Weight Concern” scale, presented with cross-loadings: Item 10 (“Fear of weight gain”) also loaded on “Restraint”, and item 20 (“Guilt about eating”) also loaded on “Preoccupation”. However, because the fear of gaining weight could be reasonably construed as motivating “Restraint”, and “Preoccupation” might co-occur with the feeling of guilt, we decided to retain both items, although future reformulation may be advised.

Fig. 2
figure 2

Dandelion plot of rotated factor solution [59]. Note. Central lines represent different factors. Star graphs visualize item factor loadings for each factor, with negative and positive loadings indicated by red and green, respectively. The size of each star graph and angles between solid lines represent the proportion of variance explained by each factor. The dashed line (starting from the solid line at 12 o'clock) indicates the cumulative percent of variance explained by the factor solution. Communalities and uniquenesses are shown on the right-hand side along with a bar chart of cumulative explanation ratios of factors (with individual factor variances [EV] on top)

For further exploration, we examined patient group profiles across EDE-Q factors (Fig. 3). “Restraint” (factor 1) appears to separate patients with AN or BN from patients with BED or EDNOS. “Weight Concern” (factor 3) was most pronounced for patients with BN, followed by patients with BED, AN, and EDNOS. “Preoccupation” (factor 4) was most pronounced for patients with BN, with the remaining groups all scoring similarly, but lower than in BN in comparison. All patient groups exhibited high—and similar—scores on “Body Dissatisfaction” (factor 2) and “Importance” (factor 5).

Fig. 3
figure 3

Patient group profiles across proposed EDE-Q factors

Discussion

ED are increasingly recognized as a public health risk for men [12], but the inclusion of men in ED research remains low [14]. To address this issue, initial efforts are underway to validate existing ED assessments in men, such as with the EDE-Q [22, 23]. However, previous research on the factor structure of the EDE-Q in men has been limited to non-clinical settings, which severely limits conclusions about the factorial validity of the EDE-Q in adult men diagnosed with ED. In the current study, the factor structure of the German EDE-Q was investigated in a clinical group of adult men with ED.

For the present sample, we propose a five-factor solution with 17 items, after excluding several items due to low communality or insufficient factor loadings in the EFA (see also [62, 63]). Although the EDE-Qs primary purpose resides in capturing variation in ED psychopathology rather than in distinguishing diagnostic groups, we obtained a first factor that represents “Restraint”, similar to the original EDE-Q and other studies with men [28, 37, 62,63,64,65], that appears to separate patients with AN or BN from patients with BED or EDNOS. “Weight Concern” (factor 3) appeared to primarily distinguish patients with binge-eating behavior (i.e., patients with BN or BED) from the other diagnostic groups, and “Preoccupation” (factor 4), describing subjective impairments due to weight and shape concerns, appeared to further separate patients with BN from other groups. Across diagnostic groups, however, all patients reported high levels on two emerging factors, “Body Dissatisfaction” (factor 2) and “Importance” (factor 5), suggesting these factors might mark core ED symptoms that may distinguish patients from non-patient populations.

Is a five-factor model of the EDE-Q justified? Other studies with non-clinical samples of men proposed four, three, two, or even one-factor solutions [37], raising the question if more parsimonious proposals could be appropriate. However, we believe that maintaining a distinction between “Weight Concern”, “Body Dissatisfaction”, and “Importance” is justified in a clinical sample of men, given the fact that the EDE-Q was primarily designed to assess body concerns in women. The EDE-Q includes items that assess general (e.g., “Discomfort seeing body”) or more thinness-related dissatisfaction (e.g., “Discomfort with shape”). Thus, one might therefore reasonably argue that the independent emergence of “Body Dissatisfaction”, “Weight Concern”, and (likely thinness-related) “Importance” might mirror the fact that a “thinness ideal” is too limiting to capture ED-related psychopathology in men, and that a sizeable group of men in our sample may have been dissatisfied with their body, not due to the fear of weight gain or the self-concept centrality of thinness, but for example due to dissatisfaction with their body mass and musculature [27]. Of course, many women may be similarly dissatisfied with their body mass and musculature [31, 32], highlighting the general need to consider a broad range of body image phenotypes in ED psychopathology assessments. Whether or not the distinction between Body Dissatisfaction, Weight Concern, and Importance can be maintained while including muscularity-focused items is to be examined in future investigations; for the time being, the suggested distinction may provide a useful heuristic for exploring additional men-related body image concerns in clinical samples. Future studies may also use weighting algorithms (i.e., the weighting of specific items depending on the cohort under investigation) and/or specific norms of the EDE-Q to capture different ED phenotypes among men, although we should ultimately strive towards the development of gender-inclusive diagnostic tools.

Can we generalize the proposed five-factor model beyond German men with ED? Our findings reinforce the apparent lack of invariance of the EDE-Q factor structure across different samples, which has remained a source of ongoing controversy and investigation [24, 36]. The EDE-Q items appear to group differently across different populations, and we further observed a tendency for several items to cross-load on multiple factors, which may hint at an instability of factor structures even within defined populations. In fact, we retained items rather liberally out of clinical considerations despite presenting with cross-loadings, which is not an ideal situation in psychometric evaluation. We speculate that the presence of cross-loadings may reflect an issue (or, depending on one’s perspective, a feature) of the EDE-Qs item construction, which often seems to query a theoretical conjunction of ED psychopathology and behaviors. However, there may be mutually non-exclusive causes of restraint eating or body image dissatisfaction other than thinness concerns, and the lack of population invariance of the EDE-Q and the cross-loading of items might thus simply reflect that individuals interpret and answer these conjoined items differently. In this sense, the current model, while not assuming universal validity, might help to inform the construction and extension of revised measurement items.

In discussing the present findings, it is important to note several potential limitations. Although we are not aware of any systematic comparisons of the EDE-Q factor structure based on sample characteristics (like gender, age, or treatment setting), we cannot exclude that specific sociodemographic variables might, at least in part, account for the observed structural disparities. In a similar vein, we must note that factor analysis maximizes variance explanation based on sample properties, which would lead to a lower variance explanation of the same factor solution given a different sample (i.e., “shrinkage”, see [66]). Similarly, we must caution against a definitive interpretation of the “Preoccupation” and “Importance” factors, since the obtained two-item solutions are under-identified (i.e., there are fewer unique variance and co-variance estimates than parameters in the measurement models) and would thus be difficult to validate [67]. Ideally, at least three items per factor should be included for factorial validation, suggesting a need for extending the range of items in similar future investigations. Further validation of the identified structure is therefore needed.

We further note that our sample may be considered of marginal size for conducting an EFA. Moreover, as this was a retrospective analysis, there were no further questionnaire data available (e.g., measures of muscularity), which should be included in future research to test for convergent validity. In a similar vein, we were unable to compare the patients’ responses and evaluate their ED severity using reference data, as norms for the German EDE-Q are thus far only available for general population samples [68]. The data were collected exclusively at an ED-specific clinic, which has the benefit of reducing variation among different treatment settings and other ecological determinants, while also limiting generalizability and transferability to other samples due to promoting symptom homogeneity. Another limitation is that although the diagnoses were made based on an expert clinical interview, they were not based on a standardized structured interview (such as the EDE interview). We also presented data aggregated across different ED diagnoses, limiting conclusions about the presence or absence of differences between factor structures among different diagnostic groups. Exploring these differences and commonalities ideally requires larger patient samples (> 150 per diagnostic group) and should be pursued to determine the EDE-Q factorial validity among different diagnostic groups. Further explorations may also consider including EDE-Q frequency items in addition to attitudinal items. Similar to other investigations [24, 33], we only included attitudinal items for use in EFA while excluding open-ended frequency items, although reformulation and mutual integration are viable [69]. Last but not least, because of the displayed instability in the factor structure in specific cohorts, it may be worth to investigate the diagnosis-specific EDE-Q factor structure in women (AN vs. BN. vs. BED).

Conclusions

In summary, we analyzed and presented a proposal for a factor structure of the German EDE-Q in adult inpatient men with ED. With 17 items of the original version included and the five subscales being “Restraint”, “Body Dissatisfaction”, “Weigh Concern”, “Preoccupation”, and “Importance”, the pursuit of a muscular body ideal in men may not be adequately assessed in the EDE-Q in its present form. These results add to the overall limited evidence regarding diagnostic tools available for men and may promote the development of future adequate measures.