Plain English summary

Disordered Eating Attitudes and Behaviors (DEAB) are a global phenomenon with high prevalence among young female adults in English- and non-English speaking cultures. Using exploratory and confirmatory analytical approaches in randomly half-split samples, we evaluated the theoretical structure of DEAB measured by the EAT-26 and the same structure holds across two languages (Arabic and English) and three BMI-based groups (underweight, normal weight, and overweight/obese) in a large representative sample of undergraduate female students of predominantly Arab ethnicity. A theoretical five-factor structure was supported in both samples. Although the resulting five subscales of the final EAT-19 demonstrated good internal consistency overall, other problematic measurement properties were identified for language and BMI. These properties pose serious measurement challenges for use of the EAT-26 or shorter versions for disordered eating screening purposes in young Arabic-speaking females of varying body weight. Our study highlights important implications for cross-culture research and measurement of disordered eating in non-clinical populations.

Background

Disordered Eating Attitudes and Behaviors (DEAB) are a global phenomenon with high prevalence among young adults in English- and non-English speaking cultures. Examples of DEAB include dieting, fasting, abusing laxatives or diuretics, self-induced vomiting, and binge eating. These behaviors are associated with increased risk of eating disorders and obesity, and are a serious public health concern [1, 2]. Early identification of DEAB may be a cost-effective public health policy especially in educational settings, where the potential for intervention and follow up are feasible and inexpensive [3, 4].

In a Muslim and Arabic-speaking country like Qatar, as in many of the Gulf countries, young women constitute a high-risk population for obesity and DEAB [5,6,7]. Rapid urbanization and economic growth has led to high rates of obesity, a shift towards fast- and processed- foods, a sedentary lifestyle, and a greater exposure to Western ideals of thinness through the media [8, 9]. However, there are currently limited screening tools for identifying young women who are at high risk of DEAB and eating disorders in this unique cultural setting.

The Eating Attitudes Test (EAT) is one of the most widely used measures of DEAB [(10)]. Originally, 40 items (EAT-40), tested in patients with anorexia nervosa and community-based controls, it was shortened (EAT-26), psychometrically tested and validated in a mixed clinical and non-clinical English-speaking sample [10, 11]. It has since been translated and adapted to multiple languages and contexts [12].

A major challenge with the EAT-26 is its elusive factorial structure. Garner et al., (1982) proposed a three-factor model based on a principal component analysis (PCA): a dieting-factor related to avoidance of fattening foods and pre-occupation with being thinner, a bulimia- and food pre-occupation-related factor, and an oral control factor [10]. Efforts to replicate this factor structure in non-clinical populations have not been widely successful. Many studies in English-speaking countries reported four and five factors instead of three, with the number of items ranging from 16 to 25 [13,14,15].

In non-English speaking community samples, four to six factors have been reported [16,17,18,19,20,21,22,23,24,25]. In series of studies, Maïano and colleagues (2013) conducted a thorough investigation of the factor structure of the EAT-26 in one of the largest samples (n = 1779) of ethnically diverse, Europeans and Africans, populations to date. This study’s sample consisted of French-speaking, 11 to 18 years of age, adolescent boys and girls, in France [26]. Using exploratory structural equation modeling (ESEM) and confirmatory factor analysis (CFA), these authors arrived at and replicated the best fitting six-factor model with 18 items of the EAT-26. These factors included Fear of Getting Fat, Eating-Related Control, Eating Related Guilt, Food Preoccupation, Vomiting-Purging Behavior, and Social Pressure to Gain Weight.

In the Middle East, although the EAT-26 has been widely used in Arabic-speaking countries [6, 7, 27, 28], fewer studies reported on its psychometric properties [29, 30]. Nasser studied the factor structure of the Arabic version of the EAT-26 in a sample of secondary school girls in Egypt using exploratory factor analysis (EFA) to confirm the original three-factor model [29]. Although Nasser demonstrated a similar three factor-solution with 16 items, the findings were inconclusive, with high internal consistency for only one-factor, the dieting subscale [29]. Nasser concluded that the scale should only be used as a screening tool for dieting and weight-related concerns and not for bulimic behaviors [29]. Although a similar three factor structure was reported for 23 items of the EAT-26 in a recent replication in Jordan, a different pattern of item-factor loadings were reported in this study of adolescent school girls [30].

Discrepant and inconclusive factor-analytic findings have sparked debate about the factorial validity of the EAT-26 as well as its overall structure and utility as a screening tool in non-clinical samples. Differences in factor structure between English and non-English speaking countries have been largely attributed to cultural differences in eating attitudes and body-figure norms [31,32,33].

Another less studied, but important aspect is the demonstration of measurement invariance or equivalence across relevant subgroups i.e. the same construct is measured in every subgroup. While a few studies demonstrated measurement invariance across cultural groups for the English version of the EAT-26, this rarely has been assessed for translations [17, 24]. Demonstrating linguistic measurement invariance in mother tongue should be a prerequisite to demonstrating cultural group differences and similarities. Without it, it is not possible to rule out the possibility that observed group differences are not generalizable across languages or cultures.

Different interpretations of the EAT may also occur if body-related variables influence the meaning and interpretations of the constructs underlying the EAT. If the EAT is supposed to screen for undifferentiated eating disorder [34], then it is supposed to measure DEAB in the same way across weight status groups. If it measures different constructs across different weight categories, this may result in erroneous screening interpretations and clinical interventions based on the same EAT score. Thus, it is important to demonstrate measurement invariance across different Body Mass Index (BMI) categories. To our knowledge, measurement invariance for BMI status has not been thoroughly examined for English and other linguistic versions of the EAT. This may account for discrepancies in the factor structure of the EAT-26 across studies [35]. We found one study that assessed the measurement invariance of the EAT across BMI categories in French-speaking populations [26]. The study reported measurement invariance across underweight, normal, and overweight categories including evidence of metric and scalar invariance [26].

The main objective of this study is to re-evaluate the factor structure of the EAT-26, while examining its measurement invariance properties across two linguistic versions (English and Arabic) and BMI categories using large probability-samples of university students, predominately from Arab ethnicity. This allows us to explore sources of measurement variability that may influence the factor analytic findings and affect interpretations concerning the factorial structure of the EAT-26.

Method

Procedures

Translation: Since there is no standard, authorized Arabic translation, the EAT-26 was translated from English to Arabic by the first author and back translated to English by another member of the research team. Minor discrepancies in translation arose and were resolved by consensus among bilingual team members. Further conceptual validation of the Arabic translation was obtained through cognitive interviews with 20 female university students. These face to face interviews tested the students’ understanding of the EAT-26 statements; in particular their conceptual, not just literal understanding of the statements was probed and verified. Building on findings from the cognitive interviews, the questions were also piloted as part of a survey (n = 120) where further probing about alternative interpretations of these statements in a semi-structured manner was elicited (a list of close-ended statements based on findings from the interviews and open-ended explanations were further elicited).

Questionnaire, Survey Mode and Administration: The EAT-26 was programmed and administered in a panel study as part of a thirty-minute online questionnaire in Qualtrics [36]. Questions were included about general health, dietary habits, weight perception, and weight-related concerns and behaviors.

Measures

Participants had a choice to complete the survey in English or Arabic, with language subgroups based on their choice. Weight status was measured using BMI (Kg/m2) based on self-reported weight and height and categorized into three groups: underweight (< 18.5), normal weight (18.5 to 24.9), and overweight or obese (25.0 or more) [37].

The EAT-26 items were measured and scored on the original six-point scale: “Never” = 1, “Rarely” = 2, “Sometimes” = 3, “Often” = 4, “Very Often” =5, and “Always” = 6. The subscales based on the factor analysis were scored as the sum of the items constituting the subscales. We did not transform the items from the original six-point scale to a four-point scale (range from 0 to 3) as per recent French study [26] and recommendation of other authors to use the original scale to preserve DEAB’s severity in non-clinical populations [34].

Sample design

The present study is based on a two-wave panel survey of female University students. Data collection for the first wave occurred between April and May of 2016 and the second wave occurred between November 2016 and February 2017. Both waves had similar sample design and response rate (52.0 and 51.7%) [38]. A total of 3138 students completed both surveys (n = 1793 and n = 1345, respectively). After removing participants who completed both waves of the survey (n = 446), the remaining students constituted the total number of observations with complete responses to all EAT items, which were used in the present analysis (n = 2692).

Pretest and Fielding: The University’s Institutional Review Board approved an ethics compliance application for the study. Each survey data collection wave was preceded by a pre-test (n = 120) to check the questionnaire’s content and skip-logic and test administration logistics.

Participants

Participants were predominantly Qataris (64.1%); the majority completed the Arabic version of the questionnaire (73.1%). Other nationalities included Egypt (5.2%), Yemen (5.0%), Palestine (4.1%), Jordan (3.8%), other Gulf countries (4.3%), other Arab countries (6.6%), Pakistan/India/Bangladesh (2.6%), Iran (2.0%), Europe/North-America (0.5%), and other Asian/Euro-Asian/African countries (1.8%). The mean age was 21.4 (standard deviation = 3.58) years (range 17–40). The mean BMI was 24.3 kg/m2 (standard deviation = 5.85) (range 11.6–72.1). The proportions of BMI categories were: 12.0% underweight, 50.9% normal weight, and 37.1% overweight or obese.

Statistical analysis

The total number of observations from both data collection waves (n = 2692) were randomly split into two samples, hereafter referred to as Sample 1 and Sample 2. All analytical procedures were conducted using STATA 14 [39] to investigate the factor structure and measurement invariance of the EAT-26. Briefly, the factorial structure analysis was carried out on the first random sample (Sample 1) and was based on the six-factor EFA solution for all items of the EAT-26. We also conducted ESEM to evaluate goodness of fit for the resulting factor structure model from the EFA stage and to compare fit with other alternative models. Based on findings from Sample 1, we conducted ESEM and a series of CFA within the Structural Equation Modelling framework in the replication sample (Sample 2). The CFAs were carried out for all the observations in Sample 2 and by language and BMI. Finally, we conducted measurement invariance tests of the final factorial model on the entire sample, after combining Sample 1 and Sample 2. Below is a step-by-step account of our analytical procedures.

Factor Structure Assessment using EFA in Sample 1 (Step 1): To determine the appropriate number of factors for the 26 EAT items, EFA was conducted using principal-factor extraction, with the resulting eigenvalues (> 1) and scree plot inspected. In addition, EFA with oblique (quartimin criterion) rotation was conducted using a pre-determined number of factors from the first step to examine each item’s factor loadings and uniqueness. Loadings on each of the factors were considered low if value was less than 0.40. Uniqueness values equal or greater than 0.70 and cross-loadings equal or greater than 0.40 were also considered as evidence of poor loading.

Factor Structure Assessment using ESEM in Sample 1 (Step 2): Since the scree plot tends to overestimate the optimal number of factors, ESEM was used to assess alternative models with fewer factors [40]. ESEM is considered better suited for examining measurement properties of psychological instruments often providing more robust estimates than CFA [41, 42]. ESEM analyses with three, four, and five correlated factors were tested using the maximum likelihood robust estimator and an oblique (quartimin) rotation. Four fit indices were selected a priori to assess model fit: comparative fit index (CFI), Tucker-Lewis index (TLI), Standardized Root Mean Square (SRMS), and Root Mean Square Error of Approximation (RMSEA). Acceptable model fit was defined by a CFI ≥0.90, Tucker–Lewis index ≥0.95, SRMR or RMSEA values ≤0.08 [43, 44]. Based on these criteria, the best fitting final model was selected.

Factor Structure Assessment using ESEM and CFA in Sample 2 (Step 3): ESEM analyses with three, four, and five correlated factors were re-run in Sample 2. The final five-factor ESEM model from Sample 1 was replicated and further reexamined by CFA with and without stratification by language and BMI subgroups in Sample 2. Factor loadings, intercepts, variances, residual variances for direction of association and magnitude were inspected, and the fit statistics were evaluated using the fit indices and criteria described above.

Measurement Invariance Testing in entire sample (Step 4): Measurement invariance of the final factorial model was examined after recombining Sample 1 and Sample 2 by fitting and comparing sequentially nested and increasingly constrained CFA models across language and BMI subgroups. First, metric invariance was examined by fitting and comparing a model imposing equality in the item-factor loadings (Model 2) relative to an equal form measurement model (Model 1) [45]. This comparison tests the assumption the items have the same meaning (slopes) across subgroups [45]. To determine whether there were important differences in the item means across the different subgroups, scalar invariance (Model 3) was subsequently tested by fitting and comparing a model imposing equality in item-factor loadings and in item means (intercepts) across groups relative to a model that only imposed equality in item-factor loadings (Model 2) [45].

The tenability of invariance at each level of model constraint was determined using the following changes (∆) in fit indices criteria between a more restricted model and the preceding one: ∆CFI ≤ 0.01 or ΔRMSEA ≤0.015 or ∆TLI ≤ 0.01 [46,47,48]. Due to sample size-independence and correction for lack of parsimony, we considered the above criteria for ∆s in fit indices superior to ∆χ2 and the primary indicators of measurement invariance in this study [42, 46, 47].

Multiple-group Comparisons of Latent Factor Means (population heterogeneity comparisons) in Entire Sample (Step 5): While still imposing equal loadings and intercepts across subgroups, we fitted SEMs that fixed the latent factor mean for a reference group at zero then estimated the means for the other groups relative to this reference group. The equal intercepts constraint was retained to ensure that any difference in item means was reflected in the means of the latent factors [45]. Latent factor means were compared and corresponding Cohen’s d effect sizes (EFs) for their differences were estimated [49]. The EFs were considered small, moderate, and large using the following respective thresholds of 0.20, 0.50, and 0.80 [49]. EFs of value less than 0.20 were considered very small or negligible even if statistically significant at alpha of 0.05.

Correlations between the latent factors from Sample 1, Sample 2, and entire sample were assessed using Pearson product-moment correlation coefficients. To check the internal consistency in the subscales, Cronbach’s alpha coefficients were computed for Sample 1, Sample 2, and for the entire sample.

Results

The numbering index and description in English and corresponding translation in Arabic for all the EAT-26 items appear in Additional file 1: Table S1.

Factor Structure Assessment using EFA in Sample 1: The scree plot (Fig. 1) suggests factors beyond the first six account for little variability in the 26 items, thus the factor structure was fixed to a maximum of six factors in a second EFA (Table 1). In this analysis, seven items had unacceptably low loadings on each of the factors and/or high unique values. As a result, “I avoid eating when hungry”, “I feel a need to cut my food into small pieces”, “I take longer than others to eat my meals”, “I display self-control around food”, “I feel uncomfortable after eating sweets”, “I like my stomach to be empty”, and “I enjoy trying new nutritionally rich foods” were dropped, leaving 19 items for the subsequent analyses.

Fig. 1
figure 1

Scree plot of EAT-26 items in Sample 1

Table 1 Characteristics of Items, the total scale and six-factor solution based on exploratory factor analysis of the EAT-26 in Sample 1

Factor Structure Assessment using ESEM in Sample 1: Three-, four- and five-factor ESEM models were run on the 19 items (Table 2). Three items had poor loadings in the three-factor model, and all fit statistics improved in the four-factor model. While the four-factor model had no items with poor loadings, all fit statistics improved in the five-factor model. In particular, the RMSEA dropped from 0.076 to 0.045, leading us to select the five-factor model for subsequent analyses.

Table 2 Goodness of fit indices from 3-, 4-, and 5-factor exploratory structural equation models (ESEM) of the EAT-19 in Sample 1

In the five-factor ESEM (Table 3), no items had high cross loadings on more than one factor and the majority had loadings on a primary factor ≥ 0.60. As in EFA, similar findings were also obtained in ESEM. Furthermore, these findings were also replicated when we re-ran the ESEM in Sample 2 (not shown) with one exception. The item “I am aware of the calorie content of foods I eat” had a loading of 0.37 on factor 2 in Sample 1 (Table 3), but 0.49 on the same factor in Sample 2. Therefore, a decision was made to retain this item.

Table 3 Five-factor model ESEM solution for the 19-item version of the EAT in Sample 1

The five factors obtained from ESEM in Sample 1 (Table 3), which was also replicated using ESEM in Sample 2 (not shown) represent the constructs of Fear-of-Getting-Fat, Eating-Related-Control, Food-Preoccupation, Vomiting-Purging-Behavior, and Social-Pressure-to-Gain-Weight. These factors explained 47.0% of the variation in the 19 items. Acceptable fit statistics were obtained for the final five-factor ESEM model (CFI = 0.976, TLI = 0.952, RMSEA = 0.045). With the exception of the correlations between Fear-of -Getting-Fat and Eating-Related-Control (r = 0.662) and Fear-of-Getting-Fat and Food-Preoccupation (r = 0.571), most factors had small or minimal correlations (seven of the remaining eight r ≤ 0.35) (Table 4). Similar correlations between latent factors were also obtained in Sample 2 (not shown) and in the entire sample (not shown).

Table 4 Correlations between Latent Factors in Sample 1

Factor Structure Assessment using CFA in Sample 2: The five-factor CFAs had acceptable fit statistics for the full Sample 2 (CFI = 0.913, TLI = 0.895; RMSEA = 0.067) with all 19 items having loadings ≥0.40 (Table 5). Acceptable fit statistics were obtained for language-based CFAs for English (CFI = 0.900, TLI = 0.879; RMSEA = 0.074) and Arabic (CFI = 0.907, TLI = 0.887; RMSEA = 0.070). Similarly, acceptable fit statistics were found for underweight (CFI = 0.851, TLI = 0.821; RMSEA = 0.075); normal weight (CFI = 0.928, TLI = 0.913, RMSEA = 0.056); and overweight or obese (CFI = 0.858, TLI = 0.829; RMSEA = 0.079).

Table 5 Goodness-of-fit for the EAT-19 5-factor CFI models for entire sample and across different groups of language and BMI categories in Sample 2

Measurement Invariance in the Entire Sample: The results supported metric invariance only for language based on the change criteria in fit statistics specified a priori (Table 6): Model 2 vs. Model 1 [∆CFI = − 0.001, ∆TLI = + 0.003, and ∆RMSEA = − 0.001]. In contrast, scalar invariance was not supported for language and failed to meet the allowed change statistics with a decrease exceeding 0.01 for ∆CFI: Model 3 vs. Model 2 [∆CFI = − 0.014, ∆TLI = − 0.009, and ∆RMSEA = + 0.003]. The tests for measurement invariance based on three BMI categories supported metric invariance only (Table 6): Models 2 vs. Model 1 [∆CFI = − 0.009, ∆TLI = − 0.002, and ∆RMSEA = 0.000]. In contrast, scalar invariance was not supported for BMI-based categories (Table 6) and failed to meet the allowed change range in fit statistics with a decrease exceeding 0.01 for both CFI and TLI, while the increase in RMSEA exceeding 0.015: Model 3 vs. Model 2 [∆CFI = − 0.114, ∆TLI = − 0.111, and ∆RMSEA = + 0.026].

Table 6 Measurement invariance for the EAT-19 5-factor model by language and BMI categories in combined sample (Sample 1 and Sample 2)

Multiple-group Comparisons of Latent Factor Means (population heterogeneity comparisons) in the Entire Sample: For the language comparison, the differences in latent factor means are presented in Table 7. Although four out of the five factors (exception factor Social-Pressure-to-Gain-Weight) demonstrated statistically significant mean differences, the corresponding EFs were considered negligible for all five factors. For the BMI comparisons, all factors except Vomiting-Purging Behavior had statistically significant mean differences and small to moderate EFs especially across the overweight and obese BMI category relative to the underweight reference category.

Table 7 Multiple-group comparisons of latent factor means and their corresponding effect sizes in combined sample

Descriptive statistics and Cronbach alphas of the 19-item five-factor subscales of the original EAT-26 are shown for the entire sample and by language and BMI sub-groups (see Table 8). The internal consistency was reasonably good for all subscales (ranging from 0.725 to 0.845). The mean scores for all subscales except Vomiting-Purging-Behavior were generally lower for respondents who completed the questionnaire in English than Arabic. Those who reported being overweight or obese had generally higher mean scores across Fear-of-Getting-Fat, Eating-Related-Control, and Food-Preoccupation subscales than those in the normal and underweight categories. For Social-Pressure-to-Gain-Weight, the opposite pattern of decreasing mean score with increasing BMI was observed, while for Vomiting-Purging-Behavior, all three BMI categories had similar mean scores (see Table 8).

Table 8 Means, standard deviations and Cronbach’s alpha for 5-factor EAT-19 subscales in the total sample and across languages and BMI status in combined sample

Discussion

The main aim of this study was to examine the factor structure of the EAT-26 in a non-clinical probability sample of young females of predominantly Arab ethnicity. The optimal factor structure was derived based on results from EFA and ESEM in the first-half of the randomly split sample. This structure was further replicated in the second-half of the sample and its fit was evaluated with and without further sub-grouping by language- and BMI. Additionally, measurement invariance tests were conducted in the entire sample assessing equivalence across language and BMI. Successive multi-group comparisons using SEM within the final five-factor CFA model tested for metric and scalar invariance and for population heterogeneity through comparison of latent factor means across these groups [50].

The resulting five-factor structure was similar to the six-factor structure reported in one of the largest factorial validation studies conducted to date on this topic using ethnically diverse French adolescents by Maïano et al. (2013) with four of the five factors comprising the same items [26]. The main observed difference in the factor structure (six versus five factors) between the abovementioned study and ours was caused by three items: “I feel extremely guilty after eating”, “I feel uncomfortable after eating sweets”, and “I like my stomach to be empty”. While, all these items loaded satisfactorily on a sixth factor, Eating-Related-Guilt factor in the other study, this was not the case for these items in our study. In particular, the item “I feel extremely guilty after eating” had significant loading on the Fear-of-Getting-Fat factor, while the other two items had no loadings above 0.40 on any factor in our study. Thus, the latter two items were dropped from our analysis at the EFA stage along with other poor-loading items. Maïano et al. (2013) eliminated the same other items that we also dropped from our study (“I avoid eating when hungry”, “I feel a need to cut my food into small pieces”, “I take longer than others to eat my meals”, “I display self-control around food”, and “I enjoy trying new nutritionally rich foods”) with the exception of “I cut my food into small pieces”. Poor loadings were also reported for all these items in the first replication of Garner’s findings in a sample of Arab girls in Egypt [29].

Despite some overlap, our results differed from the three-factor solutions proposed by two previous studies of the Arabic EAT-26 [29, 30]. Two of our factors, Fear-of-Getting Fat and Eating-Related-Control, were in the dieting factor in Nasser’s study [29] and dieting and awareness of food content in the second study [30]. The emergence of Fear-of-Getting-Fat as a unique factor in our study is consistent with the theoretical notion that distortion in body shape perception is a construct central to eating-related psychopathology and is distinct from eating-related restriction behaviors (as measured by Eating-Related-Control) [51, 52]. In this regard, our results are also consistent with CFA findings from a recent study treating perception of body shape as a separate latent factor from dieting [15].

Another important finding we share with Maïano and colleagues [26], is the support of Vomiting-Purging-Behavior as a unique factor independent of Food-Preoccupation or binge eating tendencies. While Vomiting-Purging-Behavior is an independent latent factor in our study, both “I vomit after I have eaten” and “I have impulse to vomit after meals” had poor loadings (− 0.01 and 0.12) in Nasser’s study [29]. In a more recent replication of that study, the latter was dropped, while the former item was retained within the Food-Preoccupation factor [30]. Future studies should investigate the relation between these two factors and what determines their co-occurrence (as in Bulimia Nervosa) or independence.

Our results supported weak factorial invariance in the measurement of DEAB across the two languages, a minimum requirement to meet before carrying out other theoretically important between-group comparisons [50]. However, evidence emerged against equal items’ means (intercepts), thus we failed to provide support for scalar invariance in the Arabic and English versions of the 19-item EAT.

Similar to our findings with respect to language, evidence for metric invariance was observed across BMI groups, supporting equal meaning ascribed to the same items by underweight, normal weight, overweight, and obese participants. However, our results did not support scalar invariance of the EAT-19 leading to the conclusion of weak factorial invariance of the EAT-19 for BMI-based categories [35]. This finding is inconsistent with reported strict invariance of an 18-item French version of the EAT-26 [26].

When further inspecting latent factor means for population heterogeneity, we found that the EFs for the differences in all the latent means for English versus Arabic were modest with negligible values. However, unlike language, we found that when comparing overweight and obese categories versus underweight category, the latent factors’ means, except for Vomiting-Purging-Behavior factor, were substantially different indicating the levels of the latent factors vary across groups. Specifically, individuals who are overweight and obese scored significantly higher than individuals who are underweight on Fear-of-Getting-Fat, Eating-Related-Control, and Food-Preoccupation, but significantly lower on Social-Pressure-to-Gain-Weight. This could be due to several reasons including variability in the magnitude of correlations between the latent factors across BMI categories, other psychometric properties of the EAT, and the self-report nature of our assessment method. Alternatively, it is also possible that the latent constructs that the EAT taps, as well as the five-factor model tested here, may differ across BMI categories. Future studies should clarify the present findings especially in light of previous findings indicating that the EAT can be used as a screening tool in non-clinical populations for undifferentiated eating disorders [34].

Limitations, strengths, and future directions

One major limitation of this study is that we could not establish invariance of the EAT-19 using second-order sub-grouping, such as measurement equivalence between Arabic and English within the four BMI-based categories due to sample size limitation. Future studies should endeavor to replicate or disconfirm our findings using these finer sub-groupings. A second limitation is our reliance on self-reported weight and height for BMI. The web-based administration of the EAT could reduce the generalizability of our findings to interviewer-administered questionnaires. While our sample had a good representation of female students from all over the Arab world, it is unclear whether our findings would generalize to males, females of different ages, or Arab females of lower educational status.

Conclusion

Our findings supported the five-factor solution for 19 EAT items with largely satisfactory consistency values for the resulting five subscales. Additionally, we found evidence of weak invariance across BMI-based categories as well as Arabic and English versions of the EAT-19. However, our study found a lack of scalar invariance across both language and BMI-categories, posing challenges for use of this scale for screening purposes in young Arab females. This finding is problematic for clinical screening purposes because it would mean that even when levels on the DEAB construct are identical, young Arab females belonging to different BMI-groups would still score higher or lower on the different items, giving the false impression of higher or lower levels of DEAB. Further, research into measurement invariance by BMI and cultural groups of the EAT are needed.

Specifically, the current threshold values for delineating potential cases for clinical follow-up should be adapted to reflect ethnicity and BMI-based status. In addition, separate population-based norms for the EAT score should be established for different BMI-categories and for Arabic-speaking populations. In light of our findings and the current established utility of the EAT in screening for DEAB, we recommend future studies to develop culture- and BMI-specific cut-offs when using the EAT as a screening instrument for DEAB and risk of eating disorder.