Introduction

Food addiction (FA) has received considerable attention in both laboratory and clinical research [1]. This concept refers to the idea that some foods, especially those with dense calories, heavy processing or high palatability, may promote addictive consumption. FA may be considered as a kind of behavioral addiction (related to eating) or an eating problem which may not constitute a psychiatric disorder. FA may overlap with binge-eating disorder, night-eating syndrome, bulimia nervosa or other conditions [2, 3]. A hypothesis that an addictive process related to neural features may contribute to excessive eating is an underlying conceptual feature of FA [4]. However, FA is not classified as a formal diagnosis in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5), although it has been discussed as a possible psychiatric disorder [5]. People with FA often express symptoms such as considerable distress in relation to specific foods, eating more food than planned, eating more than needed to relieve hunger, feelings of lost control over food intake, unsuccessful attempts to reduce eating particular foods, and diminished interests in participating in some experiences due to fear of overeating [6, 7]. FA has been associated with multiple mental disorders including anxiety, depressive, attention-deficit/hyperactivity, post-traumatic stress and binge-eating disorders [7].

To date, there are limited data on the prevalence of FA globally. However, some general-population studies suggest that between 4 and 10% of people may experience FA, and it is more prevalent in females than males [2, 3]. Its prevalence among people with overweight or those seeking weight-loss may range between 16 and 30%, higher than in general populations [8]. A study conducted in college students in China revealed that nearly 7% of participants may have experienced mild to severe FA [9]. However, since nearly two-thirds of the world population has overweight/obesity, the role of FA as a contributor should be investigated and addressed [10]. The prevalence of obesity in China has risen from 3 to 8% during the last decade, and currently more than 90 million people live with obesity in this country [11].

Having a psychometrically sound instrument to measure FA is important for detection and timely intervention [12]. Historically, few instruments have been available to assess addictive eating behaviors, and most were not comprehensive for evaluating different aspects of addictive eating tendencies [13]. Consequently, the Yale Food Addiction Scale (YFAS) was developed to address this concern and revealed acceptable psychometric properties across different translations into languages including French, Italian, Persian, Chinese and Turkish [13,14,15,16,17]. An updated version of the YFAS (i.e., YFAS 2.0) was published in 2016. The YFAS 2.0 included four essential criteria to diagnose FA based on DSM-5-related criteria for substance-use disorders [18]. These included craving, consumption despite negative social/ interpersonal consequences, failure to perform role obligations, and consumption in physically dangerous settings [5, 18].

The YFAS 2.0 is a commonly employed measure of FA and has 35 items that assess 11 indicators of addictive behaviors, distress, and related clinical impairment [3]. An abbreviated version of this measure, the modified YFAS 2.0 (mYFAS 2.0), includes 13 items and is available for use as a short screening measure to assess FA [19]. Although, perhaps due to fewer questions, the mYFAS 2.0 is a less sensitive instrument than the YFAS 2.0 to measure addictive eating, it has demonstrated appropriate validity and reliability in several studies [9, 19, 20]. Having a valid and reliable instrument with fewer questions may help reduce burden on respondents during the screening process and can be time-saving for both participants and researchers [14]. Nevertheless, both scales currently serve as standard measures of FA with relatively similar results.

Although these instruments have been validated in many languages with acceptable psychometric properties, the validation process was mostly done based on traditional approaches using classical test theory (CTT) methods. In this approach, the quality of an item is assessed by the degree of the association between participants’ response pattern for that item and their scores for all items [21]. However, there are some shortcomings to this approach including test and sample size dependence, considering equal weights for all items while there may be differences in the difficulty levels between items, and using a constant standard error of measurement and ordinal values to compute total scores [22, 23]. These factors may influence accurate measurements. In contrast, the Rasch model applies a modern item response theory and has been recognized as a gold standard in validation processes. Thus, it may resolve many CTT-related shortcomings [21].

The Rasch model allows researchers to critically evaluate scales using parametric tests by transforming categorical data into quantitative data [24]. In the Rasch model, a scale is examined against a mathematical measurement model that clarifies what should be in the item responses using interval-based measures. The interval data versus ordinal values provide more robust and accurate findings on the structural validity and objectivity of the scale [21]. The model contains more quantitative information and a continuous scale of measurement compared to a CTT approach and assumes that each individual has a fixed latent tendency along with each item with a particular fixed difficulty [25]. The Rasch model also will help assess the unidimensionality of both the YFAS 2.0 and mYFAS 2.0, and both instruments have been found to be unidimensional [20, 26,27,28,29,30,31,32,33,34,35,36,37,38]. Moreover, both the YFAS 2.0 and mYFAS 2.0 were expected to create a single factor structure of FA to differentiate between those with or without FA [18, 19]. Therefore, we may confirm this feature by Rasch model indicating the appropriateness of the scale for such measurements. Therefore, examination of the factor structure of this scale using traditional methods like exploratory factor analysis may not be particularly helpful.

The differential item functioning (DIF) or item bias in subsamples also may be assessed using Rasch analysis. The presence of DIF suggests that the likelihood of a correct response among people who are assumed to be test-taking with equal abilities, in subgroups based on gender, race/ethnicity, income and other variables, may be different. Thus, DIF provides negative evidence for the validity of a scale across groups [24, 39].

A psychometrically sound scale should ideally be examined using various statistical techniques to provide greater empirical evidence supporting its validity [12]. To the best of our knowledge, Rasch analysis has not been previously used to assess the YFAS 2.0 and mYFAS 2.0. Thus, the current study aimed to assess the psychometric properties of the scales using this modern approach. Further, differential responses among groups based on gender and body mass index (BMI) were investigated with the hypotheses that both scales would demonstrate validity across groups.

Methods

Recruitment procedure for the online survey

The corresponding author (C-YL) sought assistance from his university students and faculty members to spread information about this online survey. The university students and faculty members were instructed to send the online survey information via multiple forums (e.g., LINE, Facebook, email, or online posts), and the faculty members were informed that they themselves were not the target population to participate in the survey. The online survey was designed in Google Forms and all survey items were set to be compulsory to avoid missing answers. Participants were informed that if they completed the survey and provided contact information, each participant could receive 100 New Taiwan Dollars (around 3.3 USD) as an incentive. Before initiating data collection via the online survey, the study was approved by the Human Research Ethics Committee in the National Cheng Kung University (Approval No.: NCKU HREC-E-109-551-2) and the Institutional Review Board in the Chi Mei Medical Center (IRB Serial No.: 11007-006).

Participants

Target participants in the current study were university students who were enrolled in an undergraduate (including bachelor’s degrees) or a postgraduate program (including master’s and doctoral degrees) in Taiwan when they completed the online survey. The inclusion criteria were (i) having the ability to read and understand online questionnaires written in traditional Chinese characters; (ii) being aged 20 years or above with the ability to provide consent for participation; (iii) not having any psychiatric disorder based on their self-report. Every respondent was requested to provide an e-form informed consent to indicate his or her willingness to participate in the current study. Information regarding the current study and participation rights was described in the first page of the online survey before the participants could click an icon (agree or disagree) to indicate their willingness to participate.

Instruments

Both the Yale Food Addiction Scale 2.0 (YFAS 2.0) and modified YFAS 2.0 (mYFAS 2.0) were used. Moreover, the YFAS 2.0 and mYFAS 2.0 were Chinese versions in the present study. Each item in the self-reported YFAS 2.0 and mYFAS 2.0 asks participants about their eating behaviors in the past year, using an 8-point Likert scale (0 = Never; 1 = Less than monthly; 2 = Once a month; 3 = 2–3 times a month; 4 = Once a week; 5 = 2–3 times a week; 6 = 4–6 times a week; 7 = Every Day). The YFAS 2.0 includes 35 symptom items which can be categorized into 11 symptom criteria (33 items) and 1 clinical impairment criterion (2 items). Except for the two clinical significance items, scores can be summed to calculate the 11 symptom criteria. In the mYFAS 2.0, only one item was selected from each of the 11 symptom criteria; along with the 2 clinical impairment items, the mYFAS 2.0 has 13 items. To determine diagnostic thresholds, a threshold for each item was established to each calculate participant’s severity level. Participants can be categorized into no (1 or fewer symptoms or does not meet criteria for clinical impairment), mild (2 or 3 symptoms and clinical impairment), moderate (4 or 5 symptoms and clinical impairment) or severe (6 or more symptoms and clinical impairment FA.

Demographic data including age, gender, level of education, marital status, monthly income, tobacco smoking, alcohol use and information to compute BMI (i.e., height and weight) were collected from participants. BMI was calculated using SPSS software.

Data analysis

In the current study, we used Rasch analysis to examine the psychometric properties of both the YFAS 2.0 (35 items) and mYFAS 2.0 (13 items). Specifically, we used Facets software to evaluate item fit to investigate test unidimensionality. DIF across subgroups (i.e., gender and BMI status) was examined.

Previous studies have validated that the one-factor model outperformed other models for both the YFAS 2.0 [18] and mYFAS 2.0 [19]. Therefore, all items were examined together assuming one underlying trait in Rasch analyses. Additionally, as the YFAS 2.0 and mYFAS 2.0 can be scored in two manners (i.e., symptom counts and diagnostic criteria), we analyzed their psychometric properties using both options. First, each symptom item can be summed to calculate the symptom counts (ranging from 0–35 and 0–13 in the YFAS 2.0 and mYFAS 2.0, respectively). Second, diagnostic criteria were evaluated according to pre-determined thresholds of the sum of the corresponding symptoms to determine FA severity (i.e., no, mild, moderate, and severe FA). Symptom counts were evaluated with a rating scale Rasch model (rating 0–7) while the diagnostic criteria were evaluated with the dichotomous Rasch model (rating 0 or 1). The test unidimensionality of the YFAS 2.0 and mYFAS 2.0 were examined with goodness-of-fit statistics: items with an infit mean square (MnSq) above 1.5 associated with a standard deviation above 2 were considered misfit [40]. If mean square statistics are acceptable, the standardized score (Zstd) can be ignored. However, if an item has an infit MnSq > 1.5, it indicates a deviation from unidimensionality and should be revised or removed from the scale [41]. We expected that less than 5% of the items in both the YFAS 2.0 and mYFAS 2.0 would fail to meet the criterion [42]. Person separations were also evaluated to determine whether items in the YFAS 2.0 and mYFAS 2.0 distinguished enrolled participants with different levels of FA. A minimum person separation of 2 was expected [43]. Furthermore, Rasch analysis was also used to investigate the hierarchy of items in the YFAS 2.0 and mYFAS 2.0.

We examined the DIF (the extent to which the item hierarchies were inconsistent across groups) defined by gender (female vs. male) and BMI status (BMI ≥ 24 vs. BMI < 24 based on Taiwan norms) [44] to detect potential interactions between subgroups that might generate underlying bias in both the YFAS 2.0 and mYFAS 2.0. We computed the Rasch-Welch t-statistics to identify items that exhibited statistically significant DIF (p < .05). DIF contrasts less than 0.5 logit were considered negligible; contrasts between 0.5 and 1 logit were considered moderate; and contrasts over 1 were defined as substantial [45]. In order to support unidimensionality, we expected to have no more than 5% of the diagnostic criteria demonstrate substantial DIF [46]. Specifically, when detecting DIF within an assessment, a common practice is to use the item measure calibration derived from the Rasch analysis to yield a reference composite of the potential underlying secondary dimension [47]. When an item measures at least one secondary dimension (in addition to the main latent construct that the assessment was intended to measure) and two groups of participants differ in their underlying ability distribution of the secondary dimensions, then DIF occurs [48]. Therefore, ensuring that there was no DIF or limited DIF items within the YFAS 2.0 and mYFAS 2.0 is important to confirm the measurement unidimensionality. In this case, we expected that there would be less than 2 items with DIF on the YFAS 2.0 and zero to one item on the mYFAS 2.0 across the subgroups. Facets Version 3.84.0 was used to perform the Rasch analysis. Other descriptive statistics were conducted using IBM SPSS 28.0.

Results

Demographics

There were 974 participants enrolled. Among them, 578 (59.3%) were female. The mean age was 23.7 years (S.D. = 4.3), and the average BMI was 22.4 (S.D. = 3.6). Most participants were college students (69.7%; the remaining 30.3% were in other types of higher education programs such as master’s or doctoral programs), single (92.7%), non-smoking (69.6%) and non-drinking (65.3%). Detailed demographics can be found in Table 1.

Table 1 Participant demographics (N = 974)

YFAS 2.0

When examining the 35 YFAS 2.0 FA items, Rasch results showed that 3 of the 35 items (3/35 = 8.6%) misfit the Rasch expected values of MnSq and Zstd (Table 2). Two items (#1, #2) from the “Substance taken in larger amount and for longer period then intended” category and one item (#7) from the “Much time/activity to obtain, use, recover” category misfit, which exceeded the 5% threshold. The three most adopted items among the 35 symptoms were: item #2 “I continued to eat certain foods even though I was no longer hungry,” item #1 “When I started to eat certain foods, I ate much more than planned,” and item #15 “When I cut down or stopped eating certain foods, I had strong cravings for them.” The three least adopted items were: item #21 “I avoided social situations because people wouldn’t approve of how much I ate,” item # 34 “I was so distracted by thinking about food that I could have been hurt (e.g., when driving a car, crossing the street, operating machinery),” and item #33 “I was so distracted by eating that I could have been hurt (e.g., when driving a car, crossing the street, operating machinery).” The measure logits ranged from − 0.95 to 0.42. Table 2 offers additional details.

Table 2 Rasch analyses of the 35 YFAS2 symptoms

When examining the goodness-of-fit of the diagnostic criteria from the YFAS 2.0, all 11 met the Rasch expectation with no misfit items (Table 3). These findings confirmed the unidimensionality and supported the construct validity of the YFAS 2.0 when applying the diagnostic criteria. Given the results, it is suggested to use the diagnostic criteria instead of the raw symptom counts. The most adopted criterion was “Continued use despite social or interpersonal problems” while the least adopted criterion was “Much time/activity to obtain, use, recover.” The measure logits ranged from − 2.46 to 2.29. Please refer to Table 3 for additional details.

Table 3 Rasch analyses of the 11 YFAS2 diagnostic criteria

The person separation for the YFAS 2.0 was 3.14, which is associated with a person reliability of 0.91. These values indicated that the YFAS 2.0 items could distinguish enrolled participants with different levels of FA and exceeded the minimum person separation value expectation of 2, which resulted in 4.52 strata of respondents.

Substantial gender-related DIF was found for one of the 11 diagnostic criteria: “Failure to fulfill major role obligation (e.g., work, school, home).” The probability of item endorsement for this particular criterion was higher for male participants than female participants. The contrasts for criteria by gender (i.e., male vs. female) ranged from − 0.70 to 1.12. No BMI-related DIF was detected in the YFAS 2.0. The contrasts for criteria by BMI status (i.e., BMI ≥ 24 vs. BMI < 24) ranged from − 0.45 to 0.39. Please refer to Table 4 for more information.

Table 4 DIF for the 11 YFAS2 diagnostic criteria

mYFAS 2.0

Rasch analyses were also conducted on the mYFAS 2.0. We first examined the modified 13 FA symptoms. All items demonstrated great goodness-of-fit values to the Rasch model with the proper range of MnSq and Zstd values (Table 5), which met the criteria that the mYFAS2.0 had less than 5% of the misfit items. The most adopted item among the 13 symptoms was item #13 “If I had emotional problems because I hadn’t eaten certain foods, I would eat those foods to feel better,” while the least adopted items were tied between item #19 “My overeating got in the way of me taking care of my family or doing household chores” and item #33 “I was so distracted by eating that I could have been hurt (e.g., when driving a car, crossing the street, operating machinery).” The measure logits ranged from − 0.41 to 0.32. Table 5 offers additional details.

Table 5 Rasch analyses of the 13 modified YFAS2 (mYFAS2) symptoms

When considering the goodness-of-fit of the diagnostic criteria from the mYFAS 2.0, all 11 met the Rasch expectation (Table 6). The most adopted criterion was “Important social, occupational, or recreational activities given up or reduced,” which was the second most in the YFAS2.0. The least adopted criterion was “Much time/activity to obtain, use, recover,” which was the same as the result in the YFAS2.0. The measure logits ranged from − 2.26 to 2.16. Please refer to Table 6 for additional details.

Table 6 Rasch analyses of the 11 modified YFAS2 (mYFAS2) diagnostic criteria

The person separation for the mYFAS 2.0 was 2.17 with an acceptable person reliability of 0.82. These values indicated that the mYFAS 2.0 items could distinguish enrolled participants into 3.23 strata of respondents.

No substantial gender- or BMI-related DIF was detected for the mYFAS 2.0. The contrasts for gender (i.e., male vs. female) ranged from − 0.73 to 0.94. The contrasts for BMI status (i.e., BMI ≥ 24 vs. BMI < 24) ranged from − 0.66 to 0.42. Please refer to Table 7 for further details.

Table 7 DIF for the 11 mYFAS2 diagnostic criteria

Discussion

This study was designed for the psychometric assessment of the YFAS 2.0 and mYFAS 2.0 using Rasch analysis in terms of item fit and DIF. The current findings indicated that both scales have an acceptable structural validity without considerable DIF, making them appropriate measurements of FA. Also, we found both the diagnostic criteria and raw symptom counts included in the measures would be useful to indicate FA. Comparison between the two scales in term of item fitness indicated that the mYFAS 2.0 compared to its full version (YAFS 2.0) had fewer misfit items and no significant DIF for gender. Additionally, both scales did not show significant DIF related to BMI groupings. Implications are discussed below.

Several studies have investigated the construct validity of the YFAS measures using CTT methods [13, 14, 16, 17]. Koball et al. [12] assessed the dimensionality of the YFAS 2.0 using convergent and discriminant validity methods in patients seeking bariatric surgery and found that the scale appropriately measured FA as a unique construct. Consistent with our findings, they suggested that diagnostic criteria are appropriate for evaluating information obtained from the scale. In another attempt to validate the original YFAS scale that was developed based on DSM-IV criteria, Manzoni and colleagues reported a single-factor model in Italian university students based on results from confirmatory factor analysis [13]. They also, as we found, suggested some revision in several items due to low factor loading and high unusual correlation between a few items. These findings reveal that although the original YFAS scale developed by Gearhardt and colleagues in 2009 seemed to have acceptable psychometric properties [49], the newer versions (i.e., YFAS 2.0 and mYFAS) apparently have more appropriate validity and reliability, suggesting a formative evolution of the scales over time.

Other validations of the YFAS 2.0, including those conducted in college students in China and among primary-care clinic clients in Malaysia [9, 50], likewise confirmed the unidimensionality of the scale, congruent with our findings. However, our analysis has been done with interval values instead of ordinal ones, providing additional confidence in the results. However, Nantha and colleagues, when assessing the internal consistency of the scale using Kuder-Richardson α, found that the YFAS 2.0 had good reliability for both diagnostic-criteria and symptom-count versions [50]. We believe the seemingly conflicting results with ours may be attributed to different kinds of statistical analysis, recruitment of different target groups or other factors. A single factor solution for the mYFAS 2.0 has also been observed in a large sample of Brazilian people and a sample of college students in China using both exploratory and confirmatory factor analyses [9, 20].

Carr et al. [26] performed a study on a sample of university students in the United States to estimate measurement invariance of the YAFS 2.0 based on gender and race/ethnicity variables and found promising results supporting absence of significant invariance. However, they found that a single diagnostic indicator related to “efforts to cut down on tasty food” varied between gender groups. Similarly, we assessed the invariance using DIF and found the diagnostic criterion of the “failure to fulfill major role obligation” may differentiate responses from females and males. The diagnostic-criteria results in that study were different from our findings. This suggests that some diagnostic criteria assessed by the measure may exhibit different patterns based on gender in different people from different cultures and speaking different languages. Therefore, regarding the sensitivity of the gender variable, it is suggested that further evaluation of the criteria in other settings with different statistical methods be planned to detect any further potential DIF in other criteria or among different target groups.

In developing the modified version of the YAFS 2.0, Schulte and Gearhardt reported that both scales demonstrated relatively equivalent psychometric properties when assessing FA in different populations [19]. However, we found that the mYFAS 2.0 may have fewer misfit items and less DIF compared to the full version as captured by the Rasch analysis. Nevertheless, because the full version of the scale with more items may provide more detailed information on the 11 diagnostic criteria of FA, we recommend using the YFAS 2.0 when wishing to obtain thorough information. In contrast, the mYFAS 2.0 could be used in busy clinical settings. That is, we believe both scales are appropriate instruments, and their use should be based on the desired purposes of reliably gathering more information with greater respondent burden (using the YFAS 2.0) or less information with less respondent burden (using the mYFAS 2.0). Thus, multiple options exist given that both instruments are psychometrically sound.

Although we believe using a modern statistical method (namely Rasch modeling) and computing DIF may distinguish our study from former assessments of the psychometric properties of the scales, there are several study limitations that should be mentioned. First, we conducted the study in university students and faculty members without any diagnosed nutritional or psychological disorders. Because FA has been linked to other disorders, assessing the properties among in other participants, particularly those with eating disorders or addictive behaviors, may generate different findings, and this needs further study. Second, social desirability among educated people participating in the study may have influenced the gathered data (for example, with respect to response or desirability biases). Therefore, replication of the study among other people, including those with lower levels of education, is suggested. Third, we used the Chinese versions of the scales. Given potential cultural influences, further investigation of the scales in other countries and languages is needed to evaluate whether the scales may produce similar findings in other populations. Fourth, we chose an online platform to collect data. Although such platforms may help investigators collect data in a timely fashion, individuals who are not members of social networks or without access to internet and online services may not be represented. Therefore, the findings may not be generalizable to these groups. Using multiple methods, such as face-to face data collection and telephone interviews along with online surveys, may address this limitation. Another limitation may be related to sampling bias when some individuals may systematically have greater chance to be included through online recruitment particularly those individuals who are accessible during sampling. Finally, the security of online data acquisition as was done here using Google Forms requires further assessment. Based on the available information, Google uses strict security protocols to protect information. Nonetheless, this approach to data collection is a potential concern and limitation.

Conclusion

In conclusion, our study showed both the YFAS 2.0 and its modified version with single factor structures may be recommended as valid and reliable instruments to collect data on FA. Diagnostic criteria included in these scales are suitable indicators to capture information on various aspects of FA, and these criteria may provide more accurate data than symptom counts. Further investigation of the scales in people with food-related disorders or other psychiatric conditions or from other settings and cultures will contribute to a better understanding of how these measures may function in different settings to collect valid information on FA. Healthcare providers, including nutritionists, may benefit from using these scales to assess responses to interventions to treat addictive-like food consumption and investigate the predictive validity of the measures in clinical practice.