Background

Anorexia nervosa (AN) is a devastating illness with a high mortality rate. The standardized mortality ratio (SMR) calculates whether those in a given study population are equally, more or less likely to die compared to a reference population [1]. With an estimated SMR between 5.9 and 15.9 (i.e., 6–16 times excess mortality), AN is considered one of the deadliest mental disorders [2, 3].

Studies indicate that the overall incidence rate for AN has remained relatively stable (4% female lifetime-0.3% male lifetime) since the 1970s [2, 4]. The symptomology and presentation of AN have evolved along cultural lines; however, it is not simply a manifestation of modern cultural and social pressures. Accounts of deliberate self-starvation date back to the beginning of written history [5].

Although the exact etiology of AN is still unclear, a substantial body of evidence indicates that genetics plays a considerable role [6, 7]. Genetic studies dating from the late 20th century have shown that AN is highly familial. The lifetime risk of developing AN for female relatives of individuals with AN is 11 times greater than that for female relatives of individuals without AN [8]. Heritability (h2twin) estimates from twin studies range from ∼48–74% [9,10,11,12,13,14,15,16]. The large range in estimates may be due to the use of broader participant inclusion criteria in AN studies to increase study group size. Broadening the inclusion criteria results in a more heterogeneous sample and decreased heritability estimates, while narrowing the definition of AN yields higher and more consistent estimates [17].

Although recovery from AN is possible, for approximately 20% of affected individuals the condition takes on a more intractable phenotype [18, 19]. While AN symptoms vary from person to person, it has been suggested that a unique severe and enduring anorexia nervosa (SE-AN) subtype exists; however, aligning on clear defining criteria has proved challenging [20].

Since the 1980s, a small number of literature reviews of varying breadth and depth have been conducted in attempts to better define SE-AN. The most comprehensive to date, a 2017 review by Broomfield and colleagues identified illness duration and previous unsuccessful treatment as the criteria most often used in the literature to define AN severity [21]. A 2018 editorial by Hay and Touz, which referenced the Broomfield review, expanded the suggested criterion to include significantly diminished quality of life and narrowed the duration criterion to a minimum of three years and the therapeutic intervention exposure criterion to at least two previous evidence-based treatments [22]. In a 2021 follow-up review, with the aim of defining a neuropsychological profile for SE-AN, Bloomfield et al. identified intelligence, set-shifting and decision-making as features warranting further attention and noted that additional data are needed to align on defining severity criteria [23]. In short, there continues to be a lack of consensus on how to best define SE-AN.

Psychiatric illness is often diagnosed in a binary manner; an individual is assessed as either having the illness or not. In reality, due to their complex nature, psychiatric illnesses are better defined on a continuum [24, 25]. Genome-wide association studies (GWAS) often use a binary case-control design. However, as Yang et al. [26] noted, with an equal population sample size, a quantitative trait (for example, symptom severity) association study will have greater power than a case-control association study. The difference is because in a case-control study, an individual with mild symptoms is not differentiated from one with severe symptoms. Relating this to AN, there would be no differentiation between an individual who met the DSM-5 criteria for mild illness, of short duration and who was responsive to first-line treatment, and an individual who met the extreme illness criteria, with a duration of over a decade and lack of positive response to multiple treatment modalities. Delineating participants based on illness severity when performing genetic data analysis of those with AN may improve the chances of identifying significant variants.

The potential value of defining more phenotypically similar groups based on quantitative phenotypes and comorbidities in genetic studies of psychiatric illness has been demonstrated in major depressive disorder (MDD), schizophrenia, autism spectrum disorder (ASD), and obsessive-compulsive disorder (OCD) [27,28,29,30]. Individuals with more severe MDD symptoms have been found to have increased genetic risk for other psychiatric disorders [29], and polygenic risk scores (PRS) for schizophrenia correlate with symptom severity [28]. Genetic risk score (GRS), PRS and polygenic score (PGS) are the terms most often used in the literature when referring to values estimating an individual’s lifetime risk of developing a phenotype (disorder) based only on their genetics [31]. The scores are generated by combining the number of risk alleles at all the risk variants in an individual’s genome. Disease-associated risk variants are based on the latest and most comprehensive GWAS for the disorder at the time of the analysis.

Studies delineating and comparing subgroups of individuals with AN based on defined quantitative criteria may result in the discovery of rare variants associated with symptom severity, and individuals manifesting a more severe phenotype may be more likely to show higher heritability estimates and thus represent a subgroup of patients for which genetics findings may be beneficial. However, this hypothesis cannot be adequately tested to the rigorous standards required without a more precise definition of what constitutes a severe and enduring phenotype, and greater attention given to specifically identifying and including this group in genetic studies [32].

The aim of this review is to first, as an extension of the Broomfield et al. review [21], identify the criteria most widely used to describe the phenotypic severity of AN by including articles published since 2017 and, second, evaluate the genetics literature for inclusion of individuals meeting these criteria.

Methods

Delineating criteria for the severe and enduring anorexia nervosa phenotype

To better identify and delineate research participants manifesting a severe and enduring phenotype in the genetics literature, it was necessary to discern the most often used defining criteria for this subgroup of AN. The terms Anorexia Nervosa AND severe AND (Enduring OR Chronic) were used, with no year limit, to search titles and abstracts in PubMed, PsycINFO, and Web of Science. Articles were also limited to human subjects.

One of the articles identified was an extensive review by Broomfield et al. of how the literature labeled and defined AN severity up to 2017 [21]. The current search was limited to articles published after the Broomfield 2017 review to focus on the most recent literature. The references were not required to be attempting to empirically define a severe or enduring anorexia nervosa phenotype. The goal was to determine how those with a longer lasting and more severe clinical presentation are currently referred to in the literature. After removing commentaries on other references, clarifications, and updates from previous studies with the same authors and criteria, redundant references, and those not referring to a severe or enduring anorexia nervosa phenotype, 37 publications remained. Of these 37 publications, there were 22 research papers (6 clinical trials, 16 studies), 4 case reports, 6 expert panel/position papers/or opinion/editorial papers, 2 literature reviews and 3 general reviews. These references are listed in Table 1, along with a book chapter [33] identified through reviewing the references of the selected papers, that was not included in the Broomfield 2017 review, bringing the total publications included to 38. The mean age, mean BMI, duration of illness in years, and history of previous treatment, as well as any other measures of illness severity, were extracted from the articles and are shown in Table 1. A second reviewer, using the RANBETWEEN function in Microsoft Excel, selected 10% of the articles at random from Table 1. to review for meeting inclusion criteria and accuracy of the data extracted.

Table 1 Articles for Assessment of Severe Enduring Anorexia Nervosa (SE-AN) criteria

Articles were reviewed to determine which criteria are used most often in the literature in regard to the severe enduring phenotype. Specifically, articles with a central purpose of better defining a severe and or enduring/chronic AN phenotype or the need for better treatment options (for example [34, 35]), and articles including case studies or participants in one or more study groups defined as having a severe and or enduring/chronic AN phenotype (for example [36, 37]) were included. The tabulation from the Broomfield review was combined with the current total. Given that the four Dalton articles referenced the same data, they were counted as only one reference. The results are outlined in Fig. 1.

Fig. 1
figure 1

Number of references from Table 1 representing the specific duration of illness, number of previous unsuccessful treatments and body mass index (BMI) subgroups indicated either in defining severe and enduring anorexia nervosa or as inclusion criteria for participants. The totals indicated include both the references from the 2017 Broomfield review [21] and the current work

Literature review: inclusion of participants meeting the severe and enduring AN phenotype in genetics research

The search outlined in this section followed the process depicted in the PRISMA flow diagram [38] in Fig. 2, which captures the literature selection flow. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist was utilized [39]. The goal was to assess whether participants meeting the criteria identified as the most widely used to define a severe and enduring phenotype are being included in genetics research, and, if included, whether these participants were assessed as an independent group.

Fig. 2
figure 2

PRISMA flow diagram for the literature search

The terms Anorexia Nervosa AND (genetic OR gene OR hereditary) in titles and abstracts were used for the following searches. Articles were limited to human subjects, and review articles were excluded. The goal was to be as inclusive as possible in the initial searches of each database. The search was limited to the last decade of published literature to assess current practices in genetics research. This span of time encompasses the five years leading up to and following the identification of the first genome wide significant locus for AN [40] and the publication of Broomfield et al., both of which were published in 2017. The inclusion dates were as follows: PubMed, 1-Jan-2012 to 6-Oct-2023 (date of search); PsycINFO, 1-Jan-2012 to 10-Oct-2023 (date of search); and Web of Science, 1-Jan-2012 to 12-Oct-2023 (date of search).

Searches of PubMed, PsycINFO and Web of Science conducted with the search criteria resulted in 240, 206 and 235 hits, respectively. Titles and keywords were reviewed, and 277 articles were eliminated for redundancy (see “identification” in Fig. 2). During the first screening, the abstracts for the remaining 404 were reviewed, and 211 were eliminated for the reasons depicted in the PRISMA diagram (“Records selected for Review 1”). The remaining 193 publications progressed to the second screening.

In the second screening, noted as “Records selected for Review 2” in the PRISMA diagram, the methods sections of the remaining 193 articles were reviewed for details on age, psychological assessments, anorexia subtype, duration of illness, prior treatment history, and other indications of disease severity. Studies did not need to specifically call out a subgroup of participants as being severe and or enduring; however, those not including participant data for at least three of the following four criteria were eliminated because they did not provide adequate information for the assessment of participant phenotype severity and intractability: (1) duration of illness; (2) body mass index (BMI); (3) prior treatment history; and (4) severity as measured by one or more clinical, social, or psychological scales. This resulted in the elimination of an additional 115 articles. A total of 78 articles were ultimately included in the information extraction process; the results are presented in Table 2.

Table 2 Evaluation of the inclusion of SE-AN criteria in genetic studies of anorexia nervosa. The type of AN refers to the anorexia grouping utilized by the authors, for example restricting, binge-purge, weight restored. Other EDs refer to other types of eating disorders included in the study, for example, bulimia nervosa, eating disorders not otherwise specified and binge eating disorders. Age and BMI refer to the mean age or BMI of the study participants, respectively, unless otherwise indicated. The abbreviations are detailed and annotated in the legend

The data were extracted by reviewing both the methods and results sections of each paper for the following participant data: (1) mean duration of illness in years; (2) mean BMI in kg/m2; (3) prior treatment history; (4) and severity as measured by one or more clinical, social, or psychological scales. Participant gender, mean age, and groups of eating disorders included in the studies (i.e., AN-restricting, AN-binge purge, bulimia, binge eating) were also extracted. A second reviewer, using the RANBETWEEN function in Microsoft Excel, selected 10% of the articles at random from Table 2 to review for meeting inclusion criteria and accuracy of the data extracted.

Results

Defining severe enduring anorexia nervosa in the research literature

A review of the literature revealed that the terms severe, chronic, and enduring identified by Broomfield et al., in 2017 [21] continue to be widely used to label the more intractable AN phenotype. How these labels are defined in the literature, when they are defined, continues to vary greatly. The age of study participants, BMI, duration of illness, and previous treatment history were extracted from each reference and are recorded in Table 1.

The primary inclusion criteria presented in the articles reviewed were as follows:

  1. 1.

    Duration:

The Broomfield review [21] identified duration as the primary criterion used to define the severe and enduring AN phenotype, and this continues to be true. Several articles reviewed included duration of illness as a criterion for inclusion in their study or clearly delineated a subgroup using duration as one criterion. The stringency of how duration was measured varied.

In their audit of care received by patients with “early stage” versus “severe and enduring” AN, Ambwani et al. [36] defined a duration of < 3 years for early stage and ≥7 years for severe and enduring AN, as recommended by Robinson et al. and Touyz et al. [41, 42]. This was also the case for Calugi et al. [43], who used ≥7 years in their study of cognitive behavioral therapy effectiveness. The patient described in the case study by Voderholzer et al. [44] had AN for seven years. In the four papers by Dalton et al. studying the impact of transcranial magnetic stimulation on severe and enduring AN, the duration inclusion criterion for study participation was ≥3 years of AN symptoms [45,46,47,48]. Whereas Knyahnytska et al. [49] included a duration of > 5 years as a criterion for treatment resistance in their insula H-coil transcranial stimulation therapy study. In the selection of a subset of participants from the Anorexia Nervosa Genetics Initiative (ANGI) to include in their assessment of the polygenic association of severity and long-term outcome in AN, Johansson et al. [50] included in their criteria for the severe enduring subtype a ≥ 5 year follow-up time, defined by the authors as years between initial registration and ANGI recruitment. Finally, in two of the three studies evaluating the effectiveness of deep brain stimulation, an illness duration of ≥ 10 years was required for participant inclusion [51, 52], with the third requiring > 7 years [53]. Case study, clinical trial and study participants included in groups indicated as manifesting a severe and enduring phenotype tended to have illness of longer duration. For example, participants in the Calugi et al. [43] study had a mean duration of 12.3(4.7 SD) years, and the three case study subjects had illness durations of 7 [44], 11 [54], 25 [55], and 26 [37] years.

Position papers, commentaries, and reviews also varied greatly in defining duration requirements. For example, in their German language case study on palliative care for severe AN, Westermair et al. [56] proposed a long duration of illness, e.g., 10 years, as a criterion, whereas Hay and Touyz [22] and Herpetz-Dahlmann [57] used a duration of > 3 years. Other authors fell between the two extremes; Bianchi et al. [58] defined severe and enduring AN participants as those who had the disorder for six years or more, and Marzola et al. [59] used a seven-year demarcation. However, these two papers also proposed that duration should not be used alone when defining AN severity. The usefulness of duration as a criterion was also questioned by Wildes et al. [60]. In an attempt to define the severe and enduring phenotype empirically, Wildes found no evidence for a chronic subgroup of AN, instead proposing that this group may be better classified on the basis of impact on quality of life and severity of injurious behaviors. As indicated in Fig. 1, a duration of 7 or more years was used most frequently, followed by 10 years.

  1. 2.

    Severity:

Body mass index (BMI):

The DSM-5 defines four levels of AN severity: mild, BMI greater than 17 kg/m2; moderate, BMI of 16–16.99 kg/m2; severe, BMI of 15–15.99 kg/m2; and extreme, BMI of less than 15 kg/m2 [61]. Once again, the literature indicates a wide range of BMIs in articles attempting to define severe and enduring AN and/or for participation in studies targeting this group of individuals. The two studies of deep brain stimulation with duration criteria of ≥ 10 years for participation also had BMI requirements falling into the DSM extreme category [51, 52]. Deep brain stimulation involves a high degree of risk, and the authors delineated that only individuals with the most severe cases should be included. Similar to duration of illness, participants included in groups indicated as manifesting a severe and enduring phenotype in case studies, clinical trials and studies, tended to have substantially lower BMIs than required per the inclusion criteria. For example, participants in the Bemer et al. bone mineral density (BMD) study had a mean BMI of 12.60 ± 1.60 kg/m2, which was well below the < 16 kg/m2 criteria [62].

Notably, several studies included a low weight cutoff for participation. For example, in their transcranial magnetic stimulation studies, Dalton et al. [45,46,47,48] required a BMI > 14 kg/m2 for participation. The reason provided in the study protocol for the low weight cutoff was “safety precaution” [63]. The deep brain stimulation studies conducted by Park et al. [64] required that participants be severely underweight but with a low-weight BMI criterion of > 13 kg/m2. Although reasons were not given for the low weight cutoff, they stated that participants needed to have a BMI > 13 kg/m2 for surgery, which is understandable given its invasive nature.

Again, as with duration of illness, the literature suggests that BMI should not be used as the sole determinant of severity in AN. In their editorial on the challenges of defining severe and enduring AN, Hay and Touyz [22] recognized the utility of the DSM-5 BMI severity categories but also noted that for those with unremitting AN for a decade or more, having a BMI above the DSM severe range is still associated with marked morbidity.

Psychological assessment:

All the studies reviewed included an assessment of symptoms such as psychological stress, disordered eating, depression, anxiety, obsessiveness, and quality of life. For example, Wildes et al. [60], used the Research and Development Corporation (RAND) 36-Item Health Survey 1.0 (SF-36) to measure health-related quality of life, and found that these scores better classified AN subgroups than BMI and duration of illness. A score of ≤45 on the Global Assessment of Functioning (GAF) found in the DSM-4, which assesses the severity of mental illness [65], was used by Oudijn et al. [51] for inclusion in their deep brain stimulation studies. A plethora of tools was used in assessing eating disorder pathology, with the Eating Disorder Examination Questionnaire (EDE-Q) [66] and/or various iterations of the EDE-Q being the most prevalent.

  1. 3.

    Treatment response:

Lack of positive response to prior treatment, variously described as treatment resistance, treatment refractoriness, and failure to respond, was also included in assessing AN severity in several of the articles. The number and type of previous treatments required for inclusion in studies varied. For inclusion in deep brain stimulation studies, Park et al. [67] required a lack of positive response to ≥2 “typical modes” of treatment, as did Oudijn et al. [51]. The participant inclusion criteria used by Dalton et al. [48] for transcranial stimulation studies included the need to have completed at least one “previous course of National Institute for Health and Care Excellence” recommended “specialist psychotherapy or specialist day-patient or inpatient treatment”. The clearest classification criterion for treatment resistance was proposed by Hay and Touyz et al. [68]: “exposure to at least two evidence-based treatments delivered by an appropriate clinician or treatment facility together with a diagnostic assessment and formulation that incorporates an assessment of the person’s eating disorder health literacy with an assessment of the person’s stage of change”, which was referenced in the reviews of treatment options for those with severe enduring AN by Zhu et al. and Wonderlich et al. [20, 69]. In contrast, Smith and Woodside [70] defined treatment resistance as “patients with two or more incomplete inpatient admissions and no complete admissions”. Emphasis was placed on patients failing to complete treatment rather than the treatment failing to help patients, although the authors did note that approximately 10% of patients treated at their inpatient facility were “unable to benefit”. As indicated in Fig. 1, the criterion of two or more treatment attempts was most frequently used.

In summary, the literature indicates that a combination of assessments and criteria, including an illness duration of ≥ 7 years, lack of positive response to at least two previous evidence-based treatments, a BMI meeting the DSM-5 for extreme AN, and an assessment of psychological and/or behavioral severity indicating a significant impact on quality of life, were the most prevalent means of defining the severe and enduring AN phenotype. As the DSM-5 includes clear definitions of severe and extreme BMI (15–15.99 kg/m2 and < 15 kg/m2, respectively), the criteria for severe BMI were also used in assessing the genetics literature in the following section.

Inclusion of participants meeting severe enduring anorexia nervosa-defining criteria in studies of anorexia nervosa genetics

The 78 articles identified as meeting the search criteria defined in the methods section were assessed for whether the following inclusion criteria were used and how they were defined:

  1. 1.

    Duration of illness,

  2. 2.

    Prior treatment history,

  3. 3.

    BMI, and.

  4. 4.

    Severity as measured by one or more clinical, social, or psychological scales.

As mentioned previously, neither the statistical strength of the studies nor the study outcomes were assessed, as the purpose was to determine whether genetic studies included those meeting the severe and enduring phenotype criteria defined in the first aim through assessing prevalence of use in the literature. The studies consisted of Genome-Wide Association Studies (GWAS) as well as analyses of polymorphisms, expression, and gene methylation, including but not limited to the leptin (LEP) and the leptin receptor (LEPR) genes, the fat mass and obesity-associated gene (FTO), and the oxytocin receptor (OXTR) gene [16, 71,72,73]. The gender of the study participants was also recorded where reported (Table 2).

Most of the 78 articles, including those specifically stating that the study was of severe AN, did not include criteria defined in the first aim. Most notably, only one article specifically stated that participants included had at least one prior treatment attempt [50].

Of the 71 studies reporting mean BMI, the mean BMI for all groups was 15.73 kg/m2 (SD 1.48). For 15 studies (21%), the mean BMI was > 17 kg/m2 (mild DSM-5). Sixteen studies (22%) had a mean BMI of 16–16.99 kg/m2 (moderate DSM-5). Twenty-three studies (32%) had a mean BMI of ≤15.99 kg/m2 (severe DSM-5), and 17 studies (21.8%) included at least one group with a mean BMI of ≤15 kg/m2, required to meet the DSM-5 definition of extreme AN. Only one study included a lifetime minimum BMI of ≤15 kg/m2 as an inclusion criterion [74].

The duration of illness and or minimum duration required for inclusion in studies were reported for 23 (29%) of the 78 articles. Of those 23 studies, 3 (13%) had participants with a mean duration of illness ≤ 3 years, 12 (52%) had a mean of 3.1–6.99 years, and 6 (26%) had a mean of ≥ 7 years. Five of the 23 studies required a duration of illness ≥3 years as a participant inclusion criterion. None of the articles identified required duration of illness ≥7 years as an inclusion criterion.

Assessment of psychological stress, disordered eating, depression, anxiety, obsessiveness, and quality of life was another facet of defining the severity of AN in the studies evaluated. Across the 54 studies identifying defined assessment modalities, 38 different tools, checklists and guidelines were used in various combinations, including the following: Hamilton Anxiety Rating Scale (HARS), Clinical Global Impression anxiety scale (CGI), State-Trait Anxiety Inventory form (STAI); depression: Beck Depression Inventory (BDI), Children’s Depression Inventory (CDI), Montgomery-Asberg Depression Rating Scale (MADRS); alexithymia: Toronto Alexithymia Score (TAS); obsessive-compulsive and impulsive symptoms: Young-Brown Obsessive-Compulsive Symptoms (YBC-EDS), Leyton Obsessional Inventory-Child Version (LOI-CV); Barratt Impulsiveness Scale (BIS); and perfectionism: Child and Adolescent Perfectionism Scale (CAPS). Numerous eating disorder assessment tools, including the Eating Disorders Inventory (EDI), Eating Disorder Examination Questionnaire (EDE-Q), Eating Attitudes Test (EAT), and the Structured Interview for Anorexia and Bulimia Nervosa (SIAB) were also used. Table 3 shows a list of tools and how often they were used.

Table 3 List of tools ranked by use

Historically, the focus of AN research has been on teens and young adults. The current assessment found that, of the 71 studies in which the mean age was reported or could be calculated, the mean of the mean ages reported for study participants was 20.9 (4.26 SD) years. Furthermore, the reported mean age of study participants in 36 (51%) of the 71 studies was ≤19.9 years, 21 (30%) had a mean age of 20-24.9 years, 14 (20%) had a mean age of 25-29.9 years, and only one study had an overall group mean age of ≥ 30 years, although eight studies included individual groups with means ≥ 30 years. Figure 3 provides a summary of the BMI, age and duration findings discussed above.

Fig. 3
figure 3

Number of articles in Table 1 representing the body mass index (BMI), age and duration subgroups indicated. NR = Not reported. A. BMI: 71 of the 78 articles reported BMI (kg/m2), 17 of those 71 had participant mean BMI ≤ 15; Age: 72 of the 78 articles reported age, of those 72, one had a mean participant age over 30 years; Duration: 23 of the 78 articles included duration, of those 23, 6 had participant mean illness duration of ≥ 7 years

Incidence rates for AN are reported to be ten times lower in males, although this is considered an underestimation due to underreporting and underdetection [2]. Only 16 (20%) of the 78 studies included male participants.

Based on the min/max and standard deviations of the mean provided for duration of illness and BMI, it was clear that many of the articles included subsets of individuals meeting the criteria noted herein for severe and enduring AN. However, as data for those specific individuals were often not delineated, it was not possible to determine how the study conclusions may have differed for said subgroups. For example, the mean duration of illness reported by Hernández et al. [75] for the AN restricting type (AN-R) subgroup was 4.03 (4.44 SD) years, indicating that at least some of the participants met the duration criteria.

Nevertheless, there were examples of results being assessed against some measures of severity, including duration. The Booij et al. study [76] AN-R group participant duration of illness was 54.9 (30 SD) months; range: 12–84. They specifically assessed methylation against the cumulative duration of illness and observed associations between duration and methylation levels at 142 probes. The mean duration of illness in the AN-R group in the Steiger et al. study [77] was 96.00 ± 98.91 (12–456) months. They also assessed duration and found an association between chronicity of illness and methylation status at 64 probes mapping to 55 genes.

Other authors evaluated genetic correlation with the severity of various psychological assessments including quality of life, depression, food behaviors, anxiety, and obsessiveness [75, 77,78,79,80,81,82,83,84,85,86,87,88,89,90]. For example, Acevedo and colleagues found a correlation between specific single nucleotide polymorphisms (SNPs) of the oxytocin receptor gene (OXTR), and increased severity of eating disorder symptoms in those with AN [78]. A polymorphism in the promotor region of the serotonin transporter gene (5-HTTLPR), previously associated with stress and depression [91], may impact depression and long-term outcomes in those with AN [79]. Research also suggests a possible correlation between specific haplotypes of the DHEA-producing enzyme cytochrome P450 CYP17A [81] and the C861 allele of the serotonin receptor 1Dβ gene (HTR1B) and severity of anxiety in those with AN.

An example of potential utility in assessing the severe and enduring AN phenotype and the need for larger studies and more funding is the 2022 study by Johansson et al. [50] evaluating polygenic association with AN severity and long-term outcomes. Here, the authors delineated severe and enduring AN criteria, including duration of illness, clinical impairment, BMI, and having undergone at least one previous treatment attempt. They also specified requirements for the AN subtype, thereby narrowing the population. The study, which included 2843 participants followed for up to 16 years (mean: 5.3 years), provided evidence supporting the possible clinical utility of PGSs for assessing eating disorder risk but also noted the need for larger studies and sample sizes to increase statistical power.

In summary, based on the literature reviewed, genetic studies of AN continue to focus largely, but not exclusively, on younger female participants with shorter durations of illness. These findings are not surprising given that the majority of those diagnosed with AN are female, the lack of clearly defined criteria for severe and enduring AN and the need for large numbers of participants to assess significance in genetics research.

Discussion

Attempts to provide criteria for labeling those with severe mental illness as chronic or treatment-resistant need to be executed with care, as has been critically reviewed for illnesses such as schizophrenia and depression [92, 93]. Care should also be taken when defining criteria for severity of AN, which has a higher mortality rate than depression or schizophrenia [94]. However, not defining AN severity more clearly and not focusing on a more severe and enduring phenotype in research may decrease the likelihood of identifying the possible underlying biological etiology of AN. As noted by Wonderlich et al. [20] and responding commentaries by Dalle Grave [95], Wildes [96], and McIntosh [97], a lack of consensus and studies specifically targeting those with severe and enduring AN has resulted in patients being subjected to repetitive employment of largely ineffective treatment strategies resulting in a sense of hopelessness and shame and increasing the risk of suicide [98]. This review of the literature found that a duration of illness ≥7 years and an unsuccessful response to previous evidence-based treatment were the most common inclusion criteria employed, as were various measures of psychological and physical severity.

AN was once thought to be primarily caused by dysfunctional family dynamics and social and cultural pressures [99]. We now have evidence that genetics plays a significant role in its etiology. In recent years, there has been an evidence-based push to reconceptualize AN as a metabopsychiatric disorder [7]. Functional magnetic resonance imaging (fMRI) continues to provide data on the functioning of the brains of those with AN [100]. The use of large-scale GWAS and genome-wide methylation studies has been gradually revealing the interplay between genetics and environment in AN etiology and persistence, and genetic correlations with other psychiatric disorders [16, 101, 102]. These are all positive advances; however, as evidenced by the individuals included in these studies, female teens and young adults with shorter durations of illness appear to be the primary participants.

Historically, males have been underrepresented in AN research [103]. Until 2013, the DSM listed amenorrhea as a criterion for AN, thereby reinforcing the notion that AN affects only females [61]. According to the literature reviewed, males continue to be underrepresented in AN research.

The challenge of recruiting participants for inclusion in large-scale genetic studies of AN is significant. Of the indicated criteria, the most challenging for researchers to assess is the lack of response to prior evidence-based treatment. Most of the treatments described as evidence-based are not administered according to a defined protocol, making retrospective assessment nearly impossible. Furthermore, those with more severe symptoms of longer duration are often treated in a plethora of settings over many years.

For many of the publications, the data indicate that there were participants meeting the criteria defined in the first aim. However, as these individuals were not assessed as a group, it was not possible to determine whether outcomes for this subset may have differed from those with a less severe presentation. The purpose of the publications that either did not perform these assessments or did not report them in their studies was not to delineate this level of detail, so their absence is understandable. One of the reasons for this may be the small number of individuals meeting the criteria for severe and enduring AN, coupled with the need for a large enough “n” to provide any meaningful statistical assessment, which in turn points back to the need for larger studies and additional funding.

Nevertheless, several studies made concerted efforts to focus on a defined severe and enduring phenotype. For example, Kushima et al. [74] limited their study cohort to those reporting a lifetime lowest BMI < 15 kg/m2, with the median for included participants reported as 11.3 kg/m2, and a mean age of 37.9 years. The authors specifically stated that they focused on the “severe subgroup of patients because patients with severe symptoms or treatment-resistance are more likely to carry rare deleterious variants of large effect”, citing a schizophrenia study [104] as support.

The ultimate goal of AN research is to identify contributing factors to the manifestation and intractability of the disease and, in turn, develop superior evidence-based treatments tailored to the patient. Will next generation sequencing gene panels help in the diagnosis of AN [105]? Kushima et al. [74] suggested that rare copy number variants associated with neurodevelopmental disorders may correlate with more severe eating disorder subtypes. Is it possible to identify those at higher risk of developing severe and enduring illness earlier and in turn treat those patients based on their specific genetic and environmental circumstances instead of employing generic therapy that may work for most patients with eating disorders but is less effective for those in this cohort? Can artificial intelligence be employed to better identify risk in individuals with AN [106]? Will we one day regularly employ genetic testing and pharmacogenetics in treating mental illness, including AN [107, 108]? Several international projects, including ANGI and the Comprehensive Risk Evaluation for Anorexia Nervosa in Twins (CREAT) are attempting to answer these questions and many more [109, 110]. Although these projects do not focus specifically on the severe and enduring phenotype, the availability of in-depth participant health and demographic information paired with genetic analysis should allow for studies of these subsets.

The criteria for evaluating the severity and intractability of AN are evolving, as is the understanding of the disorder. The purpose of a scoping review is to map the literature on an evolving topic and to identify gaps. As such, unlike a systematic review, this review does not attempt to assess the quality of the research conducted, but rather the inclusiveness of study participants. The authors do not attempt to define the severe and enduring phenotype or suggest how the research community should create consensus on the definition. However, by assessing the current literature, we highlight the gaps between the intent to focus on those with severe and enduring AN and the inclusion of this group in published research.

Conclusion and future directions

In conclusion, this review provides an overview of the currently used criteria employed by the research community to define the severity of AN and assesses the last decade of genetics research for the inclusion of study participants meeting these criteria. We found that the following combination of assessments and criteria was used most often in the literature to define AN severity and intractability:

  1. 1.

    Illness duration of ≥ 7 years.

  2. 2.

    lack of positive response to at least two previous evidence-based treatments.

  3. 3.

    A BMI meeting the DSM-5 criteria for extreme AN.

  4. 4.

    An assessment of psychological and/or behavioral severity indicating a significant impact on quality of life.

We also found, especially in recent years, that there has been an attempt to better define severe and enduring AN in hopes of identifying patients, tailoring treatment, and improving outcomes. However, although a small subset of genetic studies reviewed specifically attempted to focus on a severe and enduring phenotype, there was a lack of aligned defining criteria. Furthermore, there is a continued focus on younger females with shorter disease durations.

Those with AN are often stigmatized, and their shame is amplified by the perception that AN is voluntary or even a lifestyle choice [111,112,113]. Those with severe and long-lasting illness are less likely to respond to currently available treatment modalities and have higher levels of mortality [20]. However, they also represent a subgroup of individuals for which genetic findings may be especially helpful [74]. Therefore, it is suggested that future genetics studies make a concerted effort to include older participants, those with longer illness durations, and those whose quality of life is most significantly impacted. It is also critically important that more objective, empirically based techniques, such as biomarker and brain structure and function analysis, be developed to more definitively classify the severe and enduring phenotype, which to this point has primarily been categorized through subjective means [32, 60, 96, 114]. There has been considerable effort in recent years to expand the definition of AN in hopes of being more inclusive and identifying those who may benefit from treatment. However, although expansion has increased the sample size for genetic studies, it could be that focusing on those with longer-lasting and more severe symptomology, even though this is a much smaller group of those with AN, would provide a better chance of identifying the genetic etiology of the disorder. Recent advances have left us far better equipped to make significant progress in developing evidence-based treatments for those with severe and enduring AN. However, these advances require the inclusion of this subgroup in both research and practice.

Limitations

One limitation of the current review is that due to the wide range of similar terminology used to refer to a severe and enduring AN phenotype in the published literature, the searches performed may have left out pertinent articles and viewpoints. Furthermore, although comprehensive for the three electronic databases, the literature search did not include gray literature; thus, information from sources such as dissertations may have been missed.