FormalPara Key Points

Inflammatory bowel disease has a strong genetic background with more than 240 identified susceptibility loci.

Despite the identification of various risk genes, a substantial impact of genetic risk assessment on therapeutic strategies and disease outcome is still missing.

Precision medicine might become a guidance tool for inflammatory bowel disease drug dosing or drug sparing if patients with a low risk of relapse could be identified adequately.

1 Introduction

Inflammatory bowel disease (IBD) refers to a chronic inflammation of the gastrointestinal tract with its main representatives Crohn’s disease (CD) and ulcerative colitis (UC). While UC is limited to the mucosal layer of the colon, CD is a transmural inflammation that may affect any part of the gastrointestinal tract between the mouth and anus [1]. Both forms present with a chronic relapsing pattern of abdominal pain and diarrhea. Diagnosis is made based on clinical, endoscopic, histological, laboratory, and eventually radiologic findings [1]. Both sexes appear to be equally affected [2]. Inflammatory bowel disease can develop at any age with a clustering between the age of 15–30 years, and a second peak in the elderly population [2]. Historically, IBD was considered a disease of the industrialized Western world, including North America, Europe, and Oceania [2]. However, while prevalence remains highest in Western civilizations with over 3 million people with a diagnosis of IBD in the USA and Europe, latest analyses show increasing incidences in South America, Asia, and Africa [3, 4]. In the twenty-first century, IBD can therefore be considered a global epidemic. Contrasting a tremendous progress in the understanding of IBD, its exact pathogenesis remains at best partially explained. Inflammatory bowel disease with its subtypes UC and CD is likely driven by a complex interaction of environmental factors with a dysregulation of the immune system in response to an altered intestinal flora in genetically predisposed individuals [5]. Since the first introduction of an anti-tumor necrosis factor (TNF) agent in 1998, the armamentarium of IBD therapeutics has dramatically expanded in the last 20 years, with the recent approval of the first IBD-specific small molecules, the sphingosine-1-P modulator ozanimod and the Janus kinase inhibitors tofacitinib, filgotinib, and upadacitinib [6,7,8].

With more available therapeutic options, treatment decisions become more complex, with still many patients experiencing debilitating disease course and a loss of response to treatment over time. With a better understanding of the disease, more effective personalized treatment strategies are looming on the horizon. Genotyping has long been considered a strategy for treatment decisions, such as the detection of thiopurine S-methyltransferase and nudix hydrolase 15 polymorphisms before the initiation of azathioprine [9, 10]. However, although many risk genes have been identified in IBD, a substantial impact of a genetic risk assessment on therapeutic strategies and disease outcome is still missing [11]. In this review, we discuss the genetic background of IBD, with a particular focus on the latest advances in the field and their potential impact on management decisions.

2 Genetic Background of IBD

Analyzing the Swedish twin registry, Tysk and colleagues set a milestone with regard to the genetic basis of IBD [12]. This landmark study was followed by various twin and family studies further supporting the genetic background of both UC and CD. However, despite numerous studies performed, an accurate genetic risk assessment is still difficult as rates vary widely in the literature with a positive family history being reported between 1.5 and 28% in CD and between 1.5 and 24% in UC [12,13,14]. Simplified, first-degree relatives of patients with CD have an eight-fold higher risk of developing IBD, while first-degree relatives of patients with UC have an approximately four-fold higher risk, making a positive family history of IBD the strongest risk factor for IBD [15]. Concordance rates in monozygotic and dizygotic twins reach up to 50% and 10% in CD compared with up to 15% and 4% in UC, suggesting a lower heritability in UC than CD [16, 17]. Interestingly, a nationwide and population-based study from Denmark demonstrated a relevant impact of non-genetic exposures, such as exposure to antibiotics and prior ankylosing spondylitis, on the development of new-onset IBD in families with pre-existing IBD [18]. Wide variations between genetic studies might be explained by their heterogeneity with regard to study design and population selection. Of note, results from twin studies contrast the much lower heritability estimated from genome-wide association studies (GWAS, see below). This might be partly explained by the shortcomings of heritability estimation, where genetic and environmental variance are considered as separate unities and epigenetics are neglected. Nevertheless, early family and twin studies were able to unravel the relative contribution of genetic and environmental influences in the etiology of IBD, with none of the genetic background explaining more than 50% of disease prevalence [16, 17, 19, 20]. Data from the UK Biobank even show that in patients with a high genetic risk of adult-onset IBD, the risk may be reduced by half in patients with adherence to a favorable lifestyle [21].

2.1 GWAS

Whole genome linkage analyses highlight co-segregation of certain genetic loci for common complex diseases, thereby uncovering Mendelian traits within families or ethnicities [22]. In IBD, application of whole genome linkage enabled the detection of nucleotide-binding oligomerization domain-containing protein 2 (NOD2) as a highly susceptible gene for CD [22]. In 2005, GWAS emerged representing a very efficient method to identify alleles with weak and inconsistent replication that are commonly seen in complex diseases such as IBD [22, 23]. Genome-wide association studies explore the genome focusing on single-nucleotide polymorphisms (SNPs) that are defined genome variations in DNA sequences. Several millions of SNPs are genotyped on platforms to uncover specific SNP variants that more often occur in a person with a trait than in one without [24]. As mentioned above, both GWAS and twin studies have been used to calculate heritability showing remarkably different results [25]. This gap, also known as missing or hidden heritability and not only seen in IBD, might have multiple explanations, such as the over-simplification of genetic identity, environmental assumptions, and not considering epigenetics [23].

Since its first implementation, numerous different study populations varying in ethnicity and geographic location have been investigated using GWAS, resulting in the detection of roughly 240 susceptibility loci in IBD. Of these SNPs, the majority were associated with both UC and CD, assuming similar disease liability and common signaling pathways. Identification of mechanistic pathways was succeeded by expansion of disease knowledge of intestinal homeostasis, most importantly, regulations in epithelial and barrier function, adaptive immunity, a host-pathogen interaction such as an immunity response, as well as autophagy [23, 26, 27]. Furthermore, over 60% of the susceptibility loci are associated with other immune-mediated diseases such as psoriasis, celiac disease, and ankylosing spondylitis [1, 28]. The strongest genetic effects have been seen for interleukin-23 receptor (IL-23R) in IBD, NOD2 in CD, and human leukocyte antigen (HLA) complex in UC [29].

Genome-wide association studies have several limitations. One limiting factor is linkage disequilibrium of the human genome [19]. Linkage disequilibrium refers to alleles (two or more) of SNPs that are non-randomly associated with each other and therefore tend to be passed on together [30]. Genome-wide association studies are thought to identify SNPs through linkage disequilibrium in the case where a significant gene does not lie within the examined region of the genome. If linkage disequilibrium is insufficient, SNPs may not be inherited with the coded gene and therefore key genes might not be captured. Indeed, not every polymorphism can be covered by GWAS. Roughly 30% of the genes are not detected; in particular, rare SNPs with high penetrance or common SNPs with low penetrance might be missed [19, 30]. In addition, it is known that the contribution of common variations to disease liability is small, with odds ratios mostly between 1.05 and 1.5 [22]. Furthermore, even highly associated GWAS SNPs might not be causative. As a consequence, the question of functionality and causality remains unresolved. Nevertheless, GWAS expanded our knowledge far beyond a basic genetic influence on human diseases and their potential has probably not been fully unlocked yet.

2.1.1 NOD2

NOD2 is located on chromosome 16p21 [31]. NOD2 wild type acts as an intracellular bacterial sensor regulating immune response such as defense or tolerance against/to pathogens. It is expressed in different cell types such as epithelial cells, endothelial cells, macrophages, T cells, and in particular, ileal Paneth cells, thereby approaching immune response in various ways. NOD2 mutations were the first to be described as associated with CD. Until now, NOD2 confers the greatest risk in the development of CD, out of all genetic variations known so far [31,32,33]. Moreover, studies suggest that disease characteristics, particularly earlier onset of CD and an increased severity of disease course, are influenced by NOD2 polymorphisms [34, 35]. Disease-related NOD2 is remarkably less active and can even result in complete unresponsiveness to bacterial pathogens [33]. The most frequent variants in NOD2 are R702WG908R, and 1007fs. All of these variants appear to influence protein function and signaling inflammatory pathways through impaired activation of the NF-kB pathway or reduced/absent cytokine production in response to pathogens [36]. Carriers of one of these disease-related NOD2 alleles have an approximate two- to four-fold increased risk of developing CD, while individuals with homozygous or compound heterozygous mutations have a 15- to 40-fold increase in CD risk. However, mutated NOD2 alleles occur in 0.5–2% of healthy individuals and about 60% of patients with CD carry no NOD2 mutation, suggesting synergistic effects of more than one factor in the development of CD [37,38,39]. Moreover, the common genetic variants in NOD2 confer a disease risk specific to European and African-American populations, but not the Asian population, as it could not be reproduced in studies with Japanese and Chinese individuals [40,41,42].

2.1.2 IL23R

IL23R is located on chromosome 1p31 encoding a subunit of the proinflammatory cytokine IL-23R [43]. Variants in IL23R have been associated with a protection against IBD, presumably with a loss of function of the receptor itself or at least through mediated cell function impairment [44]. In addition, IL-23 as the ligand of IL23R plays a key role in the pathogenesis of intestinal inflammation: IL-23 and its closely related cytokine IL-12 with their common subunit p40 have become an attractive target of neutralizing antibodies in the treatment of IBD [45]. However, the main anti-inflammatory effect is achieved by blocking IL-23 or more specifically its p19 subunit (instead of p40), which has led to development of more selective IL-23 antibodies such as guselkumab, risankizumab, mirikizumab, and brazikumab, which are currently being evaluated in phase II or phase III trials [46, 47]. Their efficacy can be explained by the disruption of a wide variety of signaling pathways including JAK-STAT, NF-κB, interleukin-17, as well as a loss of cell function in T helper-17, CD4+, and CD8+ T cells [48].

2.1.3 HLA Complex

The HLA complex, also known as the major histocompatibility complex, is located on chromosome 6p21.3. HLA genes encode specific cell surface proteins of antigen-representing cells to initiate a T-cell-mediated immune response [49]. The HLA region plays a key role in most chronic inflammatory diseases. The exact pathophysiological mechanisms of HLA involvement in IBD remain elusive [50]. It is hypothesized that molecular mimicry is one of the main mechanisms by which a foreign antigen causes autoimmunity. Cross-reactive epitopes present through the major histocompatibility complex and trigger an immune response by T-cell activation [51]. This proinflammatory response, which is essential to clear pathogens, may persist if there is sequential or structural homology between foreign and self-antigens [52]. Not surprisingly, HLA is the genetic region where the most significant risk variants with the largest effect sizes are located, for the development of IBD and other immune-mediated diseases such as psoriasis, rheumatoid arthritis, or spondyloarthropathies [53]. In the case of IBD, classical HLA genes are most strongly associated with disease development. Those can be divided into two groups: (I) class I gene region with its HLA traits A, B, and C and (II) class II genes with the HLA traits DR, DQ, and DP [54]. The most consistent association of HLA class II alleles in IBD are seen with HLA-DRB1 and HLA-DQB1. DRB1 has been the most extensively studied gene and confers the greatest risk out of all HLA genes in IBD. HLA-DRB1*0103 is strongly associated with both entities, UC and CD. Nevertheless, it is considered a rare allele with a less than 2% frequency in European, White North American, and Jewish populations. The variant shows a particularly strong association in patients with severe extensive UC and those with colonic CD [49]. Further associations have been seen with HLA-DRB1*1502 in UC, especially affecting the Japanese population, with a high prevalence of about 20% [49]. In CD, most reproducible associations have been found with HLA-DRB1*07, particularly with ileal disease, and HLA-DRB1*04, predominantly in the Japanese population [49]. Overall, the influence of HLA has been found to be greater in UC than CD [29, 55]. A genome-wide association study showed that the carriage of the HLA-DQA1*05 allele (approximately 40% of European individuals) doubled the risk of immunogenicity to anti-TNF therapy [56]. Taken together, the association of HLA alleles and IBD remains complex and difficult to resolve, particularly because of the high density of genetic variants/candidate genes and a strong linkage disequilibrium.

3 Genetics in EIM and Perianal Disease

It is well known that IBD is a systemic disease with many patients presenting with extraintestinal manifestations (EIM) [57]. Prevalence of these EIM ranges from 6 to 47% [58,59,60,61,62,63,64,65]. The classic EIM involve the following four organ system: (1) joints; (2) skin; (3) eyes; and (4) hepatobiliary system [57, 65]. The pathogenesis of EIM in the context of IBD is only partially understood, but several mechanisms have been proposed, among which are aberrant lymphocyte homing, cross-reactivity (also called molecular mimicry), or a simple proinflammatory state with upregulation of specific cytokines and chemokines resulting in a disbalance of components of the immune system [66]. The overlapping genetic background between EIM and IBD further supports a common pathophysiology.

Association studies show concordance rates of 70–80% in parent-child pairs or sibling pairs with EIM [67], while a positive family history has been known as an independent predictor for the presence of EIM, at least in patients with CD [65]. Differences in EIM prevalence among different ethnicities further support the concept of a genetic background of EIM: (1) EIM appears more frequently in Western countries compared with Eastern countries [68, 69], which is particularly true for primary sclerosing cholangitis; (2) joint manifestations are more frequent in African-American individuals and Asian individuals; and (3) EIM are more often diagnosed in Indian individuals compared with Malay or Chinese subjects [70, 71]. Several genes have been identified in the pathogenesis of EIM. The following HLA genotypes have been implicated as genetic risk factors: HLA-B27 in joint, skin, and ophthalmologic manifestations, particularly axial spondyloarthropathy; HLA-B8/DR3 in primary sclerosing cholangitis; HLA-B35 in type 1 arthritis; HLA-B44 in type 2 arthritis; HLA-B58 in joint, skin, and ophthalmologic manifestations; HLA-DRB1*0103 in joint, skin, and ophthalmologic manifestations. HLA-A2, HLA-DR1 and HLA-DQw5 are risk factors for EIM in general [66]. Genome-wide association studies for axial spondyloarthropathy and IBD revealed the following risk genes: IL23R, IL12B, STAT3, PTERG4, CARD9, IL1R2, and ORMDL3. Genome-wide association studies for psoriasis and IBD showed the involvement of IL12B, IL23R, JAK2, and STAT3, while GWAS for erythema nodosum and IBD revealed CLCA2, LY75, and 2q24.1 as risk loci. Furthermore, NOD2 has been shown to be a risk factor for ileal CD, sacroiliitis, and uveitis [72] However, the role of NOD2 in perianal disease remains unclear. Newer data point to an important role of a SNP in Complement Factor B. The non-synonymous variant rs4151651 in Complement Factor B has been associated with perianal CD in three independent cohorts [73]. Presumably, the genetic contribution to the pathogenesis of EIM and IBD is a combination of overlapping and independent gene loci, reflecting the occurrence of EIM that parallel and do not parallel intestinal inflammation [74].

4 Pharmacogenomics

The field of pharmacogenomics studies how genetics influence an individual’s response to drug therapy potentially opening the door to individualized precision medicine. Its value in clinical practice has not yet been exhausted and may grow as knowledge increases [75]. While the identification of patients who will or will not respond to a specific treatment, based on genetic testing, is still an unmet need, only a few pre-therapeutic genetic tests have made it into clinical practice, largely those that identify patients at risk for drug toxicity. Such drug toxicity affected by genotypes has been most extensively studied in the context of thiopurine treatment. Genotypes of the two enzymes thiopurine methyltransferase and nudix hydrolase 15 slow down thiopurine metabolism leading to elevated drug concentrations and therefore increased toxicity (e.g., severe myelosuppression) [9, 10]. Pre-treatment genotyping is endorsed by the European Crohn’s and Colitis Organisation guidelines [76], but does not substitute monitoring of drug toxicity under treatment.

Apart from pre-treatment assessment of thiopurine toxicity, no genetic test has made it into clinical practice. Nevertheless, several genetic polymorphisms have been linked with response to treatment with anti-TNF, particularly those within the TNF and TNFR genes (rs1800629, rs1799724, rs767455, rs1061624, and rs976881 showing an inferior outcome, and rs4149570, rs361525, and rs3397 showing a superior outcome) [77], but also in innate immunity genes (TLR4, IL6, IL1, IL17, TLR2, TLR9) [77, 78], and genes involved in autophagy and apoptosis (Fas-L, CASP9, and ATG16L1) [77]. Further data are evolving with regard to responses to other treatments such as anti-IL-12/23 ustekinumab (e.g., rs7234029) [79].

5 Epigenetic Mechanisms

As previously mentioned, only a part of the IBD risk can be explained by genetic factors. Additional epigenetic mechanisms have long been proposed and considered among pathogenic factors contributing to IBD, potentially accounting for the “hidden heritability”. Epigenetics is considered the link between environmental factors and the genetic background. Epigenetic studies analyze the heritable changes in gene function that cannot be explained by changes in DNA (e.g., through mutation, recombination, or translocation) [80]. The three major epigenetic modifications are: (1) DNA methylation, which is the most studied; (2) post-translational modifications of histones, such as acetylation of H3K27; and (3) the expression of non-coding RNAs.

DNA methylation has been shown to play a key role in cell processes such as cell differentiation or gene expression, eventually defining cell phenotypes. Of note, DNA methylation is dependent on several cofactors, one of them being nutrition [24, 81, 82], thereby linking genetic with environmental factors. In mouse models, a low-methyl diet has been shown to be associated with increased rates of obesity and diabetes [83,84,85]. In IBD, DNA global hypomethylation was found in the rectal mucosa from patients with UC, but not healthy controls [86]. In addition, there is evidence that local inflammation (such as seen in uncontrolled IBD) can accelerate DNA methylation changes, thereby resulting in cancer development [87].

Post-translational modification refers to a process that stimulates alteration in nucleosome composition [88]. Nucleosomes consist of DNA segments and the histones they wrap around. They form the smallest packaging unit of chromatin. Molecular differences between different nucleosomes result from numerous chemical modifications of the histone proteins [80]. As an example, G9a is a lysine methyltransferase and catalyzes demethylation of histone H3 lysine 9 (H3K9me2). Lysine methylation alters transcriptional activity without affecting the DNA sequence itself and thereby regulates gene expression in response to environmental stimuli [89]. T-cell-intrinsic G9a has recently been associated with altered T-cell differentiation inducing colitis [90].

Non-coding RNA refers to RNA that is not translated into proteins [91]. It regulates gene expression at transcriptional and post-transcriptional levels. Non-coding RNA can be divided into two classes by the number of nucleotides, short with < 200 nucleotides or long with >200 nucleotides. Both classes have received increasing attention as most of the IBD-associated SNPs are linked to non-coding genetic sequences [92]. They account for several changes in the adaptive immune system (e.g., T-cell and T-helper regulation, T-helper differentiation) as well as the innate immune system (e.g., NOD2- and Toll-like receptors), potentially influencing disease activity and disease course [93, 94].

6 Clinical Implications: The Road to Precision Medicine

A wide armamentarium for the treatment of IBD is available that ranges from topical formulations, systemic steroids, and immunomodulators to more specific therapeutics such as anti-TNF, anti-integrins, anti-IL-12/23, anti-IL-23p19, and more recently, small molecules. Currently, clinicians, together with the patients, face the difficult challenge of where to position all the available options in the treatment algorithm. While many drugs appear to be equally effective, loss of response in the long term is a universal problem. The question remains as to which patient might respond best to the chosen treatment and which patient does not lose this response over time. A genetic risk assessment appears to be an appealing strategy. In fact, polygenic risk scores have been used not only to predict IBD in the general population, but also to stratify patients, performing better in CD than UC [95,96,97,98]. However, the universal use of genotyping has never (or not yet) made it into clinical practice given the high costs, but low benefits [99]. Among potential implications are: assessment of disease risk in relatives (diagnostic biomarker), prediction of disease complications/progression (predictive biomarker), or prediction of therapeutic response or side effects to biological treatment regimens (response biomarker, safety biomarker) [100, 101]. Precision medicine might become a guidance tool for drug dosing or drug sparing if patients with a low risk of relapse could be identified adequately. Unnecessary immunosuppression, at least for a certain time period, could be avoided in such patients (Fig. 1).

Fig. 1
figure 1

Adapted from Vieujean and Louis

Illustration of the potential evolution from the current therapeutic strategy, in which one drug is assigned to all patients, to future medicine, where multiple details are considered to create an individualized therapeutic regimen [102]

7 Conclusions

The progress made in the genetics of IBD over the last 20 years is impressive with over 200 different gene loci identified. However, the contribution of a single polymorphism to the development of IBD or its disease course is small, and most risk alleles are rare. The pathophysiology of IBD appears to be too complex in order to be explained by a few genes or pathways. The most appealing advantage of a genetic risk assessment will lay in the field of precision medicine where the presence or absence of one or several risk genes might help in the therapeutic decision-making process. However, many more studies are needed to fill this important gap.