Introduction

The liver can be affected by three leading forms of complex autoimmune liver diseases (AILDs), in which the immune system attacks different sites in the liver: autoimmune hepatitis (AIH), primary biliary cholangitis (PBC), and primary sclerosing cholangitis (PSC). The exact pathogenesis of these diseases is poorly understood, and available therapeutic approaches are only partially effective. AIH is a rare, chronic progressive disease with a prevalence of approximately 4–42 per 100,000 depending on geographical location [1]. It is characterized by elevated levels of serum transaminases and immunoglobulin G (IgG), inflammatory liver histology, the presence of autoantibodies, and the absence of markers for viral hepatitis [2]. AIH occurs predominantly in middle-aged women but can affect all age groups of both sexes [3]. The exact cause of AIH is unknown, although loss of tolerance against liver antigens is thought to be the major pathophysiologic mechanism caused by an unknown trigger in individuals with a genetic susceptibility [4]. Concordance of AIH and familial clustering of AIH were found in twins, with an estimated pair-wise concordance rate in monozygotic twins of 8.7%, suggesting genetic risk factors for AIH [5]. PBC is a rare disease in which a cycle of immune-mediated damage to biliary epithelial cell, cholestasis, and progressive fibrosis over time can lead to end-stage biliary cirrhosis. PBC more commonly affects women, with a prevalence of about 4–58 per 100,000 people [6], and is characterized by autoimmune destruction of small to medium size intrahepatic bile ducts and the presence of anti-mitochondrial antibodies (AMAs), which are present in > 90% of cases [7, 8]. The onset of the disease is thought to be due to the interaction of environmental triggers and a genetic predisposition. Genetic risk is consistent with other complex autoimmune diseases, with an estimated sibling relative risk of 10.5 [9] and a pairwise concordance rate in identical twins of 63%, which is among the highest reported in autoimmune diseases [10]. PSC is a rare disease characterized by multifocal biliary strictures and progressive liver disease. In PSC, there is autoimmune damage to the medium to large bile ducts leading to concentric and obliterative fibrosis and structuring [11]. The prevalence of PSC is approximately 10 per 100,000 [12]. The most common positive autoantibodies are perinuclear antineutrophil autoantibodies (pANCA), which are detected in about 80% of patients but are not specific enough for diagnosis [13, 14]. Unlike AIH and PBC, men are more commonly affected by PSC than women [15]. First-degree relatives of patients with PSC have an 11.5-fold increased risk of PSC [16].

AIH, PBC, and PSC are not yet curable. Progression of the disease, especially in PSC, often leads to liver transplantation or death. AIH is usually treated by regulating the immune system with steroid and thiopurine-based treatments [17], which is why therapy in AIH is often associated with significant side effects [4]. In many cases, the drugs can permanently suppress AIH and allow patients to live a normal life. If the disease progresses to cirrhosis despite consistent therapy, liver transplantation is the only option. Two drugs, the natural bile acid ursodeoxycholic acid (UDCA) and the semi-synthetic bile acid obeticholic acid (OCA), are approved and mainly used for the treatment of PBC. Long-term therapy with UDCA is often successful if started early [18], but many patients respond poorly to both agents, putting them at risk of progressive liver disease [19]. Therapy of PSC is also carried out with UDCA, but in contrast to PBC, the success of therapy in PSC is rather limited. In certain circumstances, liver transplantation is the only therapeutic option. Biologics, such as anti-tumor necrosis factor alpha (TNF-α) or B cell-depleting antibodies [20], and many new drugs are currently being investigated for the treatment of AILDs [21]. Unfortunately, there are still no promising drugs that target the (unknown) key pathogenic processes in the early phase of disease progression [22]. Of central importance is the improvement of risk stratification strategies, which requires in-depth, longitudinal phenotyping of patients using multi-omics data analysis. The different course of the disease and the different response of patients to treatment could also be related to the heterogeneous genetic background of individual patients, which translates into a heterogeneous clinical phenotype [23]. Elucidating the genetic architecture of AIH, PBC, and PSC is likely to contribute to a better understanding of these diseases by identifying causative genes and downstream signaling pathways that can be influenced pharmacologically. Genetic research should complement future work to identify the as yet unknown environmental risk factor(s) responsible for the development of autoimmune liver disease through interaction with genetic factors [24]. One of the greatest challenges in genetic epidemiological studies remains deriving a functional biological interpretation of the results from GWAS. In the following, I describe the current status from GWAS studies for AIH, PBC, and PSC and briefly outline what additional work I believe is promising to better understand the genetic component and its biological contribution.

HLA-related genetic associations from GWAS

AIH, PBC, and PSC show a strong association with classical human leukocyte antigen genes (class I and II HLA genes; region on chromosome 6p21), which is a common feature of autoimmune diseases, with HLA susceptibility variants usually having a much greater impact than any other risk variant in the genome [25]. Proteins encoded by HLA genes are expressed on cell surfaces and present processed antigens to immune cells, which then activate downstream immune processes. A GWAS is an association study involving several million single nucleotide polymorphisms (SNPs) to determine the contribution of genetic variants to disease susceptibility [26]. The first GWAS for AIH [27], PBC [28], and PSC [29] revealed genetic associations close to classical HLA susceptibility alleles discovered before the GWAS era [30]. Subsequent GWAS and genome-wide meta-analyses (GWMA) used HLA imputation methods to investigate a variety of HLA alleles as possible susceptibility alleles. HLA imputation is a method of deriving HLA types for patients and controls in GWAS studies by imputing (predicting) genotypes of HLA genes using regional SNPs and a SNP-HLA-allele reference panel for imputation [31]. In a first GWAS for AIH [27], involving 649 adult AIH patients and 13,436 controls, followed by replication in 451 patients and 4103 controls, the strongest genome-wide significant (P < 5 × 10−8) association signal for SNP rs2187668 was found at 6p21.32, and HLA imputation assigned the SNP signals to HLA-DRB1*03:01, which is considered the primary AIH susceptibility allele (Table 1); HLA-DRB1*04:01 was identified as another independent (i.e., secondary) AIH susceptibility allele by conditional forward stepwise logistic regression analysis [32]. Recently, a meta-analysis of two GWAS study populations from China (1622 Chinese AIH type 1 patients and 10,466 population controls) identified a SNP association signal near HLA-B [33]. Results of GWAS and HLA fine-mapping for other populations as well as transethnic studies (i.e., studies across globally different study populations with largely different haplotype structure) are not available for AIH but would be of great importance to identify additional HLA susceptibility alleles and determine susceptibility to AIH across populations. In PBC, a total of 14 HLA alleles with genome-wide significance were identified in a HLA Immunochip fine-mapping study of 2861 PBC cases and 8514 controls of European ancestry [34], with four independent HLA association clusters for PBC identified by conditional logistic regression (Table 1). The first three independent HLA susceptibility alleles (clusters representatives) DQA1*04:01, DQB1*06:02 and DQB1*03:01 confirmed findings from serological studies [35,36,37], with the fourth allele DRB1*04:04 not previously associated with PBC. In another Immunochip HLA fine-mapping study of 676 Italian PBC cases and 1440 controls, three DRB1 (DRB1*08, DRB1*11, DRB1*14) and one DPB1 (DPB1*03:01) susceptibility cluster were identified through conditional analysis [38], although DRB1*14 and DPB1*03:01 did not meet the genome-wide significance threshold and are therefore not listed in Table 1. An HLA genotyping study of 1200 Japanese PBC patients and 1196 controls found a primary contribution of DQB1*06:04 and DQB1*03:01 to PBC susceptibility [39]. Subsequently, a GWAS and HLA fine mapping study of 1126 Han Chinese PBC patients and 1770 controls showed that DRB1 (with DRB1*08:03) and/or DQB1 (with DQB1*03:01) picked up most of the signals, with DPB1 (DPB1*17:01) being an independent locus [40]. Interestingly, the protective DQB1*03:01 allele for PBC has been identified as a secondary association signal in populations of European origin and in Japan, whereas it is considered a primary association signal in the Han Chinese population. For a complete list of HLA susceptibility alleles for PBC, including alleles from candidate studies, see Gerussi et al. [41]. For PSC, five independent HLA association clusters (B*08:01; DQA*01:03; DQA*05:01; DRB1*15:01; DQA*01:01) were identified in an Immunochip fine-mapping study of 3789 PSC cases and 25,079 population controls [42]. In their combined stepwise regression analysis of HLA alleles and SNPs, HLA class II associations were consistent with previous studies [43, 44], with the exception of DQA1*01:01 which was newly added (Table 1). For a complete list of HLA susceptibility alleles for PSC from candidate studies up to 2013, see Mells et al. [45], although this list of candidate HLA alleles has not been expanded to include with new candidates from non-GWAS studies since the advent of several GWMA studies for PSC. Regression analysis can work with covariates and allows disentangling the HLA effect from confounding factors such as population stratification, sex, and others. However, conditional forward stepwise regression analysis has several methodological disadvantages. First, as the number of conditioning steps increases, so does the number of statistical tests. If there are m alleles in the region of interest, about k*m statistical tests are performed after k consecutive steps, which significantly increases the probability of a false positive result. Second, if m is large, perhaps close to the number of individuals in the GWAS study, and if a liberal significance threshold is used to include alleles at each step, the forward selection procedure becomes unstable and is too optimistic about the disease variation explained by the selected allele. For this reason, I have listed in Table 1 only statistically independent and genome-wide significant HLA susceptibility alleles (cluster representatives) that are also genome-wide significant after conditioning regression analysis. The identification (fine-mapping) of a complete set of potentially “causal” HLA alleles in the overall context of all class I and I genes requires the use of high-quality multi-ethnic reference panels from different genetic backgrounds [46, 47], highly accurate HLA type imputation algorithms [31], the study of non-additive and interaction effects [48], inclusion of amino acid alleles composing HLA alleles [49], and functional fine-mapping approaches [50,51,52]. Future HLA fine-mapping studies for AIH, PBC, and PSC therefore have the potential to further refine these signals from previous GWAS and HLA imputation studies.

Table 1 Statistically independent and genome-wide significant (P < 5 × 10−8) HLA susceptibility alleles identified by (hypothesis-free) GWAS/Immunochip analyses using HLA imputation models for classical HLA genes or by full HLA locus genotyping experiments. Only the representative allele of the HLA cluster from the respective publication is shown. HLA alleles from candidate gene studies are not listed. Effect direction refers to whether the minor allele increases or decreases the risk of disease. Secondary association signals were determined in the respective publication by stepwise logistic forward regression analysis, with the lead signals added as covariates. SNP association signal near HLA-B gene was reported; no HLA allele association analysis was performed 

Non-HLA-related genetic associations from GWAS

As the number of GWAS studies for AIH, PBC, and PSC increased, it became clear that the effect sizes of non-HLA associations were much smaller compared with associations with HLA alleles. For AIH, a coding variant rs3184504 in SH2B3 was identified as a susceptibility variant with genome-wide significance in study populations from the Netherlands and Germany [27]; two additional non-coding variants (rs72929257 near CTLA4 and rs6809477 at SYNPR) were recently identified across two study populations from China [33]. For PBC, the largest GWMAs of European (Asian) case–control populations yielded 45 (12) loci with genome-wide significance, with a total of 55 genome-wide significant non-HLA susceptibility loci identified in one or the other GWMA [53]. GWMAs for PSC identified a total of 22 genome-wide significant non-HLA susceptibility loci in Europeans [42, 54, 55]; non-European as well as trans-ethnic GWMAs have not yet been performed for PSC. Table 2 summarizes all non-HLA susceptibility variants from GWMA studies for AIH, PBC, and PSC. For a review of non-HLA susceptibility variants for AIH, PBC, and PSC from monocentric studies, including candidate studies, see Engel et al. [56], Gerussi et al. [41], and Chung/Hirschfeld [57], respectively. For PBC, Bayesian fine-mapping was recently performed for the 55 non-HLA PBC susceptibility loci [53]. Bayesian methods are particularly well suited for fine-mapping of non-HLA loci to identify statistically “causal” sets of variants [58] and have been successfully used, for example, to fine-map inflammatory bowel disease (IBD) risk loci to single-variant resolution [59]. For 40 (9) of 55 non-HLA PBC susceptibility loci, the association signal was best explained by a single variant (posterior probability ≥ 0.5) across European and Asian (Asian only) populations (Table 2); for AIH and PSC, Bayesian fine-mapping for established non-HLA susceptibility regions remains to be performed. Chromosome X association analysis has unfortunately been neglected in most GWAS studies [60]. Recently, a chromosome X-wide association study for PBC identified a genome-wide significant locus at Xp11.23 (the locus includes the GRIPAP1 gene; see Table 2) in East Asian PBC case–control study populations, which also shows an association signal (not yet genome-wide significant) across European and Asian PBC case–control study sets [61]. X-linked inheritance models in GWAS/GWMA for PSC and AIH thus have the potential to reveal additional genetic associations. Figure 1 summarizes the polygenic landscape of genome-wide significant HLA and non-HLA susceptibility variants (each of which was named in association with a nearby candidate gene) for AIH, PBC, and PSC.

Table 2 Genome-wide significant (P < 5 × 10−8) non-HLA susceptibility variants identified by two (hypothesis-free) GWAS for AIH and several genome-wide meta-analyses (GWMA) for PBC and PSC. Susceptibility variants for PBC and PSC from monocentric studies and studies with candidate gene are not listed. Variant: dbSNP name of the variant. Chromosome:position: human genome build hg37. Candidate gene: candidate gene from the respective publication. Fine-mapped to single variant: In cases where loci could be resolved to a single variant by Bayesian fine-mapping with high probability as causal (posterior probability > 50%), the name of the variant is indicated. NA: fine-mapping results not yet available
Fig. 1
figure 1

The polygenic landscape of HLA and non-HLA susceptibility variants for AIH, PBC, and PSC. Genome-wide significant (P < 5 × 10−8) HLA and non-HLA susceptibility variants were identified by GWAS and HLA imputation (Table 1) and GWAS meta-analyses (GWMA; Table 2), respectively. Susceptibility variants were broadly categorized as high effect (odds ratio [OR] ≥ 2 for risk variants and OR ≤ 0.5 for protective variants), medium effect (2 > OR ≥ 1.2 or 0.5 < OR ≤ 0.83), and low effect (1.2 > OR > 1.0 or 0.83 < OR < 1.0), with the position of the variant on the x axis (on a log scale) corresponding to the magnitude of the OR. The population frequency indicated on the y axis refers to the minor allele frequency (MAF) of the susceptibility variant in the general population. The size of the circles represents the effect size, with a green circle border representing statistically protective variants (OR < 1 for the minor allele) and a red border representing risk variants (OR > 1 for the minor allele). For AIH: association signal near HLA-B is based on SNP data only, see also Table 1

SNP-based (co)-heritability

The proportion of genetic variance in liability (i.e., heritability explained by individual genetic variants for binary outcomes; additive model based on disease prevalence, relative risks, and allele frequencies [62]) for PBC explained by four major HLA alleles [34] (Table 1) and 26 independent genome-wide significant non-HLA susceptibility variants (subset from Table 2) was estimated to be 4.9% and 1.4%, respectively, which account together for 16.2% of total PBC heritability [34]. More recent calculations using the much larger PBC GWAS sets available today would be desirable, as would estimates for AIH. SNP-based heritability for PSC explained by 16 independent genome-wide significant loci (including major HLA alleles) account for 7.3% of total PSC heritability [42]; again, more up-to-date estimates would be desirable. The discrepancy between the variance caused by common SNPs and the expected heritability of AILDs from twin studies is referred to as missing heritability [63]. This gap in heritability can have several possible causes: Heritability from twin studies could be overestimated because common environmental factors or non-additive effects were not taken into account. On the other hand, part of the heritability could be due to genetic variants that have remained undetected so far, such as rare genetic variants or sex chromosome variants. Non-additive genetic effects, such as dominance effects (\(\delta\) 2SNP) or additive-by-additive interaction effects (\(\eta\) 2SNP; epistasis), might describe part of the disease heritability in AILDs. In a recent study of 70 (continuous) complex traits from the UK Biobank (with more than 60,000 individuals for each trait), the average epistatic variance across all traits (\(\widehat{\eta }\) 2SNP = 0.055) was estimated to be significantly higher than the average variance for dominance effects (\(\widehat{\delta }\) 2SNP = 0.001), but still significantly lower than the average variance for additive effects (\(\widehat{h}\) 2SNP = 0.208) [64]. Genome-wide interaction studies (GWIS) are therefore another interesting approach, but GWIS for PBC, PSC, and AIH would require many times the current sample size (which is difficult to realize) due to the exponential increase in statistical tests and extremely longer calculation times [65], although alternative computer architectures such as GPUs could help here [66, 67].

Because up to 9% and 7% of AIH patients have clinical overlap with PBC and PSC [27], respectively, it is reasonable to assume that there are shared genetic factors for AIH, PBC, and PSC [68], and also for other immune-mediated diseases, as shown by Immunochip studies for PBC and PSC [34, 38, 42]. In AIH, genetic risk is shared with type 1 diabetes for DRB1*04:01 [69] and systemic lupus erythematosus for DRB1*03:01 [70]. The association of AIH with the SH2B3 locus has also been identified as a genetic risk factor for PSC and PBC; more specifically, even the same risk variant rs3184504 in SH2B3 has been identified for AIH and PSC (see Table 1). Genetic relationships between disease pairs on a genome-wide level can be investigated by genome-wide genetic correlation analyses that quantify genome-wide SNP-based heritability (hg2SNP) in a bivariate model and provide information on potential coheritability between diseases [71,72,73]. Unfortunately, the shared genetics of AIH, PBC, and PSC have been studied only to a limited extent genome-wide and only partially with other immune-mediated comorbidities. When SNPs from the extended major histocompatibility complex (MHC) region (chromosome 6 region of 25–34 Mb including HLA genes) were excluded, a significant genetic correlation was observed between PSC and inflammatory bowel disease (ulcerative colitis, rg = 0.29; Crohn’s disease, rg = 0.04). For 196 fine-mapped regions of the Immunochip, the genetic correlation between PSC and ulcerative colitis (rg = 0.64), PSC and Crohn’s disease (rg = 0.35), and PSC and ankylosing spondylitis (rg = 0.33) was highest compared with non-immune diseases [54]. To accurately identify possible shared causal variants in AIH-, PBC-, and PSC-associated regions, Bayesian tests of colocalization may be useful [74] and could provide an indication of whether there are common or independent causal variants for the same genomic regions. For example, six of 14 loci associated with both PSC and IBD showed strong evidence of a shared causal variant with UC, CD, or both [55]; colocalization analyses for AIH, PBC, and PSC could provide further insight into shared genetic structure. Future genome-wide comparisons between (worldwide) study populations with AIH, PBC, and PSC would provide the opportunity to identify the potentially shared landscape of AILDs.

Polygenic risk scores

Genome-wide SNP-based (co-)heritability estimation provides information on the proportion of (co-)heritability explained for (pairs of) diseases and measures pleiotropy (vertical and horizontal [75]) between diseases, but does not provide an estimate of individual patient risk based on genetic markers. Given the polygenic nature of AILDs (Fig. 1) and the fact that individual risk variants from GWAS/GWMA describe only a fraction of the heritability, a combined genetic burden across all genetic variants can be calculated to identify individuals at significant increased risk. A polygenic risk score (PRS) is an estimate of an individual’s genetic susceptibility to a disease calculated based on that individual's genotype profile and relevant data from GWAS. A study by Khera and colleagues [76] revived the topic of PRS for common complex diseases and showed for coronary artery disease (CAD), that a PRS identifies 20-times more individuals at comparable or greater risk than did previous studies for monogenic mutations. Therefore, identifying individuals with high (low) PRS in a population-based sample may provide an opportunity to identify those with the highest (lowest) genetic risk. However, the utility of PRS-based risk estimates for AIH, PBC, and PSC is limited by the small effect sizes of the identified susceptibility variants (see Fig. 1). Using genome-wide data from UK Biobank, Khera and colleagues showed that individuals in the top 5% of a PRS for CAD had a 3.34-fold risk [OR(CI95%) = 3.34(3.12–3.58)]; Plogistic_regression = 6.5 × 10−264] compared with the remaining 95% of the general population. To provide an estimate for PSC here (although our GWAS study data here is not a population-based sample), I calculated a PRS from the summary statistics of the most recent GWMA for PSC [55] and determined the distribution of the PRS for 628 GWMA-independent German PSC cases and 4,272 healthy controls (Methods). A PRS for PSC runs the risk of creating a mixed PRS for PSC and IBD, as patients with PSC have a highly increased incidence of IBD (called PSC with concomitant IBD or PSC-IBD); however, PSC-IBD has clinical differences from classical IBD [77] and appears to be genetically distinct from classical IBD phenotypes [54] (see below). Individuals in the top 5% of the PRS for PSC had a 5.99-fold risk [OR(CI95%) = 5.99(4.52–7.92); Plogistic_regression = 1.06 × 10−46 and adjusted for sex and genetic ancestry] compared with the remaining 95% of the general population (Fig. 2a). The OR should be interpreted as the factor by which the chance of developing the disease increases if a person has a positive PRS test result (here, being in the top 5%). However, with an OR = 5.99 and an underlying false positive rate (1-specificity) of 5%, the detection rate (sensitivity) is only 24% (Fig. 2b), resulting in an OAPR (odds of being affected given a positive result) of 1:2087, compared with the overall prevalence of 1:10,000 for PSC in the general population. On this basis, diagnostic genetic testing would be inappropriate. The fact that even a very high OR is associated with low predictive power of a diagnostic test may seem counterintuitive. It is largely explained by the fact that the genetic risk variants are widespread in the general population, so almost everyone can be affected by these causes, even if not everyone is or becomes ill because of the genetic burden. Therefore, the PRSs for AILDs in their current form are not suitable for diagnostic testing. On the other hand, for example, a possibly increased PRS for PSC in patients with AIH (compared with healthy controls) could indicate shared genetic risk factors (pleiotropy) between PSC and AIH, but an increased cross-disease PRS could be due to multiple causes such as diagnostic misclassification, molecular subtypes, or excessive comorbidity (collectively referred to as heterogeneity). Cross-locus correlation analyses of loci associated with disease B in cases of disease A (and vice versa) can help distinguish pleiotropy from heterogeneity. For PSC and IBD, for example, we have shown that PSC-IBD is likely to be a distinct disease at the genetic level, sharing some genetic factors with IBD, but genetically distinct from classical IBD phenotypes such as CD and UC [54].

Fig. 2
figure 2

Risk gradient for PSC affection status according to the polygenic risk score (PRS) percentile and corresponding receiver operating characteristic curve (ROC) for a resulting (very weak) diagnostic test. a 100 groups of the test data set were derived according to the percentile of the PSC-specific PRS. The prevalence indicated on the y axis of the graph refers to the ratio of cases to controls in the genotyped sample. Odds ratio (OR) was calculated by comparing individuals with high PRS (top 5%) with the rest of the population (remaining 95%) in a logistic regression model adjusted for sex and the first four principal components (PCs) of ancestry from principal components analysis (PCA) with GWAS data. b Sensitivity: proportion of affected individuals with positive test results. False-positive rate (1—specificity): proportion of unaffected individuals with positive test results. AUC: area under the curve

Genetically regulated expression and single cell analyses

Susceptibility variants identified by GWAS are often located at genomic positions with methylation, expression, and protein-quantitative trait loci (mQTLs, eQTLs, pQTLs), but it remains unclear whether this overlap is due to methylation, expression, and protein levels "mediating" genetic effects on disease. Cordell and colleagues applied Bayesian tests for colocalization between GWMA summary statistics of PBC and genome-wide mQTLs, eQTLs, and pQTLs data from large-scale consortium projects ALSPAC [78], GTEx [79], and INTERVAL [80] and suggested that the genetic architecture of PBC influences susceptibility to the disease primarily by affecting the regulation of expression of potentially causal genes [53]. To assess whether this might also be expected for PSC, I calculated the correlation between GWMA summary statistics of PSC [55] and summary statistics of large tissue-specific eQTL studies from GTEx using an approach developed by Yao et al. [81] called mediated expression score regression (MESC). I estimated that the heritability for PSC mediated by the cis genetic component of gene expression levels (hmed2/hg2) averaged 38.4% for 48 tissues used in the GTEx project (Methods). This value of 38.4% for PSC is among the top of all disease-specific MESC values published in the work of Yao et al. who studied 42 diseases and human traits in the same 48 tissues, including ulcerative colitis with a similarly high published value of 38.2%. Therefore, it is hypothesized that in PBC and PSC (perhaps also in AIH), the gap between genetic approaches and the resulting disease phenotype can be reduced by the transcriptome. Because eQTL data from bulk tissues are thought to be a poor surrogate for eQTL data in causal cell types/contexts and little is known about the composition of intrahepatic immune cells and their contribution to disease pathogenesis, measurement of context-specific expression [82] and expression in single cells may allow the identification of genetic variants that impact key regulatory networks in AILDs [83]. Using single-cell RNA sequencing (scRNA-seq) techniques, Poch and colleagues generated the first atlas of intrahepatic T cells in PSC and identified a previously uncharacterized population of liver-resident CD4 + T cells that likely contribute to the pathogenesis of PSC [84]. Xiang and colleagues [85] developed a computational framework to integrate GWAS summary statistics with scRNA-seq data and revealed genetically modulated liver cell subpopulations for PBC. They found that cholangiocytes show significant enrichment with PBC-related genetic association signals, with the ORMDL3 gene showing the highest expression level in cholangiocytes compared with other liver cells. Such combined genetics/single cell omics studies have the potential to identify the causative genes for AIH, PBC, and PSC in a disease-specific context.

Genome-wide screenings for drug reuse

More than 25% of drugs entering clinical development fail because of lack of efficacy, but drugs with supportive genetic evidence are twice as likely to succeed in clinical development as drugs without supportive genetic evidence [86]. Thus, one potential approach is to test drugs with genetic support that have been successfully used in practice for other immune-mediated diseases for their transferability to AILDs. To this end, Cordell and colleagues have developed an elegant in-silico method for identifying drugs that can improve (or exacerbate) PBC in this prediction [53], highlighting the potential of genomic screening approaches for drug discovery and prediction of opposing drug effects in complex diseases. Briefly, they adapted a network-based approach to drug proximity screening from Guney et al. [87] for PBC candidate genes from GWAS risk loci by calculating a measure of proximity (z-score) between candidate genes and known drug targets (from agents stored in Drugbank), where a low z-score indicates recommended use of these agents (because an agent’s gene targets are closer to susceptibility genes than expected by chance) and a high z-score represents non-recommended use (because an agent’s targets are not closer to susceptibility risk genes than expected by chance). Major drugs predicted to improve PBC included ustekinumab, a monoclonal antibody against IL-12/23 used to treat psoriasis and Crohn's disease [88]; however, a proof-of-concept study has not shown benefit of ustekinumab for patients with PBC [89]. Major drugs that could exacerbate PBC included the pharmacologic interferons interferon alfa-2a and interferon beta-1b. The drugs already approved for PBC, fenofibrate, bezafibrate, and OCA were confirmed; interestingly, UDCA did not achieve a significant result, suggesting that genetics does not play a role in this case. A similar analysis would be desirable for AIH and PSC.

Conclusion

A GWAS for AIH and several GWMAs for PBC and PSC have been successfully conducted and have identified a variety of genetic factors associated with AIH, PBC, and PSC. Some of these studies have already identified disease-causing variants by statistical fine-mapping and provided important biological insights into pathogenesis. Some statistical epidemiological approaches, such as statistical fine-mapping, chromosome X-wide association testing, and genome-wide screens for drug reuse, that have already been successfully performed in PBC, could also be applied to AIH and PSC. Large-scale cross-disease GWMAs to explore the shared genetic landscape of AIH, PBC, and PSC are still lacking. Merging genetic and statistical results with single-cell transcriptomic data from relevant cell types and liver tissue is likely to provide more accurate insights into the effects of genetic factors on liver cells and their immunological microenvironment.

Software

PRS derivation: The LDPred2 software [90] (https://github.com/privefl/bigsnpr) was used to generate a PRS. The MESC software [81] (https://github.com/douglasyao/mesc) was used for estimating heritability mediated by assayed gene expression levels.