Background

Hepatitis B virus (HBV) infection is one of the most common infectious diseases, with about 248 million HBsAg positive individuals worldwide and the largest HBsAg positive population in China [1]. HBV infection can develop a wide spectrum of liver diseases, including chronic hepatitis B, liver cirrhosis, hepatocellular carcinoma [2,3,4]. Previous studies showed the host genetic factor played a critical role in HBV infection susceptibility and identified associated SNPs with significant contribution, including major histocompatibility complex (MHC) genes, i.e. HLA-DPA1 (rs3077), HLA-DPB1 (rs9277535), HLA-C (rs3130542), HLA-DQ (rs2856718, rs7453920) [5,6,7], and non-MHC genes, i.e. UBE2L3 (rs4821116), INTS10 (rs7000921) [8, 9]. In advanced stages of HBV disease, host genetic factors influence the outcome of HBV infection [7, 10, 11], including HLA-DQ (rs9275319), HLA-DRB1 (rs2647073, rs3997872), STAT4 (rs7574865), C2 (rs9267673), PNPLA3 (rs738408, rs738409), SLC17A2 (rs80215559), HFE (rs1800562) [12, 13] for liver cirrhosis and KIF1B (rs17401966), HLA-DQA1/DRB1 (rs9272105), HLA-DQ (rs9275319), STAT4 (rs7574865) for hepatocellular carcinoma [14,15,16]. However, these reported HBV-related genes confer relatively small increments in risk and explain a small proportion of heritability. For example, although MHC genes are important for immune response to HBsAg, more than half the heritability is determined by non-MHC genes [17]. Moreover, previous studies showed that the MHC genes share a common influence on HBV infection, liver cirrhosis, hepatocellular carcinoma [6, 12, 15, 16] as well as associate with different risk in these outcomes [18]; i.e. HLA-DQ, STAT4, C2, HLA-DRB1 for liver cirrhosis and HCC [12], HLA-DQ for CHB [6]. These consistent [12] or different [18] risks indicated shared but also modified effects for progressive HBV-related outcomes. These results raised our interest to identify host genetic factor which increases the risk of progressive stages post HBV infection. To reveal new susceptibility genes for HBV infection and the HBV-related outcomes, we performed a genome-wide association study (GWAS) in 1031 participants, including 275 HBV clearance subjects, 92 asymptomatic persistence infection carriers (ASPI), 93 chronic hepatitis B patients (CHB), 188 HBV-related decompensated cirrhosis patients (DC), 214 HBV-related hepatocellular carcinoma patients (HCC) and 169 healthy controls (HC) (Table 1).

Table 1 Characteristics of participants in the genome-wide association cohorts

Methods

Study participants

A total of 1104 unrelated, age- and gender- matched, Chinese participants were recruited in the study, enrollment criteria were consistent with a previous report [19]. The population of HBV-related phenotypes was composed of five subgroups: HBV clearance subjects, asymptomatic persistence infection (ASPI) carriers, chronic hepatitis B (CHB) patients, HBV-related decompensated cirrhosis (DC) patients, HBV-related hepatocellular carcinoma (HCC) patients. Healthy controls (HC) who were HBV serum marker-negative (HBsAg, anti-HBc) and had no serological evidence of co-infection with HCV, HDV, and HIV were also included. HBV chronic infection patients were diagnosed based on seropositivity of HBsAg at least 6 months. Then ASPI was defined as HBsAg and anti-HBc positive at least 6 months and serum alanine aminotransferase (ALT), aspartate aminotransferase (AST) in normal values without abnormal before. CHB is defined as HBsAg and anti-HBc positive at least 6 months and ALT, AST abnormal before or at enrollment. DC was defined as HBsAg and anti-HBc positive at least 6 months with decompensated portal hypertension (gastroesophageal bleeding, ascites, edema or encephalopathy) or decompensated liver function (albumin < 35 g/L and total bilirubin > 35umol/L). HCC was defined at least one of following: (a) liver biopsy; or (b) abnormal alpha fetoprotein (AFP) and sonographic, CT or MRI space occupying evidence.

Clinical parameters

Clinical parameters including serum alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bilirubin (TBIL), direct bilirubin (DBIL), alkaline phosphatase (ALP), glutamyl transpeptidase (GGT), albumin (ALB), globulin (Glo), alpha fetoprotein (AFP), prothrombin time activity (PTA), platelets (PLT), HBsAg, anti-HBs, HBeAg, anti-HBe, anti-HBc were collected from hospital information system. Other baseline characteristics were recorded during each patient’s clinical examination. In brief, liver biochemistry and virological tests were carried out by Bechman Coulter AU chemistry analyzers, chemiluminescence immunoassays (AxSYM or ARCHITECT I2000, Abbott, USA) or Ortho/Chemi-luminescent assay (Johnson and Johnson Co., USA) with commercially available kits; Anti-HAV IgM antibody, HDV antigen (HDAg) and anti-HDV antibody, and anti-HEV antibody were determined by commercially ELISA kits in China. For HBV DNA level, it was quantified using commercial real-time polymerase chain reaction kit with a lower limit of detection (LLOD) of 100 IU/ml (Daan company, China) or Roche Cobas Ampliprep/Cobas Taqman™ PCR assay with LLOD of 20 IU/ml (Roche, USA).

Genome-wide SNP genotyping and quality control

Genotyping was performed on Affymetrix 500k Genome-Wide Human SNP Array 6.0 (http://www.affymetrix.com/Auth/analysis/downloads/na35/genotyping/GenomeWideSNP_6.na35.annot.csv.zip). SNPs met the following quality control procedures were excluded: (1) call rate < 95%; (2) minor allele frequency (MAF) < 1%; (3) genotype in controls deviated from the Hardy Weinberg equilibrium (HWE test P-value < 10–5).

Statistics analysis

GCTA tool [20] was used to perform principal component analyses for estimating population substructure. The first two eigenvectors, pc1 and pc2, were used to display the population structure. PLINK 1.9 [21] software was used to perform logistic regression for identifying susceptibility SNPs of HBV infection and HBV-related outcomes. Gender and age were used as covariates in logistic regression. Chi-square test for trend in proportions was used to identify SNPs with increased effectiveness on disease progression. We used the Bonferroni method to adjust the false positive rate caused by multiple test. The number of independent LD block was used to represent the number of independent multiple test. We calculated a total of 21,077 independent LD blocks via GEC [22] and then set 0.05/21077 as the threshold of genome-wide significance. The genomic control method was used to measure population stratification by calculating the genomic inflation factor (λ) from median P-value. ANOVA was used to evaluate the significance of the association between biomarkers and genotypes in healthy controls. Using the SNPs in HBV infection-related loci in 1000 Genomes Project [23], we performed evolutional analyses, including building phylogenetic tree, detecting the signatures of selection, displaying the core haplotypes, estimating effective population size. Derived allele and ancestral allele of SNPs were accessed from Ensemble human ancestral genome (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/supporting/ancestral_alignments). PoMo [24], an allele frequency-based approach, was used to build the racial tree based on the allele frequency of SNPs in each population. FST [25], a classical metrics of population differentiation, was widely employed in detecting signatures of selection [26] in human genome [27, 28] and animal genome [29,30,31]. In our study, FST was implemented to detect the selective signature between East Asian population and each other population. Vcftools [32] was used to calculate the FST statistics of SNPs in paired populations. FST statistics accesses 0.15 [33] was used as a threshold to detect the signature of selection. Rehh package [34, 35] was used to display the haplotype bifurcation diagrams of the associated SNPs in different populations. Relate [36], a method for genome-wide genealogy estimation for thousands of samples, was used to estimate the historical population size at default setting.

Results

There are 1031 participants passed quality control from 1104 participants. The demographic and clinical characteristics of 1031 study participants included in our association study are presented in Table 1. All participants were genotyped by Affymetrix 500k SNP Array. A total of 607,153 SNPs passed through quality control (Additional file 1: Figure S1). These SNPs filtered minor allele frequency of < 1% and a call rate of < 95%.

To demonstrate that there is no genetic stratification in the population, we performed a principal component analysis on the SNPs of all participants. The first two principal components show absence of population structure (Additional file 1: Figure S2). To identify susceptibility SNPs for HBV infection, we performed a GWAS in HBV infection similar with previous design [8, 9]. HBV clearance was used as a control group versus ASPI, CHB, DC, HCC as HBV chronic infection (case group). We observed associations of two novel MHC loci with progression to certain HBV stages (SNP: rs2395166, Gene: HLA-DRA, P = 1.42 × 10–7; SNP: rs615672, Gene: HLA-DRB1, P = 8.54 × 10–7) and two reported MHC loci (SNP: rs3077, Gene: HLA-DPA1, P = 6.60 × 10–9; SNP: rs9277542, Gene: HLA-DPB1, P = 1.53 × 10–8) (Table 2; Fig. 1). These MHC loci variants replicated association results of previous studies affirming that MHC gene alleles confer risks of susceptibility of HBV infection in East Asian. Interestingly, we found that these reported MHC loci (rs2395166:C, rs615672:G, rs3077:A, rs9277542:T, rs9277341:T) present significant differences in allele frequency between East Asian and non-East Asian population in gnomAD database (Table 3), as well as the differences between HBV infection group and HBV clearance group. Since different groups may not present an identical minor allele, here, we used the derived allele against the ancestral allele for studying the allele frequency across different populations. The derived allele frequencies in East Asian are much closer to the HBV chronic infection group, while other populations, such as European, are much closer to the HBV clearance group. These genetic differences may suggest a selective signal in non-East Asian population versus East Asian population. To confirm this, we firstly build a phylogenetic tree based on these loci and then showed the genetic diversity in world-wide populations, in which the East Asian population is at the root. We set the East Asian as the ancestral group in these loci according to the derived allele frequencies and the phylogenetic tree. Subsequently, we identified two strong phylogenetic signals (HLA-DPA1, HLA-DPB1) in the European population (Fig. 2) via FST method. Haplotype bifurcation diagrams of the two core SNPs (rs3077, rs9277542) presented that the resisted allele led to a long-range, and a high frequency homozygosity in European population (Fig. 3), confirming the natural genetic selection. These evidences revealed that the resisted alleles were under positive selection in European population strongly. We estimated the historic population size and then showed these two loci (HLA-DPA1, HLA-DPB1) were under selection during the past 26,000 years (Additional file 1: Figure S3). These results may provide a context for the racking influence of HBV infectious diseases in history.

Table 2 The significance of HBV-related outcomes study
Fig. 1
figure 1

Regional plots shown –log10 P-values of SNPs in association study. Marker SNPs are shown as purple diamonds, other SNPs are shown as dots. R-square of Marker SNPs and other SNPs are shown against dark blue, blue, green, yellow and red colors, indicating the linkage disequilibrium. The structure of genes within the region are shown as rectangles and arrows. Abbreviation: ASPI, asymptomatic persistence infection; CHB, chronic hepatitis B; DC, decompensated cirrhosis; HCC, hepatocellular carcinoma

Table 3 Divided allele frequency of significant SNPs in MHC region
Fig. 2
figure 2

accessed from 1000 Genome Project. EAS, AFR, SAS, EUR, AMR refer to East Asian, African, South Asian, European and American of 1000 Genome Project, respectively. The racial tree indicated a genetic difference in HBV-infection related genes among EAS, AFR, SAS, EUR, AMR. The genetic difference (Right) of each SNPs was evaluated by FST value. X-axis refer to physical position in chromosome 6. Y-axis refer to FST value of paired SNP. FST values of all paired SNPs of AFR, SAS, AMR, EUR versus EAS were displayed in grey bar. FST values accessed 0.15 (Red horizontal line) indicated the signal of selective event. Red bars and rs IDs showed the reported HBV infection-related SNPs. The FST values of European versus East Asian showed the genetic difference in HLA-DPA1 and HLA-DPB1, indicating a genetic selection against the HBV infection. The racial tree showed that the East Asian population is at the root, indicating that why we used East Asian population as a comparative population but not the other population, and compared other four populations with East Asian population

The racial tree (Left) was based on the SNPs in HBV-infection related genes, including HLA-DRA, HLA-DRB1, HLA-DPA1 and HLA-DPB1. The genotype and minor allele frequency of each SNPs were

Fig. 3
figure 3

Haplotype bifurcation diagrams of infection-related SNPs, including rs3077 on HLA-DPA1 (upper) and rs9277542 on HLA-DPB1 (lower), in European (left) and East Asian (right). EUR and EAS in plot title refer to European and East Asian. DA and AA in plot title refer to derived allele (red) and ancestral allele (blue). Black dash line refers to the position of core SNP. Each node refers to a haplotype. The edge width reflects the population-specific frequency. Haplotype bifurcation diagrams were displayed via Rehh package. Haplotype bifurcation diagrams showed that the derived allele led to a long-range high frequency haplotype in European population and the ancestral allele led to a high frequency haplotype in East Asian population; the ancestral allele led to more haplotypes than the derived allele. The long-range high frequency haplotype confirms the genetic selection in HLA-DPA1 and HLA-DPB1 in European population

To identify new susceptibility locus for HBV-related outcomes, we performed association studies for CHB, DC, and HCC. Significantly, we observed three associated gene SNP loci: (1) (SNP: rs1264473, Gene: GRHL2, P = 1.57 × 10–6) associated with CHB versus ASPI; (2) (SNP: rs2833856, Gene: EVA1C, P = 1.62 × 10–6) associated with HCC versus CHB; and (3) (SNP: rs4661093, Gene: ETV3, P = 2.26 × 10–6) associated with HCC versus DC (Table 2; Fig. 1). No SNP associated with DC versus CHB were apparent.

HBV clearance, ASPI, CHB, DC, and HCC are progressive stages post HBV infection [4]. We hypothesized that the host genetic factor contributes to the development of outcomes, as well as to the individual outcome. To investigate this hypothesis, we test two progressive stages upon HBV infection: 1.) HBV infection itself (CHB, ASPI, and HBV clearance) and 2.) development of CHB (CHB, DC, and HCC). We performed a chi-square test for trend in proportions of allele to identify SNPs increasing risk of HBV-related outcomes in the progressive stages. We observed association with one novel locus (SNP: rs1537862, Gene: LACE1, P = 1.85 × 10–6), one association with a reported locus (SNP: rs9277542, Gene: HLA-DPB1, P = 1.50 × 10–9), and two association variants at MHC genes (SNP: rs615672, Gene: HLA-DRB1, P = 1.39 × 10–6; SNP: rs3128923, Gene: HLA-DPA2, P = 2.06 × 10–6) with trend test of allele frequency across three outcomes (Table 4; Fig. 4A). The three reported MHC genes were demonstrated to play a critical role in the resistance of HBV infection, and two (HLA-DPB:rs9277542, HLA-DRB1:rs9277542) were identified to be associated with HBV clearance (Table 2). We did not observe any SNPs achieve genome-wide significant association with development of CHB; One additional locus (SNP: rs6942409, Gene: AC011288.2, P = 3.08 × 10–6) and the HCC associated locus (SNP: rs2833856, Gene: EVA1C, P = 1.62 × 10–5) were associated with increased risk of DC and HCC during the development of CHB (Table 5; Fig. 4b).

Table 4 The significance of progressive HBV infection study
Fig. 4
figure 4

The raising allele frequency in HBV related outcomes during the progression. Four SNPs with increased resistance in CHB, ASPI, HBV clearance during HBV infection (a) and two SNPs with increased risk in the CHB, DC, HCC during the development of CHB (b). Abbreviation: ASPI, asymptomatic persistence infection; CHB, chronic hepatitis B; DC, decompensated cirrhosis; HCC, hepatocellular carcinoma

Table 5 The suggestive significance of progressive CHB study

Host genetic factors were demonstrated to influence concentrations of liver enzymes in plasma, which are widely used to indicate liver disease [37, 38]. Here, to investigate the functional change in liver influenced by the HBV related loci described above, we performed a variance analysis in 10 clinical parameters of serum liver enzymes (ALT, AST, TBIL, DBIL, ALP, GGT, ALB, AFP, PTA, and PLT) between different genotypes in healthy controls (Additional file 1: Figure S4-9). Six loci (rs1537862, rs3128923, rs9277542, rs9277341, rs9277378, rs4661093) showed modest associations with concentrations of liver enzymes, including ALB, ALP, AFP, and PTA (Fig. 5). These associations suggest pathways linking the host genetic factors, metabolism, and liver function for understanding the mechanisms of infection and disease progression.

Fig. 5
figure 5

The association between HBV related loci and serum liver enzyme levels in health controls. P values were calculated by ANOVA test. White-circle refer to the mean liver enzymes level with different genotypes. The significant differences indicate that these SNPs contribute to liver enzyme activity

In sum, our study identified susceptibility SNPs associated with HBV related outcomes and SNPs increased the risk of progressive outcomes from HBV clearance to HBV chronic infection, DC, and HCC in a Chinese population (Additional file 1: Figure S10).

Discussion

HBV infection leads to a wide spectrum of clinical outcomes, including spontaneous clearance, asymptomatic carrier, chronic hepatitis B, liver cirrhosis, and hepatocellular carcinoma. Previous studies showed that MHC genes played an important role in outcomes of HBV infection [7]. Alleles associated with HBV infection versus HBV clearance affect infection risk, and a low-risk allele indicated an effect on virus clearance. By contrast loci associated with CHB versus ASPI indicated risk for the severe progression, while a low-risk allele affected tolerance of virus. The tolerance-related gene, GRHL2, was demonstrated to influence the inflammation in hepatocytes by regulating microRNA 122 (MIR122) and the target of MIR122, HIF1α [39]. Levels of GRHL2 were increased in liver tissues of patients with alcoholic liver disease and correlated with decreases in levels of MIR122. Increased levels of MIR122 in hepatocytes of mice with ethanol-induced liver disease and advanced fibrosis reduced levels of HIF1α and reduced serum levels of alanine aminotransferase (ALT). Taken together, we propose that the low-risk allele rs1264473:T at GRHL2 ablates severe persistent inflammation through increased the levels of MIR122.

Our previous studies [40, 41] showed that NTCP S267F mutation significantly affected the disease progression to cirrhosis (P = 0.017), and hepatocellular carcinoma (P = 0.023) versus CHB [40] and the rs3077:T allele was associated with decreased risk of chronic HBV infection (OR = 0.62, P = 0.001) [41]. In this study, we searched for host genetic factor with increased risk of the development-related outcomes in GWAS. One novel locus, LACE1, and three infection-related MHC loci were associated the progression of HBV infection. These results showed that the host genetic factors, both MHC and non-MHC genes, increased the risk of progressive outcomes post HBV infection, as well as HBV mutation. It is reported that HBV infection altered the mitochondrial metabolism and mitochondrial dynamics, which result in mitochondrial injury and liver disease [42]. LACE1 was reported to affect mitochondrial protein homeostasis [43]. Knockdown of LACE1 converted the expression of a crucial component of regulating mitochondrial dynamics, OPA1 [43,44,45]. In addition, we found that the risk allele, LACE1:rs1537862:T, decreased the level of ALB significantly (P = 0.025, Fig. 5). ALB is a critical marker decreasing with the deterioration of chronic liver diseases [46,47,48]. Biosynthesis of ALB was affected by proinflammatory cytokines [49, 50] and excess amounts of oxidative agents released by mitochondria from injured liver [46, 51]. Taken together, we proposed LACE1 may affect hepatic infection by changing the hepatic mitochondrial metabolism and leading to the progression of HBV infection.

There is a limitation in our study, that is we do not have an additional cohort for replicate study. In spite of that, we showed the reported loci in MHC region are significantly related to HBV infection. These replicate results of previous studies confirm our findings are reliable and provide confidence for our study in this cohort. Here, we provide novel candidate genes related to individual outcomes, progressive stages, and liver enzymes. Moreover, we identified two SNPs that show selective significance (HLA-DPA1, HLA-DPB1) in non-East Asian (European, American, South Asian) versus East Asian. East Asian population seem more susceptible to HBV infection than non-East Asian, and the differences of susceptibility were affected by HBV genotype [52], immunity [53], and environmental exposure [53, 54]. Even in an identical environment (United States), Asian are more prevalent in chronic HBV infection than non-Asian [53]. It seems likely that host genetic factors contribute to the ethnic disparities of susceptibility of HBV infection. Taken together with the genetic associations and evolutionary signals, our findings provide a new insight for HBV study.

Conclusion

In case–control study, we identified one novel locus (SNP: rs1264473, Gene: GRHL2, P = 1.57 × 10–6) significantly associated with CHB, two novel loci (SNP: rs2833856, Gene: EVA1C, P = 1.62 × 10–6; SNP: rs4661093, Gene: ETV3, P = 2.26 × 10–6) significantly associated with HCC. In trend study across multiple outcomes, we identified one novel locus (SNP: rs1537862, Gene: LACE1, P = 1.85 × 10–6) and three MHC loci (HLA-DRB1, HLA-DPB1, HLA-DPA2) significantly increased progressive risk from CHB through ASPI to HBV clearance. In evolutionary study, we showed the derived allele of two HBV clearance related loci, rs3077 and rs9277542, are under strong selection in European population. We suggested these selected alleles may play a role in resisting the susceptibility of HBV in Europeans. Our findings provided a new insight into the role of host genetic factors in HBV related outcomes and progression.