Background

Hepatitis B virus (HBV) is a serious health problem because patients with chronic HBV infection are at risk for development of liver cirrhosis and hepatocellular carcinoma (HCC). It is estimated that 240 million people are chronic HBV carriers worldwide and 15 to 40% of them will develop liver cirrhosis, liver failure, or HCC during their lifetime [1,2,3,4,5].

HBV is classified into ten genotypes, labeled A through J, and over 40 related sub-genotypes. The ten genotypes are based on an intergroup divergence of at least 8% in the complete nucleotide sequence, while the sub-genotypes are based on a 4 to 7.5% divergence [6, 7]. The ten genotypes are also commonly found in certain geographic locations as followed. Genotype A is the predominant genotype in Northern Europe and the United States. Genotypes B and C are common in East and Southeast Asia, while Genotype D is prevalent in the Mediterranean, Middle East, and South Asia. Genotype E has been reported in West Africa, genotype F in Central and South America, genotype G in the United States, France, and Germany, genotype H in Central America, genotype I in Vietnam, and genotype J in the Ryukyu Islands of Japan [8, 9]. It is important to note that HBV genotype A and B are associated with earlier hepatitis B e antigen (HBeAg) seroconversion, less active liver disease, and a slower rate of progression to liver cirrhosis and HCC as compared to HBV genotype C and D [9,10,11,12].

Naturally occurring mutations in the precore and basal core promoter (BCP) regions are common. The most common precore mutations are G1896A and G1899A, of which G1896A creates a stop codon and prevents the synthesis of HBeAg [13]. The most common BCP mutations are A1762T and G1764A, which are associated with reduced synthesis of HBeAg by suppressing the transcription of precore mRNA [14, 15]. The precore and BCP mutants are usually found in HBeAg-negative patients but could also present as a mixture with wild-type virus in HBeAg-positive patients [16, 17]. The precore mutations are more common in patients with HBV genotype B and D than in patients with HBV genotype A and C, whereas the BCP mutations are more common in patients with HBV genotype A and C than in patients with HBV genotype B and D [9, 18,19,20,21]. The precore and BCP mutants are associated with liver cirrhosis, HCC, and advanced liver disease [19, 22,23,24,25].

The pre-S protein plays an important role in the interaction with the immune system, as it contains B-cell and T-cell epitopes [26,27,28]. The pre-S1 domain contains the hepatocyte binding site and is essential for virion assembly and transportation [29,30,31]. The pre-S2 domain can bind to polymerized human serum albumin, but the significance of this binding is unknown [32]. The pre-S deletion mutations are prevalent in patients with chronic HBV infection, ranging from 6% at age 20–29 years to 35% at age 50–59 years in HBeAg-positive patients, and 60% in HCC patients [33]. These deletion mutations are found more frequent in genotype B (25%) and genotype C (24.5%) than in the other genotypes [34]. Some studies showed that pre-S deletion mutations are an independent risk factor for HCC [35,36,37], while other studies showed that combination of mutations (pre-S deletion, precore, and BCP mutations) rather than a single mutation, are associated with liver cirrhosis and liver diseases progression [38, 39]. Pre-S deletion mutations could induce endoplasmic reticulum (ER) stress, genomic instability, and hepatocyte proliferation [40,41,42,43]. In the transgenic mouse model, the pre-S2 deletion mutations can induce dysplasia of hepatocytes and HCC development [33, 44].

Hepatitis B virus X protein (HBx), a nonstructural protein, is required for HBV covalently closed circular DNA (cccDNA) transcription and viral replication [45, 46]. In addition, HBx contributes to hepatocarcinogenesis through interactions with multiple cellular proteins that modulate cell proliferation, cell death, gene expression, and DNA repair [47,48,49,50]. Truncated HBx proteins have also been reported to promote hepatocarcinogenesis [51, 52].

Previous studies showed that the risk of HCC was associated with the existence of specific HBV variants, which were the major stains identified by traditional direct Sanger sequencing. Although direct Sanger sequencing is the most common method for analyzing viral mutations, it is unable to determine the profile of a heterogeneous viral population in a patient. Next-generation sequencing (NGS) however, can do this, as well as perform high-throughput analysis from thousands of amplified regions, characterize genetic diversity, and detect minor strains that direct sequencing or cloning neither can find [53,54,55]. This study provides an overview about the possible applications of next-generation sequencing analysis for the detection of hepatocellular carcinoma-associated hepatitis B virus mutations.

Optimization of NGS analysis for HBV: Four recommended steps

Use the sample-specific reference sequence as the mapping reference

Assembling the NGS reads into whole-genome sequences could be performed by de novo assembly or mapping using reference sequences. De novo assembly is usually employed in studying unknown species and would be hindered by regions with high diversity. For studying HBV, mapping reference is often utilized [55,56,57]. Two main NGS platforms, Illumina Genome Analyzer and Roche Genome Sequencer, were widely used in viral quasispecies studies. Illumina generates larger data sets with shorter read length, as compared with Roche. Therefore, the NGS data generated by Illumina are usually assembled using reference sequences as templates while de novo assembly is applicable but not commonly used [58, 59].

One of the major challenges for NGS is to monitor quality control metrics over all stages of the data processing pipeline. Alignment with a reconcilable mapping reference is a required step for any re-sequencing analysis and is crucial for successful variant detection [60].

HBV quasispecies involves an error-prone reverse transcription step in its replication, so that its rate of nucleotide change during replication is high and closed to the rate observed for the RNA viruses. The evolution rate of HBV ranges from 1.8 × 10− 2 to 1.5 × 10− 5 nucleotide substitutions/site/year [61,62,63], while that of the human genome is 1.1–3 × 10− 8 nucleotide substitutions/site/generation [64]. Furthermore, HBV has differences in genomic lengths among 10 HBV genotypes (from 3182 to 3248 base pairs), which could result in genotype alignments containing several regions of gaps [65].

Previous HBV-related NGS analyses used the consensus genotype sequences from public viral databases [55, 56] or the major viral sequence identified by polymerase chain reaction (PCR)-director sequencing [57] as mapping references to detect HBV variants. A sample-specific reference sequence is the consensus sequence obtained from the NGS reads of each sample through alignment with its same genotype mapping reference. In our demonstrations, we found that using this type of reference sequence as the mapping reference has the best mapping quality and the highest single nucleotide variant (SNV) calling accuracy, as compared with using the compatible genotype sequence [66]. The percentage of false SNV calls increased significantly from 0.09% using a sample-specific reference sequence to 28.95% using an incompatible genotype reference (Fig. 1). These false SNVs would be especially prone to call in regions with high divergence. In addition, the sample-specific reference sequence is effective in the analysis of HBV quasispecies, which is more complex to analyze due to its hetereogeneity and structure.

Fig. 1
figure 1

The percentage of false SNV calls for using a different reference sequence. Full-length HBV genome sequence, Clone_N6 (KJ790199; genotype C, Taiwan) was cloned from a CHB patient and sequenced using a direct Sanger sequencer. This nucleotide sequence would be used as a standard sequence. Clone_N6 was also fragmented to be sequenced by NGS analysis. The mapping results of NGS reads from the Clone_N6 using the following mapping references: sample-specific reference, genotype specific reference (JN315779; genotype C, Asia) and incompatible genotype reference (FJ787477; genotype B, Asia). When compared with the standard sequence of Clone_N6, derived from direct sequencing, the percentage of false SNVs calls increased significantly from 0.09% using sample-specific reference as mapping reference to 28.95% using incompatible genotype reference as the mapping reference. aSample-specific reference is the consensus sequence obtained from the NGS reads of each sample through alignment with its same genotype mapping reference. bReference is using the same genotype as the sample (genotype C). cReference is using the incompatible genotype as the sample (genotype B)

Elongate the end of reference sequence and reset the origin of mapping reference sequence

HBV genome is a circular structure with position 1 conventionally taken to be the first “T” nucleotide in the EcoR1 restriction site (“GAATTC”) [67]. Some variants and deletion mutations, such as pre-S deletion mutations, cross this site. Most genome mappers for NGS analysis, like BWA [12], were designed for linear genome, but they were not well suited for circular genomes like HBV genomes and will have worse mapping performance when reads spanned the end of genome. To resolve the problem, we manually concatenated the end of reference sequence for 600 bases and reset the origin of mapping reference sequence from nt1600. This approach was beneficial to improve mapping performance at the end of genomic sequence and detect deletion mutations spanning position 1 of HBV genome [68].

Use a platform-specific cut-off value to distinguish authentic minority variants from technical artifacts

High-throughput sequencing techniques can generate low-interest variants in the form of false-positives, especially from misalignment of sequencing reads and inaccuracies of the reference sequence compared to a specific local population [69]. In order to distinguish authentic minority variants from technical artifacts, we estimated the technical error rate and identified a threshold above which mutations detected by NGS using Illumina HiSeq™ 2500 were unlikely to be technical artifacts. The technical error rate was estimated by PCR amplification and NGS of a plasmid expressed with HBV full-length genome. The mean error rate among three runs was estimated by comparing each NGS sequence read to the plasmid control sequences. The empirical distribution of mismatch and deletion errors in the clone yielded an average of 0.32 and 1.8%, respectively. Accordingly, we used this empirically observed distribution of mismatch errors to distinguish sequence errors from authentic minor variants by excluding possible technical errors, which were mutations present in < 3.2% of sequence reads, a value 1 log above the mean overall error rate in the Illumina HiSeq™ 2500 platform. For deletion mutations, an exclusionary cutoff of < 1.8% was used [66, 68]. Some other studies had proposed the similar approach to distinguish authentic minority variants from technical artifacts with different cut-off value in its current platform [55,56,57]. This is an important step not to be ignored after variant calling.

Apply these two analytic methods to better identify the deletion mutations in the HBV genome

Higher heterogeneity increases the uncertainty of reads-mapped genomic coordinates and leads to greater challenges in discovering deletion mutations. Several methods for deletion mutation discovery have been proposed, such as BreakDancer [70], Pindel [71], Breakpointer [72], but all these tools were mainly designed for human NGS data and not entirely applicable for viruses with a high mutation rate. DeF-GPU is a graphics processing unit-based data mining method that incorporates the pattern growth approach to identify HBV genomic deletions. Validation of DeF-GPU on synthetic and real datasets showed that DeF-GPU outperforms the representative and commonly-used method Pindel, a pattern growth approach originally designed to detect either large deletions or medium-sized insertions, and is able to exactly identify the deletions in few seconds [73]. VirDelect uses the split read alignment method to obtain the exact breakpoints of deletions. The experiments on simulation data and real data indicated that VirDelect can identify more exact breakpoints of deletions than Pindel and is suitable for researchers with higher requirements in accuracy than speed [74].

HCC-associated HBV SNVs determined by next-generation sequencing analysis

Through HBV genome-wide NGS analysis, our previous study identified 60 NGS-defined HCC-associated SNVs and their pathogenic frequencies, including 41 novel SNVs. Each SNV was specific for either genotype B (n = 24) or genotype C (n = 34), except for nt53C, which was identified in both genotypes. SNV I was defined as the dominant strain of HBV in the majority of non-HCC patients. SNV II was defined as the variant other than SNV I at the same nucleotide position, i.e. the minor strain of HBV in the majority of non-HCC patients [68].

HCC-associated HBV SNVs for genotype B

For genotype B, 25 HCC-associated SNVs located at 23 nucleotide sites were identified, including the precore mutations (G1896A and G1899A). For nucleotide sites 273 and 2227, 273A and 2227 T were SNV I and protective factors for HCC, whereas 273G and 2227G were SNV II and risk factors for HCC. All the other 21 SNVs were risk factors for HCC, 6 of them were SNV I and 15 of them were SNV II (Table 1). Seventeen of 25 SNVs were missense mutations at the polymerase, preS2, surface, precore, and core regions. Seven of the 17 missense mutations and 4 of the 8 silent mutations were at the regulatory elements, including CpG islands I/II/III, X promoter, enhancer (Enh) I, ε loop, and BCP (Fig. 2).

Table 1 HCC-associated SNVs with their pathogenic frequencies through NGS analysis, categorized by level of supporting evidence
Fig. 2
figure 2

Distinct NGS-derived SNVs located in HBV regulatory element and ORFs associated with HCC among patients with genotype B and genotype C. a, Twenty-five distinct NGS-defined HCC-associated SNVs were located in HBV regulatory elements and ORFs for genotype B HBV. b, Thirty five distinct NGS-defined HCC-associated SNVs were located in HBV regulatory elements and ORFs for genotype C HBV. * and ** indicate risk of SNVs for HCC with an odds ratio of HCC > 1 and with P value of < 0.05 and < 0.01, respectively. Ɨ means protective SNVs for HCC with an odds ratio of HCC < 1 and with a P value < 0.05. ● missense mutation; ○ silent mutation; • SNVs located in regulatory element. Level A means HCC-associated HBV variants supported by meta-analysis with at least 4 studies. Level B means HCC-associated HBV variants supported by at least one study if total number of relevant studies is less than 4. Level C means HBV variants unassociated with HCC supported by all studies if total number of relevant studies is less than 4. Red box indicated Level A; Blue box indicated Level B; Yellow box indicated Level C. The figure has been adopted from [70]

HCC-associated HBV SNVs for genotype C

For genotype C, all the 35 HCC-associated SNVs located at distinct nucleotide site were found, including BCP mutations (G1764A and C1653T). All the 35 SNVs were risk factors for HCC, 17 of them were SNV I and 18 of them were SNV II (Table 1). Twenty-eight of 35 SNVs were missense mutations located at 4 open reading frames (ORFs), particularly at the preS1 region and the spacer domain of polymerase. Twenty one of the 28 missense mutations and 3 of the 6 silent mutations were at the regulatory elements, including CpG islands I/II/III, negative regulatory element (NRE)/core upstream regulatory sequence (CURS)/BCP, Enh I/II, core promoter, and S2 promoter (Fig. 2).

The U-shaped distribution pattern of SNV frequency in SNV II and the novel HCC-associated SNVs with low SNV frequency detected by NGS analysis

Almost all SNV I had SNV frequencies higher than 80%. The great majority of SNV II had either low (< 20%) or high (> 80%) SNV frequencies, i.e. a characteristic U-shaped distribution pattern of SNV frequencies with low (< 20%) or high (> 80%) values (Fig. 3). The cut-off values of SNV frequency for HCC-associated SNVs represent their pathogenic frequencies. Almost all HCC-associated SNV I had pathogenic frequencies higher than 80% and the great majority of HCC-associated SNV II had either low (< 20%) or high (> 80%) pathogenic frequencies, a U-shaped distribution pattern (Fig. 4). Among the 60 NGS-defined HCC-associated SNVs, 19 had been reported previously and 41 were novel ones. In 19 HCC-associated SNVs reported previously, 94.7% (18/19) had cut-off values of SNV frequency greater than 20%, except nt456G, which had a cut-off value of 10.2%. In the other 41 novel HCC-associated SNVs, 68.3% (28/41) had cut-off values of SNV frequency greater than 20%, while 31.7% (13/41) had cut-off values of less than 20% (Fig. 5). This showed that NGS could be used to detect HCC-associated SNVs with low SNV frequency.

Fig. 3
figure 3

The distribution of SNV frequencies in SNV I and SNV II. Almost all SNV I had SNV frequencies higher than 80%. The great majority of SNV II had either low (< 20%) or high (> 80%) SNV frequencies, i.e. a characteristic U-shaped distribution pattern of SNV frequencies with low (< 20%) or high (> 80%) values. a, All SNVs in genotype B HCC group. b, All SNVs in genotype B non-HCC group. c, All SNVs in genotype C HCC group. d, All SNVs in genotype C non-HCC group. SNV I was defined as the dominant strain of HBV in non-HCC group. SNV II was defined as the variant other than SNV I at the same nucleotide position, i.e. the minor strain of HBV in non-HCC group

Fig. 4
figure 4

The distribution of pathogenic frequencies in HCC-associated SNV I and SNV II. Almost all HCC-associated SNV I had pathogenic frequencies higher than 80% and the great majority of HCC-associated SNV II had either low (< 20%) or high (> 80%) pathogenic frequencies, i.e. a U-shaped distribution pattern. SNV I was defined as the dominant strain of HBV in non-HCC group. SNV II was defined as the variant other than SNV I at the same nucleotide position, i.e. the minor strain of HBV in non-HCC group

Fig. 5
figure 5

The distribution of pathogenic frequencies in previously reported and novel HCC-associated SNVs. Among the 60 NGS-defined HCC-associated SNVs, 19 had been reported previously and 41 were novel ones. In 19 HCC-associated SNVs reported previously, 94.7% (18/19) had cut-off values of SNV frequency > 20%, expect nt456G, which had a cut-off value of 10.2%. In the other 41 novel HCC-associated SNVs, 68.3% (28/41) had cut-off values of SNV frequency > 20 and 31.7% (13/41) had cut-off values < 20%

Validation of the NGS-defined HCC-associated SNVs

For validating the 60 NGS-defined HCC-associated SNVs, a systematic literature review and meta-analysis was conducted. One hundred and sixty-seven HBV variants had been studied previously and were categorized into 4 levels of supporting evidence associated with HCC. Level A included 12 HCC-associated HBV variants supported by meta-analysis with at least 4 studies. Level B included 60 HCC-associated HBV variants supported by at least one study if total number of relevant studies were less than 4. Level C included 85 HBV variants unassociated with HCC supported by all studies if total number of relevant studies were less than 4. Level D included 10 HBV variants unassociated with HCC supported by meta-analysis with at least 4 studies. The proportions of NGS-defined HCC-associated SNVs among HBV variants with different levels of supporting evidence declined significantly with decreasing levels of evidence from Level A to Level D. All the HCC-associated HBV variants with Level A evidence, except for C1766T and T1768A which were mainly expressed in genotypes A and D, and the subgroup analysis of A1762T, were identified by NGS analysis. Besides, 5 novel NGS-defined HCC-associated SNVs in the small surface region identified by our previous study did influence hepatocarcinogenesis pathways, including endoplasmic reticulum-stress and DNA repair systems, as shown by microarray, real-time polymerase chain reaction and western blot analysis [68].

The advantage of NGS for the detection of HCC-associated HBV mutations

Our previous NGS analysis showed that the association of HCC was related to specific SNVs and deletion mutations with a certain frequency instead of presence or absence of specific variant. Risk HCC-associated SNVs had significantly higher SNV frequency in HCC group than in non-HCC group, whether they were dominant strains or minor strains in HCC group. Protective HCC-associated SNVs had significantly lower SNV frequency in HCC group than in non-HCC group. For deletion mutations, the deletion of preS region was significantly associated with HCC, in terms of deletion index, which is composed of the deletion length and the deletion frequency by NGS analysis, but there was no significant difference in the proportions of patients with deletion mutations between HCC patients and non-HCC patients. In addition, the lower limit of detection using direct Sanger sequencing technology is ~ 20% minor allele frequency. In our previous study, 31.7% (13/41) novel HCC-associated SNVs and 83.6% (138/165) deletion mutations had cut-off values of SNV frequency lower than 20%, which could only be detected by NGS analysis [68]. Therefore, NGS is a powerful tool to characterize minor strains among viral quasispecies which could not be detected even by direct sequencing or cloning.

HBV SNVs and deletion mutations related to HCC development

The mechanisms of hepatocarcinogenesis induced by HBV quasispecies are still not completely known. Many studies had indicated that unique HBV oncoproteins (HBx isoforms and preS mutants) and mutated precore/core proteins could induce hepatocarcinogenesis through induction of endoplasmic reticulum (ER) and oxidative stress [40], activation of ER-independent pathway [75], regulation of microRNA expression [76], lipid metabolism disturbance [77], or epigenetic modification through modified genomic methylation status [78]. Mutations of HBV regulatory elements probably induced hepatocarcinogenesis through oncoprotein expression modulation [79, 80], HBV DNA integration leading to chromosomal instability [81], or HBV DNA methylation [82].

Surface gene and protein

The HBV surface (S) proteins are produced from ORF S gene with three different translation sites, pre-S1, pre-S2, and S, to large, middle, and small surface proteins. The variability of the pre-S1 and pre-S2 regions were higher in the HCC group than in the non-HCC group [68, 83], and the mutations at the promoter sites of pre-S1 and pre-S2 were significantly associated with an increased risk of HCC [37, 84]. The pre-S mutated large surface protein are retained in the ER to induce ER stress signals and upregulate COX-2 and cyclin A to induce cell cycle progression [33]. According to our previous NGS analysis for pre-S deletion in genotype C HBV, the HCC group had more patients with deletion mutations involving nt2977–3013 (amino acid 43–56), deletion patterns II or III [38], deletion mutations at S2 promoter, and heat shock protein binding site in the preS region than the non-HCC group [68]. Pre-S deletion mutants can cause accumulation of HBsAg in the ER and lead to ER stress and oxidative stress, which is known to cause DNA damage and alterations of several signaling pathways that are related to cell proliferation, invasion, cell survival, and apoptosis [85].

In addition, a few point mutations of HBV surface proteins were reported to be associated with HCC, such as Q10L in pre-S1 region [86], F22 L in pre-S2 region [86], and I126S, G130 N, M133 L/T, and G145R in S region [86,87,88,89]. However, the hepatocarcinogenesis mechanisms of these variants remain unclear. From our previous NGS results, 4 missense mutations and 2 silent mutations located in ORF S gene of genotype B HBV and concentrated on pre-S2 and small S regions; 10 missense mutations and 11 silent mutations distributed in ORF S gene of genotype C HBV. We showed again that amino acid F22 L (nt T53C), Q10L (nt C2875A) and A216T (nt G530A) were HCC-associated variants. On the other hand, we also identified the other NGS-defined HCC-associated SNVs in small S region (Genotype B, nt T216C and nt A273G; Genotype C, nt A293G, nt C446G, and nt A456G) could affect hepatocarcinogenesis pathway through inducing ER stress and regulating DNA repair system [68].

X gene and protein

HBx, a protein encoded by HBV ORF X gene, was involved in many intracellular signal pathways which were closely associated with cell proliferation and cell apoptosis [85]. Different HBx isoforms and C-terminal truncated HBx play important roles in HCC development [85, 90]. HBx C-terminal region could interact with intracellular molecules, through phosphorylation/methylation or binding to certain molecules, which directly or indirectly contribute towards tumorigenesis [91,92,93,94]. The C-terminal truncation of HBx plays a role in enhancing cell invasiveness and metastasis in HCC, regulating miRNA transcription and promoting hepatocellular proliferation [95,96,97]. From our previous NGS analysis, HBx deletions occurred only in a minority of patients with HCC (genotype B: 3% (1/40), genotype C: 4% (2/53)) and non-HCC (genotype B: 4% (2/47), genotype C: 3% (2/61)), and were indeed localized in the C-terminal of HBx. However, the proportion of patients with C-terminal truncation of HBx did not differ between HCC and non-HCC patients.

HBx-Ser31, an HBx mutation, had been investigated to exercise as an anti-apoptotic protein, resulting in enhancing tumor growth and suppressing tumorigenesis [90]. Another study showed that HBV BCP mutations (A1762T/G1764A), harbored in HBx gene lead to L130 M and V131I substitutions, could enhance S-phase kinase-associated protein 2 transcription, conversely down-regulate cell cycle inhibitors, and provide a potential mechanism for HCC development [91]. The Combo (T1753A/A1762T/G1764A/T1768A) mutations in BCP result in four amino acid substitutions in HBx protein including I127R/S/T, L130 M, V131I, and F132Y, which cause constitutive activation of the Wnt signaling pathway and play a pivotal role in HBV-associated hepatocarcinogenesis [98]. According to our previous NGS results, genotype C HBV bear HCC-associated SNVs in X gene (G1386A, G1613A, C1653T, T1674C, T1753G, and G1764A) and most of them clustered on C terminal of HBx, while genotype B HBV did not. These mutations changed HBx protein sequences to 5 M, 80I, 94Y, 101P, 127S, and 131I, which might affect the regulatory domain to change the self-regulatory mechanism of X gene expression, and impact the transactivation domain to regulate HBV replication and cellular pathway [99,100,101,102].

Precore/core gene and protein

The BCP and its adjacent precore region are crucial for replication of HBV. HBV mutations at BCP and precore region have been considered classical risk factors for HBV-related HCC, such as T1753 V, G1896A, G1899A, G1613A, and C1653T, which occur in core promoter and ORF C gene [103,104,105,106]. The BCP A1762T/G1764A double mutations have been indicated to increase the risk of HCC development exclusively in genotype C, but not in genotype B [25]. For our previous NGS results, G1896A and G1899A were HCC-associated variants barely in genotype B, while G1613A, C1653T, T1674C, T1753 V, G1764A, and A1846T were genotype C specific HCC-associated variants. A1762T was identified as an HCC-associated SNV by our NGS-based subgroup analysis of HBeAg-positive patients with genotype C HBV infection. Based on our meta-analysis and NGS results, we again confirmed the mutations T1727A, A1752G, C1773A and C1799G at BCP region and that T1858C and G1862 T at ORF C gene were not the risk variants for HCC development [68]. Mutations in BCP and core gene were usually considered to possess the trans-activating effect to the core promoter, resulting from alteration of binding affinity with trans-activator [107]. These hotspot mutations then would influence the complicated changes in genomic activity for HBeAg expression and HBV DNA replication, which may possibly lead to a more active hepatitis and the risk to HCC [107, 108].

Polymerase gene and protein

The association between HBV polymerase (P) gene mutations and HCC has been rarely reported. HBV P gene contain 4 domains as follows: a terminal protein (TP) region involved in priming the viral template, a spacer (SP) region, a catalytic domain with reverse transcriptase (RT) activity, and a C-terminus that has ribonuclease H (RNase H) activity. Polymerase dysfunction, in the form of an inability to package pre-genomic RNA into core particles, appeared to result from a single missense mutation in the 5′ region of the gene in a single patient with HCC [109]. Focusing on RT domain which overlaps with S gene, Wu et al. had characterized spontaneous mutations in the HBV RT region and indicated that A799G, A987G, and T1055A were independent risk factors for HCC using Sanger sequencing [110]. Li et al. indicated that rtF221Y (T791A), identified by the Sanger method, was an independent risk factor for the postoperative recurrence of HCC and poor overall survival rates [111]. Regarding the HCC-associated SNVs by our previous NGS analysis, only rtN134D (nt A529G) and tpK93E (nt A2583G) of genotype B and rtH55R (nt A293G) and rtS106C (nt C446G) of genotype C were nonsynonymous substitution in TP and RT domains affecting viral replication fitness. The other SNVs, located in spacer region and overlapped with pre-S region, did not affect the polymerase activity [68]. The related mechanisms of these HCC-associated SNVs involved in polymerase activity and hepatocarcinogenesis need to be further explored.

Conclusion

NGS analysis is a powerful and high-throughput method for the detection of HCC-associated HBV mutations. This method is useful to discover novel HCC-associated HBV SNVs, especially those with low SNV frequency. Although our previous study confirmed the association between hepatocarcinogenesis and some novel HCC-associated HBV SNVs with low SNV frequency in small S region, the pathologic and clinical significance of these low frequency SNVs should be investigated further. In addition, the evolution and impact of these quasispecies, including these SNVs, are intriguing to be investigated.