Aligning to the sample-specific reference sequence to optimize the accuracy of next-generation sequencing analysis for hepatitis B virus

Liu, Wen-Chun; Lin, Chih-Peng; Cheng, Chun-Pei; Ho, Cheng-Hsun; Lan, Kuo-Lun; Cheng, Ji-Hong; Yen, Chia-Jui; Cheng, Pin-Nan; Wu, I-Chin; Li, I-Chen; Chang, Bill Chia-Han; Tseng, Vincent S.; Chiu, Yen-Cheng; Chang, Ting-Tsung

doi:10.1007/s12072-015-9645-x

Aligning to the sample-specific reference sequence to optimize the accuracy of next-generation sequencing analysis for hepatitis B virus

Original Article
Open access
Published: 25 July 2015

Volume 10, pages 147–157, (2016)
Cite this article

Download PDF

You have full access to this open access article

Hepatology International Aims and scope Submit manuscript

Aligning to the sample-specific reference sequence to optimize the accuracy of next-generation sequencing analysis for hepatitis B virus

Download PDF

Wen-Chun Liu^1,2,
Chih-Peng Lin³,
Chun-Pei Cheng⁴,
Cheng-Hsun Ho^1,2,
Kuo-Lun Lan⁴,
Ji-Hong Cheng⁴,
Chia-Jui Yen^1,2,
Pin-Nan Cheng¹,
I-Chin Wu^1,2,
I-Chen Li¹,
Bill Chia-Han Chang³,
Vincent S. Tseng⁴,
Yen-Cheng Chiu^1,2 &
…
Ting-Tsung Chang^1,2

2492 Accesses
10 Citations
Explore all metrics

Abstract

Background

Hepatitis B virus (HBV) quasispecies are crucial in the pathogenesis of chronic liver disease. Next-generation sequencing (NGS) is powerful for identifying viral quasispecies. To improve mapping quality and single nucleotide variant (SNV) calling accuracy in the NGS analysis of HBV, we compared different mapping references, including the sample-specific reference sequence, same genotype sequences and different genotype sequences, according to the sample.

Methods

Real Illumina HBV datasets from 86 patients, and simulated datasets from 158 HBV strains in the GenBank database, were used to assess mapping quality. SNV calling accuracy was evaluated using different mapping references to align Real Illumina datasets from a single HBV clone.

Results

Using the sample-specific reference sequence as a mapping reference produced the largest number of mappable reads and coverages. With a different genotype mapping reference, the consensus sequence derived from the Real Illumina datasets of the single HBV clone showed 21 false SNV callings in polymerase and surface genes, the regions most divergent between the mapping reference and this HBV clone. A ~6 % coverage of most of these false SNVs was yielded even with a same genotype mapping reference, but none with the sample-specific reference sequence.

Conclusions

Using sample-specific reference sequences as a mapping reference in NGS analysis optimized mapping quality and the SNV calling accuracy for HBV quasispecies.

Applications of next-generation sequencing analysis for the detection of hepatocellular carcinoma-associated hepatitis B virus mutations

Article Open access 02 June 2018

Deep Sequencing of the Hepatitis B Virus Genome: Analysis of Multiple Samples by Implementation of the Illumina Platform

A method for near full-length amplification and sequencing for six hepatitis C virus genotypes

Article Open access 17 March 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Next generation sequencing (NGS), also known as ultra-high throughput sequencing, is a powerful tool for discovering diseases with novel mutations and for detecting traces of pathogenic microorganisms [1, 2]. It has also been used for sequencing human and microbial genomes and for identifying species.

Hepatitis B virus (HBV) infection causes a multitude of clinical conditions ranging from acute hepatitis to cirrhosis and hepatocellular carcinoma [3–5]. HBV is classified into genotypes A–J with an inter-genotypic divergence of at least 8 % in the complete genome sequence [6]. HBV genotypes B and C are prevalent in Taiwan [7]. Many studies have suggested that HBV genotypes show not only geographical distribution and ethnic specificity but are also associated with disease progression and responses to interferon treatment [8, 9]. HBV highly replicates its genome and daily releases more than 10¹¹ virions. Because HBV viral polymerase reverse transcriptase (RT) has no proofreading activity, HBV has higher mutation rates than other DNA viruses and complex quasispecies [10]. Viral quasispecies have been implicated in the development of drug resistance, the exacerbation of chronic hepatitis B (CHB), and the development of liver cancer [10].

NGS has been used to characterize single nucleotide variations (SNVs) and their dynamics in HBV polymerase RT genes in serum from patients who had undergone antiviral treatment [11–13]. HBV sequence-reads from NGS were aligned to one reference sequence from either a consensus genotype sequence in public viral databases [12, 13] or a major viral sequence identified using polymerase chain reaction (PCR)-director sequencing [11]. However, the mapping reference for optimizing the results in NGS analysis has never been identified.

In this study, we used various types of mapping references: genotype sequences identical to and different from the NGS sample, and a sample-specific reference sequence derived from its NGS dataset to investigate the mapping quality and the accuracy of the SNV callings for the full HBV genome. We also illustrated the effect of divergence between the mapping reference and NGS datasets on mapping quality and the accuracy of the SNV callings.

Materials and methods

Patients

Informed consent was obtained from each patient. Eighty-six patients between the ages of 34 to 75 were enrolled. All patients were treatment-naïve and had serum HBV DNA ≥200,000 IU/mL and detectable hepatitis B surface antigen for at least 6 months. Patients with hepatitis C or D infection, human immunodeficiency virus infection, or autoimmune hepatitis were excluded. Serum samples of all the patients were stored at −80 °C until used. Clinical characteristics of the 52 genotype-C patients and 34 genotype-B patients are shown in Table 1.

Table 1 Characteristics of patients with different HBV genotypes

Full size table

Extracting and amplifying full HBV genomic DNA

HBV DNA from 200 μL of serum was extracted using the Viogene Blood and Tissue Extraction Mini DNA Extractor kit (Viogene BioTek, New Taipei City, Taiwan). Fragments of the HBV DNA full genome were amplified using PCR with nine primer sets (Supplementary Table 1) and High Fidelity DNA polymerase (Thermo Fisher Scientific, Pittsburgh, PA, USA). The PCR condition was 5 min at 94 °C followed by 40 cycles of 1 min at 94 °C, 1 min at 50 °C, and 1.5 min at 72 °C. All amplified PCR products were run in a 1 % agarose gel electrophoresis and were purified using the GEL/PCR Purification Mini Kit (Favorgen Biotech, Ping-Tung, Taiwan). Nine PCR fragments were mixed and were subjected to NGS.

Viral genome sequencing using massively parallel NGS

Massively-parallel NGS with multiplexed tags was carried out using a genome analyzer, as previously described [11]. In brief, HBV DNA was fragmented using sonication and was cut into suitable sizes. These fragments were purified and were then end-repaired and A-tailed using DNA Polymerase I Klenow Fragment (3′ → 5′ exo-) (New England Biolabs, Ipswich, MA, USA). DNA fragments ligated with indexed adapters were amplified using 10–18 cycles of PCR reaction. The DNA library was quantified by Qubit fluorometer (Qubit dsDNA HS assay, Thermo Fisher Scientific, Life Technology) and real time PCR (KAPA Library Quantification Kit Illumina® platforms, KAPA biosystems). Experion Automated Electrophoresis System (Bio-Rad Laboratories, Hercules, CA, USA) was used to validate the size of the library. After it had been validated, the library was sequenced (HiSeq™ 2500; Illumina, San Diego, CA, USA).

Genomic analysis of NGS data

For NGS high-throughput data, low-quality bases of raw reads were first trimmed using Seqtk (https://github.com/lh3/seqtk), which uses a modified Mott trimming algorithm. All parameters were default settings except maximally trimming down to 1 bp to remove as many low-quality bases as possible but still keeping paired-end information. At the alignment stage, the trimmed reads were then mapped to the mapping reference genome using BWA (BWA-MEM) [14] with a −M parameter setting and 16 threads to get correct and consistent mapping statistics. Because almost all genome mappers, like BWA, were designed for linear genomes, they are not well suited for circular genomes like HBV genomes, especially when reads spanning the end of the genome have worse mapping performance. Therefore, we manually concatenated the end parts, 600 bases, ranging from the beginning of the 5′ end to the 3′ end to avoid overhanging reads. The mapping results were then processed using SAMTools [15] to remove reads that mapped to multiple positions and reads that had poor mapping quality scores.

Direct Sanger sequencing and NGS of HBV clones

Serum HBV DNA of two patients with CHB were extracted and amplified using PCR. The designed primers were modified [7] to amplify the full-length HBV genome and then were cloned into yT&A plasmid (Yeastern Biotech, Taipei, Taiwan). HBV full genomes of Clone_N6 (KJ790199) and Clone_H44 (KJ790200) were sequenced using a direct Sanger sequencer (Applied Biosystems, Life Technologies, Taipei, Taiwan). For NGS of the two HBV clones, the HBV full genomes in plasmids were amplified using primers with restriction site sapI [16], then self-ligated using T4 ligase to produce a circular form HBV genome, and amplified into nine fragments that were subjected to NGS as described in “Viral genome sequencing using massively parallel NGS”.

Mapping references chosen at the alignment stage during NGS data analysis

To optimize the NGS data analysis, five different mapping references were used at the alignment stage to compare the mapping results. Four full-length HBV genome sequences, FJ787477 (genotype B, Asia), JN315779 (genotype C, Asia), KJ790200 (Clone_H44; genotype B, Taiwan), and KJ790199 (Clone_N6; genotype C, Taiwan) in the GenBank database were used. Furthermore, a sample-specific reference sequence, the consensus sequence obtained from the NGS reads of each sample through alignment with its same genotype mapping reference (FJ787477 or JN315779), was also used as a mapping reference sequence for the NGS datasets.

HBV genotyping

The HBV genotype was determined using a melting curve analysis with LightCycler hybridization probes as previously described [17]. The derived consensus sequence of NGS reads were aligned to the standard full-length HBV genomes (genotype A–H) from the GenBank database to confirm the genotyping results. HBV genotype was identified using phylogenetic analysis software (Mega 6.0) [18].

Evaluating the mapping quality of simulated datasets from 158 HBV strains in the GenBank database with the alignment to different genotype sequences

To evaluate the quality of mapping results with the alignment to different genotype references, 158 HBV complete genomes (34 genotype A, 33 genotype B, 39 genotype C, and 52 genotype D strains) with 4 common genotypes from the GenBank database were collected. The simulated NGS reads were produced from each strain using sequence alignment/mapping software (SAMtools wgsim) [15]. To analyze these 158 simulated NGS datasets, the full genome of each HBV strain was also used as a mapping reference for alignment, respectively. The mapping results—mappable reads, properly paired reads, broken paired reads, and singleton reads—were evaluated.

Calculating nucleotide divergence

Nucleotide divergences were calculated using DNA sequence polymorphism software (DNasp 5.10.1) [19]. One hundred sites for window length were set when we calculated the divergences.

Statistical analysis

Continuous variables were compared using Student’s t test for two independent groups. The changes in mapping quality between different mapping references were compared using paired t tests. The frequencies and distributions of categorical variables were compared using the Chi square tests or Fisher’s exact tests.

Results

The mapping quality of the NGS dataset was optimized when using a sample-specific reference sequence as the mapping reference

We compared the mapping quality of real Illumina datasets of viral genomes from 52 patients with genotype C HBV and 34 patients with genotype B HBV using different mapping references. The sample-specific reference sequence had the best quality, followed by the Taiwanese strain with the same genotype, the Asian strain with the same genotype, the Taiwanese strain with a different genotype, and the Asian strain with a different genotype (Table 2; Supplementary Table 2). In patients with genotype C HBV, the sample-specific reference sequence had the best mapping quality: the largest number of mappable reads, properly paired reads, and broken paired reads, and the lowest number of singletons. In addition, using a sample-specific sequence as a reference yielded the largest average coverages (38,362 ± 75,502), minimum coverages, and maximum coverages per nucleotide, and the smallest number of nucleotides with fewer than 30 coverages. Moreover, when using a sample-specific sequence, 97 % of the nucleotides in the full genome had more than 1000 coverages.

Table 2 Mapping NGS datasets of genotype C patients (n = 52) to different references of HBV full genome

Full size table

Using a Taiwanese genotype C mapping reference yielded larger mappable reads (p < 0.0001) and average coverage per nucleotide (p < 0.0001) than did using the Asian genotype C mapping reference. A similar pattern of mapping quality was detected in NGS datasets from 34 genotype B patients aligned to different HBV mapping references (Supplementary Table 2).

Simulated NGS datasets for which a sample-specific sequence was used as a reference yielded improved mapping quality

Simulated NGS reads were produced from 158 HBV full genomes (genotypes A–D HBV) obtained from the GenBank database using the Wgsim read simulator [15]. To analyze these simulated NGS datasets, each full genome was also used as mapping reference for alignment. The mapping qualities categorized by subgenotypes were shown in Supplementary Table 3. The improved mapping qualities of simulated NGS datasets aligned to the same genotype sequence, especially the same subgenotype sequence in genotype B, showed a higher percentage of mappable reads and properly paired reads and a lower percentage of singletons than did datasets aligned to different genotypes.

Datasets aligned to different genotypes yielded false SNVs in the consensus sequence derived from NGS reads of a single HBV clone

The mapping results of NGS reads from a Taiwanese genotype C HBV Clone_N6 using different mapping references: sample-specific reference sequence JN315779 (Genotype C, Asia) and reference sequence FJ787477 (Genotype B, Asia) were assessed. When compared with the sequence of Clone_N6 derived from direct sequencing, the consensus sequence contained 21 false SNVs when using a different genotype mapping reference (Fig. 1). Consensus sequences derived from the same dataset were identical to the sequence of Clone_N6 using either the same genotype reference or a sample-specific reference sequence as mapping references.

Regions of high divergence between single HBV clone and mapping references involved the false SNVs of NGS analysis

The percentage of mappable reads from the Clone_N6 NGS dataset was 98.7 using a sample-specific reference sequence, 97.0 % using JN315779, and 96.8 % using FJ787477. The coverage and percentage of false SNVs in the consensus sequences derived from Clone_N6 (genotype C) NGS reads when aligned to different HBV mapping references are shown in Table 3. When the NGS reads were aligned to a different genotype strain (FJ787477), the derived consensus sequence showed 21 false SNVs with 2–3 log reductions of coverage and an enormous change in nucleotide percentages. Interestingly, these false SNVs were located exactly at the three highest divergence regions, viz., nt929–942 (P gene/RT domain), nt2733–2741 (P gene/terminal protein domain), and nt2980–3015 (P gene/spacer domain; pre-S1 region), with a divergence of >18 % between Clone_N6 (genotype C) and the mapping reference FJ787477 (Genotype B, Asia) (Fig. 2a).

Table 3 Coverage and percentage of false SNVs of NGS reads (Clone_N6, genotype C) with alignment to different references of HBV full genome

Full size table

Using a mapping reference with the same genotype (JN315779) for Clone_N6 NGS reads, even the derived consensus sequence was identical to the Clone_N6 sequence, and decades of SNVs were still detected with the coverage percentage up to 6 % (Table 3). The sequence divergences of each sliding window between Clone_N6 and mapping references with JN315779 were all below 10 % in the HBV full genome (Fig. 2a). Using a mapping reference with the sample-specific reference sequence showed the best results of NGS analysis, with a percentage of false SNVs <0.3. This false SNV rate was considered mismatch error values and could be ignored in NGS analysis [11]. Furthermore, we found 14 false SNVs in the consensus sequence derived from Clone_H44 (genotype B) NGS reads when mapping to reference JN315779 (Genotype C, Asia). All the false SNVs located in nt940–942 (P gene/RT domain), nt2759–2775 (P gene/terminal protein domain), and nt2783–2790 (P gene/terminal protein domain), showed high divergence (>15 %) between Clone_H44 and JN315779 (Supplementary Fig. 1A).

High prevalence of SNVs in 52 patients associated with a high divergence between their derived HBV consensus sequences and mapping references

We obtained all the sequence divergences of each sliding window by comparing each HBV-derived consensus sequence from 52 patients infected with genotype C HBV with mapping references JN315779 (Genotype C, Asia) and FJ787477 (Genotype B, Asia). Mean sequence divergences of genotype C consensus strains relative to two different genotype mapping references were shown as two demarcating curves (Fig. 2b). The profiles were similar to those in Fig. 2a. When using FJ787477 as a mapping reference, all the mean sequence divergences of each window except those within nt2137–2250 of the core gene were significantly higher than those referenced to JN315779 (p < 0.0001).

Interestingly, the derived consensus sequence at 94 nucleotide positions of the HBV full genome in at least one of 52 patients with genotype C HBV contained inconsistent variants when using different genotype mapping references (Supplementary Table 4). Seventeen inconsistent variants occurred in more than 20 % and ten inconsistent variants in more than 40 % of these patients (Table 4). These inconsistent SNVs were located at high divergence regions (over 13 %): nt939–942, nt1353–1362, and nt2980–3015, and they were probably false SNVs (Fig. 2b). In addition, mean sequence divergences between 34 genotype B-derived consensus sequences from CHB patients and each of two different genotype mapping references are shown as two demarcating curves in Supplementary Fig. 2A, in which the profiles are similar to those in Fig. 2b. When comparing different genotype mapping references, 111 nucleotides in the HBV full genome of derived consensus sequences expressed inconsistent variants in at least one patient (Supplementary Table 5). When using JN315779 as a mapping reference, all the mean sequence divergences except those within nt368–481, nt685–790, nt1207–1313, nt1802–2000, and nt2133–2584 were significantly higher than those referenced to FJ787477 (p < 0.0001) (Supplementary Fig. 1B).

Table 4 Prevalence of probable false SNVs of genotype C patients (n = 52) to different references of HBV full genome

Full size table

The mean divergence of any consecutive 100-nucleotide segments in genotypes A–D

We collected 158 HBV full genomes (genotypes A–D) from the GenBank database. Within the same genotype, the mean divergence of any consecutive 100-nucleotide segment was ≤8.0 % (Fig. 3). Between any two different genotypes, there were high divergences in most regions of the HBV full genome, except the core gene between genotypes B and C.

Discussion

Various NGS platforms, including the Illumina HiSeq, Illumina Genome Analyzer, Illumina Miseq, Roche 454, and SOLiD4, differ in engineering configurations and sequencing chemistry. In the present study, we used the Illumina Hiseq 2500 system to analyze HBV full genome in viral quasispecies; it has the advantages of short run times, long read lengths, and high data quality. With an average read length of 110 bp and an average coverage ≥25,000 after quality trimming, the depth and quality of the sequencing results are admissible for analyzing viral quasispecies.

At the alignment stage, mapping short reads against a reference genome is typically the first step in analyzing such NGS data, and it should be as accurate as possible. The mapping reference is crucial for mapping quality and for the quality of the SNV calls. Previous HBV-related NGS analyses used the consensus genotype sequence [12, 13] or a sequence from direct sequencing of PCR products [11] as mapping references. This is the first study to compare the mapping quality and the accuracy of the SNV calls for NGS analysis of the HBV full genome using different mapping references. When a sample-specific reference sequence was used, the mappable reads were 89.4 % of total reads with highest properly paired reads and broken paired reads and lowest singleton reads, which indicated that mapping quality was substantially improved. Coverage is important for the quality of SNV calls. High coverage regions or bases tend to have a higher call quality. Using a sample-specific reference sequence as mapping reference increased the average coverage for each nucleotide. When simulated NGS datasets were aligned to the same genotype sequence, especially the same subgenotype sequence in genotype B or from the same country, mapping qualities were also improved.

HBV quasispecies are importantly implicated in the pathogenesis of chronic liver diseases. For example, several mutations of the HBV genome are crucial for developing HCC and cirrhosis [20–23]. Therefore, accurate SNV calls are important for analyzing the HBV full genome in NGS. Along with the verification using cloned sequences, we identified numerous false SNVs in a major strain when mapping with different viral genotypes to the preceding sample (Table 3). These false SNV calls were contributed by fewer mappable reads, a profound reduction in coverages, and the change of nucleotide composition at one site. In addition, we also confirmed that there were probably many false SNVs in derived consensus sequences from patients when using a different viral genotype as a mapping reference (Table 4). More than 20 % of the patients had numerous false SNVs, particularly in high-divergence regions (Table 4; Supplementary Tables 4, 5), which suggested that a different genotype sequence was an inappropriate mapping reference for HBV NGS analysis.

HBV mutants have been demonstrated in patients with acute fulminant or chronic infection [10]. Mutations in the RT region of viral P gene have been associated with the resistance to antiviral drugs [24, 25]. Mutations on the PreS1 or PreS2 promoter were correlated with the occurrence of HCC [26, 27]. While using different genotype mapping references, we found that false SNV calls of major strains were located in high-divergence regions, such as the P-gene/RT domain, P-gene/terminal protein domain, and preS1 region in the HBV genome for genotypes A–D (Fig. 2). We suggest using a sample-specific reference sequence as a mapping reference for NGS analysis in studies of P-gene variants for drug-resistance [24, 25] and S-gene variants for the pathogenesis of HCC [26, 27]. Emerging evidence supports the notion that certain drug-resistant HBV minor strains are crucial for the progression of liver diseases and are predictors of subsequent treatment failure [23, 28]. They may accumulate and eventually dominate under a long-term selection effect during antiviral treatment [23].

By taking advantage of an abundance of genetic information when using NGS, more accurate and detailed SNVs of HBV quasispecies can be obtained than with previous methods, such as INNO-LiPA and clonal HBV sequencing. However, even when we used the same genotype sequence as a mapping reference, decades of false SNVs still existed with a coverage up to 6 % (Table 3). Using a mapping reference with a sample-specific reference sequence yielded the best NGS analysis results with a coverage percentage of false SNVs <0.3 %, which was considered a mismatch error value and could be ignored for NGS analysis in the Illumina platform [11]. Therefore, using a sample-specific reference sequence to align the NGS dataset is crucial for accurate SNV calls of viral quasispecies.

In conclusion, we conclude that a sample-specific reference sequence, which provides the highest mapping quality and SNV call accuracy, should be used as the mapping reference in the NGS analysis of emerging HBV variants, especially for the studies of the P gene and S gene.

Abbreviations

CHB:: Chronic hepatitis B
DNasp:: DNA sequence polymorphism
Geno.:: Genotype
HBV:: Hepatitis B virus
NGS:: Next generation sequencing
NT:: Nucleotide
PCR:: Polymerase chain reaction
SNV:: Single nucleotide variant

References

Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 2008;456:66–72
Article PubMed Central CAS PubMed Google Scholar
Isakov O, Modai S, Shomron N. Pathogen detection using short-RNA deep sequencing subtraction and assembly. Bioinformatics 2011;27:2027–2030
Article PubMed Central CAS PubMed Google Scholar
Lok AS, McMahon BJ. Chronic hepatitis B. Hepatology 2007;45:507–539
Article CAS PubMed Google Scholar
EASL clinical practice guidelines. Management of chronic hepatitis B virus infection. J Hepatol 2012;57:167–185
Article Google Scholar
Liaw YF, Chu CM. Hepatitis B virus infection. Lancet 2009;373:582–592
Article CAS PubMed Google Scholar
Liu CJ, Kao JH. Global perspective on the natural history of chronic hepatitis B: role of hepatitis B virus genotypes A–J. Semin Liver Dis 2013;33:97–102
Article PubMed Google Scholar
Liu WC, Phiet PH, Chiang TY, Sun KT, Hung KH, Young KC, et al. Five subgenotypes of hepatitis B virus genotype B with distinct geographic and virological characteristics. Virus Res 2007;129:212–223
Article CAS PubMed Google Scholar
Kidd-Ljunggren K, Miyakawa Y, Kidd AH. Genetic variability in hepatitis B viruses. J Gen Virol 2002;83:1267–1280
Article CAS PubMed Google Scholar
Nie JJ, Sun KX, Li J, Wang J, Jin H, Wang L, et al. A type-specific nested PCR assay established and applied for investigation of HBV genotype and subgenotype in Chinese patients with chronic HBV infection. Virol J 2012;9:121
Article PubMed Central PubMed Google Scholar
Blum HE. Hepatitis B virus: significance of naturally occurring mutants. Intervirology 1993;35:40–50
CAS PubMed Google Scholar
Nishijima N, Marusawa H, Ueda Y, Takahashi K, Nasu A, Osaki Y, et al. Dynamics of hepatitis B virus quasispecies in association with nucleos(t)ide analogue treatment determined by ultra-deep sequencing. PloS ONE 2012;7:e35052
Article PubMed Central CAS PubMed Google Scholar
Margeridon-Thermet S, Shulman NS, Ahmed A, Shahriar R, Liu T, Wang C, et al. Ultra-deep pyrosequencing of hepatitis B virus quasispecies from nucleoside and nucleotide reverse-transcriptase inhibitor (NRTI)-treated patients and NRTI-naive patients. J Infect Dis 2009;199:1275–1285
Article PubMed Central CAS PubMed Google Scholar
Solmone M, Vincenti D, Prosperi MC, Bruselles A, Ippolito G, Capobianchi MR. Use of massively parallel ultradeep pyrosequencing to characterize the genetic diversity of hepatitis B virus in drug-resistant and drug-naive patients and to detect minor variants in reverse transcriptase and hepatitis B S antigen. J Virol 2009;83:1718–1726
Article PubMed Central CAS PubMed Google Scholar
Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 2009;25:1754–1760
Article PubMed Central CAS PubMed Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078–2079
Article PubMed Central PubMed Google Scholar
Gunther S, Li BC, Miska S, Kruger DH, Meisel H, Will H. A novel method for efficient amplification of whole hepatitis B virus genomes permits rapid functional analysis and reveals deletion mutants in immunosuppressed patients. J Virol 1995;69:5437–5444
PubMed Central CAS PubMed Google Scholar
Liu WC, Mizokami M, Buti M, Lindh M, Young KC, Sun KT, et al. Simultaneous quantification and genotyping of hepatitis B virus for genotypes A–G by real-time PCR and two-step melting curve analysis. J Clin Microbiol 2006;44:4491–4497
Article PubMed Central CAS PubMed Google Scholar
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 2013;30:2725–2729
Article PubMed Central CAS PubMed Google Scholar
Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 2009;25:1451–1452
Article CAS PubMed Google Scholar
Chen YM, Wu SH, Qiu CN, Yu DJ, Wang XJ. Hepatitis B virus subgenotype C2- and B2-associated mutation patterns may be responsible for liver cirrhosis and hepatocellular carcinoma, respectively. Braz J Med Biol Res 2013;46:614–622
Article PubMed Central CAS PubMed Google Scholar
Lin CL, Kao JH. Hepatitis B viral factors and clinical outcomes of chronic hepatitis B. J Biomed Sci 2008;15:137–145
Article PubMed Google Scholar
Park YM, Jang JW, Yoo SH, Kim SH, Oh IM, Park SJ, et al. Combinations of eight key mutations in the X/preC region and genomic activity of hepatitis B virus are associated with hepatocellular carcinoma. J Viral Hepat 2014;21:171–177
Article CAS PubMed Google Scholar
Singla B, Chakraborti A, Sharma BK, Kapil S, Chawla YK, Arora SK, et al. Hepatitis B virus reverse transcriptase mutations in treatment Naive chronic hepatitis B patients. J Med Virol 2013;85:1155–1162
Article CAS PubMed Google Scholar
Song ZL, Cui YJ, Zheng WP, Teng DH, Zheng H. Diagnostic and therapeutic progress of multi-drug resistance with anti-HBV nucleos(t)ide analogues. World J Gastroenterol 2012;18:7149–7157
Article PubMed Central PubMed Google Scholar
Yim HJ, Hwang SG. Options for the management of antiviral resistance during hepatitis B therapy: reflections on battles over a decade. Clin Mol Hepatol 2013;19:195–209
Article PubMed Central PubMed Google Scholar
Li GJ, Harrison TJ, Yang JY, Chen QY, Wang XY, Fang ZL. Combined core promoter mutations and pre-S deletion of HBV may not increase the risk of HCC: a geographical epidemiological study in Guangxi, China. Liver Int 2013;33:936–943
Article CAS PubMed Google Scholar
Qu L, Kuai X, Liu T, Chen T, Ni Z, Shen X. Pre-S deletion and complex mutations of hepatitis B virus related to young age hepatocellular carcinoma in Qidong, China. PloS ONE 2013;8:e59583
Article PubMed Central CAS PubMed Google Scholar
Bhattacharya D, Lewis MJ, Lassmann B, Phan T, Knecht G, Bickel M, et al. Combination of allele-specific detection techniques to quantify minority resistance variants in hepatitis B infection: a novel approach. J Virol Methods 2013;190:34–40
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by Grant 102-2628-B-006-008-MY3 from the Ministry of Science and Technology of Taiwan, the Infectious Disease and Signaling Research Center of National Cheng Kung University. We thank the Human Biobank of the Research Center of Clinical Medicine and Cancer Data Bank of the Cancer Center of National Cheng Kung University Hospital Cancer Center for providing HCC samples. We are indebted to Kung-Chia Young for invaluable suggestions and Bill Franke for English editing.

Compliance with ethical requirements and Conflict of interest

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2008 (5). Informed consent was obtained from all patients for being included in the study. This article does not contain any studies with animal subjects. Wen-Chun Liu, Chih-Peng Lin, Chun-Pei Cheng, Cheng-Hsun Ho, Kuo-Lun Lan, Ji-Hong Cheng, Chia-Jui Yen, Pin-Nan Cheng, I-Chin Wu, I-Chen Li, Bill Chia-Han Chang, Vincent S. Tseng, Yen-Cheng Chiu, and Ting-Tsung Chang declare that they have no conflict of interest. This study was approved by the Institutional Review Board of National Cheng Kung University Hospital (NCKUH) (IRB no.: B-ER-101-385).

Author information

Authors and Affiliations

Department of Internal Medicine, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, 138 Sheng-Li Road, Tainan, 70403, Taiwan
Wen-Chun Liu, Cheng-Hsun Ho, Chia-Jui Yen, Pin-Nan Cheng, I-Chin Wu, I-Chen Li, Yen-Cheng Chiu & Ting-Tsung Chang
Infectious Disease and Signaling Research Center, National Cheng Kung University, Tainan, Taiwan
Wen-Chun Liu, Cheng-Hsun Ho, Chia-Jui Yen, I-Chin Wu, Yen-Cheng Chiu & Ting-Tsung Chang
Yourgene Bioscience, Taipei, Taiwan
Chih-Peng Lin & Bill Chia-Han Chang
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
Chun-Pei Cheng, Kuo-Lun Lan, Ji-Hong Cheng & Vincent S. Tseng

Authors

Wen-Chun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Peng Lin
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Pei Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Hsun Ho
View author publications
You can also search for this author in PubMed Google Scholar
Kuo-Lun Lan
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Hong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Jui Yen
View author publications
You can also search for this author in PubMed Google Scholar
Pin-Nan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
I-Chin Wu
View author publications
You can also search for this author in PubMed Google Scholar
I-Chen Li
View author publications
You can also search for this author in PubMed Google Scholar
Bill Chia-Han Chang
View author publications
You can also search for this author in PubMed Google Scholar
Vincent S. Tseng
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Cheng Chiu
View author publications
You can also search for this author in PubMed Google Scholar
Ting-Tsung Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ting-Tsung Chang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 214 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Liu, WC., Lin, CP., Cheng, CP. et al. Aligning to the sample-specific reference sequence to optimize the accuracy of next-generation sequencing analysis for hepatitis B virus. Hepatol Int 10, 147–157 (2016). https://doi.org/10.1007/s12072-015-9645-x

Download citation

Received: 10 April 2015
Accepted: 11 June 2015
Published: 25 July 2015
Issue Date: January 2016
DOI: https://doi.org/10.1007/s12072-015-9645-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Aligning to the sample-specific reference sequence to optimize the accuracy of next-generation sequencing analysis for hepatitis B virus

Abstract

Background

Methods

Results

Conclusions

Similar content being viewed by others

Applications of next-generation sequencing analysis for the detection of hepatocellular carcinoma-associated hepatitis B virus mutations

Deep Sequencing of the Hepatitis B Virus Genome: Analysis of Multiple Samples by Implementation of the Illumina Platform

A method for near full-length amplification and sequencing for six hepatitis C virus genotypes

Introduction

Materials and methods

Patients

Extracting and amplifying full HBV genomic DNA

Viral genome sequencing using massively parallel NGS

Genomic analysis of NGS data

Direct Sanger sequencing and NGS of HBV clones

Mapping references chosen at the alignment stage during NGS data analysis

HBV genotyping

Evaluating the mapping quality of simulated datasets from 158 HBV strains in the GenBank database with the alignment to different genotype sequences

Calculating nucleotide divergence

Statistical analysis

Results

The mapping quality of the NGS dataset was optimized when using a sample-specific reference sequence as the mapping reference

Simulated NGS datasets for which a sample-specific sequence was used as a reference yielded improved mapping quality

Datasets aligned to different genotypes yielded false SNVs in the consensus sequence derived from NGS reads of a single HBV clone

Regions of high divergence between single HBV clone and mapping references involved the false SNVs of NGS analysis

High prevalence of SNVs in 52 patients associated with a high divergence between their derived HBV consensus sequences and mapping references

The mean divergence of any consecutive 100-nucleotide segments in genotypes A–D

Discussion

Abbreviations

References

Acknowledgements

Compliance with ethical requirements and Conflict of interest

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (PDF 214 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation