Introduction

In current clinical viral genome sequencing, next-generation sequencing (NGS) is a frequent choice that provides an unbiased high resolution of mutation profile in a genome-wide manner [1]. Because of an overwhelming dominance of human genetic content in clinical specimens, a major limitation of this approach is its low efficiency, which is rarely higher than 1% of viral sequencing reads in NGS output [1]. Among numerous virus-enriched methods, capture sequencing, employing a hybridization step after NGS library construction, comes out as the most efficient strategy to enrich viral sequences [1]. However, this strategy is associated with a dramatic cost increase as it requires the synthesis of expensive biotin labeled virus-specific probes (baits) and streptavidin beads [2]. The inclusion of such a hybridization step after initial library preparation also makes the entire NGS pipeline a lengthy procedure. Most human viruses, such as hepatitis B virus (HBV), hepatitis C virus (HCV), HIV, and coronavirus, have a genome less than 30 kb in size. If the viral on-target rate consistently exceeds 1%, current NGS approach is actually powerful enough to satisfy clinical and research needs. For instance, a 1% HCV on-target rate in 5 million of 2 × 150 bp paired ended reads give a depth at 1562×, which already crosses a saturation point (1100×) for HCV viral population dissection with a mutation frequency resolved at 1% [3]. To achieve this goal, we provide an alternative option for viral sequence enrichment that does not require a probe-based hybridization step. Our method, named NGS with target enrichment via enzymatic digestion (TEEDseq), is dependent on 7-deaza-2′-deoxyguanosine 5′-triphosphate (c7dGTP), an analog of deoxyguanosine triphosphate (dGTP). Due to its ability to relax DNA secondary structure, c7dGTP is widely used in PCR and Sanger sequencing [4, 5]. DNA molecules composed of c7dGTP show steric alteration that is resistant to some restriction enzymes with the recognition motifs containing guanosine [6]. This unique characteristic of c7dGTP is used to accomplish the enrichment of a sequencing target.

Main text

Materials and methods

PCR amplification efficiency using c7dGTP was first estimated using an HBV plasmid as the template [7]. A 30-cycle PCR was done in a 50 µL reaction containing 1× Q5 DNA polymerase buffer, 0.8 mM dNTPs, each 0.4 µM of primers HBVF1 and HBVR1 (Table 1), and 1 unit of Q5 DNA polymerase [New England Biolabs (NEB), Ipswich, MA]. In the parallel reaction, dGTPs was completely replaced by c7dGTP (Roche Molecular Systems, Madison, WI). After the purification with QIAquick PCR Purification Kit (Qiagen, Valencia, CA), the PCR product was quantitated in NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA).

Table 1 List of the oligonucleotides used in the study

Next, we tested the resistance of c7dGTP to restriction enzymes. The above PCR was repeated with primer HBVR1p containing a phosphate at 5′ end. PCR product was purified and digested by Lambda exonuclease (NEB) that favored the digestion of 5′ phosphate strand [8]. Consequently, both single-strand DNA (ssDNA) and double-strand DNA (dsDNA) were generated and used for the digestion of three restriction enzymes (AluI, HaeIII, and HpyCH4V) (NEB) that were carefully selected based on their recognition motifs and buffer compatibility. The experiment was repeated with PCR amplicon generated using c7dGTP. Since c7dGTP was difficult to stain using ethidium bromide [9], PCR with a high cycle number (n = 35) was conducted using an aliquot of 2 µL enzyme reaction with primers HBVF2 and HBVR2 (Table 1).

Finally, TEEDseq was evaluated using a healthy donor serum sample spiked with a 1311-bp HBV PCR fragment at a concentration of 1 × 106 copies/mL, mimicking a concentration from viruses like HCV and HBV (Table 1) [10]. Total DNA was extracted from 0.5 mL of serum by QIAamp MinElute ccfDNA Kit (Qiagen, Valencia, CA) and eluted into 20 µL Tris buffer. Entire 20 µL extracted DNA was used for 5-cycle primer extension under 1 unit of Q5 DNA polymerase, 0.4 µM primer HBVR4 (Table 1), and 0.8 mM of dNTPs in which dGTP was completely replaced by c7dGTP. The reaction was purified using MinElute PCR Purification Kit (Qiagen) and eluted into 20 µL Tris buffer, followed by ligation in 30 µL reaction consisting of 10 U T4 DNA ligase and 0.5 µM of the splinter at 14 °C overnight. After heat inactivation, the enzyme complex (AluI 10 U, HeaIII 10 U, HpyCH4V 5U, Exonuclease I 50 U, and Exonuclease III 20 U) was added to bring the reaction up to 45 µL volume in 1× CutSmart buffer. 3 h after the incubation at 37 °C, the reaction was heat-inactivated and used for RCA with 1 µM target-specific primer HBVR5 (Table 1) at 30 °C for the first 12 h and then 4 h at 28 °C with 80 µM of C28 primer (Table 1). The final product was around 12 kb in size with an average yield of 1.8 µg after purification using QIAprep Spin Miniprep Kit (Qiagen). The product was subjected to Illumina sequencing (1 × 250 nt single-end read), followed by data analysis as we previously described [11, 12]. We tested four options: full TEEDseq protocol (a), TEEDseq with the omission of three restriction enzyme (b), direct sequencing using Illumina Nextera Flex for plasma/serum kit (c), and full TEEDseq using the same serum sample without the spike-in of the HBV fragment (d). Each option was set for three technical replicates.

Results

PCR using c7dGTP showed a weak band in ethidium bromide (EB)-stained gel (Additional file 1: Figure S1A), which was consistent with the previous report that c7dGTP was hardly stained using EB [9]. However, PCR quantification revealed a slightly lower yield with c7dGTP (Additional file 1: Figure S1B). This slight drop in PCR yield may also be attributed to the nature of c7dGTP rather than an authentic decrease. Hence, PCR with c7dGTP had similar efficiency to that using regular dGTP.

In the estimation of c7dGTP’s resistance to restriction enzymes, the amplicon had three AluI sites, one HaeIII site, and four HpyCH4V sites. While all three enzymes had a complete digestion of dsDNA, HpyCH4V cut both ssDNA and dsDNA (Additional file 1: Figure S1C). In comparison to dGTP, c7dGTP showed strong bands, suggesting a resistance to digestion. The combination of all three enzymes resulted in almost a complete digestion of both ssDNA and dsDNA, as indicated by a much weaker band (Additional file 1: Figure S1D). These experiments have demonstrated that dc7GTP is resistant to individual and combinatorial digestion of AluI, HaeIII, and HpyCH4V.

After read quality control [11, 12], one million of total reads had HBV-mapped reads at 33,153 ± 3900 (3.31 ± 0.39%), 2638 ± 750 (0.26 ± 0.07%), 73 ± 21 (0.007 ± 0.0002%), and zero for options a, b, c, and d, respectively (Fig. 1). TEEDseq reached an enrichment 454× that of direct sequencing (option c). The recovery of HBV-mapped reads was 12.6 times higher in option a than in option b, illustrating the pivotal role of the three restriction enzymes. Using HiCUP [14], these enzymes together have 36,535,384 cuts (AluI 13,085,321; HaeIII 8,582,925; HpyCH4V 14,867,138) on the human genome (building GRCh38). Their combination with exonucleases efficiently digested non-target background sequences.

Fig. 1
figure 1

HBV-specific read mapping among four options. Read-alignment on 1195-bp HBV genome sequence from the HBVR4 priming site was viewed in bam file using BamView [13]. Reads with matching start and end positions were collapsed into one line and are shown in green. Option a, b, and c used a serum sample spiked with 1311-bp HBV fragment. Option d had no HBV fragment spiked in the serum and served as a control. Each option was shown with the numbers (average and standard derivation) of HBV-mapped and total reads from three technical replicates after the quality control

Discussion

Our method consists of four steps: primer extension, splinter ligation, enzymatic digestion, and rolling circle amplification (RCA) (Fig. 2). Using a serum sample spiked with a partial HBV genome (1311 bp), TEEDseq achieved a 3.31% mapping rate. Under a probe-based hybridization strategy, genome-wide HBV capture sequencing does not necessarily have a high on-target rate, for instance, < 1% in a recent report [15]. Off-target effect may come from non-specific priming since there is significant micro-homology between HBV and the human genome [16]. A more rigid primer design and conditions for primer extension could further enhance the enrichment.

Fig. 2
figure 2

The working flow of TEEDseq. Note that ligation, digestion, and RCA (grey-filled cycles) are placed in the same tube in a sequential manner. RCA, rolling cycling amplification

In addition, TEEDseq has several technical features worthy of attention. Serum DNA is regarded as a low-biomass sample. Its low DNA concentration, 435 ng/mL in the current study, naturally favors intracellular ligation that can be further facilitated using a splinter. Because intermolecular ligation is suppressed at a low DNA concentration, high concentration of templates, such as DNA extracted from tissue samples, need to be diluted prior to the ligation [17]. Second, we applied two-phase RCA amplification, target-specific and non-specific. The short incubation of non-specific RCA suppresses the amplification of contaminated sequences in the reagents, as observed in our recent studies [18, 19]. Third, TEEDseq requires purification after primer extension. Afterwards, ligation, digestion, and RCA do not need purification because all enzymes have optimal activity in the CutSmart buffer (NEB). Therefore, these reactions can be conducted successively in the same tube. Finally, phi29 DNA polymerase used in RCA has a strong stand-displacement activity. This activity results in a hyperbranched structure of the final product that usually has a large size more than 10 kb [20]. Therefore, the final product can be directly used for fragmentation in NGS library preparation without the need of additional procedure, such as concatemerization. Taken together, our experiment, using a partial HBV genome (1311 bp) spiked in a serum sample, provides concept evidence that TEEDseq is a simple and cost-effective method for target enrichment in NGS. By using multiple primers to cover target genomes in primer extension, it can be applied to clinical viral sequencing as well as human genomic research.

Limitations

The current study is merely a proof of principle for TEEDseq. It remains to be improved toward a simple experimental method. For instance, time for the steps of ligation and RCA may be shortened. In addition, the efficiency and sensitivity of TEEDseq need to be further evaluated in clinical specimens.