The detection of pathogens in clinical and environmental samples using high-throughput sequencing (HTS) is often hampered by large amounts of background information, which is especially true for viruses with small genomes. Enormous sequencing depth can be necessary to compile sufficient information for identification of a certain pathogen. Generic HTS combining with in-solution capture enrichment can markedly increase the sensitivity for virus detection in complex diagnostic samples.
A virus panel based on the principle of biotinylated RNA baits was developed for specific capture enrichment of epizootic and zoonotic viruses (VirBaits). The VirBaits set was supplemented by a SARS-CoV-2 predesigned bait set for testing recent SARS-CoV-2-positive samples. Libraries generated from complex samples were sequenced via generic HTS (without enrichment) and afterwards enriched with the VirBaits set. For validation, an internal proficiency test for emerging epizootic and zoonotic viruses (African swine fever virus, Ebolavirus, Marburgvirus, Nipah henipavirus, Rift Valley fever virus) was conducted.
The VirBaits set consists of 177,471 RNA baits (80-mer) based on about 18,800 complete viral genomes targeting 35 epizootic and zoonotic viruses. In all tested samples, viruses with both DNA and RNA genomes were clearly enriched ranging from about 10-fold to 10,000-fold for viruses including distantly related viruses with at least 72% overall identity to viruses represented in the bait set. Viruses showing a lower overall identity (38% and 46%) to them were not enriched but could nonetheless be detected based on capturing conserved genome regions. The internal proficiency test supports the improved virus detection using the combination of HTS plus targeted enrichment but also points to the risk of cross-contamination between samples.
The VirBaits approach showed a high diagnostic performance, also for distantly related viruses. The bait set is modular and expandable according to the favored diagnostics, health sector, or research question. The risk of cross-contamination needs to be taken into consideration. The application of the RNA-baits principle turned out to be user friendly, and even non-experts can easily use the VirBaits workflow. The rapid extension of the established VirBaits set adapted to actual outbreak events is possible as shown for SARS-CoV-2.
Disease control is encompassing several fields like powerful diagnostics for early detection, efficient therapy, and disease prophylaxis. The present study focused on broad and powerful diagnostics for viral pathogens including evolved, newly emerging or unrecognized viruses. The detection of the latter can be challenging for conventional virus diagnostics relying on specific quantitative PCR assays until their improvement or development of new assays [1, 2]. Sequence information for such newly emerging pathogens is often rare or not existing. In addition, PCR-based diagnosis of co-infections with several pathogens is complicated and can be time-consuming and expensive, and therefore, co-infections can easily be overseen . Moreover, pathogen detection in cases of immunocompromised patients might be hampered due to a deviating course of the infection or its ectopic location . For all the mentioned challenges, untargeted metagenomics using high-throughput sequencing (HTS) offers in general a swift and broad solution. It also enables the simultaneous detection of the genetic information of pathogens of all taxa including viruses, bacteria, and eukaryotic pathogens like parasites or fungi . The aforementioned benefits of metagenomics HTS (generally a swift and broad solution) are confined by poor sample quality, low pathogen loads, and high levels of background consisting of nucleic acids of the host or accompanying bacteria that often lead to a detrimental pathogen/background nucleic acid ratio of the final sequencing libraries . These confinements can cause analyses of the resulting sequence datasets to get laborious and time consuming and difficult to interpret, even for experts. This is especially true for pathogenic viruses with rather small RNA genomes that can get lost in datasets generated from complex samples. In the worst case (extremely unfavorable pathogen/background ratio), exceedingly large sequence datasets and intense time-consuming analyses would be necessary to compile enough meaningful information to finalize a genome of a certain pathogen . This significantly impedes pathogen detection causing a diagnostic gap and making HTS difficult to implement in the daily routine of diagnostic laboratories.
Prior to sequencing, an enrichment of certain pathogens using capture enrichment methods can help to increase the target sequence information considerably, along with avoiding extensive sequencing efforts, as recently reviewed by Gaudin and Desnues , and consequently, can enhance the sensitivity of diagnostic HTS. Basically, enrichment can be realized by the application of oligonucleotides (DNA probes or RNA baits) that are complementary to the sequence information of the target pathogens to reach a targeted positive separation of them. Target enrichment with complex samples was already applied for viral [7,8,9], bacterial , and eukaryotic [11, 12] pathogens to compile full or nearly full genomes for follow-up analyses. For diagnostic reasons, however, the combination of probes or RNA baits of several pathogens into one set, depending on the scope, for example virus, microbiological or syndromic diagnostics, is beneficial. This approach can save numberless PCR reactions, which are specific for only one pathogen or pathogen groups. Studies implementing this practice mostly using DNA probes for different virus panels were already presented [13,14,15]. However, since DNA:DNA hybrids are less stable and have a weaker hybridization efficiency than RNA:DNA hybrids [16, 17], RNA baits might be preferable.
For the purpose of improved virus diagnostics, we used RNA baits for an in-solution capture assay with subsequent HTS (Fig. 1). In a pilot study, we selected a panel of viral pathogens for notifiable epizootic and zoonotic diseases. A genome dataset for the relevant virus groups was created (containing about 18,800 full-length virus genomes), and a set of RNA baits (VirBaits) was derived from this dataset to facilitate the identification and subsequently diagnosis of viral pathogens in farm and reservoir animals but also humans. The resulting VirBaits set was tested for a set of representatives of pathogenic viruses with diagnostic samples from already examined cases instead of spiked samples. These clinical samples had, to some extent, challenging low virus titers. Due to the recent worldwide SARS-CoV-2 outbreak , we extended the bait panel for this newly emerging virus and successfully tested the set with SARS-CoV-2-positive samples.
Bait design for the targeted virus panel
The aim of the study was to design a diagnostic bait set for epizootic and zoonotic viral pathogens. For this purpose, virus genomes were compiled based on a curated list of complete virus genomes from the viralzone website (list “SEPT_2017” on viralzone website , comprising 70,352 virus genomes) for viruses causing notifiable epizootic and emerging zoonotic diseases. For influenza A (IAV), a virus with RNA genome characterized by a broad genomic variety, all available IAV sequences covering full or nearly full-length segments from influenza viruses collected from avian hosts that were available in FluDB June 2018 were taken (representing 12,608 isolates). Duplicated IAV sequences were removed, and the resulting set of 79,859 segment sequences was used as basis for bait design. The final virus genome dataset comprising about 18,800 complete virus genomes for epizootic and zoonotic diseases (Table 1) was sent to Arbor Biosciences (Ann Arbor, Michigan, USA) for bait design. On this basis, 80-mer oligonucleotides were derived with 1.3× tiling density. They were afterwards checked against 19 host genomes relevant for veterinary and human virus diagnostics (Anas platyrhynchos GCF_000355885.1, Anser cygnoides GCF_000971095.1, Bos taurus GCA_000003055.4, Camelus dromedarius GCF_000767585.1, Canis familiaris GCA_000002285.2, Capra hircus GCF_001704415.1, Cervus elaphus GCA_002197005.1, Chlorocebus sabaeus GCA_000409795.1, Enhydra lutris GCF_002288905.1, Equus caballus GCA_000002305.1, Felis catus GCF_000181335.3, Gallus gallus GCA_000002315.3, Homo sapiens GCA_000001405.15, Mus musculus GCA_000001635.2, Ovis aries GCF_000298735.2, Rattus norvegicus GCA_000001895.4, Sus scrofa GCF_000003025.6, Vicugna pacos GCF_000164845.2, Vulpes vulpes GCA_003160815.1) to remove those oligonucleotides that were likely to hybridize and capture host nucleic acids. Finally, redundancy was decreased by removing oligonucleotides that were > 95% identical to each other. Based on the final set, biotinylated RNA baits were manufactured. The tailored custom myBaits kit (VirBaits) for 48 hybridization reactions including the RNA bait set comprising 177,471 baits and adapter blockers fitting with Ion Torrent sequencing adapters was obtained (Arbor Biosciences). All baits including the GenBank accession number of the respective virus genomes are provided in Additional file 1 (fasta format).
Extension of the VirBaits set: tests with SARS-CoV-2-positive samples
To expand the abovementioned VirBaits set for the emerging SARS-CoV-2 virus, a predesigned bait set for the coronavirus SARS-CoV-2 was designed and available free of charge from Arbor Biosciences. This SARS-CoV-2 bait set was mixed with the VirBaits set in the ratio of 1:40 to simulate an extended VirBaits set, and applied as described above. Different samples of SARS-CoV-2-infected ferrets (Table 2) were used for tests with the extended VirBaits set.
The performance of the VirBaits set was tested for selected viruses including such with RNA and DNA genomes. We mainly used infected sample material of real cases that had already been sequenced for diagnostic purpose but also a spiked salmon sample from a laboratory proficiency test (Table 2). The libraries of those samples were generated using a validated unbiased workflow  without any enrichment or depletion. In the present study, these previously constructed and archived sequencing libraries (stored at − 20 °C) were treated with the bait set as described below. Most tested viruses were nearly identical or closely related to the viruses in the underlying genome set used for the bait design (case 1 samples; Table 2). Besides these viruses, we also tested samples infected with viruses that indeed belong to the virus families included in the bait design but that are only distantly related to the sequences used for the bait design (case 2 samples; Table 3).
Proficiency testing of the VirBaits application
The broad functionality of the VirBaits approach was investigated via an internal proficiency test using five blind samples. The test initiator (M.B.) provided five samples (see Table 4) only declaring sample number and type (tissue sample with Trizol or already extracted RNA) whereas the sample processor (C.W.) did not get any information on the samples like host, virus, kind of organ or tissue, or pre-diagnosis.
Hybridization and enrichment procedure
The VirBaits set was applied according to the manufacturer’s instructions (myBaits manual v.4.01, Arbor Biosciences, April 2018). In brief, hybridization reagents including baits were thawed, collected, and mixed with each other preparing hybridization (HYB) and blocker master mixes. The HYB mix was incubated for 10 min at 60 °C to clear the solution. After adapting to room temperature for 5 min, the HYB mix was aliquoted to 18.5 μl per reaction. The blocker mix was aliquoted to 5 μl each and subsequently combined with up to 7 μl of the sequencing library (resulting in the individual library samples, LIB). The LIB mix of each library was denatured for 5 min at 95 °C using a thermal cycler. Both the LIB and HYB mixes were afterwards incubated for 5 min at 65 °C. Subsequently, 18 μl of the HYB aliquot was mixed with the LIB mix aliquot each (12 μl) using pipetting up and down and spinning down, and then incubated for about 16 h at the suggested hybridization temperature (65 °C). We tested three variations from the protocol recommended in the manual. (i) The rabies virus (RABV)-infected sample L03007 was hybridized with only the half amount of the reaction volume. (ii) For IAV-infected samples L02651 and L03706, the baits in the HYB mix were diluted 1:5 with ultrapure water. (iii) The Kotalahti bat lyssavirus (KBLV)-infected sample L02374 was incubated at 62 °C instead of the default hybridization temperature of 65 °C as recommended in the manual when the expected bait-target divergence exceeds 5%. After hybridization, the samples were purified three times at the hybridization temperature using binding beads contained in the kit and finally eluted with 30 μl 10 mM Tris-Cl, 0.05% Tween-20 solution (pH 8.0–8.5). An aliquot (16 μl) of the enriched samples was amplified using the GeneRead DNA Library L amplification Kit (Qiagen) according to manufacturer’s instructions performing 14 cycles and purified using solid-phase paramagnetic bead technology. Quality check using a Bioanalyzer 2100 (Agilent Technologies) and quantification of the enriched libraries using the KAPA Library Quantification Kit Ion Torrent (Roche) were performed as described . To obtain high quality sequences, we included the Ion S5 calibration standard in the clonal amplification reaction. Samples were sequenced on an Ion Torrent S5XL instrument using Ion 530 chips and chemistry for 400 bp reads.
For direct comparison of datasets that were generated with or without enrichment, sequence datasets were analyzed via reference mapping using the Genome Sequencer software suite (version 2.6; Roche) and a reference genome sequence as indicated in Table S1 in Additional file 2. For these analyses, the default software settings for quality filtering and mapping were used. The number of mapped reads reported by the software was used as the number of reads representing the respective virus in the datasets. The number of unmapped reads was used as the background portion. The enrichment factors were calculated according to Eq. 1.
with ef, enrichment factor; m, number of virus reads in the dataset with VirBaits treatment; n, total number of reads of the dataset with VirBaits treatment; k, number of virus reads in the dataset without VirBaits treatment; l, total number of reads of the dataset without VirBaits treatment.
In addition, to gain insight into the overall composition of the VirBaits metagenomics datasets, all reads of the datasets were taxonomically classified using the RIEMS tool with default settings .
To identify the viruses comprised in the datasets generated from the proficiency test samples, a different strategy for data analyses was applied. In these cases, no prior information on the viruses contained in proficiency test samples was available to the sample processor, and hence no reference mapping of the datasets was possible. Therefore, initially, de novo assemblies of raw sequence data were conducted using the Genome Sequencer software suite (v. 2.6; Roche) with default settings. All contigs obtained from these assemblies were taxonomically assigned using blast version 2.6.0 and the NCBI nt or nr databases . Closest related complete viral genome sequence available in the nucleotide sequence database was downloaded and subsequently used as a reference sequence for mapping analysis as described above to identify all viral reads in the dataset.
Variant analyses were performed to compare within-sample variant frequencies in the RABV genome of sample L03007 and the KBLV genome of sample L02374 with and without enrichment using VirBaits. For these analyses, the KBLV whole genome sequence was assembled from the available dataset as described above. For RABV, sequence NC_001542.1 was used as reference sequence to generate the genome of the RABV comprised in the sample. Subsequently, the obtained whole genome sequences were used as reference sequences to calculate the total number of viral reads as described above. To identify potential single nucleotide variants (SNVs), the Torrent Suite plugin Torrent variantCaller (version 5.12) was used (parameter settings: generic, S5/S5XL(530/540), somatic, low stringency, changed alignment arguments for the TMAP module from map 4 [default] to map1 map2) while the previously determined whole genome sequence was set as reference. In addition, the variant analysis tool integrated in Geneious Prime (2019.2.3) was applied for further confirmation and the acquisition of variant frequencies for each variant (default settings, minimum variant frequency 0.02). Positions of identified potential SNVs were visually inspected in Geneious Prime.
Design of the virus in-solution capture assay (VirBaits)
The VirBaits panel covered 35 viral pathogens of notifiable epizootic and zoonotic diseases (Table 1). The resulting bait set comprises 177,471 biotinylated RNA baits complementary to 18,800 virus genomes (Fig. 2). The final number of baits in the VirBaits set among the target virus groups depends on the diversity of viruses comprised in the underlying genome set, the number of genomes, and the corresponding genome size, respectively. The highest portion of baits belongs to influenza A (IAV) viruses that have small segmented single-strand RNA genomes with a broad variety of genetic information that needs to be covered. Therefore, 79,859 influenza A segment sequences from 12,608 IAV isolates were used for bait design and resulted in 55,649 baits that comprise 31.4% of the entire set. In contrast, the smallest portion of only 11 underlying complete genomes for Schmallenberg virus (SBV) with a small, segmented single-strand RNA genome with limited genomic variety resulted in 198 baits (0.1% of the entire set).
Testing of the VirBaits set by directly comparing generically generated and bait-treated sequencing datasets
The VirBaits performance was assessed via direct comparison of datasets generated without and with enrichment. The portion of virus reads extracted from the metagenomics datasets using reference mapping for both datasets per sample and the enrichment factor are shown (Table 2, Fig. 3). In general, there we observed a significant effect of the VirBaits treatment. Two cases can be distinguished: viruses which are comprised in the bait set (case one) and viruses of which only distant relatives are comprised in the bait set (case two).
Case one, for all tested viruses that are represented in the VirBaits set, we could reach an enrichment between 16-fold for sample L02726, slightly loaded with West Nile virus (WNV), and 9200-fold for the RABV-infected sample L03015 (Table 2, Fig. 3). Also, for viruses with large DNA genomes (i.e., African swine fever virus, ASFV, sample L02762; Bovine herpesvirus 1, BHV, sample L02568), a clear increase of virus reads could be detected after the VirBaits treatment. Despite the lower representation in the bait set, WNV (1.8% of the baits; compare Table 1) and BHV (1.4% of the baits; compare Table 1) were significantly enriched (WNV 16-fold to 421-fold, BHV 133-fold, Fig. 3). The additionally performed tests with the complemented SARS-CoV-2 set resulted in a clear increase of coronavirus reads (9-fold to 745-fold). Nearly complete virus genomes could be assembled from many samples including samples with a high host genome background and low-level viral genome loads, see below.
Case two, the results vary regarding samples infected with viruses distantly related to the viruses included in the underlying genome set, although related taxa of the same virus family are represented in the VirBaits set. On the one hand, we found a clear enrichment for two samples: sample L02374 infected with KBLV (showing an overall identity of 72% to RABV genomes) was enriched 26-fold, and sample L03038 infected with Usutu virus (showing about 74% overall identity to WNV; compare Table 3 and Fig. 3) was enriched 100-fold. On the other hand, two samples did not show any increase after VirBaits treatment but were at least detectable. These are sample L02262 infected with a novel ovine picornavirus (OvPiV, showing about 46% and 38% overall identity to SVDV and FMDV genomes in the VirBaits set, respectively) and sample L01003 infected with a penguin alphaherpesvirus (PeHV, showing about 38% and 36% overall identity to SHV and BHV genomes, respectively; compare Table 3). Besides the targeted viral reads, the datasets mainly contained host-related reads (Mammalia) and some bacterial reads including contaminations as typical for the used sample processing workflow .
Genome coverage of virus genomes generated with the VirBaits approach
Coverage statistics are provided in Additional file 2: Table S1. The coverage of virus genomes resulting from datasets generated with enrichment is basically comparable to those generated without enrichment as shown for viruses with DNA and RNA genomes (ASFV sample L02762 and RABV sample L03007, respectively, Additional file 3: Figure S1). An improvement of coverage was found for low loaded samples (e.g., WNV sample L03451, RABV sample L02838, Additional file 3: Figure S1), whereas case two samples (L01003, L02262) show concentration in certain conserved regions (e.g., OvPiV in L02262, Additional file 3: Figure S1).
Analysis of variants in lyssavirus genomes
For sample L03007, the complete rabies lyssavirus genome assembly generated via HTS without enrichment did show the same single nucleotide variants (SNV) as the genome assembly of the VirBaits dataset. The variants in each genome assembly were found to exhibit nearly unaltered frequencies per SNV (Fig. 4) indicating an equal extraction of RABV reads using the bait treatment. The same is true for the KBLV-infected sample L02374 showing two point mutations with similar variant frequencies.
Testing of the VirBaits set with blind samples—proficiency test
In the proficiency test blind samples, high-impact transboundary or zoonotic viruses detected in the samples X1 to X4 were ASFV, FMDV, RVFV, NiV, EOV, and Marburgvirus (Table 4). In sample X5, no virus but instead reads related to the green monkey cell line was detected. The assignments made by the sample processor (C.W.) were approved as correct by the proficiency test initiator (M.B.). However, in sample X3, a small amount of NiV was detected that was not supposed to be in there indicating a cross-contamination from sample X4 indeed containing NiV. Using the RIEMS software, the full taxonomic content of all proficiency test datasets was analyzed, but no other indication of cross-contamination was found in any other proficiency test dataset.
The aim of the present study was to design and test of a virus enrichment panel, whose application markedly enhances the virus signal in diagnostic metagenomics by targeted capture enrichment. In general, the VirBaits approach with custom-designed kits using 80-mer RNA baits was easy to apply for our purpose by just compiling the relevant virus genomes and providing them to the supplier. In our opinion, it is applicable for diagnosticians who are in need for bait sets to capture specific pathogens but are not proficient in bioinformatics skills to convert the underlying genomes into meaningful oligonucleotides.
As a base for this HTS plus capture enrichment approach, we used a generic HTS workflow without applying rRNA depletion or PCR-amplification. This procedure ensures that the resulting sequencing data reflect the original sample composition in the best way . This generic HTS approach is a powerful tool and led to the detection of novel viruses in many cases [25, 26, 30], although the initial presence of the virus can be low in some cases [21, 25, 31]. Indeed, samples with very low pathogen (virus) content represent a typical diagnostic setting. Therefore, samples with a range of genome loads including very low virus titers were used in the present pilot study to mimic the realistic clinical situation.
The designed VirBaits panel was successfully applied for all tested viruses, especially for viruses that are represented in the underlying genome set (case 1 samples). Altogether, an increase of 9-fold to 9200-fold of the target sequence reads was achieved. The obtained enrichment factors were neither dependent on the bait abundance within the bait set nor on the target genome size (compare Table 1, Fig. 3, and Additional file 2: Table S1). When looking at the less enriched samples (Fig. 3), it is apparent that most of them show a very high percentage of virus amount in the VirBaits dataset indicating a certain saturation of the hybridization. Hence, a much higher increase would not be possible for these samples. Consequently, especially the low loaded samples benefit from the capture enrichment procedure. At the same time, the low loaded samples demonstrate the limit of the system: if too little or no target nucleic acids are present in the library used for capture due to the virus presence in the sample, there will be no enrichment possible. According to the myBaits manual, it is highly recommended to amplify libraries before enrichment. However, since we always use a PCR-free generic HTS workflow, we did not follow this recommendation and still obtained good results. This is in agreement with Mamanova et al. , who determined a severe bias when doing a PCR step before hybridization capture. The authors recommend keeping library amplification to a minimum and only performing 14–18 cycles after the capture hybridization especially with clinical samples exhibiting nucleic acids of lower integrity. The virome capture sequencing approach for vertebrate viruses, VirCapSeq-VERT , relies on sample pre-treatments for virus enrichment like filtration of samples, treatment of samples with ribonucleases to reduce extra-viral RNA and subsequent depletion of ribosomal RNA. Our study clearly shows that an amplification- and depletion-free HTS workflow is well suitable to be combined with capture enrichment. In addition, since RNA:DNA hybrids are more stable and have a higher hybridization efficiency than DNA:DNA hybrids [16, 17], the VirBaits approach leads to a significant increase of the target viruses including even only moderately related viruses (case 2 samples). However, care must be taken when working with RNA baits, which are not as stable as DNA probes, by avoiding contaminations with RNases during handling and assuring the cold chain.
The bait amount per virus group in the entire panel was varying, yet the performance for virus groups that are represented by few baits was satisfying with regard to the aims set out in the introduction, i.e., at least improved detection of a virus, as shown for RVFV, BHV, HeV, and NiV (Fig. 3, Table 4) indicating a robust and efficient method. Regarding samples infected with the OvPiV and the PeHV, the viruses are only distantly related to the viruses in the underlying genome set (case 2 samples). For such cases, an adaption of hybridization temperature and time might increase the output for those distantly related viruses. For the latter modification, evaporation might become a serious problem that needs to be considered. Nonetheless, distantly related viruses can be detected by the standard VirBaits procedure via capturing conserved regions. Hence, they would not get lost during the procedure. Altogether, the results indicate a good applicability for all samples including low-load virus samples if included in the VirBaits set or moderately related to those viruses.
The tiling density is low in the present VirBaits panel (1.3×) compared with panels for single species applying a tiling density of 3×  or 4× . This is owing to the diagnostic approach we were intending to design, improving diagnostic sensitivity but not necessarily leading to the assembly of whole genomes. Typically, one specific read in a sequence dataset indicating a pathogenic virus would be enough to call a suspicion and trigger follow-up analyses. However, as shown here, the used tiling density in many cases allows for the assembly of nearly complete genomes emphasizing the sensitivity of the approach. In addition, the detected SNVs of RABV and KBLV genomes point to equal frequencies (Fig. 4) indicating a consistent recovery of the genomes via capture enrichment as already demonstrated by Metsky et al. .
A general benefit of capture approaches is that unlike for targeted assays like qPCR or serologic assays no prior decision is necessary for which pathogen (virus) is sought after, i.e., which specific test system needs to be applied. Of course, this requires that substantial knowledge and effort be put into the design of the bait set and that the used bait set fits the specific purpose. This saves time and money by avoiding application of many laborious diagnostic techniques successively until a causative agent can be suspected and confirmed. Moreover, a negative result of a PCR approach can mask an undetected viral pathogen when PCR primers do not fit anymore with the genome of evolved pathogens (e.g., [34, 35]). Otherwise, with positive PCR result, co-infections with several different pathogens can be overseen. For cases with atypical clinical signs or deviating viral pathogens, it might be that the relevant specific diagnostics are not considered or not existing, respectively. The synchronous detection of different viruses, as shown for sample L03038 having a co-infection with WNV and USUV, enabled by the generic HTS approach in combination with VirBaits approach is a great benefit. Compared to multiplex PCR approaches, that allow the detection of several viruses or virus variants in parallel [36, 37], the VirBaits approach has a broader applicability and often provides sequence information of full or nearly full genomes instead of small gene fragments. This information can additionally be used for follow-up analyses on genetic diversity, virulence, or functional potential.
The results of the internal proficiency test corroborate the applicability of the VirBaits workflow combined with HTS as all-in-one approach for different pathogenic viruses without the necessity to do laborious diagnostic tests for at least each single virus or small groups of viruses. The datasets generated were relatively small but resulted in all cases in enough reads to determine the respective pathogen without any doubt (compare Tables 2 and 3). This also indicates the potential for reduction of sequencing costs when applying an enrichment method. For sample X1 infected with the DNA virus ASFV, the portion of virus reads in the dataset was only low since the sequencing library was made from RNA according to the standard HTS workflow . In that case, starting from DNA would have delivered much higher read numbers, as seen for sample L02762 yielding a proper amount of African swine fever virus reads (compare Table 2). In sample X3, some reads of the NiV were found while they were not supposed to be in this sample but only in sample X4. This point to the risk of cross-contamination can be an issue for manual sample handling. Cross-contaminations typically occur in laboratories but become especially visible when highly enriched samples are used, like amplicons, or bait-enriched samples. An automated procedure or a cartridge-based (in the ideal case closed) system may avoid cross-contamination in the future by reducing (excluding) opportunities for aerosol contamination between samples.
The rapid inclusion of baits and samples related to the recent SARS-CoV-2 outbreak demonstrate the high dynamics and ad hoc adaptability of the here tested approach. The results for those samples show that the VirBaits workflow can be updated very rapidly and easily for viruses causing newly emerging health threats. In addition, we investigated several samples related to the WNV epidemic in Germany [20, 21] including two horse samples. Horses are dead-end hosts for WNV, and horse samples show generally lower virus titers in contrast to samples of bird being reservoir hosts for WNV infection. As shown here, the VirBaits treatments considerably improved the WNV detection for horse samples (L03451).
The application of myBaits is still expensive, although sequencing costs decrease with this procedure since no large data sets are necessary (compare Tables 2, 3, and 4). The reduction of the reaction volume by 50%, as shown here for L03007, can reduce the cost for the enrichment reagents. Nevertheless, this opportunity needs to be validated in further trials and might depend on the individual sample type and the contained virus.
Resulting from the present pilot study, the VirBaits panel can be improved by optimizing the distribution of baits per virus group. This especially means a significant reduction of IAV baits, as shown here with the dilution of the baits for IAV samples, but also other viruses with small RNA genomes like FMDV, RABV, and NDV (compare Fig. 2). Redundancy of the baits can be decreased by removing oligonucleotides that were > 90% identical to each other (in contrast to 95% as used for the pilot study). This decrease should still allow good hybridization results considering the performance of deviating baits as shown here for KBLV, USUV, OvPiV, and PeHV. Pooling of multiple libraries (for cost reduction) and application of reduced hybridization temperatures or prolonged hybridization times (up to 48 h) need to be investigated for distantly related targets. The latter step has to go along with procedures to avoid strong evaporation holding the reaction volume nearly constant during extended incubation time.
The biotinylated RNA-baits principle for capture enrichment with a large and diverse virus panel as tested here is highly suitable to improve diagnostics for viruses causing notifiable epizootic and zoonotic diseases. It can be dynamically updated with new RNA baits because of the modular concept that allows the addition of new bait sets for emerging diseases. The combination of a generic and unbiased HTS workflow with the capture approach leads to an enhancement of the viral signal in terms of sequence reads in the sequence dataset resulting in higher sensitivity of HTS. The identification of distantly related viruses can be difficult as shown for a deviating avian bornavirus in a proficiency test for HTS bioinformatics . Although the problem in the mentioned study was related to bioinformatics analyses, the accumulation of certain virus reads by capture enrichment can avoid overlooking pathogenic viruses contained in a sample. The application of such an approach is recommendable for cases of unclear clinical pictures and can replace time-consuming screenings using large panel PCR diagnostics for specific single pathogens. This leads to an accelerated diagnosis and an improved detection compared with HTS without prior enrichment. Although not the primary intention, the application of the VirBaits even enables the generation of nearly complete genome sequences in most cases.
African horse sickness virus
African swine fever virus
Bovine leukemia virus
Bovine herpesvirus 1
Borna disease virus BTV, Bluetongue virus
Bovine viral diarrhea virus 1, 2
Crimean-Congo hemorrhagic fever virus
Classical swine fever virus
Eastern, Venezuelan, and Western Equine encephalitis virus
Epizootic hemorrhagic disease virus
Equine infectious anemia virus
Foot-and-mouth disease virus
Influenza A virus
Kotalahti bat lyssavirus
Lumpy skin disease virus
Newcastle disease virus
Small ruminant morbillivirus
Rift Valley fever virus
Severe acute respiratory syndrome coronavirus 2
Suid alphaherpesvirus 1
Swine vesicular disease virus
Variegated squirrel bornavirus-1
Vesicular stomatitis Indiana virus
West Nile virus
Forth LF, Konrath A, Klose K, et al. A novel squirrel respirovirus with putative zoonotic potential. Viruses. 2018;10:373.
Wang Y, Wang Y, Chen Y, Qin Q. Unique epidemiological and clinical features of the emerging 2019 novel coronavirus pneumonia (COVID-19) implicate special control measures. J Med Virol. 2020;92:568–76.
Wylezich C, Papa A, Beer M, Höper D. A versatile sample processing workflow for metagenomic pathogen detection. Sci Rep. 2018;8:13108.
Olson PD, Yoder K, Fajardo LF, et al. Lethal invasive cestodiasis in immunosuppressed patients. J Infect Dis. 2003;187:1962–6.
Gaudin M, Desnues C. Hybrid capture-based next generation sequencing and its application to human infectious diseases. Front Microbiol. 2018;9:2924.
Höper D, Mettenleiter TC, Beer M. Metagenomic approaches to identifying infectious agents. Rev Sci Tech. 2016;35:83–93.
Depledge DP, Palser AL, Watson SJ, et al. Specific capture and whole-genome sequencing of viruses from clinical samples. PLoS One. 2011;6:e27805.
Forth JH, Tignon M, Cay AB, et al. Comparative analysis of whole-genome sequence of African swine fever virus Belgium 2018/1. Emerg Infect Dis. 2019;25:1249–52.
Strubbia S, Schaeffer J, Oude Munnink BB, et al. Metavirome sequencing to evaluate norovirus diversity in sewage and related bioaccumulated oysters. Front Microbiol. 2019;10:2394.
Brown AC, Bryant JM, Einer-Jensen K, et al. Rapid whole-genome sequencing of Mycobacterium tuberculosis isolates directly from clinical samples. J Clin Microbiol. 2015;53:2230–7.
Melnikov A, Galinsky K, Rogov P, et al. Hybrid selection for sequencing pathogen genomes from clinical samples. Genome Biol. 2011;12:R73.
Amorim-Vaz S, Tran VDT, Pradervand S, et al. RNA enrichment method for quantitative transcriptional analysis of pathogens in vivo applied to the fungus Candida albicans. mBio. 2015;6:e00942–15.
Wylie TN, Wylie KM, Herter BN, Storch GA. Enhanced virome sequencing using targeted sequence capture. Genome Res. 2015;25:1910–20.
Briese T, Kapoor A, Mishra N, et al. Virome capture sequencing enables sensitive viral diagnosis and comprehensive virome analysis. mBio. 2015;6:e01491–15.
Metsky HC, Siddle KJ, Gladden-Young A, et al. Capturing sequence diversity in metagenomes with comprehensive and scalable probe design. Nat Biotechnol. 2019;37:160–8.
Chien YH, Davidson N. RNA:DNA hybrids are more stable than DNA:DNA duplexes in concentrated perchlorate and trichloroacetate solutions. Nucleic Acids Res. 1978;5:1627–37.
Lesnik EA, Freier SM. Relative thermodynamic stability of DNA, RNA, and DNA:RNA hybrid duplexes: relationship with base composition and structure. Biochemistry. 1995;34:10807–15.
Zhou P, Yang XL, Wang XG, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270–3.
Hulo C, de Castro E, Masson P, et al. ViralZone: a knowledge resource to understand virus diversity. Nucleic Acids Res. 2011;39(Database issue):D576–82.
Ziegler U, Lühken R, Keller M, et al. West Nile virus epizootic in Germany, 2018. Antiviral Res. 2019;162:39–43.
Ziegler U, Santos PD, Groschup MH, et al. West Nile virus epidemic in Germany triggered by epizootic emergence, 2019. Viruses. 2020;12:448.
Pfaff F, Hägglund S, Zoli M, et al. Proteogenomics uncovers critical elements of host response in bovine soft palate epithelial cells following in vitro infection with Foot-And-Mouth Disease Virus. Viruses. 2019;11:E53.
De Cesare A, Oliveri C, Lucchi A, et al. Application of shotgun metagenomics to smoked salmon experimentally spiked: comparison between sequencing and microbiological data using different bioinformatic approaches. Ital J Food Safety. 2019;8(8462):205–8.
Schlottau K, Rissmann M, Graaf A, et al. SARS-CoV-2 in fruit bats, ferrets, pigs, and chickens: an experimental transmission study. Lancet Microbe. https://doi.org/10.1016/S2666-5247(20)30089-6.
Forth LF, Scholes SFE, Pesavento PA, et al. Novel picornavirus in lambs with severe encephalomyelitis. Emerg Infect Dis. 2019;25:963–7.
Pfaff F, Schulze C, König P, et al. A novel alphaherpesvirus associated with fatal diseases in banded penguins. J Gen Virol. 2017;98:89–95.
Calvelage S, Tammiranta N, Nokireki T, Gadd T, Eggerbauer E, Zaeck LM, et al. Genetic and Antigenetic Characterization of the Novel Kotalahti Bat Lyssavirus (KBLV). Viruses. 2021;13:69.
Scheuch M, Höper D, Beer M. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets. BMC Bioinf. 2015;16:69.
Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.
Pfaff F, Schlottau K, Scholes S, et al. A novel astrovirus associated with encephalitis and ganglionitis in domestic sheep. Transbound. Emerg Dis. 2017;64:677–82.
Hoffmann B, Tappe D, Höper D. A variegated squirrel bornavirus associated with fatal human encephalitis. N Engl J Med. 2015;373:154–62.
Mamanova L, Coffey AJ, Scott CE, et al. Target-enrichment strategies for next-generation sequencing. Nat Methods. 2010;7:111–8.
Patterson Ross Z, Klunk J, Fornaciari G, et al. The paradox of HBV evolution as revealed from a 16th century mummy. PLoS Pathog. 2018;14:e1006750.
Belbis G, Zientara S, Breard E, et al. Bluetongue virus: from BTV-1 to BTV-27. Adv Virus Res. 2017;99:161–97.
Rubbenstroth D, Peus E, Schramm E, et al. Identification of a novel clade of group A rotaviruses in fatally diseased domestic pigeons in Europe. Transbound Emerg Dis. 2019;66:552–61.
Brinkmann A, Ergünay K, Radonić A, et al. Development and preliminary evaluation of a multiplexed amplification and next generation sequencing method for viral hemorrhagic fever diagnostics. PLoS Negl Trop Dis. 2017;11:e0006075.
Hoffmann B, Hoffmann D, Henritzi D, et al. Riems influenza a typing array (RITA): An RT-qPCR-based low density array for subtyping avian and mammalian influenza a viruses. Sci Rep. 2016;6:27211.
Brinkmann A, Andrusch A, Belka A, et al. Proficiency testing of virus diagnostics based on bioinformatics analysis of simulated in silico high-throughput sequencing datasets. J Clin Microbiol. 2019;57:e00466–19.
We acknowledge Brian Brunelle and Alison Devault (Arbor Biosciences) for professional help during bait design. We thank Andrea Aebischer Sandra Blome, Sandra Diederich, Martin Eiden, and Michael Eschbaumer for providing the proficiency test samples. We are grateful to Dennis Rubbenstroth and Arnt Ebinger for providing genome sequences; Patrick Zitzow for excellent technical support; and Leonie Forth, Jan Forth, Conrad Freuling, Florian Pfaff, Pauline Santos, Jacob Schön, Adi Steinrigl, and Donata Hoffmann for sharing samples and sequencing libraries. Finally, we thank Arbor Biosciences for providing the predesigned SARS-CoV-2 bait set.
This work was financially supported by the German Federal Ministry of Food and Agriculture through the Federal Office for Agriculture and Food, project ZooSeq, grant number 2819114019. This work was in part financed by EU Horizon 2020 program grant agreement VEO, grant number 874735.
Ethics approval and consent to participate
For samples L02912 and L02913, animal experiments were evaluated and approved by the ethics committee of the State Office of Agriculture, Food safety, and Fishery in Mecklenburg–Western Pomerania (LALLF M-V: LVL MV/TSD/7221.3-1-067/18). For samples L02456 and L03827-30 as well as L03862-64, ethics approvals are documented in the publications as cited in Table 2 and references [22, 24], respectively.
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1.
Baits; all baits including the GenBank accession number of the respective virus genomes are provided (fasta format, 26 kb).
Additional file 2: Table S1.
Coverage statistics of genomes of the viruses in all samples generated in this study with enrichment approach (VirBaits) after reference mapping using the Genome Sequencer software suite, including used references, number of baits in the VirBaits panel and enrichment factor of virus reads.
Additional file 3: Figure S1.
Coverage of genomes of selected viruses generated without (generic HTS) or with enrichment (VirBaits) after reference mapping with the Genome Sequencer software suite; displayed using Geneious Prime (2019.2.3).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Wylezich, C., Calvelage, S., Schlottau, K. et al. Next-generation diagnostics: virus capture facilitates a sensitive viral diagnosis for epizootic and zoonotic pathogens including SARS-CoV-2. Microbiome 9, 51 (2021). https://doi.org/10.1186/s40168-020-00973-z
- Metagenomics virus detection
- Capture enrichment
- RNA baits
- VirBaits 1.0
- Diagnostic tool
- Proficiency test