Background

Disease control is encompassing several fields like powerful diagnostics for early detection, efficient therapy, and disease prophylaxis. The present study focused on broad and powerful diagnostics for viral pathogens including evolved, newly emerging or unrecognized viruses. The detection of the latter can be challenging for conventional virus diagnostics relying on specific quantitative PCR assays until their improvement or development of new assays [1, 2]. Sequence information for such newly emerging pathogens is often rare or not existing. In addition, PCR-based diagnosis of co-infections with several pathogens is complicated and can be time-consuming and expensive, and therefore, co-infections can easily be overseen [3]. Moreover, pathogen detection in cases of immunocompromised patients might be hampered due to a deviating course of the infection or its ectopic location [4]. For all the mentioned challenges, untargeted metagenomics using high-throughput sequencing (HTS) offers in general a swift and broad solution. It also enables the simultaneous detection of the genetic information of pathogens of all taxa including viruses, bacteria, and eukaryotic pathogens like parasites or fungi [3]. The aforementioned benefits of metagenomics HTS (generally a swift and broad solution) are confined by poor sample quality, low pathogen loads, and high levels of background consisting of nucleic acids of the host or accompanying bacteria that often lead to a detrimental pathogen/background nucleic acid ratio of the final sequencing libraries [5]. These confinements can cause analyses of the resulting sequence datasets to get laborious and time consuming and difficult to interpret, even for experts. This is especially true for pathogenic viruses with rather small RNA genomes that can get lost in datasets generated from complex samples. In the worst case (extremely unfavorable pathogen/background ratio), exceedingly large sequence datasets and intense time-consuming analyses would be necessary to compile enough meaningful information to finalize a genome of a certain pathogen [6]. This significantly impedes pathogen detection causing a diagnostic gap and making HTS difficult to implement in the daily routine of diagnostic laboratories.

Prior to sequencing, an enrichment of certain pathogens using capture enrichment methods can help to increase the target sequence information considerably, along with avoiding extensive sequencing efforts, as recently reviewed by Gaudin and Desnues [5], and consequently, can enhance the sensitivity of diagnostic HTS. Basically, enrichment can be realized by the application of oligonucleotides (DNA probes or RNA baits) that are complementary to the sequence information of the target pathogens to reach a targeted positive separation of them. Target enrichment with complex samples was already applied for viral [7,8,9], bacterial [10], and eukaryotic [11, 12] pathogens to compile full or nearly full genomes for follow-up analyses. For diagnostic reasons, however, the combination of probes or RNA baits of several pathogens into one set, depending on the scope, for example virus, microbiological or syndromic diagnostics, is beneficial. This approach can save numberless PCR reactions, which are specific for only one pathogen or pathogen groups. Studies implementing this practice mostly using DNA probes for different virus panels were already presented [13,14,15]. However, since DNA:DNA hybrids are less stable and have a weaker hybridization efficiency than RNA:DNA hybrids [16, 17], RNA baits might be preferable.

For the purpose of improved virus diagnostics, we used RNA baits for an in-solution capture assay with subsequent HTS (Fig. 1). In a pilot study, we selected a panel of viral pathogens for notifiable epizootic and zoonotic diseases. A genome dataset for the relevant virus groups was created (containing about 18,800 full-length virus genomes), and a set of RNA baits (VirBaits) was derived from this dataset to facilitate the identification and subsequently diagnosis of viral pathogens in farm and reservoir animals but also humans. The resulting VirBaits set was tested for a set of representatives of pathogenic viruses with diagnostic samples from already examined cases instead of spiked samples. These clinical samples had, to some extent, challenging low virus titers. Due to the recent worldwide SARS-CoV-2 outbreak [18], we extended the bait panel for this newly emerging virus and successfully tested the set with SARS-CoV-2-positive samples.

Fig. 1
figure 1

Sketch of the viral diagnostic HTS workflow combining generic metagenomics [3] with the VirBaits approach. Images used as symbols were obtained from the free website https://pixabay.com/

Methods

Bait design for the targeted virus panel

The aim of the study was to design a diagnostic bait set for epizootic and zoonotic viral pathogens. For this purpose, virus genomes were compiled based on a curated list of complete virus genomes from the viralzone website (list “SEPT_2017” on viralzone website [19], comprising 70,352 virus genomes) for viruses causing notifiable epizootic and emerging zoonotic diseases. For influenza A (IAV), a virus with RNA genome characterized by a broad genomic variety, all available IAV sequences covering full or nearly full-length segments from influenza viruses collected from avian hosts that were available in FluDB June 2018 were taken (representing 12,608 isolates). Duplicated IAV sequences were removed, and the resulting set of 79,859 segment sequences was used as basis for bait design. The final virus genome dataset comprising about 18,800 complete virus genomes for epizootic and zoonotic diseases (Table 1) was sent to Arbor Biosciences (Ann Arbor, Michigan, USA) for bait design. On this basis, 80-mer oligonucleotides were derived with 1.3× tiling density. They were afterwards checked against 19 host genomes relevant for veterinary and human virus diagnostics (Anas platyrhynchos GCF_000355885.1, Anser cygnoides GCF_000971095.1, Bos taurus GCA_000003055.4, Camelus dromedarius GCF_000767585.1, Canis familiaris GCA_000002285.2, Capra hircus GCF_001704415.1, Cervus elaphus GCA_002197005.1, Chlorocebus sabaeus GCA_000409795.1, Enhydra lutris GCF_002288905.1, Equus caballus GCA_000002305.1, Felis catus GCF_000181335.3, Gallus gallus GCA_000002315.3, Homo sapiens GCA_000001405.15, Mus musculus GCA_000001635.2, Ovis aries GCF_000298735.2, Rattus norvegicus GCA_000001895.4, Sus scrofa GCF_000003025.6, Vicugna pacos GCF_000164845.2, Vulpes vulpes GCA_003160815.1) to remove those oligonucleotides that were likely to hybridize and capture host nucleic acids. Finally, redundancy was decreased by removing oligonucleotides that were > 95% identical to each other. Based on the final set, biotinylated RNA baits were manufactured. The tailored custom myBaits kit (VirBaits) for 48 hybridization reactions including the RNA bait set comprising 177,471 baits and adapter blockers fitting with Ion Torrent sequencing adapters was obtained (Arbor Biosciences). All baits including the GenBank accession number of the respective virus genomes are provided in Additional file 1 (fasta format).

Table 1 Viral pathogens causing notifiable or emerging epizootic and zoonotic diseases, their nucleic acids, number of genomes or genome segments, and baits represented on the VirBaits set. The percentage of baits on the VirBaits set is given for each virus (compare Fig. 1)

Extension of the VirBaits set: tests with SARS-CoV-2-positive samples

To expand the abovementioned VirBaits set for the emerging SARS-CoV-2 virus, a predesigned bait set for the coronavirus SARS-CoV-2 was designed and available free of charge from Arbor Biosciences. This SARS-CoV-2 bait set was mixed with the VirBaits set in the ratio of 1:40 to simulate an extended VirBaits set, and applied as described above. Different samples of SARS-CoV-2-infected ferrets (Table 2) were used for tests with the extended VirBaits set.

Table 2 Overview of tested libraries: case 1 samples. For viruses enclosed in the samples, reference genomes were included in the genome set used for VirBait design. Sample matrix, animal species, the detected virus, and the references are given. The amount of virus reads in the sequence datasets found in the originally sequenced libraries and in the VirBaits treated libraries is given as percentage. Libraries were made from RNA except the one marked by an asterisk, which was made from DNA

Sample material

The performance of the VirBaits set was tested for selected viruses including such with RNA and DNA genomes. We mainly used infected sample material of real cases that had already been sequenced for diagnostic purpose but also a spiked salmon sample from a laboratory proficiency test (Table 2). The libraries of those samples were generated using a validated unbiased workflow [3] without any enrichment or depletion. In the present study, these previously constructed and archived sequencing libraries (stored at − 20 °C) were treated with the bait set as described below. Most tested viruses were nearly identical or closely related to the viruses in the underlying genome set used for the bait design (case 1 samples; Table 2). Besides these viruses, we also tested samples infected with viruses that indeed belong to the virus families included in the bait design but that are only distantly related to the sequences used for the bait design (case 2 samples; Table 3).

Table 3 Overview of tested libraries: case 2 samples. For the viruses enclosed in the samples, for that no reference genomes were included in the genome set used for the VirBaits design. Sample matrix, animal species, the detected virus, and the references are given. The amount of virus reads in the sequence datasets found in the originally sequenced libraries and in the VirBaits-treated libraries is given as percentage. Libraries were made from RNA except the one marked by an asterisk, which was made from DNA

Proficiency testing of the VirBaits application

The broad functionality of the VirBaits approach was investigated via an internal proficiency test using five blind samples. The test initiator (M.B.) provided five samples (see Table 4) only declaring sample number and type (tissue sample with Trizol or already extracted RNA) whereas the sample processor (C.W.) did not get any information on the samples like host, virus, kind of organ or tissue, or pre-diagnosis.

Table 4 Proficiency test samples with information on viruses contained and found after VirBaits treatment and dataset size

Hybridization and enrichment procedure

The VirBaits set was applied according to the manufacturer’s instructions (myBaits manual v.4.01, Arbor Biosciences, April 2018). In brief, hybridization reagents including baits were thawed, collected, and mixed with each other preparing hybridization (HYB) and blocker master mixes. The HYB mix was incubated for 10 min at 60 °C to clear the solution. After adapting to room temperature for 5 min, the HYB mix was aliquoted to 18.5 μl per reaction. The blocker mix was aliquoted to 5 μl each and subsequently combined with up to 7 μl of the sequencing library (resulting in the individual library samples, LIB). The LIB mix of each library was denatured for 5 min at 95 °C using a thermal cycler. Both the LIB and HYB mixes were afterwards incubated for 5 min at 65 °C. Subsequently, 18 μl of the HYB aliquot was mixed with the LIB mix aliquot each (12 μl) using pipetting up and down and spinning down, and then incubated for about 16 h at the suggested hybridization temperature (65 °C). We tested three variations from the protocol recommended in the manual. (i) The rabies virus (RABV)-infected sample L03007 was hybridized with only the half amount of the reaction volume. (ii) For IAV-infected samples L02651 and L03706, the baits in the HYB mix were diluted 1:5 with ultrapure water. (iii) The Kotalahti bat lyssavirus (KBLV)-infected sample L02374 was incubated at 62 °C instead of the default hybridization temperature of 65 °C as recommended in the manual when the expected bait-target divergence exceeds 5%. After hybridization, the samples were purified three times at the hybridization temperature using binding beads contained in the kit and finally eluted with 30 μl 10 mM Tris-Cl, 0.05% Tween-20 solution (pH 8.0–8.5). An aliquot (16 μl) of the enriched samples was amplified using the GeneRead DNA Library L amplification Kit (Qiagen) according to manufacturer’s instructions performing 14 cycles and purified using solid-phase paramagnetic bead technology. Quality check using a Bioanalyzer 2100 (Agilent Technologies) and quantification of the enriched libraries using the KAPA Library Quantification Kit Ion Torrent (Roche) were performed as described [3]. To obtain high quality sequences, we included the Ion S5 calibration standard in the clonal amplification reaction. Samples were sequenced on an Ion Torrent S5XL instrument using Ion 530 chips and chemistry for 400 bp reads.

Data analysis

For direct comparison of datasets that were generated with or without enrichment, sequence datasets were analyzed via reference mapping using the Genome Sequencer software suite (version 2.6; Roche) and a reference genome sequence as indicated in Table S1 in Additional file 2. For these analyses, the default software settings for quality filtering and mapping were used. The number of mapped reads reported by the software was used as the number of reads representing the respective virus in the datasets. The number of unmapped reads was used as the background portion. The enrichment factors were calculated according to Eq. 1.

$$ \mathrm{ef}=\frac{\left(\frac{m}{n}\right)}{\left(\frac{k}{l}\right)} $$
(1)

with ef, enrichment factor; m, number of virus reads in the dataset with VirBaits treatment; n, total number of reads of the dataset with VirBaits treatment; k, number of virus reads in the dataset without VirBaits treatment; l, total number of reads of the dataset without VirBaits treatment.

In addition, to gain insight into the overall composition of the VirBaits metagenomics datasets, all reads of the datasets were taxonomically classified using the RIEMS tool with default settings [28].

To identify the viruses comprised in the datasets generated from the proficiency test samples, a different strategy for data analyses was applied. In these cases, no prior information on the viruses contained in proficiency test samples was available to the sample processor, and hence no reference mapping of the datasets was possible. Therefore, initially, de novo assemblies of raw sequence data were conducted using the Genome Sequencer software suite (v. 2.6; Roche) with default settings. All contigs obtained from these assemblies were taxonomically assigned using blast version 2.6.0 and the NCBI nt or nr databases [29]. Closest related complete viral genome sequence available in the nucleotide sequence database was downloaded and subsequently used as a reference sequence for mapping analysis as described above to identify all viral reads in the dataset.

Variant analyses were performed to compare within-sample variant frequencies in the RABV genome of sample L03007 and the KBLV genome of sample L02374 with and without enrichment using VirBaits. For these analyses, the KBLV whole genome sequence was assembled from the available dataset as described above. For RABV, sequence NC_001542.1 was used as reference sequence to generate the genome of the RABV comprised in the sample. Subsequently, the obtained whole genome sequences were used as reference sequences to calculate the total number of viral reads as described above. To identify potential single nucleotide variants (SNVs), the Torrent Suite plugin Torrent variantCaller (version 5.12) was used (parameter settings: generic, S5/S5XL(530/540), somatic, low stringency, changed alignment arguments for the TMAP module from map 4 [default] to map1 map2) while the previously determined whole genome sequence was set as reference. In addition, the variant analysis tool integrated in Geneious Prime (2019.2.3) was applied for further confirmation and the acquisition of variant frequencies for each variant (default settings, minimum variant frequency 0.02). Positions of identified potential SNVs were visually inspected in Geneious Prime.

Results

Design of the virus in-solution capture assay (VirBaits)

The VirBaits panel covered 35 viral pathogens of notifiable epizootic and zoonotic diseases (Table 1). The resulting bait set comprises 177,471 biotinylated RNA baits complementary to 18,800 virus genomes (Fig. 2). The final number of baits in the VirBaits set among the target virus groups depends on the diversity of viruses comprised in the underlying genome set, the number of genomes, and the corresponding genome size, respectively. The highest portion of baits belongs to influenza A (IAV) viruses that have small segmented single-strand RNA genomes with a broad variety of genetic information that needs to be covered. Therefore, 79,859 influenza A segment sequences from 12,608 IAV isolates were used for bait design and resulted in 55,649 baits that comprise 31.4% of the entire set. In contrast, the smallest portion of only 11 underlying complete genomes for Schmallenberg virus (SBV) with a small, segmented single-strand RNA genome with limited genomic variety resulted in 198 baits (0.1% of the entire set).

Fig. 2
figure 2

The distribution of RNA-baits among the virus groups included in the final VirBaits panel is shown. The black-white part of the upper panel is zoomed out in the lower panel

Testing of the VirBaits set by directly comparing generically generated and bait-treated sequencing datasets

The VirBaits performance was assessed via direct comparison of datasets generated without and with enrichment. The portion of virus reads extracted from the metagenomics datasets using reference mapping for both datasets per sample and the enrichment factor are shown (Table 2, Fig. 3). In general, there we observed a significant effect of the VirBaits treatment. Two cases can be distinguished: viruses which are comprised in the bait set (case one) and viruses of which only distant relatives are comprised in the bait set (case two).

Fig. 3
figure 3

Enrichment of virus reads after VirBaits treatment. The numbers refer to the tested samples according to Tables 2 and 3. Black and gray bars indicate viruses that were included or not included in the bait design, respectively (labeling at left axis). White and gray dots indicate the virus amount (percentage) in datasets obtained without and with VirBaits treatment, respectively (labeling at right axis). For samples L02569 and L03828, no enrichment factor was calculated since no virus read was found in datasets obtained without VirBaits treatment (generic HTS). RNA viruses are marked in orange; DNA viruses are marked in blue

Case one, for all tested viruses that are represented in the VirBaits set, we could reach an enrichment between 16-fold for sample L02726, slightly loaded with West Nile virus (WNV), and 9200-fold for the RABV-infected sample L03015 (Table 2, Fig. 3). Also, for viruses with large DNA genomes (i.e., African swine fever virus, ASFV, sample L02762; Bovine herpesvirus 1, BHV, sample L02568), a clear increase of virus reads could be detected after the VirBaits treatment. Despite the lower representation in the bait set, WNV (1.8% of the baits; compare Table 1) and BHV (1.4% of the baits; compare Table 1) were significantly enriched (WNV 16-fold to 421-fold, BHV 133-fold, Fig. 3). The additionally performed tests with the complemented SARS-CoV-2 set resulted in a clear increase of coronavirus reads (9-fold to 745-fold). Nearly complete virus genomes could be assembled from many samples including samples with a high host genome background and low-level viral genome loads, see below.

Case two, the results vary regarding samples infected with viruses distantly related to the viruses included in the underlying genome set, although related taxa of the same virus family are represented in the VirBaits set. On the one hand, we found a clear enrichment for two samples: sample L02374 infected with KBLV (showing an overall identity of 72% to RABV genomes) was enriched 26-fold, and sample L03038 infected with Usutu virus (showing about 74% overall identity to WNV; compare Table 3 and Fig. 3) was enriched 100-fold. On the other hand, two samples did not show any increase after VirBaits treatment but were at least detectable. These are sample L02262 infected with a novel ovine picornavirus (OvPiV, showing about 46% and 38% overall identity to SVDV and FMDV genomes in the VirBaits set, respectively) and sample L01003 infected with a penguin alphaherpesvirus (PeHV, showing about 38% and 36% overall identity to SHV and BHV genomes, respectively; compare Table 3). Besides the targeted viral reads, the datasets mainly contained host-related reads (Mammalia) and some bacterial reads including contaminations as typical for the used sample processing workflow [3].

Genome coverage of virus genomes generated with the VirBaits approach

Coverage statistics are provided in Additional file 2: Table S1. The coverage of virus genomes resulting from datasets generated with enrichment is basically comparable to those generated without enrichment as shown for viruses with DNA and RNA genomes (ASFV sample L02762 and RABV sample L03007, respectively, Additional file 3: Figure S1). An improvement of coverage was found for low loaded samples (e.g., WNV sample L03451, RABV sample L02838, Additional file 3: Figure S1), whereas case two samples (L01003, L02262) show concentration in certain conserved regions (e.g., OvPiV in L02262, Additional file 3: Figure S1).

Analysis of variants in lyssavirus genomes

For sample L03007, the complete rabies lyssavirus genome assembly generated via HTS without enrichment did show the same single nucleotide variants (SNV) as the genome assembly of the VirBaits dataset. The variants in each genome assembly were found to exhibit nearly unaltered frequencies per SNV (Fig. 4) indicating an equal extraction of RABV reads using the bait treatment. The same is true for the KBLV-infected sample L02374 showing two point mutations with similar variant frequencies.

Fig. 4
figure 4

Schematic representation of single nucleotide variants (SNV) for the brain sample L03007 infected with RABV (upper panel) and sample L02374 infected with KBLV (lower panel). Portions of the variants are given for the dataset processed with the untargeted workflow (gray bars) and for the dataset after VirBaits treatment (white bars). Four and two SNVs deviating from the consensus were detected along the RABV genome (at positions 2926; 7419; 11,245; and 11,790) and the KBLV (at positions 2547 and 6172), respectively, by applying a strand bias less than 70%

Testing of the VirBaits set with blind samples—proficiency test

In the proficiency test blind samples, high-impact transboundary or zoonotic viruses detected in the samples X1 to X4 were ASFV, FMDV, RVFV, NiV, EOV, and Marburgvirus (Table 4). In sample X5, no virus but instead reads related to the green monkey cell line was detected. The assignments made by the sample processor (C.W.) were approved as correct by the proficiency test initiator (M.B.). However, in sample X3, a small amount of NiV was detected that was not supposed to be in there indicating a cross-contamination from sample X4 indeed containing NiV. Using the RIEMS software, the full taxonomic content of all proficiency test datasets was analyzed, but no other indication of cross-contamination was found in any other proficiency test dataset.

Discussion

The aim of the present study was to design and test of a virus enrichment panel, whose application markedly enhances the virus signal in diagnostic metagenomics by targeted capture enrichment. In general, the VirBaits approach with custom-designed kits using 80-mer RNA baits was easy to apply for our purpose by just compiling the relevant virus genomes and providing them to the supplier. In our opinion, it is applicable for diagnosticians who are in need for bait sets to capture specific pathogens but are not proficient in bioinformatics skills to convert the underlying genomes into meaningful oligonucleotides.

As a base for this HTS plus capture enrichment approach, we used a generic HTS workflow without applying rRNA depletion or PCR-amplification. This procedure ensures that the resulting sequencing data reflect the original sample composition in the best way [3]. This generic HTS approach is a powerful tool and led to the detection of novel viruses in many cases [25, 26, 30], although the initial presence of the virus can be low in some cases [21, 25, 31]. Indeed, samples with very low pathogen (virus) content represent a typical diagnostic setting. Therefore, samples with a range of genome loads including very low virus titers were used in the present pilot study to mimic the realistic clinical situation.

The designed VirBaits panel was successfully applied for all tested viruses, especially for viruses that are represented in the underlying genome set (case 1 samples). Altogether, an increase of 9-fold to 9200-fold of the target sequence reads was achieved. The obtained enrichment factors were neither dependent on the bait abundance within the bait set nor on the target genome size (compare Table 1, Fig. 3, and Additional file 2: Table S1). When looking at the less enriched samples (Fig. 3), it is apparent that most of them show a very high percentage of virus amount in the VirBaits dataset indicating a certain saturation of the hybridization. Hence, a much higher increase would not be possible for these samples. Consequently, especially the low loaded samples benefit from the capture enrichment procedure. At the same time, the low loaded samples demonstrate the limit of the system: if too little or no target nucleic acids are present in the library used for capture due to the virus presence in the sample, there will be no enrichment possible. According to the myBaits manual, it is highly recommended to amplify libraries before enrichment. However, since we always use a PCR-free generic HTS workflow, we did not follow this recommendation and still obtained good results. This is in agreement with Mamanova et al. [32], who determined a severe bias when doing a PCR step before hybridization capture. The authors recommend keeping library amplification to a minimum and only performing 14–18 cycles after the capture hybridization especially with clinical samples exhibiting nucleic acids of lower integrity. The virome capture sequencing approach for vertebrate viruses, VirCapSeq-VERT [14], relies on sample pre-treatments for virus enrichment like filtration of samples, treatment of samples with ribonucleases to reduce extra-viral RNA and subsequent depletion of ribosomal RNA. Our study clearly shows that an amplification- and depletion-free HTS workflow is well suitable to be combined with capture enrichment. In addition, since RNA:DNA hybrids are more stable and have a higher hybridization efficiency than DNA:DNA hybrids [16, 17], the VirBaits approach leads to a significant increase of the target viruses including even only moderately related viruses (case 2 samples). However, care must be taken when working with RNA baits, which are not as stable as DNA probes, by avoiding contaminations with RNases during handling and assuring the cold chain.

The bait amount per virus group in the entire panel was varying, yet the performance for virus groups that are represented by few baits was satisfying with regard to the aims set out in the introduction, i.e., at least improved detection of a virus, as shown for RVFV, BHV, HeV, and NiV (Fig. 3, Table 4) indicating a robust and efficient method. Regarding samples infected with the OvPiV and the PeHV, the viruses are only distantly related to the viruses in the underlying genome set (case 2 samples). For such cases, an adaption of hybridization temperature and time might increase the output for those distantly related viruses. For the latter modification, evaporation might become a serious problem that needs to be considered. Nonetheless, distantly related viruses can be detected by the standard VirBaits procedure via capturing conserved regions. Hence, they would not get lost during the procedure. Altogether, the results indicate a good applicability for all samples including low-load virus samples if included in the VirBaits set or moderately related to those viruses.

The tiling density is low in the present VirBaits panel (1.3×) compared with panels for single species applying a tiling density of 3× [8] or 4× [33]. This is owing to the diagnostic approach we were intending to design, improving diagnostic sensitivity but not necessarily leading to the assembly of whole genomes. Typically, one specific read in a sequence dataset indicating a pathogenic virus would be enough to call a suspicion and trigger follow-up analyses. However, as shown here, the used tiling density in many cases allows for the assembly of nearly complete genomes emphasizing the sensitivity of the approach. In addition, the detected SNVs of RABV and KBLV genomes point to equal frequencies (Fig. 4) indicating a consistent recovery of the genomes via capture enrichment as already demonstrated by Metsky et al. [15].

A general benefit of capture approaches is that unlike for targeted assays like qPCR or serologic assays no prior decision is necessary for which pathogen (virus) is sought after, i.e., which specific test system needs to be applied. Of course, this requires that substantial knowledge and effort be put into the design of the bait set and that the used bait set fits the specific purpose. This saves time and money by avoiding application of many laborious diagnostic techniques successively until a causative agent can be suspected and confirmed. Moreover, a negative result of a PCR approach can mask an undetected viral pathogen when PCR primers do not fit anymore with the genome of evolved pathogens (e.g., [34, 35]). Otherwise, with positive PCR result, co-infections with several different pathogens can be overseen. For cases with atypical clinical signs or deviating viral pathogens, it might be that the relevant specific diagnostics are not considered or not existing, respectively. The synchronous detection of different viruses, as shown for sample L03038 having a co-infection with WNV and USUV, enabled by the generic HTS approach in combination with VirBaits approach is a great benefit. Compared to multiplex PCR approaches, that allow the detection of several viruses or virus variants in parallel [36, 37], the VirBaits approach has a broader applicability and often provides sequence information of full or nearly full genomes instead of small gene fragments. This information can additionally be used for follow-up analyses on genetic diversity, virulence, or functional potential.

The results of the internal proficiency test corroborate the applicability of the VirBaits workflow combined with HTS as all-in-one approach for different pathogenic viruses without the necessity to do laborious diagnostic tests for at least each single virus or small groups of viruses. The datasets generated were relatively small but resulted in all cases in enough reads to determine the respective pathogen without any doubt (compare Tables 2 and 3). This also indicates the potential for reduction of sequencing costs when applying an enrichment method. For sample X1 infected with the DNA virus ASFV, the portion of virus reads in the dataset was only low since the sequencing library was made from RNA according to the standard HTS workflow [3]. In that case, starting from DNA would have delivered much higher read numbers, as seen for sample L02762 yielding a proper amount of African swine fever virus reads (compare Table 2). In sample X3, some reads of the NiV were found while they were not supposed to be in this sample but only in sample X4. This point to the risk of cross-contamination can be an issue for manual sample handling. Cross-contaminations typically occur in laboratories but become especially visible when highly enriched samples are used, like amplicons, or bait-enriched samples. An automated procedure or a cartridge-based (in the ideal case closed) system may avoid cross-contamination in the future by reducing (excluding) opportunities for aerosol contamination between samples.

The rapid inclusion of baits and samples related to the recent SARS-CoV-2 outbreak demonstrate the high dynamics and ad hoc adaptability of the here tested approach. The results for those samples show that the VirBaits workflow can be updated very rapidly and easily for viruses causing newly emerging health threats. In addition, we investigated several samples related to the WNV epidemic in Germany [20, 21] including two horse samples. Horses are dead-end hosts for WNV, and horse samples show generally lower virus titers in contrast to samples of bird being reservoir hosts for WNV infection. As shown here, the VirBaits treatments considerably improved the WNV detection for horse samples (L03451).

The application of myBaits is still expensive, although sequencing costs decrease with this procedure since no large data sets are necessary (compare Tables 2, 3, and 4). The reduction of the reaction volume by 50%, as shown here for L03007, can reduce the cost for the enrichment reagents. Nevertheless, this opportunity needs to be validated in further trials and might depend on the individual sample type and the contained virus.

Resulting from the present pilot study, the VirBaits panel can be improved by optimizing the distribution of baits per virus group. This especially means a significant reduction of IAV baits, as shown here with the dilution of the baits for IAV samples, but also other viruses with small RNA genomes like FMDV, RABV, and NDV (compare Fig. 2). Redundancy of the baits can be decreased by removing oligonucleotides that were > 90% identical to each other (in contrast to 95% as used for the pilot study). This decrease should still allow good hybridization results considering the performance of deviating baits as shown here for KBLV, USUV, OvPiV, and PeHV. Pooling of multiple libraries (for cost reduction) and application of reduced hybridization temperatures or prolonged hybridization times (up to 48 h) need to be investigated for distantly related targets. The latter step has to go along with procedures to avoid strong evaporation holding the reaction volume nearly constant during extended incubation time.

Conclusions

The biotinylated RNA-baits principle for capture enrichment with a large and diverse virus panel as tested here is highly suitable to improve diagnostics for viruses causing notifiable epizootic and zoonotic diseases. It can be dynamically updated with new RNA baits because of the modular concept that allows the addition of new bait sets for emerging diseases. The combination of a generic and unbiased HTS workflow with the capture approach leads to an enhancement of the viral signal in terms of sequence reads in the sequence dataset resulting in higher sensitivity of HTS. The identification of distantly related viruses can be difficult as shown for a deviating avian bornavirus in a proficiency test for HTS bioinformatics [38]. Although the problem in the mentioned study was related to bioinformatics analyses, the accumulation of certain virus reads by capture enrichment can avoid overlooking pathogenic viruses contained in a sample. The application of such an approach is recommendable for cases of unclear clinical pictures and can replace time-consuming screenings using large panel PCR diagnostics for specific single pathogens. This leads to an accelerated diagnosis and an improved detection compared with HTS without prior enrichment. Although not the primary intention, the application of the VirBaits even enables the generation of nearly complete genome sequences in most cases.