Skip to main content

Metagenomic analysis of nepoviruses: diversity, evolution and identification of a genome region in members of subgroup A that appears to be important for host range

Abstract

Data mining and metagenomic analysis of 277 open reading frame sequences of bipartite RNA viruses of the genus Nepovirus, family Secoviridae, were performed, documenting how challenging it can be to unequivocally assign a virus to a particular species, especially those in subgroups A and C, based on some of the currently adopted taxonomic demarcation criteria. This work suggests a possible need for their amendment to accommodate pangenome information. In addition, we revealed a host-dependent structure of arabis mosaic virus (ArMV) populations at a cladistic level and confirmed a phylogeographic structure of grapevine fanleaf virus (GFLV) populations. We also identified new putative recombination events in members of subgroups A, B and C. The evolutionary specificity of some capsid regions of ArMV and GFLV that were described previously and biologically validated as determinants of nematode transmission was circumscribed in silico. Furthermore, a C-terminal segment of the RNA-dependent RNA polymerase of members of subgroup A was predicted to be a putative host range determinant based on statistically supported higher π (substitutions per site) values for GFLV and ArMV isolates infecting Vitis spp. compared with non-Vitis-infecting ArMV isolates. This study illustrates how sequence information obtained via high-throughput sequencing can increase our understanding of mechanisms that modulate virus diversity and evolution and create new opportunities for advancing studies on the biology of economically important plant viruses.

Introduction

New viral sequences are being discovered at an unprecedented rate since the advent of high-throughput sequencing (HTS). The recovery of numerous complete or almost complete viral genome sequences from different ecosystems (i.e., environmental, human, veterinary, plant) has allowed previously unknown virus genomes to be described and their diversity to be studied. This wealth of information is creating the possibility of using the pangenome for virus taxonomy [17] and increasing our understanding of the mechanisms that modulate virus diversity, evolution, vector and host specificity, and epidemiology [37]. However, new challenges arise, for instance, with regard to virus classification. Taxonomy traditionally relies not only on the genetic relationships among sequences of a few viral coding regions, primarily the replicase and/or coat protein coding domains, but also on biological properties such as vector species and host range, among other features [48]. This type of biological information is critical for the taxonomic classification of currently known plant viruses, but it is generally lacking when only metagenomic data are available.

Nepoviruses are plant picorna-like viruses belonging to the subfamily Comovirinae in the family Secoviridae [46]. Their transmission occurs in a non-persistent and non-circulative manner via ectoparasitic nematodes of the genera Xiphinema, Longidorus, and Paralongidorus [44]. Long-distance dissemination of nepoviruses occurs with the exchange of uncontrolled propagation material and the use of infected cuttings and budwood for grafting. Seed and pollen transmission have been documented for some, but not all, nepoviruses, and transmission by mites has been observed in rare cases. The genus Nepovirus includes 40 species whose members are widely distributed in temperate regions (https://talk.ictvonline.org/ictv-reports/ictv_online_report/positive-sense-rna-viruses/picornavirales/w/secoviridae/591/genus-nepovirus) [22]. Most nepoviruses have a broad natural host range, including annual herbaceous species (e.g., Beta vulgaris, Nicotiana tabacum, and Solanum lycopersicum) and perennial woody species (e.g., Vitis vinifera, Prunus domestica, Rubus idaeus, and Olea europaea), and cause significant crop losses worldwide [11].

The genome of nepoviruses is composed of two single-stranded, positive-sense RNAs (RNA1 and RNA2). Both genomic RNAs are necessary for infection in planta. These RNAs encode a large polyprotein, P1 for RNA1 and P2 for RNA2, which is cleaved by the viral proteinase into functional proteins [11]. P1 is the precursor of proteins that are necessary for replication, including a helicase with a nucleoside-triphosphate-binding domain, a proteinase (Pro), and an RNA-dependent RNA polymerase (Pol). Depending on the viral species, one (1A) or two (X1 and X2) proteins are located upstream of the helicase domain. The function of these proteins is not fully elucidated yet. P2 includes the coat protein (CP), multiple units of which form icosahedral virions with a diameter of 26-30 nm. The cell-to-cell movement protein (MP) domain is located immediately upstream of the CP domain. Depending on the nepovirus species, one (2A, which is required for the replication of RNA2) or two (X3 and X4 of unknown function) proteins are located upstream of the MP [13]. Three subgroups of nepoviruses have been recognized based on RNA2 properties, including its organization and size, phylogenetic relationships in the CP coding region, and cleavage sites recognized by the viral proteinase [11]. The three nepovirus subgroups are named A, B, and C.

One of the most important viral diseases of grapevines is infectious degeneration. This disease is caused by members of 15 different Nepovirus species [6, 43]. Most grapevine-infecting nepoviruses are generally restricted to a particular region of the world. For example, arabis mosaic virus (ArMV) is limited to European vineyards, while tobacco ringspot virus (TRSV), tomato ringspot virus (ToRSV), peach rosette mosaic virus (PRMV), and blueberry leaf mottle virus are present in American vineyards. In contrast, grapevine fanleaf virus (GFLV) is present in most vineyards worldwide.

The genetic diversity of nepoviruses has been analyzed extensively, primarily using information collected from RT-PCR-based studies combined with Sanger sequencing, generally in the CP coding region [14, 52]. Similarly, diversity studies and phylogenetic analysis have been reported for members of the family Secoviridae, including nepoviruses [45, 51]. However, several new nepoviruses have been characterized recently, and the number of complete genome sequences of nepovirus isolates has increased exponentially in the past five years [1, 2, 4, 12, 15, 18, 23,24,25, 41, 50, 55, 56, 58]. In this study, we built on these latest advancements in nepovirus research and carried out metagenomic analysis. We focused on RNA1 and RNA2 coding sequences to gain new insights into viral diversity and evolution, and we identified a hitherto undescribed conserved region of the genome that is putatively involved in determining the host range of two subgroup A nepoviruses.

Materials and methods

Sequence analysis, genetic diversity, and detection of recombination

The complete nepovirus ORF1 and ORF2 sequences were retrieved from NCBI as of January 2020, our own curated nepovirus sequence repository obtained by analysis of high-throughput or Sanger sequencing datasets, and a selection of Sequence Research Archive datasets from GenBank [19]. In total, 110 ORF1 sequences and 167 ORF2 sequences were used in this study (Supplementary Tables S2 and S3). In addition, sequences of specific domains were retrieved from NCBI (see Supplementary Table S8).

Codon-based multiple sequence alignments and maximum-likelihood (ML)-based phylogenetic trees were prepared using MUSCLE [7], implemented in MEGA7 and MEGAX software [26, 27], excluding the viral untranslated regions (UTRs). The best ML-fitted model for each sequence alignment was used, and nodes in phylogenetic trees were validated by bootstrap analysis (100 replicates). For visualization effects, FigTree v. 1.3.1 was used (http://tree.bio.ed.ac.uk/). The diversity index (π), which is the average number of nucleotide substitutions per site between any two sequences in a multi-sequence alignment, and the variation of π along genome sequences was evaluated by sliding window analysis (length, 80; step size, 20) using DnaSP v.6.12.03 [29] and MEGA X.

A search for potential recombination signals was performed using all seven algorithms implemented in RDP v4.97 (RDP4) [32]. The default settings were used for each algorithm, and only recombination events detected by five or more methods were considered.

Differences in nucleotide sequence diversity of viral populations defined using different modalities were tested by analysis of molecular variance (AMOVA), as implemented in Arlequin v. 5.3.1.2 [10]. AMOVA calculates the Fixation index, FST index explaining the between-groups fraction of total genetic diversity. The significance of these differences was evaluated by performing 1000 sequence permutations.

Tajima’s D (DT) and sliding window analyses were conducted using DnaSP v. 6.12.03 [29] in order to distinguish the viral populations evolving randomly (per mutation-drift equilibrium; DT = 0) from those evolving under a nonrandom process (DT > 0: balancing selection, sudden population contraction; DT < 0: recent selective sweep, population expansion after a recent bottleneck).

Results and discussion

Phylogenetic relationships among nepoviruses

Only complete open reading frame (ORF) sequences of RNA1 (ORF1) and RNA2 (ORF2) of nepoviruses were considered in this study. All sequences were retrieved from the NCBI database as of January 2020, our own curated nepovirus sequence repository, and a selection of Sequence Read Archive (SRA) datasets from the GenBank database. Data mining was performed to increase the number of sequences for ArMV, GFLV, and mulberry mosaic leafroll-associated virus (MMLRaV), a novel nepovirus [31], as described previously [19]. These data mining, Sanger, or Illumina sequencing efforts resulted in 46 new sequences (24 for RNA1 and 22 for RNA2) of ArMV, GFLV, and MMLRaV. New sequences were deposited in the GenBank database (Supplementary Table S1). In total, both genomic RNA sequences were recovered from members of 29 nepovirus species, except from olive latent ringspot virus, for which only a single RNA2 sequence but no RNA1 sequence is available (Table 1). Two nepoviruses (GFLV and ArMV) made up the majority of sequences analyzed in this study, while most species were represented by one or a few sequences of either genomic RNAs (Table 1). Novel nepoviruses used in this study included MMLRaV [31], caraway yellows virus [12], potato virus B [4], and red clover nepovirus A [25]. A few new viruses and isolates belonging to the genus Nepovirus have been identified since we last consulted NCBI (January 2020). The corresponding sequences were not included in this study (Supplementary Table S10). In addition, a few viruses described in the literature as potential members of new nepoviral species, such as Hobart nepovirus 3 [42] or Zhuye pepper nepovirus 1 [3], were not taken into account in this study because the sequences were incomplete or discrepancies were observed between the datasets available at NCBI and the publications. Furthermore, only a single sequence was chosen from a group of sequences displaying nucleotide sequence identity higher than 99% unless the isolates were from different hosts and/or different countries. Complete lists of the 110 ORF1 and 167 ORF2 sequences selected for this study are provided in Supplementary Tables S2 and S3, respectively.

Table 1 List of nepoviruses used in this study

Nucleotide sequence comparisons confirmed the classification of nepoviruses into three subgroups with higher inter-subgroup than intra-subgroup mean distance values (Table 2). Subgroup B sequences displayed the lowest maximum pairwise distance values, which were well below the inter-subgroup mean distance values, suggesting a well-defined group of virus isolates (Table 2). The inter-subgroup mean distance value was lower than the maximum intra-subgroup mean distance value for subgroup A and C sequences, revealing a greater variability and less well-defined groups of virus isolates (Table 2). After performing an alignment of ORF1 and ORF2 sequences, phylogenetic trees were constructed by the maximum-likelihood (ML) method, using the best-fit model (GTR+G+I) (Fig. 1). Interestingly, the members of each subgroup were separated better in the tree based on ORF1 than the one based on ORF2 sequences (Fig. 1 and Supplementary Fig. S1). Indeed, the ORF1 nucleotide sequences of virus isolates of subgroups A, B, and C clustered in separate and well-supported clades in a tanglegram (Fig. 1) and in an unrooted cladogram (Supplementary Fig. S1). Nucleotide sequences of nepovirus isolates of subgroup B were also well defined when using ORF2 sequences, but subgroup A and C ORF2 sequences were scattered in different clades in a tanglegram (Fig. 1) or unrooted cladogram (Supplementary Fig. S1). These results suggest that the classification of nepoviruses into distinct subgroups is more robust when based on ORF1 sequences than when based on ORF2 sequences. This finding should be considered by the International Committee on Taxonomy of Viruses (ICTV) Secoviridae Study Group to eventually define new demarcation criteria for nepoviruses when using pangenome information.

Table 2 Genetic distance within and between subgroups (SubGP) A, B, and C of the genus Nepovirus
Fig. 1
figure1

Tanglegram of maximum-likelihood phylogenetic trees inferred from 110 ORF1 and 167 ORF2 nucleotide sequences of nepoviruses. Colors represent the three nepovirus subgroups: red for subgroup A sequences, blue for subgroup B sequences, and green for subgroup C sequences. Clades with several sequences from the same species are collapsed. Numbers at nodes indicate bootstrap values based on 100 replicates. The scale bar corresponds to the number of substitutions per site.

New challenges for species identification within the genus Nepovirus

Species demarcation criteria for nepoviruses have been defined by the ICTV (https://talk.ictvonline.org/ictv-reports/ictv_online_report/positive-sense-rna-viruses/picornavirales/w/secoviridae). These include CP amino acid (aa) sequence identity less than 75% and conserved protease-polymerase (Pro-Pol) region aa sequence identity less than 80%, among other criteria. We assessed whether these two major demarcation criteria are applicable to the corresponding complete ORF1 and ORF2 aa sequences. Some discrepancies with regard to the intra-species aa sequence identity falling outside the species demarcation were observed for PRMV ORF1 sequences (78.99%) and ORF2 sequences of cherry leaf roll virus (CLRV), ToRSV, cycas necrotic stunt virus, and ArMV (below 74.05%) (Supplementary Table S4). These results revealed that analyzing complete ORF sequences may be problematic for the establishment of new virus species and the classification of new genetic variants of members of existing virus species if the current demarcation criteria pertaining to partial genome sequence information were to be applied. The results further suggest that the demarcation criteria for species in the genus Nepovirus should be amended to accommodate pangenome information. In addition, the ORF2 sequence of ArMV isolate Butterbur was a clear outlier among the ArMV isolates with lower identity values at both the nucleotide (Fig. 2, Supplementary Figs. S3 and S5) and amino acid (70.25%, Supplementary Table S4) levels. According to the original report [21], the pathological and serological features of ArMV-Butterbur are unique, and its CP is 504 aa long (as for all GFLV CPs), while all other ArMV CPs are 505 aa long. These features underscore the need for additional work to ascertain the taxonomic position of ArMV-Butterbur and its recognition as an isolate of ArMV, in particular since no RNA1 sequence is available.

Fig. 2
figure2

Phylogenetic and diversity analysis of arabis mosaic virus (ArMV) isolates from different plants, using a corpus of 17 ORF1 (left panel) and 21 ORF2 (right panel) nucleotide sequences. Colors correspond to hosts, with red for monocotyledonous plants, green for dicotyledonous plants, and black for all plants. Maximum-likelihood phylogenetic trees are shown. Numbers at each node indicate bootstrap values based on 100 replicates, and scale bars show genetic distance. Graphics represent π (substitutions) and Tajima’s D (DT) for evolution along the ORF1 and ORF2 sequences. Colored bars with # and * correspond to statistically validated regions (P-values at 0.05 and 0.001), respectively

Similarly, the ORF1 aa sequence identity between some isolates belonging to different species was higher than 80%, for example, for beet ringspot virus (BRSV) and tomato black ring virus (TBRV), as well as for BRSV and artichoke Italian latent virus (AILV) (Supplementary Table S4). This high level of sequence similarity could also explain the large number of inter-species recombination events identified between members of these particular species (see the dedicated section below). However, inter-species diversity was below the species demarcation level (< 75%) for ORF2 sequences, unambiguously defining BRSV, TBRV, and AILV as members of different species (Supplementary Table S4). One particular case of interest is grapevine deformation virus (GDefV) [20], a subgroup A nepovirus. GDefV ORF2 aa sequences display 73 and 71% identity to those of GFLV and ArMV, respectively [16], and GDefV ORF1 aa sequences have higher identity to those of GFLV (86-89%) than to those of ArMV (73-74%) [8]. According to the species demarcation criteria for nepoviruses, GDefV would be classified as a highly divergent variant of GFLV when focusing on the ORF1 sequences but as a member of a new species based on ORF2 sequences.

Identification of putative recombination events within and between nepovirus species

Putative intra-species recombination events have been extensively reported for nepoviruses, mostly in the GFLV RNA2-encoded MP and CP domains [34, 35, 38, 49, 52,53,54]. Recombination events have also been described for ToRSV [56] and grapevine chrome mosaic virus (GCMV) [5]. In addition, many inter-species recombination events have been detected, mostly between ArMV and GFLV [9, 34, 35, 54], but also between GCMV and TBRV [5, 28]. With the use of HTS and the recovery of complete virus genome sequences, recombination events can be detected all along the two genomic RNAs [18]. Here, we used the same corpus of nepovirus sequences and searched for potential recombination events using the RDP4 program. Recombination events were only considered when predicted by five or more algorithms with P-values < 10-3 (Table 3, Supplementary Tables S5 and S6).

Table 3 Number of putative intra- and inter-species recombination events predicted by RDP4 for members of the three nepovirus subgroups (SubGP A, B, and C)

Potential intra-species recombination events were identified in ORF1 and ORF2 sequences, mostly of subgroup A members (Table 3, Supplementary Table S5 and Supplementary Fig. S2). Almost twice as many putative recombination events were predicted in ORF2 sequences than in ORF1 sequences (Table 3). Both of these observations most definitely reflect the total number of sequences being recovered and used in this study. Many recombination events were predicted in GFLV and ArMV sequences, with some hotspots, i.e. more than one putative recombinant per site (Supplementary Table S5 and Supplementary Fig. S2). In addition, putative recombination events were also identified for the first time for AILV, CLRV, and raspberry ringspot virus.

All inter-species recombination events predicted in this study strictly involved members of the same subgroup (Supplementary Table S6). Surprisingly, the number of inter-species recombination events was higher than the number of intra-species recombination events within ORF2 sequences (Table 3). For example, 31 inter-species and two intra-species recombination events were detected for subgroup B ORF2 sequences (Table 3 and Supplementary Fig. S2). It should also be noted that all putative ORF1 recombination events detected for subgroup B involved members of different species (Supplementary Table S6). In contrast, all subgroup A recombination events that were predicted involved only ArMV, GFLV, and GDefV (Fig. 1 and Supplementary Fig. S1), again emphasizing their kinship. Recombination may have been facilitated for these three viruses because they have the potential to co-exist in grapevine, a common host, for long periods of time, thus increasing the likelihood of an potential encounter in the same host cell.

Genetic diversity and population differentiation of ArMV from mono- and dicotyledonous plants

ArMV is a ‘generalist’ with a very broad natural host range, including winter barley, narcissus, Ligustrum vulgare, weeds, hops, berries, olive trees, apricot trees, and grapevines, among other species [14, 33, 40]. Our data mining efforts resulted in the retrieval of 17 complete ORF1 sequences from seven monocotyledonous plants and 10 dicotyledonous plants, as well as 21 complete ORF2 sequences from eight monocotyledonous plants and 13 dicotyledonous plants (Supplementary Tables S1, S2, S3, and S8).

The overall genetic diversity (π) of ArMV ORF1 and ORF2 was 0.166 ± 0.003 and 0.133 ± 0.003, respectively (Table 4). As observed previously [57], the coding region 2A is the most divergent genomic region, showing the highest diversity at the extreme 5’ end of ORF2 (Fig. 2), mostly due to size differences among isolates. For ArMV ORF1, the extreme 3’ end is the most divergent genomic region. A comparative analysis of ArMV sequences obtained from mono- and dicotyledonous plants revealed a significantly higher diversity in sequences from isolates infecting dicotyledonous plants compared to isolates infecting monocotyledonous plants (0.170 ± 0.003 vs. 0.109 ± 0.003 and 0.145 ± 0.003 vs. 0.093 ± 0.003, for ORF1 and ORF2 sequences respectively; Table 4). When looking at the evolution pattern (Tajima’s D) of ORF1 sequences (Fig. 2), values were negative but close to 0 (DT =  – 0.344; P > 0.1), suggesting that the population of ArMV is evolving as per mutation-drift equilibrium with no specific region under selection. On the other hand, two distinct regions of ORF2 sequences were under selection with an overall DT value of  – 0.994 (P > 0.1) (Fig. 2). One of these two regions covers an aa stretch between two proline-rich segments of the central coding region of the 2A domain. The other region is a specific segment of the CP coding region that overlaps the previously defined R4 region, which is involved in specific transmission of ArMV by the nematode vector Xiphinema diversicaudatum [47].

Table 4 Genetic diversity for both ORFs of arabis mosaic virus (ArMV) and grapevine fanleaf virus (GFLV) isolates

In a previous study [57], ArMV isolates were separated by the size and aa sequence identity of protein 2A into four groups (I to IV). Here, we recovered 43 ArMV 2A nucleotide sequences from GenBank (Supplementary Table S8) and confirmed the existence of three major clades corresponding to groups II, III, and IV (Supplementary Fig. S3). Group I was composed of a single sequence located within the group II clade. The sequences belonging to each group were genetically different with a high fixation index (FST ≥ 0.530) and strong statistical support (P ≤ 10-5) (Table 5). However, the size of the 2A domain was not linked to the plant host, with ArMV isolates from grapevine belonging to all four groups. A comparative analysis of 2A coding sequences from mono- and dicotyledonous plants documented a statistically supported genetic differentiation (FST) (Table 5). Genetic differentiation according to mono- and dicotyledonous plants was also observed when looking at other RNA1 or RNA2 coding region sequences or the complete ORF1 and ORF2 sequences (Table 5, Supplementary Tables S7, S8 and Supplementary Fig. S3). Distinct FST values between mono- and dicotyledonous plants were also found at lower cladistic levels, strongly suggesting a likely genetic bias based on the plant host (Supplementary Table S7). Interestingly, similar results have been reported for CLRV, another generalist virus within the genus Nepovirus for which a host-species-dependent population structure was documented using only a short 375-bp sequence corresponding to the extreme 3’ part of the 3’ untranslated region [39].

Table 5 Genetic differentiation of arabis mosaic virus (ArMV) populations for the complete ORFs and the different coding regions

Genetic diversity and population differentiation of GFLV from different geographic regions

GFLV primarily infects Vitis spp., making the virus very specialized to this woody plant. The overall nucleotide sequence diversity for GFLV ORF1 (π = 0.127 ± 0.002) and ORF2 (π = 0.130 ± 0.005) sequences was very similar (Table 4). Plotting π along ORF1 sequences (Fig. 3) showed a highly divergent region at the 3’ end of the Pol domain. This result is consistent with other analyses of this particular aa stretch of P1, which was predicted to form an α-helix [18, 36]. Another highly polymorphic region was detected at the extreme 5’ end of ORF2 (Fig. 3), corresponding to a region where intra- and inter-species recombination events have been predicted (see above section and [54]). On the other hand, similar to ArMV, a significant drop in nucleotide sequence diversity is observed within a segment of 2A sequences located between two highly conserved proline-rich regions. The evolution of this particular domain of ORF2 sequences is not neutral, with statistical DT values well below 0 (Fig. 3), indicating conservative selection with regard to the remainder of the ORF2 sequence. Similarly to ArMV, the same trend was observed for the R4 region of the CP domain [47]. Interestingly, these two regions, which display the lowest DT values, suggesting a recent selective sweep, were mostly located in sequences recovered from grapevines from the New World (Supplementary Fig. S4). Regarding the evolution pattern of GFLV ORF1 sequences, values were negative but very close to 0 (DT =  – 0.758), with two sites under selection (P > 0.1). The first site is located at the extreme 5’ end and the second site is positioned within the Hel domain (Fig. 3). When looking at the evolutionary pattern of P1 and P2 (dN-dS), most of the codons were under negative or neutral pressure (data not shown), as described previously [51].

Fig. 3
figure3

Genetic diversity analysis of grapevine fanleaf virus (GFLV) isolates, using a corpus of 40 ORF1 (left panel) and 80 ORF2 (right panel) nucleotide sequences. Graphics represent π (substitutions) and Tajima’s D (DT) for evolution along the ORF1 and ORF2 sequences. Bars and # and * correspond to statistically validated regions (P-values at 0.05 and 0.001), respectively

No major differences were observed when separating GFLV sequences by geographic region (France vs. the rest of the world or Old vs. New World), with very similar π and DT values (Table 4 and Supplementary Fig. S4). However, a genetic structuration between geographic regions was observed, although the FST values were extremely low (Supplementary Table S9), with differences in the evolution pattern of GFLV isolates from different parts of the world. This was even more noticeable when separating ORF2 sequences by continent (i.e., Europe, Americas [combining North and South America] and Asia [Far East, Turkey, and Russia]). All FST values were statistically supported (P < 0.001). However, the disparity in FST values indicated that European and American GFLV variants were more closely related to each other than to the Asian variants. This observation was confirmed when grouping sequences into seven countries or specific regions of the world (France, Slovenia, Italy, USA, Chile, Far East and Switzerland). Some FST values were very high, underlying a strong genetic structuration among regions of the world, as confirmed when comparing sequences from the Far East (Iran and China) with other regions of the world (Table S9). This genetic differentiation according to Far East GFLV populations was previously described using GFLV MP sequences [49]. Altogether, these observations suggest a specific geographic evolution and genetic structuration of the virus.

Common characteristics and major differences between grapevine-infecting ArMV and GFLV isolates

ArMV and GFLV are very closely related but belong to different species (Fig. 1). They share many characteristics such as hosts (i.e., grapevine), closely related vectors (Xiphinema spp.), similar symptomatology, and many natural inter-species recombinants. While genetically different (Supplementary Table S4, Figs 4 and 5), similar patterns in their respective genetic diversity were observed along ORF1 sequences, especially when separately analyzing sequences from Vitis-infecting ArMV isolates from non-Vitis-infecting ArMV isolates. As detailed above, one of the hallmarks of GFLV is a higher π value at the C-terminus of Pol. A higher π at the C-terminal end of Pol was also clearly identified in Vitis-infecting isolates, but not in non-Vitis-infecting ArMV isolates (Fig. 4). Such specific increased genetic diversity in Vitis-infecting ArMV and GFLV isolates was not due to the overlap of a hidden ORF (Supplementary Fig. S6), as described for sobemoviruses [30]. This diversity was also observed at the aa level, with a percent identity higher than 80.41% in the case of non-Vitis-infecting ArMV isolates, but as low as 65.54% and 56.08% for Vitis-infecting ArMV and GFLV isolates, respectively (Fig. 4). Such high divergence was not found when specifically looking at the first 148 aa of the Pol domain, where the sequence identity was above 82%. While highly divergent between ArMV and GFLV (Supplementary Fig. S7), the Pol C-terminus has only two amino acids that are mostly conserved between Vitis-infecting ArMV and GFLV isolates, at position 683 and 746 (Supplementary Fig. S8). Could these two residues be implicated in host adaptation mechanisms? More work is needed to address this hypothesis.

Fig. 4
figure4

Phylogenetic and diversity analysis of grapevine fanleaf virus (GFLV), arabis mosaic virus (ArMV), and grapevine deformation virus (GDefV), using a corpus of 58 ORF1 nucleotide sequences. GFLV isolates are shown in blue, GDefV isolates in peach, ArMV isolates from grapevines (Vitis-ArMV) in green, and ArMV isolates from other plants (non-Vitis ArMV) in red. The country of origin of the isolate, if known, is indicated in bold at the end of the sequence name by two letters corresponding to the international ISO country code. GFLV sequences from the Old World (Turkey, Iran, France, Hungary, Germany, Slovenia, Russia, Italy, and Switzerland) are indicated by a solid diamond, and those from the New World are indicated by an open rectangle (Canada, USA, Chile, China, and South Africa). Maximum-likelihood phylogenetic trees are shown. Numbers at each node indicate bootstrap values based on 100 replicates, and the scale bar shows genetic distance. Graphics represent π (substitutions per site) and Tajima’s D (DT) for evolution along the ORF1 sequence. Percentages correspond to minimum aa identity for both framed regions within the polymerase domain. The colored bar with # corresponds to a statistically validated region at 0.05

The ORF2 sequences of ArMV and GFLV have many characteristics in common (Fig. 5). For example, higher genetic diversity is detected at the 5’ end of 2A, partly as a result of indels. However, major differences between these viruses were found when focusing specifically on the CP domain, with a clearly different level of genetic diversity in the R4-R5 region between ArMV and GFLV (Fig. 5 and Supplementary Fig. S5). This region is important for vector transmission [47]. When looking at the evolution pattern, most of the ORF2 sequences seem to be evolving randomly, while two regions display a non-random evolution pattern. One of these two regions corresponds to the 2A domain, and the other to the R4-R5 region within the CP domain, both showing strong constraints.

Fig. 5
figure5

Phylogenetic and genetic diversity analysis of grapevine fanleaf virus (GFLV), arabis mosaic virus (ArMV), and grapevine deformation virus (GDefV), using a corpus of 102 ORF2 nucleotide sequences. GFLV isolates are shown in blue, GDefV in peach, ArMV isolates from grapevines (Vitis-ArMV) in green, and ArMV isolates from other plants (non-Vitis ArMV) in red. The country of origin of the isolate, if known, is indicated in bold at the end of the sequence name by two letters corresponding to the international ISO country code. GFLV sequences from the Old World (Turkey, Iran, France, Hungary, Germany, Slovenia, Russia, Italy, and Switzerland) are indicated by a solid diamond, and those from the New World are indicated by an open rectangle (Canada, USA, Chile, China, and South Africa). Maximum-likelihood phylogenetic trees are shown. Numbers at each node indicate bootstrap values based on 100 replicates, and the scale bar shows genetic distance. Graphics represent π (substitutions per site) and Tajima’s D (DT) for evolution along the ORF2 sequence. The boxed area corresponds to the R4-R5 region of the CP domain. Colored bars with # and * correspond to statistically validated regions (P-values at 0.05 and 0.001), respectively

Conclusion

Data mining and metagenomic analysis of complete ORF sequences has provided new insights into the diversity of viruses in the genus Nepovirus, family Secoviridae, with a special emphasis on GFLV and ArMV, the two most important viruses involved in degeneration disease of grapevine in France. Our results confirmed a probable phylogeographic structure of GFLV populations and revealed a host-dependent structure of ArMV populations at a cladistic level. The C-terminus of the RNA-dependent RNA polymerase of GFLV and ArMV is predicted to be a potential host range determinant. More work is needed to test this hypothesis biologically. Furthermore, some of the current species demarcation criteria that are applied to limited genomic regions may not be validated for all nepoviruses at the ORF sequence level. This suggests the need to adapt some of the taxonomic criteria to pangenome information. Nonetheless, with an ever-increasing amount of sequence data obtained through HTS, there are new opportunities for studying nepovirus biology, characterizing nepoviral communities in plants, improving nepovirus taxonomy, and exploiting pangenomic and populational information for developing anti-viral strategies.

References

  1. 1.

    Adams IP, Boonham N, Jones RAC (2017) First complete genome sequence of arracacha virus a isolated from a 38-year-old sample from peru. Genome Announc 5:e00141-e117

    PubMed  PubMed Central  Article  Google Scholar 

  2. 2.

    Bratsch S, Lockhart B, Mollov D (2017) Characterization of a new nepovirus causing a leaf mottling disease in petunia hybrida. Plant Dis 101:1017–1021

    CAS  PubMed  Article  Google Scholar 

  3. 3.

    Cao M, Zhang S, Li M, Liu Y, Dong P, Li S, Kuang M, Li R, Zhou Y (2019) Discovery of four novel viruses associated with flower yellowing disease of green sichuan pepper (Zanthoxylum Armatum) by virome analysis. Viruses 11:696

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  4. 4.

    De Souza J, Müller G, Perez W, Cuellar W, Kreuze J (2017) Complete sequence and variability of a new subgroup B nepovirus infecting potato in central Peru. Adv Virol 162:885–889

    Google Scholar 

  5. 5.

    Digiaro M, Yahyaoui E, Martelli GP, Elbeaino T (2015) The sequencing of the complete genome of a Tomato black ring virus (TBRV) and of the RNA2 of three Grapevine chrome mosaic virus (GCMV) isolates from grapevine reveals the possible recombinant origin of GCMV. Virus Genes 50:165–171

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Digiaro M, Elbeaino T, Martelli GP (2017) Grapevine fanleaf virus and Other Old World Nepoviruses. In: Meng B, Martelli GP, Golino DA, Fuchs M (eds) Grapevine Viruses: Molecular Biology, Diagnostics and Management. Springer International Publishing, Cham, pp 47–82

    Chapter  Google Scholar 

  7. 7.

    Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8.

    Elbeaino T, Digiaro M, Ghebremeskel S, Martelli GP (2012) Grapevine deformation virus: Completion of the sequence and evidence on its origin from recombination events between Grapevine fanleaf virus and Arabis mosaic virus. Virus Res 166:136–140

    CAS  PubMed  Article  Google Scholar 

  9. 9.

    Elbeaino T, Kiyi H, Boutarfa R, Minafra A, Martelli GP, Digiaro M (2014) Phylogenetic and recombination analysis of the homing protein domain of grapevine fanleaf virus (GFLV) isolates associated with ‘yellow mosaic’ and ‘infectious malformation’ syndromes in grapevine. Adv Virol 159:2757–2764

    CAS  Google Scholar 

  10. 10.

    Excoffier L, Laval G, Schneider S (2005) Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evol Bioinform Online 1:47–50

    CAS  Article  Google Scholar 

  11. 11.

    Fuchs M, Schmitt-Keichinger C, Sanfaçon H (2017) Chapter Two - A Renaissance in Nepovirus Research Provides New Insights Into Their Molecular Interface With Hosts and Vectors. In: Kielian M, Mettenleiter TC, Roossinck MJ (eds) Advances in Virus Research. Academic Press, pp 61–105. https://doi.org/10.1016/bs.aivir.2016.08.009

  12. 12.

    Gaafar YZA, Richert-Pöggeler KR, Sieg-Müller A, Lüddecke P, Herz K, Hartrick J, Maaß C, Ulrich R, Ziebell H (2019) Caraway yellows virus, a novel nepovirus from Carum carvi. Virol J 16:70

    PubMed  PubMed Central  Article  Google Scholar 

  13. 13.

    Gaire F, Schmitt C, Stussi-Garaud C, Pinck L, Ritzenthaler C (1999) Protein 2A of grapevine fanleaf nepovirus is implicated in RNA2 replication and colocalizes to the replication site. Virology 264:25–36

    CAS  PubMed  Article  Google Scholar 

  14. 14.

    Gao F, Lin W, Shen J, Liao F (2016) Genetic diversity and molecular evolution of arabis mosaic virus based on the CP gene sequence. Adv Virol 161:1047–1051

    CAS  Google Scholar 

  15. 15.

    Garcia S, Hily J-M, Komar V, Gertz C, Demangeat G, Lemaire O, Vigne E (2019) Detection of multiple variants of grapevine fanleaf virus in single xiphinema index nematodes. Viruses 11:1139

    CAS  PubMed Central  Article  PubMed  Google Scholar 

  16. 16.

    Ghanem-Sabanadzovic NA, Sabanadzovic S, Digiaro M, Martelli GP (2005) Complete nucleotide sequence of the rna-2 of grapevine deformation and grapevine anatolian ringspot viruses. Virus Genes 30:335–340

    CAS  PubMed  Article  Google Scholar 

  17. 17.

    Gorbalenya AE, Krupovic M, Mushegian A, Kropinski AM, Siddell SG, Varsani A, Adams MJ, Davison AJ, Dutilh BE, Harrach B, Harrison RL, Junglen S, King AMQ, Knowles NJ, Lefkowitz EJ, Nibert ML, Rubino L, Sabanadzovic S, Sanfaçon H, Simmonds P, Walker PJ, Zerbini FM, Kuhn JH, International Committee on Taxonomy of Viruses Executive C (2020) The new scope of virus taxonomy: partitioning the virosphere into 15 hierarchical ranks. Nature Microbiol 5:668–674

    Article  CAS  Google Scholar 

  18. 18.

    Hily J-M, Demanèche S, Poulicard N, Tannières M, Djennane S, Beuve M, Vigne E, Demangeat G, Komar V, Gertz C, Marmonier A, Hemmer C, Vigneron S, Marais A, Candresse T, Simonet P, Lemaire O (2018) Metagenomic-based impact study of transgenic grapevine rootstock on its associated virome and soil bacteriome. Plant Biotechnol J 16:208–220

    CAS  PubMed  Article  Google Scholar 

  19. 19.

    Hily J-M, Poulicard N, Candresse T, Vigne E, Beuve M, Renault L, Velt A, Spilmont A-S, Lemaire O (2020) Datamining, genetic diversity analyses and phylogeographic reconstructions redefine the worldwide evolutionary history of grapevine Pinot gris virus and grapevine berry inner necrosis virus. Phytobiomes J 4:165–177

    Article  Google Scholar 

  20. 20.

    Cigsar MD, Gokalp K, Abou Ghanem-Sabanadzovic N, De Stradis A, Boscia D, Martelli GP (2003) Grapevine deformation virus, a novel nepovirus from turkey. J Plant Pathol 85:183–191

    CAS  Google Scholar 

  21. 21.

    Imura Y, Oka H, Kimata K, Nasu M, Nakahama K, Maeda T (2008) Comparisons of complete RNA-2 sequences, pathological and serological features among three Japanese isolates of Arabis mosaic virus. Virus Genes 37:333–341

    CAS  PubMed  Article  Google Scholar 

  22. 22.

    International Committee on Taxonomy of Viruses Executive C (2012) Family - Secoviridae. In: King AMQ, Adams MJ, Carstens EB, Lefkowitz EJ (eds) Virus Taxonomy. Elsevier, San Diego, pp 881–899

    Google Scholar 

  23. 23.

    Isogai M, Tatuto N, Ujiie C, Watanabe M, Yoshikawa N (2012) Identification and characterization of blueberry latent spherical virus, a new member of subgroup C in the genus Nepovirus. Adv Virol 157:297–303

    CAS  Google Scholar 

  24. 24.

    Kis S, Salamon P, Kis V, Szittya G (2017) Molecular characterization of a beet ringspot nepovirus isolated from Begonia ricinifolia in Hungary. Adv Virol 162:3559–3562

    CAS  Google Scholar 

  25. 25.

    Koloniuk I, Přibylová J, Fránová J (2018) Molecular characterization and complete genome of a novel nepovirus from red clover. Adv Virol 163:1387–1389

    CAS  Google Scholar 

  26. 26.

    Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol 33:1870–1874

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. 27.

    Kumar S, Stecher G, Li M, Knyaz C, Tamura K (2018) MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35:1547–1549

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Le Gall O, Lanneau M, Candresse T, Dunez J (1995) The nucleotide sequence of the RNA-2 of an isolate of the English serotype of tomato black ring virus: RNA recombination in the history of nepoviruses. J Gen Virol 76:1279–1283

    PubMed  Article  Google Scholar 

  29. 29.

    Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25:1451–1452

    CAS  PubMed  Article  Google Scholar 

  30. 30.

    Ling R, Pate AE, Carr JP, Firth AE (2013) An essential fifth coding ORF in the sobemoviruses. Virology 446:397–408

    CAS  PubMed  Article  Google Scholar 

  31. 31.

    Lu Q-Y, Wu Z-J, Xia Z-S, Xie L-H (2015) A new nepovirus identified in mulberry (Morus alba L.) in China. Adv Virol 160:851–855

    CAS  Google Scholar 

  32. 32.

    Martin DP, Murrell B, Golden M, Khoosal A, Muhire B (2015) RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evolution 1:1–5

    Article  Google Scholar 

  33. 33.

    Mathioudakis M, Saponari M, Hasiow-Jaroszewska B, Elbeaino T, Koubouris G (2020) The Detection of viruses in olive cultivars in Greece, using a rapid and effective RNA extraction method, for certification of virus-tested propagation material. Phytopathologia Mediterranea 59:203–211

    CAS  Google Scholar 

  34. 34.

    Mekuria TA, Gutha LR, Martin RR, Naidu RA (2009) Genome diversity and intra- and interspecies recombination events in grapevine fanleaf virus. Phytopathology 99:1394–1402

    CAS  PubMed  Article  Google Scholar 

  35. 35.

    Oliver JE, Vigne E, Fuchs M (2010) Genetic structure and molecular variability of Grapevine fanleaf virus populations. Virus Res 152:30–40

    CAS  PubMed  Article  Google Scholar 

  36. 36.

    Osterbaan LJ, Choi J, Kenney J, Flasco M, Vigne E, Schmitt-Keichinger C, Rebelo AR, Cilia M, Fuchs M (2019) The identity of a single residue of the RNA-dependent RNA polymerase of grapevine fanleaf virus modulates vein clearing symptoms in nicotiana benthamiana. Molecular plant microbe interactions 32:790–801

  37. 37.

    Pagán I (2018) The diversity, evolution and epidemiology of plant viruses: A phylogenetic view. Infect Genet Evol 65:187–199

    PubMed  Article  Google Scholar 

  38. 38.

    Pompe-Novak M, Gutiérrez-Aguirre I, Vojvoda J, Blas M, Tomažič I, Vigne E, Fuchs M, Ravnikar M, Petrovič N (2007) Genetic variability within RNA2 of Grapevine fanleaf virus. Eur J Plant Pathol 117:307–312

    CAS  Article  Google Scholar 

  39. 39.

    Rebenstorf K, Candresse T, Dulucq MJ, Büttner C, Obermeier C (2006) Host species-dependent population structure of a pollen-borne plant virus, Cherry leaf roll virus. J Virol 80:2453–2462

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. 40.

    Rezk AA, Amal AA, Farag A G, M SA (2009) Biological assay and molecular characterization of apricot isolate of Arabis mosaic virus. Arab J Biotechnol 12:237-250

  41. 41.

    Rivera L, Zamorano A, Fiore N (2016) Genetic divergence of tomato ringspot virus. Adv Virol 161:1395–1399

    CAS  Google Scholar 

  42. 42.

    Roberts JMK, Anderson DL, Durr PA (2018) Metagenomic analysis of Varroa-free Australian honey bees (Apis mellifera) shows a diverse Picornavirales virome. J Gen Virol 99:818–826

    CAS  PubMed  Article  Google Scholar 

  43. 43.

    Rowhani A, Daubert SD, Uyemoto JK, Al Rwahnih M, Fuchs M (2017) American Nepoviruses. In: Meng B, Martelli GP, Golino DA, Fuchs M (eds) Grapevine Viruses: Molecular Biology, Diagnostics and Management. Springer International Publishing, Cham, pp 109–126

    Chapter  Google Scholar 

  44. 44.

    Sanfaçon H (2008) Nepovirus. In: Mahy BWJ, Van Regenmortel MHV (eds) Encyclopedia of Virology, 3rd edn. Academic Press, Oxford, pp 405–413

    Chapter  Google Scholar 

  45. 45.

    Sanfaçon H (2015) Secoviridae: A family of plant picorna-like viruses with monopartite or bipartite genomes. eLS. John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470015902.a0000764.pub3

  46. 46.

    Sanfaçon H, Dasgupta I, Fuchs M, Karasev AV, Petrzik K, Thompson JR, Tzanetakis I, van der Vlugt R, Wetzel T, Yoshikawa N (2020) Proposed revision of the family Secoviridae taxonomy to create three subgenera, “Satsumavirus”, “Stramovirus” and “Cholivirus”, in the genus Sadwavirus. Adv Virol 165:527–533

    Google Scholar 

  47. 47.

    Schellenberger P, Andret-Link P, Schmitt-Keichinger C, Bergdoll M, Marmonier A, Vigne E, Lemaire O, Fuchs M, Demangeat G, Ritzenthaler C (2010) A stretch of 11 amino acids in the betaB-betaC loop of the coat protein of grapevine fanleaf virus is essential for transmission by the nematode Xiphinema index. J Virol 84:7924–7933

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48.

    Simmonds P, Aiewsakun P (2018) Virus classification – where do you draw the line? Adv Virol 163:2037–2046

    CAS  Google Scholar 

  49. 49.

    Sokhandan-Bashir N, Melcher U (2012) Population genetic analysis of grapevine fanleaf virus. Adv Virol 157:1919–1929

    CAS  Google Scholar 

  50. 50.

    Sorrentino R, De Stradis A, Russo M, Alioto D, Rubino L (2013) Characterization of a putative novel nepovirus from Aeonium sp. Virus Res 177:217–221

    CAS  PubMed  Article  Google Scholar 

  51. 51.

    Thompson JR, Kamath N, Perry KL (2014) An evolutionary analysis of the secoviridae family of viruses. PLoS ONE 9:e106305

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  52. 52.

    Vigne E, Bergdoll M, Guyader S, Fuchs M (2004) Population structure and genetic variability within isolates of Grapevine fanleaf virus from a naturally infected vineyard in France: evidence for mixed infection and recombination. J Gen Virol 85:2435–2445

    CAS  PubMed  Article  Google Scholar 

  53. 53.

    Vigne E, Demangeat G, Komar V, Fuchs M (2005) Characterization of a naturally occurring recombinant isolate of Grapevine fanleaf virus. Adv Virol 150:2241–2255

    CAS  Google Scholar 

  54. 54.

    Vigne E, Marmonier A, Fuchs M (2008) Multiple interspecies recombination events within RNA2 of Grapevine fanleaf virus and Arabis mosaic virus. Adv Virol 153:1771–1776

    CAS  Google Scholar 

  55. 55.

    Vigne E, Garcia S, Komar V, Lemaire O, Hily J-M (2018) Comparison of serological and molecular methods with high-throughput sequencing for the detection and quantification of grapevine fanleaf virus in vineyard samples. Front Microbiol 22:2726

    Article  Google Scholar 

  56. 56.

    Walker M, Chisholm J, Wei T, Ghoshal B, Saeed H, Rott M, Sanfaçon H (2015) Complete genome sequence of three tomato ringspot virus isolates: evidence for reassortment and recombination. Adv Virol 160:543–547

    CAS  Google Scholar 

  57. 57.

    Wetzel T, Fuchs M, Bobko M, Krczal G (2002) Size and sequence variability of the Arabis mosaic virus protein 2A. Adv Virol 147:1643–1653

    CAS  Article  Google Scholar 

  58. 58.

    Yasmin T, Nelson BD, Hobbs HA, McCoppin NK, Lambert KN, Domier LL (2017) Molecular characterization of a new soybean-infecting member of the genus Nepovirus identified by high-throughput sequencing. Adv Virol 162:1089–1092

    CAS  Google Scholar 

Download references

Acknowledgements

This study was funded by the projects VACCIVINE and GPGV of the ‘Plan National Dépérissement du vignoble’ (French Ministry of Agriculture, FranceAgrimer and CNIV) and by URIVir project (ANR-20-CE20-0010).

Author information

Affiliations

Authors

Corresponding authors

Correspondence to J. M. Hily or E. Vigne.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Handling Editor: Ioannis E. Tzanetakis.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 863 KB)

Supplementary file2 (XLSX 80 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hily, J.M., Poulicard, N., Kubina, J. et al. Metagenomic analysis of nepoviruses: diversity, evolution and identification of a genome region in members of subgroup A that appears to be important for host range. Arch Virol 166, 2789–2801 (2021). https://doi.org/10.1007/s00705-021-05111-0

Download citation