Introduction

The family Geminiviridae includes plant-infecting viruses with circular ssDNA genomes that are encapsidated in virions comprised of 22 pentameric capsomeres arranged in a ~18- to 30-nm twinned icosahedral (or geminate) configuration [7, 41]. Geminiviruses have a near-global distribution and infect both monocotyledonous and dicotyledonous plants. Geminiviruses are often responsible for serious yield losses in economically important crops, including tomato, maize, cotton, chickpea and cassava [25]. Symptoms of geminivirus infections can include foliar crinkling, curling, yellowing, stunting, mosaic and/or striations.

The Geminiviridae Study Group of the International Committee on Taxonomy of Viruses (ICTV) has provided a series of guidelines for the classification of viruses within the seven currently recognised genera of the family: Becurtovirus, Begomovirus, Curtovirus, Eragrovirus, Mastrevirus, Topocuvirus and Turncurtovirus. The criteria include host range, insect vector, genome organisation and genome-wide pairwise sequence identities [1, 11, 12, 26, 38, 39].

Over the last decade, the development of new molecular tools such as rolling-circle amplification (RCA) and high-throughput sequencing have greatly facilitated the discovery of novel geminiviruses in a range of cultivated and non-cultivated host species [29]. Accordingly, these techniques have accelerated the discovery of highly divergent geminivirus-like viruses. For some of these viruses, a large number of closely related full genome sequences have been determined. Analyses of these sequences (pairwise identities and phylogenetics), coupled with the identification of insect vectors and particle morphology, have allowed us to determine with great confidence that some of these divergent viruses should be classified as members of two new genera in the family Geminiviridae, which was approved by the Executive Committee of the ICTV:

  1. 1)

    Capulavirus, which will for now include four species, Alfalfa leaf curl virus, Euphorbia caput-medusae latent virus, French bean severe leaf curl virus and Plantago lanceolata latent virus [5, 6, 30, 36]

  2. 2)

    Grablovirus, which will for now include one species, Grapevine red blotch virus [18, 28, 35]

In addition, two other species of highly divergent viruses that are closely related to geminiviruses have been accepted by the Executive Committee of the ICTV for inclusion in the family Geminiviridae and will remain unassigned to a genus, awaiting identification of the vector species and confirmation of particle morphology. These new species are:

  1. 1)

    Citrus chlorotic dwarf associated virus [17, 21]

  2. 2)

    Mulberry mosaic dwarf associated virus [23, 24]

Furthermore, recently, a single virus isolated from an apple tree in China (tentatively referred to as apple geminivirus; AGmV) [20] and 13 virus isolates from grapevine in Hungary, Israel, Japan and South Korea (tentatively referred to as grapevine geminivirus A; GGVA) sharing >97% pairwise identity have been discovered [2]. These have distinctly geminivirus-like genome organisations and are more closely related to members of some established geminivirus species than they are to citrus chlorotic dwarf associated virus (CCDaV) and mulberry mosaic dwarf associated virus (MMCaV). The taxonomic assignment of AGmV and GGVA still needs to be discussed by the ICTV.

Whereas the arrangements of genes and intergenic regions in the genomes of curtoviruses, topocuviruses, turncurtoviruses, AGmV, GGVA and genomes/DNA A components of begomoviruses are all very similar, the becurtoviruses, capulaviruses, eragroviruses, grabloviruses and mastreviruses all have distinctly unique genome organisations (Fig. 1). Furthermore, the genome organisation of CCDaV is also apparently unique [21]. Particularly noteworthy in the CCDaV genome (3640 nt) is the ~921-nt-long virion-sense ORF, V3, which results in the CCDaV genome being 600-800 nt larger than those of the other known monopartite geminiviruses (Fig. 1).

Fig. 1
figure 1

Genome organisation of viruses in the genera of the family Geminiviridae. Citrus chlorotic dwarf associated virus and Mulberry mosaic dwarf associated virus have been accepted by the Executive Committee of the International Committee on Taxonomy of Viruses (ICTV) as a species within the family Geminiviridae but have not been assigned to a genus. AGmV and GGVA have not yet been considered as representatives of a species by ICTV

Although the genomes of the known geminiviruses can have up to eight genes, only two of them, the replication-associated protein gene (rep) and the coat protein gene (cp) are obviously conserved across viruses in all of the genera, whereas the replication enhancer gene (ren), symptom determinant gene (sd) and silencing suppressor gene (ss) homologues are found in begomoviruses, curtoviruses, topocuviruses and turncurtoviruses (Fig. 1), and transcription activator gene (trap) homologues are additionally found in eragroviruses (Fig. 1). The genomes of viruses in all of the established genera also have either proven or putative movement protein genes (mp), which, in the monopartite members of the family, are immediately upstream of cp. It is currently unknown, however, if these genes in viruses from different genera have a common evolutionary origin, because the amino acid sequences that they encode have such low degrees of similarity that it is not possible to conclude that they are homologous.

The genomes of becurtoviruses, capulaviruses, grabloviruses, eragroviruses, mastreviruses, and CCDaVs all have two intergenic regions (Fig. 1). In general, the longer of these is referred to as the long intergenic region (LIR) and contains a conserved nonanucleotide within a hairpin structure that forms the origin of virion-strand replication [3, 22]. Similarly, the shorter intergenic region is referred to as the short intergenic region (SIR) and contains transcription termination signals and, for mastreviruses at least, the origin of complementary-strand replication.

Introns are present in several geminivirus genes. Rep and RepA proteins in becurtoviruses, capulaviruses, grabloviruses, mastreviruses, MMCaV and CCDaV are either proven to be, or inferred to be, expressed from alternatively spliced complementary-sense transcripts [8, 13, 40] (Fig. 1). Further, an intron occurs in the mp of mastreviruses [40], and an intron has been inferred to occur in the mp of the capulavirus EcmLV (Fig. 1) [5].

Recombination has played a significant role in the evolution of geminiviruses, and it can therefore be difficult to accurately infer the evolutionary relationships of viruses in different genera. Whereas non-homologous recombination has potentially contributed to the loss or gain of large genome regions in particular genera, homologous recombination has also likely contributed to the exchange of genome regions between viruses in different genera [6, 9, 31, 33].

Nonetheless, we inferred the approximate genome-wide evolutionary relationships between representative members of the various established geminivirus genera, CCDaV, MMCaV, AGmV and GGVA (Fig. 1). The full genome sequences of these representative viruses were aligned using MUSCLE [15] and used to infer a neighbor-joining phylogenetic tree with the Jukes-Cantor nucleotide substitution model and 1000 bootstrap replicates. Branches with <60% support were collapsed using TreeGraph2 [34], and the tree was rooted at the midpoint (Fig. 2). From this phylogenetic tree, it is apparent that the capulaviruses and grabloviruses are closely related to one another and that CCDaV and MMCaV are closely related to one another.

Fig. 2
figure 2

A. Unrooted neighbour-joining tree inferred from aligned full-genome sequences of representative isolates from various genera in the family Geminiviridae. Numbers associated with branches indicate percentage bootstrap support for these branches (determined with 1000 bootstrap replicates). B. Maximum-likelihood phylogenetic tree (applying the WAG+I+G amino acid substitution model) inferred from aligned CP sequences of representative isolates from various genera in the family Geminiviridae. Numbers associate with branches indicate percentage aLRT support for these branches. The CP phylogenetic tree is rooted at the midpoint. Branches with less than 80% aLRT support have been collapsed. C. Maximum-likelihood phylogenetic tree (applying the LG+I+G amino acid substitution model) inferred from aligned Rep sequences of representative isolates from various genera in the family Geminiviridae. Numbers associated with branches indicate percentage aLRT support for these branches. The Rep phylogenetic tree is rooted with the Rep sequences of members of the family Genomoviridae (not included here). Branches with less than 80% aLRT support have been collapsed

We also analysed the evolutionary relationships of the inferred CP and Rep amino acid sequences of the same set of representative geminiviruses. The CP and Rep sequence datasets were aligned using MUSCLE [15] and used to infer maximum-likelihood phylogenetic trees using PHYML3.0 [16], applying the WAG+I+G and LG+I+G substitution model, respectively (selected as the best-fit models by ProtTest [14]; Fig. 2). Branch support in these trees was determined using approximate likelihood ratio tests (aLRT). The Rep phylogenetic tree was rooted with sequences from the family Genomoviridae [19], whereas the CP tree was rooted at the midpoint. Branches with <80% aLRT support were collapsed using TreeGraph2 [34]. Pairwise CP and Rep amino acid sequence identities were determined using SDT v1.2 [27] (Fig. 3).

Fig. 3
figure 3

Pairwise identities of the Rep and CP sequences of representative isolates from various genera in the family Geminiviridae as determined using SDT v1.2 (Muhire et al., 2014)

Discordance in the phylogenetic placement of curtoviruses, becurtoviruses, topocuviruses, eragroviruses, turncurtoviruses and CCDaV in the CP and Rep phylogenetic trees likely reflects the fact that the most recent common ancestors of the known viruses in these genera are possibly the products of inter-genus recombination events [10, 37]. For example, the becurtovirus CPs are most closely related to those of curtoviruses, whereas the becurtovirus Reps are most closely related to those of CCDaV and MMCaV (Fig. 2 and Fig. 3).

Capulavirus

A group of closely related viruses isolated from Euphorbia caput-medusae (South Africa), Medicago sativa (France and Spain), Phaseolus vulgaris (India) and Plantago lanceolata (Finland) have been assigned to the new genus Capulavirus. The genus name Capulavirus was derived from the type member of the genus: euphorbia caput-medusae latent virus. As with viruses in the genera Mastrevirus and Becurtovirus, capulaviruses have two intergenic regions and express Rep from a spliced complementary-strand transcript. In common with begomoviruses and curtoviruses, they have a large complementary-sense ORF (C3) that is completely embedded within rep. A unique feature of capulavirus genomes is a complex arrangement of possible MP-encoding ORFs located in the 5’direction from cp (Fig. 1). Two or more of these ORFs may constitute an intron-containing mp (Fig. 1). Geminate particles have been observed in purified preparations of EcmLV by transmission electron microscopy [30]. Aphids of the species Aphis craccivora have been shown to transmit ALCV [30]. All known capulaviruses have the nonanucleotide motif “TAATATTAC” at their presumed origins of virion-strand replication.

An analysis of the distribution of pairwise identities (one minus Hamming distances of pairwise aligned sequences with pairwise deletion of gaps) of known capulavirus genomes using SDT v1.2 [27] (n = 47; Table 1) indicates that a pairwise-identity-based species demarcation criterion that would minimize conflicts (i.e., possible assignments of individual isolates to two or more species) could be placed either in the 72 to 84% identity interval, or in the 87-94% interval (Fig. 4). Whereas a species demarcation threshold within the 87-94% range would yield five species, placing it within the 72-84% range would yield four species. In the case of a threshold within the 87-94% range, the alfalfa leaf curl virus (ALCV) sequences, which have all been isolated from alfalfa plants, would be split into two species. Given that there seems to be no good biological reason to split the ALCVs into two species, we have opted to tentatively place the species demarcation threshold in the 72-84% range. To bring the capulavirus species demarcation threshold in line with that of the genus Mastrevirus (the recognized genus of geminiviruses that the capulaviruses are most closely related to; Fig. 4) we have opted to tentatively select 78% as the pairwise identity value above which two sequences should be considered isolates of the same species. The four proposed capulavirus species to which the 47 known capulavirus full genome sequences (Table 1) have been assigned are Alfalfa leaf curl virus (n = 26), Euphorbia caput-medusae latent virus (n = 17), French bean severe leaf curl virus (n = 2) and Plantago lanceolata latent virus (n = 2). Based on pairwise identity comparisons, all isolates (ALCV; EcmLV; french bean severe leaf curl virus, FbSLCV; plantago lanceolata latent virus, PlLV) within each of these species share between 63% and 73% genome-wide sequence identity with all isolates that have been assigned to other proposed capulavirus species (Fig. 2).

Table 1 Details of virus isolates that have been assigned to the four proposed species within the genus Capulavirus
Fig. 4
figure 4

A. Distribution of pairwise nucleotide sequence identities of 47 known capulavirus genomes. B. Genome-wide pairwise identities determined using SDT v1.2 [27] and a neighbor-joining phylogenetic tree of capulavirus isolates. The tree is rooted with mastrevirus genome sequences (not shown)

Regardless of whether the full-genome nucleotide sequence, the inferred Rep amino acid sequence, or the inferred CP amino acid sequence is considered, all of the viruses belonging to the proposed capulavirus species group with 100% aLRT support with other proposed capulaviruses (Fig. 2). The viruses assigned to the genus Capulavirus share <22% CP amino acid sequence identity and <45% Rep amino acid sequence identity with viruses in other established genera of the family Geminiviridae (Fig. 3).

Grablovirus

A group of closely related viruses discovered infecting grapevines in Canada, South Korea and the USA have been assigned to the genus Grablovirus. The genus name is based on the name of the type member: grapevine red blotch virus (GRBV), the first and currently the only member of this genus [18]. These viruses have the virion-strand origin of replication nonanucleotide motif TAATATTAC and a unique genome arrangement (Fig. 1; Table 2). The natural vector is likely the three-cornered alfalfa treehopper (Spissistilus festinus Say) [4].

Table 2 Details of virus isolates that have been assigned to the proposed species Grapevine red blotch virus within the genus Grablovirus

Analysis of the 27 known grablovirus full-genome sequences indicate that they all share >91% genome-wide nucleotide sequence identity (Fig. 5) and are therefore all assigned to the species Grapevine red blotch virus. All of the GRBV isolates share <45% Rep amino acid sequence identity with all viruses in other geminivirus genera (Fig. 3). Regardless of whether the full-genome nucleotide sequence, Rep amino acid sequence or CP amino acid is considered, the GRBV isolates cluster together with 100% aLRT support (Fig. 2).

Fig. 5
figure 5

A. Distribution of pairwise nucleotide sequence identities of 27 grablovirus genomes. B. Genome-wide pairwise identities determined using SDT v1.2 [27] and a neighbor-joining phylogenetic tree of grablovirus isolates. The tree is rooted with mastrevirus genome sequences (not shown)

In the absence of additional related sequences in the genus, we suggest that, based on the species demarcation criteria used for most other geminivirus genera, grablovirus isolates sharing <80% genome-wide pairwise identity with members of accepted grablovirus species should be classified as members of distinct grablovirus species. This is a tentative species demarcation criterion that will need to be refined as more full-genome sequences of grabloviruses are determined.

New geminivirus species that remain unassigned to a genus

Citrus chlorotic dwarf associated virus and Mulberry mosaic dwarf associated virus

Two complete CCDaV genomes, isolated from citrus trees in Turkey (JQ920490) [21] and China (KF561253), and eight complete MMCaV genomes, isolated from mulberry bushes in China (KP303687, KP699128-KP699132, KP728254, KR131749) [23, 24], are clearly closely related to those of geminiviruses, based on both their geminivirus-like genome organization and the fact that their inferred Rep and CP amino acid sequences cluster phylogenetically with those of the known geminiviruses (Fig. 1 and Fig. 2). All known CCDaV and MMDaV isolates also have the nonanucleotide TAATATTAC, which is characteristic of the virion-strand origin of replication of the majority of geminiviruses.

The two CCDaV sequences share 99.5% genome-wide pairwise identity, whereas the eight MMDaV sequences share >97% identity. The predicted Rep proteins of these CCDaV and MMDaV isolates share ~58% amino acid sequence similarity with one another, but only 33-42% similarity with those of other geminiviruses (Fig. 3). Regardless of whether full genome nucleotide sequences, inferred Rep amino acid sequences, or inferred CP amino acid sequences are considered, the two proposed species group separately from all other geminiviruses, with 100% aLRT branch support (Fig. 2).

CCDaV and MMDaV will need to remain unassigned to a genus until their respective vector species are identified and/or it is confirmed that they form geminate particles.

New geminiviruses identified recently that need to be reviewed by ICTV

Apple geminivirus and grapevine geminivirus A

One AGmV complete genome isolated from an apple tree in China (accession no. KM386645) [20] and thirteen GGVA complete genomes isolated from grapevine in China (KX570611), Hungary (KX570618), Israel (KX618694), Japan (KX570610, KX570612, KX570613, KX570615, KX570616, KX570617) and South Korea (KX570607, KX570609, KX570614) group together, with 94% bootstrap support (Fig. 2). These viruses isolated from woody plants contain the typical geminivirus-like nanonucleotide motif TAATATTAC and display a distinctly geminivirus-like genome organisation. In addition, these viruses are clearly recombinant, with a begomovirus-like Rep (sharing 59-68% amino acid sequence identity) and a divergent CP that is unlike that of any known geminivirus.

The thirteen GGVA sequences share >97% identity. The unique AGmV sequence and the thirteen GGVA sequences share ~63% genome-wide pairwise identity. The Rep and CP sequences of the AGmV and GGVA isolates share ~60% and 48% amino acid sequence identity, respectively, with one another. The Rep proteins are most closely related to those of begomoviruses, curtoviruses, topocuvirus and turncurtoviruses, sharing 50-68% identity (Fig. 3). Although, based on inferred Rep, AGmV and GGVA branch with begomoviruses, both viruses group separately from all other geminiviruses, with 100% aLRT branch support (Fig. 2), based on the inferred CP amino acid sequences.

AGmV and GGVA must also remain unassigned to a genus until the vectors are identified and/or geminate particles are observed.

Concluding remarks

The past ten years have seen the establishment of five new geminivirus genera, and the next ten will likely see many more being established as metagenomics-based approaches begin enabling the convenient and cost-effective discovery of viruses in greater numbers from cultivated and non-cultivated plant host species [29, 32]. As well as illuminating the actual diversity of geminiviruses and the role played by recombination in their evolutionary history, such approaches also have the potential to elucidate the full range of geminivirus host and vector species. Knowing where geminiviruses are found (especially in non-cultivated hosts where they may cause no obvious disease symptoms) and how they are transmitted will not only enable prediction of which geminiviruses could pose serious threats to agriculture but also provide a detailed view of the actual role of geminiviruses in natural and disturbed terrestrial ecosystems.