Introduction

Over the past 50 years, plant-infecting geminiviruses (family Geminiviridae) have been amongst the most important pathogens of cultivated plant species in most tropical and subtropical regions of the world [28]. Geminiviruses infect both monocotyledonous and dicotyledonous plants, in some cases without visible symptoms and in others causing symptoms that include different degrees of foliar crinkling, curling, yellowing, distortion, stunting, mosaic and/or striations [28]. The family Geminiviridae includes viruses with circular single-stranded DNA (ssDNA) genomes that are individually encapsidated into 22 × 38 nm twinned quasi-icosahedral virions made up of 110 capsid protein subunits organized as 22 pentameric capsomers [2, 16, 17, 41]. The genomes of geminiviruses can be monopartite, with a single DNA molecule of ~ 2,600–3,600 nucleotides (nt), or bipartite, with two DNA molecules of ~ 2,600 nt (termed DNA-A and DNA-B) for a total genome size of ~ 5,200 nt [38].

As of 2020, there were nine genera (Becurtovirus, Begomovirus, Capulavirus, Curtovirus, Eragrovirus, Grablovirus, Mastrevirus, Topocuvirus, and Turncurtovirus) [36, 38] and two unassigned species, Citrus chlorotic dwarf associated virus [19] and Mulberry mosaic dwarf associated virus [20, 22] in the family Geminiviridae. A framework for species classification based on pairwise genome sequence identity between member viruses, coupled with phylogenetic support, has been established for becurtoviruses [35], begomoviruses [3], capulaviruses [36], curtoviruses [34], eragroviruses [35], grabloviruses [36], mastreviruses [24], topocuviruses [26], and turncurtoviruses [35].

Over the last several years, a number of new geminiviruses have been identified that do not fit the previously established nine genera. These include geminiviruses identified infecting apple trees (Malus domestica Borkh.) [18], camellia (Camellia japonica L.) [40], grapevine (Vitis vinifera L. [1], sea rush (Juncus maritimus Lam.) [5], passion fruit (Passiflora edulis Sims.) [8], tomato (Solanum lycopersicum L.) [7, 33], cleome (Cleome sp.) [7], cactus plants (subfamilies Cactoideae and Opuntioideae in the Cactaceae) [9], white mulberry (Morus alba L.) [20, 22], and paper mulberry (Broussonetia papyrifera (L.) L'Hér. ex Vent.) [27]. Based on the inferred genome organizations of these viruses (Fig. 1) coupled with phylogenetic analysis (Fig. 2), these diverse geminiviruses are now assigned to five new genera: Citlodavirus, Maldovirus, Mulcrilevirus, Opunvirus, and Topilevirus (Figs. 1 and 2, Table 1). Within these genera, twelve species have been established, and species demarcation criteria have been defined.

Fig. 1
figure 1

Illustration of the genome organisation of various geminiviruses. LIR, long intergenic region; SIR, short intergenic region; CR, common region; cp, capsid protein; mp, movement protein; nsp, nuclear shuttle protein; reg, regulatory gene; ren, replication enhancer; rep, replication-associated protein; sd, symptom determinant; ss, silencing suppressor; trap, transactivator protein. The genomes of representative members of the five new genera are indicated by asterisks.

Fig. 2
figure 2

Unrooted neighbor-joining tree inferred from aligned complete genome sequences of representative isolates from the various geminivirus genera. The five new genera are indicated by asterisks. Branches with less than 60% bootstrap support have been collapsed with TreeGraph2 [32].

Table 1 Summary of the five new genera and 12 new species in the family Geminiviridae

Genus Citlodavirus

A group of closely related viruses isolated from citrus in China, Thailand, and Turkey [19, 37, 42], Camellia japonica and Camellia sinensis in China [40], Passiflora edulis in Brazil [8], and Broussonetia papyrifera in China [27] has been assigned to the new genus Citlodavirus (Table 1). The genus name Citlodavirus is derived from the “type member” of the genus: citrus chlorotic dwarf associated virus [19]. The genome size of these viruses is approximately 20% larger than that of other known monopartite geminiviruses and ranges from 3639 to 3763 nt (Fig. 1). This greater length of their genomes is due to their predicted mp gene (891–921 nt in size), which likely encodes a protein product homologous to that encoded by the DNA-B component of bipartite begomoviruses (genus Begomovirus) [8].

These viruses have the virion-strand origin of replication nonanucleotide motif ‘TAATATTAC’ and a unique genome arrangement. The virion-sense strand potentially encodes a capsid protein (CP), a movement protein (MP), and two other small hypothetical proteins, referred to as V2 and V3 (Fig. 1). The complementary strand potentially encodes a RepA protein and expresses a replication initiator protein (Rep) from an alternatively spliced complementary-sense transcript (Fig. 1). While the natural vector has not been formally identified yet, it has been proposed that the vector of citrus chlorotic dwarf associated virus (CCDaV) could be a whitefly (Parabemisia myricae) that thrives on woody plants [11, 19]. Phylogenetic analysis of the predicted CP amino acid sequence (Fig. 3) and pairwise sequence comparisons (Fig. 4) show that the citlodavirus CPs are most similar to those of begomoviruses that are transmitted by Bemisia tabaci. The citlodavirus capsid morphology is currently unknown, and further studies are needed to ascertain whether the > 3200-nt-long genomes of these viruses are encapsidated within geminate particles, or whether the particles form alternative structures [30].

Fig. 3
figure 3

Maximum-likelihood phylogenetic trees inferred using PhyML 3 [13], based on the Rep and CP amino acid sequences of representative members of the various genera in the family Geminiviridae. rtREV+G+I (Rep) and rtREV+G+F+I (CP) were used as best-fit substitution models as determined using ProtTest 3 [6]. Branches with < 60% bootstrap support have been collapsed with TreeGraph2 [32].

Fig. 4
figure 4

Pairwise identity matrix of Rep and CP amino acid sequences of representative members of the genera in the family Geminiviridae and four unassigned species determined using SDT v1.2 [25].

An analysis of the distribution of pairwise identities (one minus Hamming distances of pairwise aligned sequences with pairwise deletion of gaps) of known citlodavirus genome sequences (n = 20; Table 1) using SDT v1.2 [25] indicates that a pairwise-identity-based species demarcation threshold that would minimize conflicts (i.e., possible assignments of individual isolates to two or more species) could be placed in the 66-to-90% interval (Fig. 5, Supplementary Data 1). To align the citlodavirus species demarcation threshold with that of the majority of genera of the family Geminiviridae, we have opted to tentatively use 78% as the pairwise identity species demarcation threshold. Citlodaviruses whose genome sequences had less than 78% pairwise identity, coupled with phylogenetic support, would be considered members of new species. Based on these demarcation criteria, four citlodavirus species were established to accommodate the 20 citlodavirus isolates (Table 1). These species are Citrus chlorotic dwarf associated virus (n = 15), Camellia chlorotic dwarf-associated virus (n = 2), Paper mulberry leaf curl virus 2 (n = 2), and Passion fruit chlorotic mottle virus (n = 1). Viruses assigned to a citlodavirus species share between 61% and 66% genome-wide nucleotide sequence identity with members of other species within the genus (Fig. 2).

Fig. 5
figure 5

Distribution of pairwise identity values for the genomes of viruses in the genera Citlodavirus, Maldovirus, Mulcrilevirus, Opunvirus, and Topilevirus, determined using SDT v1.2 [25]. The cyan line represents the species demarcation threshold for each genus.

Regardless of whether the full-genome nucleotide sequence, the inferred Rep amino acid sequence, or the inferred CP amino acid sequence is considered, all viruses belonging to the proposed citlodavirus species group, with 60-100% bootstrap support, with other proposed citlodaviruses (Figs. 2 and 3). The Reps and CPs of the viruses assigned to the genus Citlodavirus share 32-44% and 16-40% amino acid sequence identity, respectively, with those of other geminiviruses (Fig. 4).

Genus Maldovirus

A group of closely related viruses isolated from two dicotyledonous plants, apple tree in China [18] and grapevine in China, Israel, Japan, Hungary, and South Korea [1], and one monocotyledonous plant, sea rush in France [5], has been assigned to the new genus Maldovirus (Table 1). The genus name Maldovirus (Malus domestica virus) is derived from the scientific name of the host plant (Malus domestica Borkh.) of the “type member” of the genus, apple geminivirus 1. All 15 known maldovirus isolates have the same ‘TAATATTAC’ virion-strand origin of replication nonanucleotide sequence motif and share a similar arrangement of five to six open reading frames (ORFs) within their genomes (Fig. 1). While the virion-sense strand potentially encodes a CP and one other small hypothetical protein, referred to as V2 (Fig. 1), the complementary strand potentially encodes a Rep and two or three other small hypothetical proteins, referred to as C2, C3, and C4 (Fig. 1). It is noteworthy that a naturally occurring 1559-nt subgenomic/defective form of grapevine geminivirus A has been isolated from various grapevines housed in germplasm collections in the USA [1]. Grapevine geminivirus A and its defective molecule were both shown to be graft-transmissible [1]. The natural vector of maldoviruses has not been identified to date.

An analysis of the distribution of pairwise identity values for known maldovirus genomes (n = 15; Table 1) indicates that a pairwise-identity-based species demarcation threshold could be placed in the 67-to-97% interval (Fig. 5, Supplementary Data 1). To align the maldovirus species demarcation threshold with that of the majority of genera of the family Geminiviridae, a 78% pairwise identity species demarcation threshold has tentatively been adopted. Therefore, groups of maldoviruses with less than 78% pairwise identity to other maldoviruses but >78% identity to one another, coupled with phylogenetic support for their branching within a separate clade, would be considered members of the same new species.

The 15 maldoviruses can therefore be assigned to three species (Table 1) i.e., Apple geminivirus 1 (n = 1), Grapevine geminivirus A (n = 13), and Juncus maritimus geminivirus 1 (n = 1). Genome sequences within each species share between 62% and 67% genome-wide identity with members of other maldovirus species (Fig. 2). Furthermore, regardless of whether the full-genome nucleotide sequence, the inferred Rep amino acid sequence, or the inferred CP amino acid sequence is considered, all of the viruses belonging to the proposed maldovirus species group, with 88-100% bootstrap support, with other proposed maldoviruses (Figs. 2 and 3). The Reps and CPs of the viruses assigned to the genus Maldovirus share 28-69% and 14-29% amino acid sequence identity, respectively, with those of other geminiviruses (Fig. 4).

Genus Mulcrilevirus

A group of closely related viruses isolated from white mulberry and paper mulberry in China [20, 22, 27] has been assigned to the new genus Mulcrilevirus (Table 1). Specifically, eight mulberry mosaic dwarf associated virus (MMDaV) isolates and one mulberry crinkle leaf virus (MCLV) isolate (Table 1) have been characterized from diseased white mulberry plants displaying crinkle leaf, mosaic and/or dwarfing symptoms [20, 22]. The genus name Mulcrilevirus is derived from the “type member” of the genus: mulberry crinkle leaf virus. All 11 known mulcrilevirus isolates have the same ‘TAATATTAC’ virion-strand origin of replication nonanucleotide sequence motif and an arrangement of ORFs similar to those described previously for other geminiviruses (Fig. 1). The virion-sense strand encodes a putative MP, a CP, and two additional small hypothetical proteins, referred to as V2 and V4 (Fig. 1). The complementary strand of the genome potentially encodes a RepA protein and expresses a Rep protein from a spliced complementary-strand transcript (Fig. 1). It has been proposed recently that MCLV could be transmitted by the leafhopper Tautoneura mori Matsumura [21].

An analysis of the distribution of pairwise identity values for known mulcrilevirus genomes (n = 11; Table 1) indicates that a pairwise-identity-based species demarcation threshold could be placed in the 61-to-97% interval (Fig. 5, Supplementary Data 1). To align the mulcrilevirus species demarcation threshold with that of the majority of genera of the family Geminiviridae, we have tentatively adopted a 78% pairwise identity species demarcation threshold. The MCLV and MMDaV isolates share > 96.9% genome-wide pairwise identity with each other, indicating that they belong to the same species (Fig. 2). Based on these criteria, two species in the genus Mulcrilevirus were established: Mulberry crinkle leaf virus (including isolates of MMDaVs and MCLVs) and Paper mulberry leaf curl virus 1. Isolates of a given Mulcrilevirus species share between 60.5% and 60.8% genome-wide sequence identity with each other (Fig. 2) and cluster together in phylogenetic trees with 63-91% bootstrap support, regardless of whether the full-genome nucleotide sequence, the inferred Rep amino acid sequence, or the inferred CP amino acid sequence is used for the analysis (Figs. 2 and 3). The Reps and CPs of the viruses assigned to the genus Mulcrilevirus share 32-53% and 17-40% amino acid sequence identity, respectively, with those of other geminiviruses (Fig. 4).

Genus Opunvirus

A group of closely related viruses has been identified in asymptomatic New World Cactaceae plants [9] and has been assigned to the new genus Opunvirus (Table 1). Specifically, 79 opuntia virus 1 (OpV1) genome sequences have been determined from cactus plants belonging to 20 different cactus species from both the Cactoideae and Opuntioideae subfamilies and from nine cactus-feeding cochineal insects (Dactylopius sp.) sampled in the USA and Mexico [9] (Table 1). The genus name Opunvirus is derived from the “type member” of the genus: Opuntia virus 1. All 79 known opunvirus genomes have the same ‘TAATATTAC’ virion-strand origin of replication nonanucleotide sequence motif and an arrangement of the six ORFs similar to those described previously for other geminiviruses (Fig. 1). The genome organization of opunviruses resembles that of monopartite begomoviruses. On the complementary strand, the opunvirus sequences encode a Rep, a putative replication enhancer protein (Ren), a putative transactivation protein (TrAP), and a putative symptom determinant protein (C4) (Fig. 1). A CP and a possible MP are encoded on the virion strand. Interestingly, OpV1 sequences were isolated directly from cochineal insects that were associated with the cactus plants from which OpV1 sequences were isolated [9]. However, the transmission of OpV1 has not been unequivocally demonstrated, and controlled insect transmission experiments will be needed to determine if cochineal insects are vectors for opunviruses.

An analysis of the distribution of pairwise identity values for of known OpV1 sequences (n = 79; Table 1) indicates that their genomes share > 78.4% genome-wide pairwise identity with each other (Fig. 5, Supplementary Data 1), and thus, for the moment, they have been assigned to a single species, Opuntia virus 1. We suggest that a 78% species demarcation threshold should also be used for this genus. In addition, OpV1 genome sequences share less than 64.9% identity with all other known geminiviruses within currently established species (Fig. 2).

Regardless of whether the full-genome nucleotide sequence, the inferred Rep amino acid sequence, or the inferred CP amino acid sequence is used for the analysis, all OpV1 isolates cluster together in phylogenetic trees with 100% bootstrap support (Figs. 2 and 3). The Reps and CPs of the viruses assigned to the genus Opunvirus share 30-68% and 15-29% amino acid sequence identity, respectively, with those of other geminiviruses (Fig. 4).

Genus Topilevirus

From tomato and cleome plants sampled in Argentina and Brazil [7, 33], a group of closely related viruses has been identified that has been assigned to the new genus Topilevirus (Table 1). The genus name Topilevirus was derived from the “type member” of the genus: tomato apical leaf curl virus. The genome of topileviruses has a ‘TAATATTAC’ virion-strand origin of replication nonanucleotide motif. A CP, a possible MP, and a protein that possibly regulates relative ssDNA and dsDNA levels (Reg) are encoded on the virion strand (Fig. 1). The complementary strand of the genome potentially encodes a RepA protein, one small hypothetical protein, referred to as C3, and a Rep protein from a spliced complementary-strand transcript (Fig. 1).

Similar to viruses in the genera Becurtovirus, Capulavirus, Citlodavirus, Eragrovirus, Grablovirus, and Mastrevirus, topileviruses have two intergenic regions (large intergenic region [LIR] and small intergenic region [SIR]; Fig. 1). The natural vector of topileviruses is not known. However, using in silico prediction based on capsid protein sequences, it has been proposed that treehoppers in the family Membracidae could be the vector of this virus [33]. Nonetheless, this has not been experimentally confirmed yet.

An analysis of the distribution of pairwise identity values for known topilevirus genomes (n = 5; Table 1) indicates that a pairwise-identity-based species demarcation threshold could be placed in the 65-to-99% interval (Fig. 5, Supplementary Data 1). A 78% pairwise identity species demarcation threshold has tentatively been adopted for topileviruses, and thus the five viruses are assigned to two species, i.e., Tomato apical leaf curl virus (n = 3) and Tomato geminivirus 1 (n = 2) (Table 1). Based on pairwise identity comparisons, all isolates (tomato apical leaf curl virus and tomato geminivirus 1) within each of these species share between 64.2% and 64.9% genome-wide sequence identity with all isolates that have been assigned to other proposed topilevirus species (Fig. 2).

Regardless of whether the full-genome nucleotide sequence, the inferred Rep amino acid sequence, or the inferred CP amino acid sequence is used for the analysis, all topilevirus isolates cluster together in phylogenetic trees with 75-99% bootstrap support (Figs. 2 and 3). The Reps and CPs of the viruses assigned to the genus Topilevirus share 32-70% and 17-35% amino acid sequence identity, respectively, with those of other geminiviruses (Fig. 4).

Concluding remarks

With this report, a total of 10 new geminivirus genera have been established in the last decade to classify diverse members of the family Geminiviridae. These genera have been created to accommodate sequences that have predominantly been identified as a result of improved molecular techniques, including rolling-circle amplification [14] and high-throughput sequencing approaches [23, 29]. Furthermore, a recent study has identified previously unappreciated coding regions within the genome of what is arguably one of the best-studied geminiviruses, the begomovirus tomato yellow leaf curl virus. These coding regions encode small proteins that appear to play key roles in the subcellular localization of viral components and in virulence [12]. What this study indicates is that the genome organizations of all other geminiviruses, including those described here, could also be considerably more complex than can be determined by a cursory accounting of conserved ORFs.

At the time of writing, there remain four geminivirus species that have not been assigned to any established genera: Common bean curly stunt virus, Limeum africanum associated virus, Parsley yellow leaf curl virus, and Polygala garcinii associated virus [5, 15, 39]. These four species are either singletons or do not group with other known geminiviruses (Figs. 2 and 3). Further, we note that there have also been two recent reports of novel geminiviruses that have been identified in olive trees [4] and cacti [10].

Finally, we would like to point out that virus taxonomy is dynamic and not static. As we survey ever-increasing swathes of the geminivirus sequence space, representatives of novel species will be discovered that will demand that we change some of the criteria that we use to classify these viruses. We would like to note that although species names may change within this dynamic taxonomy, the virus names do not change. For example, bean golden mosaic virus will always be bean golden mosaic virus, even though the species name may change. We also would like to remind the reader that a standardized binomial species nomenclature, consisting of the genus name and a free-form species epithet, has been ratified recently by the International Committee on Taxonomy of Viruses (ICTV) [31]. A summary of the current species in the family Geminiviridae is provided in Supplementary Data 2. Thus, we encourage the community to engage with the ICTV Geminiviridae and Tolecusatellitidae Study Group to help smooth the transition from the more free-form virus nomenclature of the past to its new standardised version.