Introduction

Viruses of the family Geminiviridae are distinct in having genomes of circular, single-stranded (ss) DNA that are packaged within twinned quasi-isometric (“geminate”) virions. Geminiviruses are divided into four genera based on genome organisation and biological properties, most important of which are the type of insect vector (either whitefly, leafhopper or treehopper) and host range (either mono- or dicotyledonous hosts) [12]. Those having monopartite genomes that are transmitted by leafhopper vectors, primarily to monocotyledonous plants, are included in the genus Mastrevirus, of which Maize streak virus is the type species. Viruses that have monopartite genomes distinct from those of the mastreviruses and that are transmitted by leafhopper vectors to dicotyledonous plants are placed in the genus Curtovirus, with Beet curly top virus as the type species. The genus Topocuvirus, the most recently established genus, has only one member (also the type species), Tomato pseudo-curly top virus, which has a monopartite genome and is transmitted by a treehopper vector to dicotyledonous plants. The genus Begomovirus contains the majority of the identified geminivirus species and these are transmitted exclusively by the whitefly Bemisia tabaci (Gennadius) to dicotyledonous plants, with Bean golden yellow mosaic virus (originally considered an isolate of Bean golden mosaic virus) as the type species. Many begomoviruses have bipartite genomes (known as the DNA-A and DNA-B components), although numerous begomoviruses with monopartite genomes occur in the Old World. Recently, the monopartite begomovirus tomato yellow leaf curl virus has been inadvertently introduced to the New World from the Middle East/Mediterranean region [21, 22], and the New World virus, squash leaf curl virus from the southwestern U.S.A., has been introduced into the Middle East [2, 14], most likely due to the global trade in agricultural products.

A somewhat unexpected development occurred in 1999–2000, when additional components were shown to be associated with some begomoviruses. The first indication of this came with the report by Dry et al. [9] of a ssDNA satellite associated with tomato leaf curl virus (ToLCV) occurring in Australia. Later, this molecule was shown to be a defective (truncated) version of a much larger group of subviral components associated with begomoviruses.

A number of the apparently monopartite begomoviruses were shown to be incapable of inducing bona fide disease symptoms when introduced as infectious clones to the host species from which they were isolated. These included Ageratum yellow vein virus (AYVV) [31] and cotton leaf curl Multan virus (CLCuMV). Clones of these viruses were either not infectious or poorly infectious to the hosts in which they occur naturally; Ageratum conyzoides and Gossypium hirsutum (cotton), respectively [4, 26]. Upon closer inspection, plants naturally infected with AYVV and CLCuMV were shown to include additional ssDNA components. The first, now known as DNA-1, was interesting because it has similarity to, and likely evolved from, components of another group of ssDNA viruses; the nanoviruses [18, 25]. DNA-1, however, was shown to play no part in symptom induction or infectivity of its helper virus, and its function remains unclear.

The second group of components, collectively known as DNA-β, were shown to be approximately half the size of a begomovirus component (∼1,360 nucleotides) and required for efficient infection of some hosts [5, 26]. Since they were first recognized, DNA-β components have been shown to be associated with an increasing number of diseases caused by begomoviruses, including many of the most significant, economically damaging diseases occurring in the Old World. Probably the most important of these is cotton leaf curl disease (CLCuD). CLCuD was epidemic during the 1990s across Pakistan and continues to be so in northern India. During 2002, a second epidemic of a resistance-breaking strain of the virus initiated in central Pakistan and is threatening major yield losses for future cotton harvests [17]. The disease is caused by a complex consisting of multiple begomoviruses and a specific DNA-β component. Members of at least seven distinct begomovirus species, but only one type of DNA-β component, were shown to have caused the disease epidemic which occurred during the 1990s [19]. DNA-β has also been associated with the earliest written description of a plant virus disease [29].

Satellites are defined as subviral agents composed of nucleic acid that depend on co-infection with a helper virus for their productive multiplication. Satellite nucleic acids have substantially distinct nucleotide sequences from those of the genomes of their helper viruses. RNAs satellites that are associated with many RNA-containing viruses vary greatly in size, from less than 200 nucleotides to greater than 1,500 nucleotides. Although the larger satellites may encode functional open reading frames, the smaller satellites do not but are highly structured. Despite their small size and the apparent absence of potential gene products, satellites may have a dramatic effect on the symptoms induced by their helper viruses [7, 30]. These effects range from amelioration to severe exacerbation of symptoms, which vary with the helper virus, host plant and satellite combination.

At this time, plant virus-associated satellites are not classified by the International Committee on Taxonomy of Viruses (ICTV). They are grouped under the term “subviral agents” and divided into satellite viruses (those satellites that encode their own coat protein) and satellite nucleic acids, and subsequently according to the type of nucleic acid (either DNA or RNA) [12].

DNA-β satellites associated with begomoviruses

In the short time since they were first identified, over 260 full-length DNA-β sequences have been deposited in the databases. This number is in no small part due to the relative ease, with modern PCR-based cloning and automated sequencing procedures, with which these components can be isolated and characterised. However, it is also an indication of the importance and widespread nature of these components, at least in the Old World. Although we far from fully understand the functions that they provide to their helper begomoviruses, it is clear that a large number of viruses are associated with DNA-β components and that, at least in Asia, the DNA-β-requiring begomoviruses likely outnumber the bipartite and truly monopartite begomoviruses. This explosion in available DNA-β sequences has highlighted the need for a clear set of guidelines for naming and classifying newly characterised DNA-β components using a standardised approach.

The DNA-β satellites are typically half the length of their helper begomoviruses (∼1,360 nucleotides) and share no significant sequence homology with their helper viruses other than the presence of a potential stem–loop structure containing the ubiquitous nonanucleotide sequence TAATATTAC that marks, for geminiviruses, the origin of virion-strand DNA replication. They have a highly conserved structure consisting of a sequence of approx. 100 nucleotides conserved between all DNA-βs [known as the satellite conserved region (SCR)], a region of sequence rich in adenine (A-rich) and encode a single gene, the product of which known as βC1 [20]. βC1 is a pathogenicity determinant, a suppressor of post-transcriptional gene silencing, up-regulates viral DNA levels in planta, binds DNA and may be involved in virus movement [5, 8, 23, 24, 26, 32]. DNA-β satellites depend upon their helper viruses for replication, movement in plants and transmission between plants, presumably by trans-encapsidation in the helper virus’ coat protein. DNA-β components have been cloned from a diverse range of hosts, and DNA sequence comparison shows them to be highly diverse [3, 6, 32].

The nature of begomoviruses associated with DNA-β

AYVV was the first begomovirus shown to be associated with a DNA-β component [26]. Its peculiar behaviour, in comparison to begomviruses identified prior to this, in being poorly infectious to the host from which it was isolated (A. conyzoides) and inducing atypical symptoms, initiated the search for additional components. This is a common property of the majority of begomoviruses which associate with a DNA-β component, as is the observation that the virus usually does not naturally infect plants in the absence of the satellite. This includes viruses such as those associated with cotton leaf curl disease [19] and bhendi yellow vein mosaic disease [15].

More recently, some begomoviruses have been identified which have a facultative association with their DNA-β component. A recent study [16] has shown that some isolates of Tobacco curly shoot virus are associated with a satellite, whereas others are not. Viruses cloned from these isolates were shown to behave similarly, each being able to associate with DNA-β (yielding a more severe infection but not elevated viral DNA levels) as well as inducing asymptomatic infection in the absence of the satellite. It is possible that tobacco curly shoot virus (TbCSV) represents an evolutionary intermediate, although it is unclear whether it is evolving to gain or lose the requirement for a DNA-β component.

Phylogenetic comparisons show that begomoviruses associated with a DNA-β component are not monophyletic, thus are not likely to have a single recent common progenitor. This suggests that begomoviruses not associated with a DNA-β component (either monopartite or bipartite) may subsequently become associated with one or that begomoviruses may lose their DNA-β component, becoming monopartite or bipartite if associated with a DNA-B component. There is evidence to support all of these possibilities. The absence of both DNA-β and DNA-1 components in the New World is strong circumstantial evidence supporting the evolution of the DNA-β complex after the divergence of Old and New World begomoviruses [20]. The half unit-length DNA-β component associated with ToLCV in Australia is evidence for a begomovirus which has dispensed with the need for a symptom-modulating satellite [9], although it retains the ability to interact with an intact DNA-β component [1, 23]. Sri Lankan cassava mosaic virus may have been a satellite-requiring begomovirus which has exchanged the satellite for a DNA-B component, most likely captured from Indian cassava mosaic virus [28], but nevertheless is able to productively interact with DNA-β. The range of begomoviruses (seven species identified to date) associated with CLCuD on the Indian subcontinent [19], all associated with a single DNA-β component, indicates that the satellites can be highly promiscuous.

Proposed nomenclature for DNA-β satellites

Virus nomenclature is important for researchers since it provides labels with which to identify viruses in a concise and unambiguous manner. To be practical, however, any system of nomenclature requires uniformity based on a simple set of rules that everybody can apply in a similar manner. The explosion in the number of available DNA-β sequences during the last few years has highlighted the need for such a standardised nomenclature for begomovirus satellites. For geminiviruses, the ICTV has accepted that English vernacular names will be used to describe begomovirus species with, if necessary, a geographical descriptor. Thus, species names take the form “host-disease symptom-[origin]-virus”, such as Cotton leaf curl Multan virus, although a few historical names, such as African cassava mosaic virus, are maintained [11]. The use of such descriptive names for the DNA-β satellites is desirable, as is both the maintenance of the association with the virus and the disease symptoms, since most DNA-β satellites play a major role in symptom determination. It is thus a simple process to use the virus name to derive the satellite name. For example, the DNA-β component associated with Cotton leaf curl Multan virus would become Cotton leaf curl Multan beta (CLCuMB) (the ICTV requirement that only ASCII characters be used in virus names entails the use of the term “beta” rather than the Greek character “β”). Further details, including strain and isolate descriptors, are appended with geminivirus species names [13] which can similarly be applied to DNA-β. Thus, the DNA-β component associated with the first description of CLCuMV infectivity becomes Cotton leaf curl Multan beta-[Pakistan:Faisalabad1:1996], which is abbreviated to CLCuMB-[PK:Fai1:96]. However, it should be noted that, just because a virus belongs to a particular species, it does not necessarily follow that the associated DNA-β component will belong solely to that species or that members of that species will necessarily be associated only with the same DNA-β component. Thus, although the isolate information for a DNA-β component should mirror that of the helper virus, the strain information may not (different strains/isolate of a single virus species being able to interact with different strains/isolates of a DNA-β).

Clearly the diversity of the satellites and the complexity of their interactions means that the simple association between disease symptoms, host plant, virus name and corresponding DNA-β names will not necessarily be maintained in all cases. As with the naming of new begomovirus species, the naming of new satellites should follow the “grandfather principle”, with the name given at first description being adopted.

Proposed taxonomy for DNA-β satellites

Sequence data are now the major criterion for assigning geminiviruses to species, although biological characteristics are used in some cases to distinguish strains. The threshold cut-off value for distinguishing species from strains (89% for begomoviruses) was determined by Pairwise sequence comparisons [10]. This criterion for species demarcation is proving very robust, although some begomovirus species are beginning to overlap as more sequences become available, necessitating merging/synonymization of these taxa.

A study of 261 full-length DNA-β sequences available in the databases as of August 2007 (Table 1) was conducted. Nine recombinant DNA-β components, in which the SCR has been replaced by the origin of replication of the helper begomovirus, were not included in the analysis; plants infected with such recombinant components have been shown to accumulate reduced viral DNA levels, and the recombinants will probably not be maintained in nature, being at a selective disadvantage with respect to intact DNA-β [27]. An exhaustive pairwise comparison of the DNA-β sequences showed a multimodal distribution (Fig. 1) consisting of three major peaks between 25 and 78% identity made up of recombinant DNA-β components, a major peak between 33 and 61%, and a succession of smaller peaks above this.

Table 1 List of DNA-β components for which full-length sequences are available in the databases grouped according to species using the criteria detailed in the text
Fig. 1
figure 1

Distribution of pairwise nucleotide sequence identities of 202 DNA-β full-length sequences (listed in Table 1)

Of these DNA-β components, 44 are associated with begomoviruses for which full-length DNA sequences are available (Table 2). Separate pairwise nucleotide sequence comparison studies were conducted for each. The correlation between these two studies is highly significant (R = 0.72), indicating a direct evolutionary relationship (co-evolution) between a helper begomovirus and its associated DNA-β component (Fig. 2). Since trans-replication of these satellites is not confined to a single virus and they can infect diverse hosts, it might be anticipated that frequent exchanges between distinct helper begomoviruses should have resulted in their independent evolution from the latter, in which case such a strong correlation would not be expected. However, DNA-β satellites depend on the begomovirus for their replication, movement in plants and insect transmission between plants. In turn, the helper begomovirus depends on DNA-β for efficient infection of hosts, possibly by βC1-mediated post-transcriptional gene silencing suppression of a host defense response [8]. These findings suggest that the association between the begomovirus and its cognate DNA-β satellite is subject to subtle interactions that confer a selective advantage, and that component exchange is (or at least has been) relatively infrequent. Consequently, evolutionary pressures have acted on the disease complex as a whole rather than on each component of the complex independently. This means that, for taxonomic purposes, we can deal with the complex as whole and derive a “species” demarcation value for DNA-β from that determined for the helper begomoviruses. The demarcation threshold for distinguishing begomovirus species from isolates currently is set at 89% nucleotide sequence identity for full-length genomic (DNA-A component) sequences [12]. This corresponds to 77% nucleotide sequence identity for DNA-β components (Fig. 2). However, if the available 261 full-length DNA-β sequences are considered, the percentage identity having the minimum number of pairs is 78% (Fig. 1). There is a very high correlation between the distribution of full-length DNA-β component sequences and the nucleotide sequences of the βC1 gene (91%). However, the use of full-length sequences for species demarcation is more meaningful, since it corresponds to a biological entity and eliminates recombinant DNA-β components containing helper virus sequences. Consequently this proposal favours the 78% demarcation threshold provided by analysis of full-length sequences, in line with the recommendations used for geminivirus species demarcation, and avoids the complexity of variation in the coding capacity (length) of βC1, which could create confusion.

Table 2 List of cognate begomovirus-DNA-β pairs for which full-length sequences are available
Fig. 2
figure 2

Correlation between pairwise nucleotide sequence comparison identities of the complete genome sequences of 44 begomoviruses (listed in Table 2) and their cognate DNA-β satellites. The species demarcation threshold defined by the co-evolution study is indicated by the red arrows

Application of the DNA-β species demarcation criteria

Application of the 78% nucleotide sequence identity demarcation threshold to the 261 available sequences leads to their division into 51 distinct DNA-β satellite species. These are indicated in Table 1 using the proposed nomenclature for DNA-β components.

Conclusions

Due to the rapidly increasing number of recognised DNA-β satellites, there is an urgent need for a robust and workable system of nomenclature and classification of these components. Since these satellites are not independent entities, relying on a helper virus for their replication and spread, the biological data useful for their classification that can unequivocally be attributed to the satellite are limited at this time. It is thus not unreasonable to base the classification system entirely on their nucleotide sequence. It is possible that in future we will recognize distinct satellite strains, based on sequence differences or distinct biological properties, and will want to classify them below the family level into distinct genera.

Whenever possible, the proposed satellite naming system correlates with the original helper virus and host. Although it can accommodate an unlimited number of new satellite isolates, this system currently predicts a manageable number of satellite species. It provides guidelines for assigning future isolates and an accepted system of nomenclature covering species and isolates that will greatly benefit the research community investigating the diversity and function of these widespread and economically important subviral components.

The following criteria should be used as guidelines to establish taxonomic status of a DNA-β component.

  1. 1.

    Species status should only be conferred once a full-length sequence is available (this is in line with the established convention for geminiviruses). Comparison of component parts of the satellite (SCR, βC1 coding sequence and A-rich sequence) broadly yields the same result as analysis of the full-length sequence. Nevertheless, their propensity for recombination means that species status may only be bestowed when the entire sequence has been established.

  2. 2.

    Pairwise nucleotide sequence identity comparisons between 261 full-length DNA-β components indicate that a value of 78% nucleotide sequence identity is appropriate for distinguishing species from isolates. Application of this criterion to 261 full-length sequences currently available in the databases leads to 51 species.

The naming of satellite species follows the convention set down for geminiviruses and, wherever possible, mirrors that of the helper begomovirus. For example, the two satellites associated with cotton leaf curl disease are called Cotton leaf curl Multan beta and Cotton leaf curl Gezira beta, indicating their association with cotton leaf curl disease but also their association with helper viruses belonging to the species Cotton leaf curl Multan virus and Cotton leaf curl Gezira virus, respectively. Isolate names similarly follow the convention set down for geminiviruses. These indicate country of origin, isolate descriptor [preferably place (town/region) of origin] and year of isolation. For example, Cotton leaf curl Multan beta-[India:Dabwali1:1995] originates from India, in the vicinity of the city of Dabwali, and was collected from the field in 1995. The number 1 indicates that this is the first isolate from this region (if more than one isolate was collected). This name is abbreviated to CLCuMB-[IN:Dab1:95].