Background

Begomoviruses are an important group of whitefly (Bemisia tabaci) transmitted viruses in the family Geminiviridae. They inflict significant economic losses in many dicotyledonous crops including beans, cassava, cotton, melon, pepper, potato and tomato [17]. Tomato yellow leaf curl virus (TYLCV) and Tomato leaf curl virus (ToLCV) are the begomoviruses severely constraining tomato production in many tomato-growing regions of the world.

Begomovirus genomes are composed of either one (monopartite) or two (bipartite) single stranded DNA molecules ranging in size between 2500 and 2800 nucleotides [8]. Most TYLCV of the old world and almost all known new world begomoviruses viruses are bipartite with genomes comprising DNA A and DNA B molecules. Monopartite old world begomoviruses, which are now believed to be the predominant begomovirus form, have only a DNA-A like genome component. The virion-sense strand of DNA A encodes the viral coat protein (AV1, V1 or cp) and, in old-world begomoviruses [9], an AV2 or V2 gene that is necessary for virus accumulation and symptom development [10]. The complementary-sense strand of DNA-A encodes genes responsible for viral replication (AC1, C1 or rep), replication enhancer (AC3, C3 or ren), regulation of gene expression (AC2, C2 or trap) and AC4 or C4 involved in host range determination, symptom determination, symptom severity, and virus movement [1113]. The DNA B of bipartite begomoviruses encodes two proteins, BV1 (a nuclear shuttle protein or NS) and BC1 (a movement protein or MP) involved in intra- and inter-cellular movement within the plant [14].

Begomoviruses exhibit a great deal of geographic dependent but host-independent genomic variation [1517]. Recombination, especially interspecific homologous recombination, is a key contributor to the genomic diversification and evolution of begomoviruses [17]. To date, many natural begomoviruses recombinants have been reported [1720]. Although the biological significance of begomovirus recombination is not clearly understood, in many parts of the world epidemics associated with the emergence of recombinant begomoviruses have been reported. These include the devastating cassava mosaic disease epidemic caused by recombinant East African cassava mosaic viruses in Uganda and neighbouring countries [18, 21], the currently emerging pathogenic recombinant, tomato yellow leaf curl Malaga virus, in Spain [22] and the cotton leaf curl disease epidemic in Pakistan caused by a species complex including a variety of mostly recombinant begomovirus species [23]. Besides the apparent importance of recombination in begomovirus evolution the marks that it has left on currently sampled begomovirus genome sequences also have major implications when we attempt to use these sequences to infer the evolutionary histories of begomoviruses [24, 25]. Consequently, the detailed characterization of recombination amongst tomato-infecting begomoviruses is a prerequisite for understanding how these important pathogens are evolving.

Although a few specific recombination events have been described so far in tomato-infecting begomoviruses [2629], a full accounting of recombinants, recombination breakpoints and recombination hotspots in tomato begomovirus species and strains is lacking. For example, it is currently unknown whether there are differences in the number and quality of recombination events that are occurring amongst different tomato infecting begomovirus species. It is also currently unknown whether sequences in particular parts of the begomovirus genomes are more or less exchangeable between different species than sequences in other parts of these genomes. Such variations in recombination frequencies and patterns have been clearly observed in RNA viruses [30]. In this study we employ a variety of recombination analysis methods to characterize recombination in South and Southeast Asian tomato-infecting begomoviruses. We map recombination hotspots and provide evidence that not all tomato-infecting begomoviruses are equally prone to recombination and that specific characteristic of particular recombination events are reliant on both the relatedness of the recombining viruses and the genomic region involved in sequence exchanges

Results and discussion

In this study, we sought to characterise recombination in South and Southeast Asian viruses using a different approach to those used previously: (1) By studying a different set of viruses to those studied previously; (2) Making use of a combination of recombination analysis methods that are both powerful and have low false positive rates; (3) by mapping and estimating the frequency of recombination events in begomoviruses.

The neighbor-net analysis revealed clear evidence of phylogenetic conflicts within the analysed sequences (Fig. 1). Notably, every sequence represented within the tree was implicated as a potential recipient of horizontally acquired sequences at some time in its evolutionary past. Unsurprisingly, the PHI test strongly supported the presence of recombination in these sequences (p < 0.0001).

Figure 1
figure 1

Neighbor-Net generated for the tomato-infecting begomoviruses of South and Southeast Asia. Evidence for reticulate evolution is reported on pairwise Hamming distances using only parsimonious sites. Networked relationships among the viral species with boxes, instead of bifurcating evolutionary tree indicate to the presence of recombination.

Different methods used for recombination breakpoint analysis also provided strong evidence for presence of past recombination events in most of the sequences analysed. For each of the 32 potential recombinant sequences identified, possible breakpoint positions, sequence fragments and parental genotypes are listed in Table 1. Tomato leaf curl virus from the Philippines and ToLCBV, ToLCBV-[Ban4] and ToLCBV-[Ban5] from Bangalore, south India appeared to be the most complex recombinants carrying evidence of seven and six recombination events respectively. On the opposite end of the spectrum, Tomato leaf curl virus strains including ToLCNDV-Mld and ToLCNDV-[Luc] from New Delhi, ToLCNDV-Svr [Jes] from Bangladesh, and TYLCCNV-Tb [Y38] from China each carried evidence of only a single recombination event. In addition, viruses from geographically well separated regions appeared to have recombined at some time in the past. For example, tomato leaf curl virus strains from Bangalore and Gujarat in India contained sequences closely resembling those found in a ToLCTWV isolate from Taiwan. Also, Chinese viruses contained fragments of sequence closely resembling those found in sequences sampled in Thailand, Taiwan, Bangladesh and South India. Further, we used the TreeOrderScan method [31] to investigate the phylogenetic evidence for recombination in the sequence alignment. This analysis revealed major deviations in the branching order of sequences within trees constructed from different portions of the multiple sequence alignment (Fig. 2). Frequent tree order changes were observed at the region of rep and AC4. Importantly, most of the viruses detected as recombinants in the breakpoint analysis exhibited deviations in their branching order indicating that they were most likely correctly identified as recombinants. In addition, the TreeOrderScan analysis also provided evidence for gene flow amongst viruses in geographically separated regions. For instance, sequences found in southern Indian viruses grouped with those found in Thailand and Bangladesh virus positions from 2335–2652. Thai viruses contained sequences resembling those of Chinese viruses between 300–490 and 590–2372, but Indian viruses between 2472–2743. The recombination observed between geographically separated species/strains probably represents older events as they presumably occurred before their present separation [19]. Movement of vectors and/or infected plant materials may also have contributed to the gene flow observed between these widely separated locations [32]. Alternatively, it is possible that current sampling of Asian begomovirus diversity is so sparse that we do not yet fully appreciate the geographical range of many of the species studied here.

Table 1 Breakpoint analysis of tomato-infecting begomoviruses and their putative parental sequences.
Figure 2
figure 2

TreeOrder Scan of tomato-infecting begomoviruses sequences. Changes in tree order(Y axis) resulting from changes in phylogenetic relationships at 70% bootstrap level are shown for sequential 300 bases sequence fragments at 100 base fragment intervals (X axis). Sequences are assigned to groups based on geographical locations and groups are color coded as indicated by labels. The genome map drawn to scale has been superimposed to indicate the positions of genes in DNA A sequences. Positions were drawn relative to the ToLCGV-[Var] strain.

Interestingly, our breakpoint analysis indicated that three north Indian viruses (ToLCNDV-[PkT1/8], ToLCNDV-Svr and ToLCNDV-[PkT5/6]) were not detectably recombinant and three other north Indian viruses namely ToLCNDV-Mld, ToLCNDV-[Luc] and ToLCNDV-[Luf] were simple recombinants with only evidence of a single detectable recombination event involving a virus resembling ToLCPV sampled in the Philippines. While TreeOrderScan analysis also revealed an absence of recombination in two north Indian viruses, ToLCNDV-[PkT1/8] and ToLCNDV-[Luf] (indicated by a horizontal line across the graph in Fig. 2). In addition, there was no phylogenetic support for inter-group recombination event reported for ToLCNDV-[Luc]. Thus there appears to be no or few recombination events in viruses reported from North India, signifying that certain begomovirus species may not recombine as readily as others. There are a number of prerequisites for recombination between begomoviruses. These include shared host ranges (possibly influenced by the emergence of B whitefly biotype), the ability to co-infect the same cells [3335], high levels of viral replication [36], and overlapping geographical ranges. If all of these prerequisites are met for the tomato-infecting begomoviruses in South and Southeast Asia then one would expect there to be frequent and invariable recombination amongst all of these viruses. However, fitness disadvantages may be associated with some sequence exchanges that would lead to the selective elimination of many newly produced recombinants.

The recombination sites distributed non-randomly along the genome. The recombination breakpoints were detected in all the six reading frames of south Indian viruses and viruses from eastern and western India. The breakpoints in the Chinese and Thai viruses were located in AV1, AV2, AC1 and AC4, whereas ORFs AV1 and AV2 were identified to be cold spots in the Bangladeshi viruses.

The frequency and locations of recombination events measured as topological differences between trees constructed from different parts of the alignment were visualised as a half-diagonal compatibility matrix (Fig. 3). Each X and Y coordinate in the matrix is a gross estimate of the number of topological modifications needed to convert the tree constructed using sequences at position X into that constructed using sequences at position Y [31, 37]. It was apparent from this matrix that recombination events are probably not randomly distributed throughout begomovirus genomes. The highest frequency of recombination apparently occurs in the portion of the C1/AC1 ORF encoding the N-terminal portion of Rep. For example, the matrix indicates that there are an excess of 0.16 phylogeny violations per clade when trees constructed using sequences between alignment positions 351 and 1251 are compared with those constructed using sequences between alignment positions 2451 and 2951. This analysis also indicated the probable absence in certain regions of begomovirus genomes of recombination events that had any substantial phylogenetic effect. For example, all phylogenetic trees constructed using coat protein gene sequences were all in good agreement with one another indicating a relative absence of recombination breakpoints within the CP gene.

Figure 3
figure 3

Phylogenetic compatibility matrix of tomato-infecting begomovirus sequences, exhibiting frequencies of phylogeny violations for each pairwise comparison of sequence fragments. For this analysis sequence fragments of 300 bases and 100 base intervals were used. Phylogeny violations above the threshold bootstrap value of 70% are shown. Frequencies are color coded to indicate number of phylogeny violations per sequence. The genome map drawn to scale has been superimposed to indicate the positions of genes in DNA A sequences. Positions were drawn relative to the ToLCGV-[Var] strain.

We examined phylogeny violations and number of recombination events in our data set from the perspective of parental sequence relatedness. We noted that in general phylogeny violations clustered around the genetic distance 0.30. The observed frequency of phylogeny violations were inversely correlated (r = -0.36 p < 0.05) to the pairwise distances of the fragments involved in exchange (Fig. 4A). In addition, the number of recombination events was also inversely correlated (r = -0.35 p < 0.05) to the diversity between the exchanged fragments (Fig. 4B), we used only identified parental sequences to estimate the genetic distance between horizontally transferred fragments and the sequences that they replaced. Overall the frequency of phylogenetic violations and number of recombination events decreased with increasing parental sequence diversity. In a study with artificial and natural geminivirus recombinants Martin and co-workers [38] demonstrated that the degree of similarity between a horizontally inherited sequence and the sequence it replaces is an important determining factor of recombinant fitness. Rather than the non-random distribution of break points observed here being due to higher recombination rates in some genome regions than others [39], the distribution seems to have been created by natural selection only allowing the survival of recombinants with high fitness. In the more diverse genome regions where recombination events are not detected it is possible that these regions would not function properly when transferred into foreign genetic backgrounds.

Figure 4
figure 4

(A) Relationship between the number of phylogeny violations and fragment diversity. Jukes-Cantor distance was calculated for each pairwise comparison used in TreeOrder Scan analysis and corresponding violations were counted and plotted. (B) Relationship between the number of recombination events and fragment diversity. The fragments involved in the exchange with identified parental sequences were used and the number of recombination events detected were counted and plotted.

Conclusion

Finally, the variable recombination and diversity-dependent distribution of recombination hotspots in tomato-infecting begomoviruses is valuable new information that has emerged from this study. Perhaps this is the first report of variable recombination reported among tomato-infecting begomoviruses found in the same region. Further, recombinant forms, recombination hot spots and frequency of recombination documented in this study would provide new information for understanding the diversity and evolution of tomato-infecting begomoviruses in Asia. In addition to evolutionary considerations, understanding the implications of recombination observed in these viruses on efforts to develop resistant tomatoes through conventional breeding and genetic engineering are important and attempts should be focused on these issues for developing effective disease management strategies. Given that the N-terminal portion of rep is highly recombinogenic it is perhaps worrying that so many virus derived transgenic resistance strategies are focusing on this portion of the geminivirus genome [4043]. It may be wiser to develop virus derived resistance strategies using genome regions that are less recombinogenic as this will make it more difficult for viruses to overcome resistance by simply replacing targeted genome regions with variants that are not targeted.

Methods

Sequence data

The study sequences comprised 35 publically available (as on June 2006) complete Indian, Pakistani, Chinese, Bangladeshi, Sri Lankan, Malaysian, Thai, Philippine and Taiwanese tomato-infecting begomovirus DNA-A and DNA-A-like components (Table 2). These sequences were aligned using the CLUSTAL W [44] using gap open and extension penalties- of 10.

Table 2 List of species/strains of tomato-infecting begomoviruses used in the present study.

Phylogenetic network and pairwise homoplasy test

Phylogenetic evidence for recombination was detected with Splits-Tree version 4.3 [45] using the neighbor-Net method [46]. Neighbor-net depicts conflicting phylogenetic signals in the data that are caused by recombination as cycles within unrooted bifurcating trees. Although, we report evidence for reticulate evolution in such phylogenetic graphs obtained using parsimonious sites, pairwise Hamming distances and no gaps, we obtained similar results with other distance measures and settings.

We statistically verified the presence of recombination identified visually in phylogenetic graphs using the pairwise homoplasy test (PHI) implemented in Splits Tree 4.3. PHI has been shown to powerfully identify the presence/absence of recombination within a wide range of sequence samples with a low false positive rate [47].

Detection of recombination breakpoints

The recombination breakpoint analysis was carried out using Recombination detection program RDP [48], GENECONV [19] and MAXIMUM CHI SQUARE [49], selected following the conclusions of studies on evaluation of different methods of recombination detection [50, 51]. All these methods are implemented in RDP2 [52, 53]. Default RDP2 settings were used throughout (P-value cut-off = 0.05 and the standard Bonferroni correction was used), other than that sequences were considered as circular, consensus daughters were found and breakpoints were polished. We used principally the information inferred by more than one method, as evaluation of the performance of these recombination detection methods using simulated and empirical data indicated that one should not rely too heavily on the results of a single method (Posada, 2002). In RDP analysis, the length of the window was set to 10 variable sites, and the step size was set to one nucleotide. P values were estimated by randomizing the alignment 1,000 times. For GENECONV analysis, the g-scale parameter was set to 1 and the number of permutations was set to 10,000.

Phylogenetic congruence

To examine phylogenetic support for each identified recombination event in the breakpoint analysis, we used the retained sequence position version of the TreeOrder Scan method [31] implemented in Simmonics2005 (Version1.4) package. TreeOrder Scan records the position of each sequence in a series of phylogenetic trees produced by sets of overlapping fragments across the genomes. Deviations in the tree order of individual sequences and of group of sequences between fragments of defined length indicate conflicting phylogenetic relationships. Alternatively, individual non-recombinant sequences show constant tree order (position) across the genome. In the present analysis, we recorded the changes in the phylogenetic relationships of clades supported by 70 per cent bootstrap values for sequential 300 base sequence fragments at 100 nucleotide intervals.

Frequency and mapping of recombination

Estimation of the frequency and mapping of the locations of recombination events was achieved by phylogeny compatibility analysis using the TreeOrder Scan method. First, the TreeOrder Scan program produces optimally ordered neighbor-joining trees for fragments of definite length along an alignment. In the next step, a pairwise comparison is made between trees constructed from each sequence fragment along the alignment. Then a phylogenetic compatibility value is computed as the number of times the phylogeny of one tree has to be violated to match the tree order observed in other trees constructed along the length of an alignment. In our case we assigned sequences to predefined groups based on their geographical origin and a bootstrap value of 70 per cent was used as threshold for scoring phylogeny violations. All pairwise compatibility values were calculated using trees constructed for 300 nucleotide sequence fragments separated by 100 nucleotides across the length of the analysed alignment. These compatibility values were then plotted on a phylogenetic compatibility matrix.