Journal of Molecular Evolution

, Volume 71, Issue 4, pp 298–307

Phylogenetic Analysis Reveals Rapid Evolutionary Dynamics in the Plant RNA Virus Genus Tobamovirus


    • Department of Biology, Center for Infectious Disease DynamicsThe Pennsylvania State University
  • Cadhla Firth
    • Department of Biology, Center for Infectious Disease DynamicsThe Pennsylvania State University
    • Center for Infection and ImmunityColumbia University
  • Edward C. Holmes
    • Department of Biology, Center for Infectious Disease DynamicsThe Pennsylvania State University
    • Fogarty International CenterNational Institutes of Health

DOI: 10.1007/s00239-010-9385-4

Cite this article as:
Pagán, I., Firth, C. & Holmes, E.C. J Mol Evol (2010) 71: 298. doi:10.1007/s00239-010-9385-4


Early studies on the evolutionary dynamics of plant RNA viruses suggested that they may evolve more slowly than their animal counterparts, sometimes dramatically so. However, these estimates were often based on an assumption of virus–host codivergence over time-scales of many millions of years that is difficult to verify. An important example are viruses of the genus Tobamovirus, where the assumption of host–virus codivergence over 100 million years has led to rate estimates in the range of ~1 × 10−8 nucleotide substitutions per site, per year. Such a low evolutionary rate is in apparent contradiction with the ability of some tobamoviruses to quickly overcome inbred genetic resistance. To resolve how rapidly molecular evolution proceeds in the tobomaviruses, we estimated rates of nucleotide substitution, times to common ancestry, and the extent of congruence between virus and host phylogenies. Using Bayesian coalescent methods applied to time-stamped sequences, we estimated mean evolutionary rates at the nucleotide and amino acid levels of between 1 × 10−5 and 1.3 × 10−3 substitutions per site, per year, and hence similar to those seen in a broad range of animal and plant RNA viruses. Under these rates, a conservative estimate for the time of origin of the sampled tobamoviruses is within the last 100,000 years, and hence a far more recently than proposed assuming codivergence. This is supported by our cophylogeny analysis which revealed significantly discordant evolutionary histories between the tobamoviruses and the plant families they infect.


TobamovirusesEvolutionary dynamicsVirus–host codivergence


Along with fungi, RNA viruses are the most abundant plant pathogens (Hadidi et al.1998). Due to their detrimental effects on the host, plant RNA viruses are of major economic importance, causing nearly one-third of all crop losses globally (Agrios 2005). Many RNA viruses have the ability to spread quickly in new host species, a property determined in part by their rapid evolutionary dynamics, itself a consequence of replication with an error-prone RNA polymerase (Domingo and Holland 1997; Drake et al. 1998). However, some studies have suggested that the RNA viruses that infect plants may evolve more slowly than those that infect animals, an observation of importance for the design of effective disease control strategies. For instance, rapid evolution may have an important bearing on the durability of genetic resistance to infection in crops (García-Arenal and McDonald 2003).

The tobamoviruses (single-stranded positive-sense RNA, genus Tobamovirus) are an important example of supposedly slowly evolving plant RNA viruses. Analyses based on an assumption of codivergence between tobamoviruses and their plant host families have resulted in evolutionary rate estimates in the range of ~1 × 10−8 nucleotide substitutions per site, per year (subs/site/year; Gibbs 1980; Gibbs et al. 2008a), and so markedly lower than those of 1 × 10−3 to 1 × 10−4 subs/site/year more commonly reported in animal RNA viruses (Duffy et al. 2008). Low substitution rate estimates are supported by the observation of limited genetic diversity in some Tobamovirus species (Fraile et al. 1996, 1997). For example, low levels of haplotype divergence have been recorded in natural populations of Tobacco mild green mosaic virus (TMGMV) (Rodriguez-Cerezo and García-Arenal 1991). However, these low evolutionary rates are in apparent contradiction with the ability of some tobamoviruses to quickly overcome genetic resistance to infection bred in pepper crops, for which tobamoviruses are important pathogens (Antignus et al. 2008; Rast 1988). In addition, more recent estimates of nucleotide substitution rates in a variety of plant RNA viruses using Bayesian coalescent methods applied to time-stamped sequence data have yielded far higher values, comparable to those seen in a broad range of animal viruses (Gibbs et al. 2010; Pagán and Holmes 2010; Roossinck and Ali 2007; Simmons et al. 2008).

There are a variety of explanations for this striking variation in estimates of evolutionary rate in plant RNA viruses. For example, it is possible that the assumption of host–virus codivergence does not reflect the true evolutionary history of the tobamoviruses, such that these species have acquired their current host distributions more recently. Alternatively, the high substitution rates inferred using Bayesian methodologies could be artificially inflated due to either the effect of erroneous prior values or the presence of transient deleterious mutations (i.e., polymorphisms) in sequences sampled over very short time periods (Duffy et al. 2008).

The genus Tobamovirus contains 22 species with genomes of approximately 6,400 nt, arranged in four open reading frames (ORFs). The two 5′ ORFs form the RNA-dependent RNA polymerase. The shorter ORF encodes the 126 kDa protein, while the longer ORF, formed by suppression of the 126 kDa protein gene stop codon, encodes the 183 kDa protein. The two 3′ proximal ORFs encode the movement protein (MP, 30 kDa) and the coat protein (CP, 17 kDa) (Fauquet et al. 2005). Tobamoviruses are divided into three subgroups according to their genomic organization and host range within the core Eudicot of plant species (Fukuda et al. 1981; Lartey et al. 1996). Members of subgroup 1, which infect Solanaceae species, have non-overlapping MP and CP genes. Species in subgroup 2 have been isolated from cucurbits and legumes, and have an MP that overlaps the CP by 23 nt. Finally, subgroup 3 species infect brassica and asterid plants, and their MP and CP genes overlap by 77 nt. Phylogenetic analyses of the genus Tobamovirus also supports this division into subgroups (Gibbs et al. 2008a; Lartey et al. 1996).

The phylogeny of the tobamoviruses has been reported to generally mirror that of their plant hosts, which has led to the hypothesis that tobamoviruses diverged at approximately the same time as the Eudicot plants, around 100 million years ago (Gibbs 1986, 1999; Lartey et al. 1996). Several additional pieces of evidence have been cited in support of this hypothesis. In particular, Nicotiana species (Solanaceae) that develop hypersensitive responses against Tobacco mosaic virus (TMV) infection are thought to have originated in the Americas (Holmes 1951) at least 65 million years ago (D’Arcy 1991). These species fall at the basal nodes of Solanaceae phylogenies (Olmstead and Palmer 1991), suggesting that hypersensitivity is the ancestral condition in this plant family. Since hypersensitivity is a host defense response based on a gene-for-gene interaction (Flor 1955), and therefore very specific; tobamoviruses and their hosts might have coexisted for an extended time period. Conversely, it has been argued that the topological concordance between the tobamovirus and host phylogenies might reflect more recent virus adaptation to specific plant species rather than long-term codivergence, and that hypersensitivity might be acquired over far shorter evolutionary time-scales by hybridization among plant species (Gibbs et al. 2008a).

In sum, there is still considerable uncertainly regarding the pace of evolutionary change and age of viruses from the genus Tobamovirus. To infer these rates and dates as rigorously as possible, we employed Bayesian and other phylogeny-based methods to all available tobamovirus sequences. In addition, we used cophylogenetic reconciliation methods to examine the extent of codivergence between the tobamoviruses and their host species.

Materials and Methods

Sequence Data

Sequences of the 21 classified and 10 unclassified species of the genus Tobamovirus were compiled from GenBank. Those from recombinant isolates—detected within each data set using the RDP, GENECONV, and Bootscan methods implemented in the RDP3 package (Martin et al. 2005)—were excluded, as were those from extensively passaged isolates in non-natural hosts. The year of isolation for each sequence was obtained either from GenBank, from the associated publications, or was kindly provided by the relevant authors. Accession numbers, origins, and years of isolation of the sequences used are listed as supplementary material (Supplementary Tables 1–7).

Long-term evolutionary patterns in the tobamoviruses were determined using the CP gene, as this was the largest available data set. Only those virus species represented by more than 15 CP sequences were retained for analysis, resulting in a data set that included representatives of seven classified taxa. Despite the relatively small number of tobamoviruses analysed, the three subgroups in the genus were represented: Odontoglossum ringspot virus (ORSV), Pepper mild mottle virus (PMMoV), TMGMV, TMV, and Tomato mosaic virus (ToMV) belonging to subgroup 1; Cucumber green mottle mosaic virus (CGMMV) belonging to subgroup 2; and Ribgrass mosaic virus (RMV) belonging to subgroup 3. Sufficient sequence data were not available for any unclassified species. The final data set was composed of 279 partial (missing up to 70 nt) and full-length CP sequences isolated from 1950 to 2008.

Phylogenetic analyses were also carried out on each of the seven Tobamovirus species individually. For each species, protein-coding alignments were derived for the following genes: 126k (ORF1), 183k (ORF2), MP (ORF3) and CP (ORF4). Analysis of full-length genome sequences could not be performed due to an insufficient number of sequences (i.e., <15 sequences) for most of the species. For each data set, sequence alignments were obtained using MUSCLE 3.7 (Edgar 2004), and adjusted manually according to the inferred amino acid translation using Se–Al (Rambaut 1996).

Estimation of Substitution Rates and Age of Genetic Diversity

For each data set, rates of nucleotide substitution per site per year and the time to the most recent common ancestor (TMRCA) were estimated using the Bayesian Markov Chain Monte Carlo (MCMC) method available in the BEAST package v1.5.3 (Drummond and Rambaut 2007). The best-fit model of nucleotide substitution in each case was determined using Modeltest 3.7 (Posada and Crandall 1998), and all data sets were subsequently analyzed using the general time-reversible substitution model with invariant sites and a gamma distribution of among-site rate variation (GTR + I + Γ4), and partitioned by codon positions. In addition, we utilized a relaxed (uncorrelated lognormal) molecular clock (values for the coefficient of variation were always >0, indicative of non-clock-like evolution; Drummond et al., 2006), and the conservative Bayesian skyline model as a coalescent prior. To test the robustness of these estimates we repeated the analysis using a strict molecular clock and the simpler Hasegawa, Kishino and Yano (HKY85) model of nucleotide substitution. Estimation of rates of evolutionary change at amino acid sites was performed using the Whelan and Goldman (WAG) substitution model (Whelan and Goldman 2001), utilizing the molecular clock and demographic parameters described above. In all cases, the BEAST analyses were run until all relevant parameters converged, with 10% of the MCMC chains discarded as burn-in. Statistical confidence in the parameter estimates was represented by values for the 95% highest probability density (HPD) intervals around the marginal posterior parameter means. Substitution rates were considered to be significantly different if the mean value of the estimate from one data set fell outside of the 95% HPD values of another (indicating that these rates were drawn from different distributions). Maximum clade credibility (MCC) trees, with Bayesian posterior probability values providing a measure of statistical support at each node, were also inferred using BEAST. Finally, the extent of site saturation at the third codon position of each data set was determined using the index of substitution saturation (Iss; Xia et al. 2003), and implemented in the DAMBE package (

Robustness of Temporal Signal

We used two methods to assess the strength of temporal signal in these sequence data, essential to the accurate estimation of substitution rates. First, maximum likelihood (ML) phylogenies were estimated using PAUP* 4.0 (Swofford 2003) for each alignment using tree-bisection-reconnection branch-swapping and the best-fit nucleotide substitution model as determined by Modeltest. Clock-like behavior of the data was then assessed by regressing root-to-tip distance in the ML tree against the date of sampling of each sequence using the Path-O-Gen program (Rambaut 2009). This analysis effectively provides a measure of the amount of variation in genetic distances explained by sampling time. In addition, the point at which the regression line crosses the x-axis serves as an estimator of the TMRCA. Second, the BEAST analyses described above were repeated on data sets in which sampling times were randomized among sequences to remove any temporal structure. For computational tractability, this randomization was undertaken using the gene with the smallest number of sequences for each viral species. BEAST analyses using the randomized data were repeated 10 times. The mean and 95% HPDs of the estimates of substitution rate for the randomized data were then compared with those obtained from the real data; substantial differences in these estimates indicate the presence of temporal structure (Firth et al. 2010).

Analysis of Host–Tobamovirus Codivergence

To analyze the topological congruence between the tobamoviruses and their host plant phylogenies, we considered the seven Tobamovirus species used for estimating evolutionary rates plus six other virus species, each of which infect a different host plant family. This allowed us to compare 13 virus species infecting 10 host families. ML phylogenetic trees for the plant hosts were inferred as above using the gene encoding the large subunit of the ribulose-1,5-bisphosphate carboxylase oxygenase (rbcL, Savolainen et al. 2000), and for the viruses using the CP. Phylogenetic reconciliation analysis was performed using TreeMap (version 2.0ß) ( This method assumes a fixed host phylogeny and creates multiple potentially optimal (POpt) solutions by mapping the Tobamovirus phylogeny into that of the host (Jackson and Charleston 2004). We considered all POpt solutions generated by TreeMap as hypotheses of host–virus codivergence. The significance of the congruence between the phylogenies of the tobamoviruses and their hosts was established for each POpt solution. For this, the Tobamovirus phylogeny was randomized 1,000 times, mapped onto the host phylogeny, and the deviation from randomness of the observed congruence level was determined.


Nucleotide Substitution Rates in the Genus Tobamovirus

Before estimating rates of nucleotide substitution for each gene within the seven Tobamovirus species considered here, we first screened for recombination. Intra-genic recombination break-points were only found in the 183k gene. One sequence of CGMMV, TMGMV and TMV, and seven of 11 sequences of ORSV were identified as recombinants. These sequences were not considered further with the exception of ORSV, as removal of these sequences would have prevented the estimation of evolutionary rates in this virus. Reliable rate estimates could not be obtained for the MP of TMGMV, and for all genes apart from the CP of RMV and ToMV due to an insufficient number of sequences. In addition, rate estimates for the 183k gene of CGMMV, the MP of TMV, and the CP of RMV and ToMV exhibited very wide 95% HPD intervals, with lower values differing by more than two orders of magnitude from the corresponding mean estimate, strongly suggesting that they are unreliable.

Mean evolutionary rates across genes in all species analyzed ranged from 8 × 10−5 to 1.3 × 10−3 subs/site/year (range of 95% HPD values = 7.7 × 10−6–3.2 × 10−3 subs/site/year), while substitution rates for the CP gene varied from 1.6 × 10−4 to 8 × 10−4 subs/site/year (HPD = 1.6 × 10−5–3.2 × 10−3 subs/site/year) (Table 1). Examining substitution rates across all genes revealed no significant difference between species for the MP and the CP genes. However, significant differences in substitution rate were found in the 126k protein and 183k protein genes; CGMMV exhibited significantly lower, and ORSV significantly higher, substitution rates than the other species (Table 1). The high rates estimated for ORSV must be treated with caution because of the frequent recombination in this data set. No significant differences in evolutionary rates were observed when the data were reanalyzed using the HKY85 substitution model and a strict molecular clock (results available from the authors on request). Finally, mean evolutionary rates at the amino acid level were in most cases significantly lower than those estimated at the nucleotide level (between 1.0 × 10−5 and 1.0 × 10−4 subs/site/year, depending on the gene-virus species combination), indicating that the majority of substitutions are synonymous (Supplementary Table 8). In addition, reliable estimates of substitution rate could be now obtained for the 183k gene of CGMMV and the MP of TMV.
Table 1

Estimates of nucleotide substitution rate and the age of diversity (TMRCA) for Tobamovirus species


Date range




RdRp (126 kDa)

RdRp (183 kDa)






Sub ratec

8.0 × 10−5

5.4 × 10−5f

6.7 × 10−4

8.0 × 10−4


(1.3 × 10−5–1.4 × 10−4)

(5.8 × 10−7–1.1 × 10−4)

(7.7 × 10−6–1.1 × 10−3)

(2.9 × 10−5–1.4 × 10−3)














Sub rate

8.8 × 10−4

7.7 × 10−4

1.3 × 10−3

7.8 × 10−4


(4.8 × 10−4–1.2 × 10−3)

(4.1 × 10−4–1.1 × 10−3)

(4.2 × 10−4–2.2 × 10−3)

(1.5 × 10−4–9.2 × 10−4)














Sub rate

2.6 × 10−4

2.6 × 10−4

8.9 × 10−4

1.8 × 10−4


(5.8 × 10−5–4.8 × 10−4)

(6.8 × 10−5–4.9 × 10−4)

(1.1 × 10−5–1.5 × 10−3)

(1.6 × 10−5–3.9 × 10−4)














Sub rate




1.4 × 10−4f





(1.3 × 10−8–4.0 × 10−4)














Sub rate

1.4 × 10−4

1.3 × 10−4


1.7 × 10−4


(7.6 × 10−5–2.0 × 10−4)

(7.2 × 10−5–2.0 × 10−4)


(1.7 × 10−5–3.3 × 10−4)














Sub rate

7.9 × 10−4

2.9 × 10−4

2.9 × 10−4f

1.6 × 10−4


(3.9 × 10−4–1.3 × 10−3)

(1.7 × 10−4–4.1 × 10−4)

(7.8 × 10−7–9.7 × 10−4)

(5.2 × 10−5–3.2 × 10−3)














Sub rate




2.1 × 10−4f





(9.1 × 10−8–4.9 × 10−4)











RdRp RNA-dependent RNA polymerase, MP movement protein, CP coat protein, ND not determined estimates due to low number of sequences or narrow date range

aVirus species

bNumber of sequences for each gene [RdRp(126k)-RdRp(183k)-MP-CP]

cMean nucleotide substitution rate (subs/site/year)

d95% Highest Probability Density (HPD) values

eTime to the Most Common Ancestor (TMRCA; years ago)

fUnreliable estimates

Mean nucleotide substitution rates for the tobamoviruses at the inter-specific level were approximately 4.3 × 10−4 subs/site/year (95% HPD value = 2.2–6.4 × 10−4 subs/site/year), and hence not statistically different from those seen in the majority of individual virus species (Table 2). Similar analyses run using a strict molecular clock resulted in comparable estimates. Mean evolutionary rates in amino acid alignments were significantly lower (~9.5 × 10−5 subs/site/year, 95% HPD value = 3.8 × 10−6–1.7 × 10−4), again suggesting that most changes occur at the third codon position. Indeed, the Iss for the third codon position in our CP data set was significantly higher (P < 3.8 × 10−3) than the estimated critical Iss value, which determines the threshold for site saturation. Iss values for the sampled data that are similar or higher than the critical Iss value, as here, are indicative of severe site saturation. To reduce the effect of site saturation, we repeated our analyses of the tobamoviruses as a whole using only the first and second codon positions of the CP. As predicted, these estimates were not significantly different from those obtained using amino acids (mean = 6.1 × 10−5 subs/site/year; 95% HPD value = 1.1 × 10−6–1.2 × 10−4; Table 2).
Table 2

Estimates of nucleotide and amino acid substitution rates, and age of genetic diversity (TMRCA), for the genus Tobamovirus using the coat protein (CP) data set


Molecular clock


Relaxed (lognormal)

Amino acid


Amino acid


All positions

Sub ratea

7.2 × 10−5

4.3 × 10−4

9.5 × 10−5

4.2 × 10−4


(8.5 × 10−6–1.3 × 10−4)

(2.6 × 10−4–6.1 × 10−4)

(3.8 × 10−6–1.7 × 10−4)

(2.2 × 10−4–6.3 × 10−4)











1 and 2 codon position

Sub ratea

4.2 × 10−5

6.1 × 10−5


(1.3 × 10−6–7.9 × 10−5)

(1.1 × 10−6–1.2 × 10−4)







aMean nucleotide or amino acid substitution rate (subs/site/year)

b95% Highest probability density (HPD) values

cTime to the most common ancestor (TMRCA; years ago)

Strength of Temporal Signal

Correlation coefficients (r) of root-to-tip distance against sampling date ranged from 0.62 to 0.92, and clearly show that individual species of Tobamovirus are characterized by rapid evolutionary dynamics over the time-scale of sampling (Fig. 1). Exceptions were the CP genes of RMV and ToMV, which showed very weak root-to-tip correlations (r < 0.18), supporting the absence of temporal structure suggested by our Bayesian substitution rate estimates. Moreover, for the data sets with significant root-to-tip correlations, the inferred TMRCAs were always within the 95% HPD intervals estimated under the Bayesian analysis.
Fig. 1

Regression of root-to-tip distance (inferred from ML trees) against year of isolation for the gene with the smallest number of sequences in each Tobamovirus species (genes with the smallest number of sequences also gave the lowest correlation coefficients). The name of the species, the gene used, and the correlation coefficient is shown in each panel. The correlation coefficient for RMV was lower than that of ToMV and is not shown. Note the different scale in each panel

With the exception of the CPs of RMV and ToMV, which clearly lack temporal signal, all estimates of the mean substitution rate differed by at least one order of magnitude between the real and randomized data sets. In addition, the 95% HPD values for all the randomized controls excluded the mean substitution rates estimated for the real data, indicating that they are significantly different (Supplementary Fig. 1). Finally, the lower 95% HPD values in the randomized data sets differed by at least three orders of magnitude from the corresponding mean estimates, and all were equal or lower than 1 × 10−7, strongly indicative of a lack of temporal structure. Taken together, these results indicate that the sequence data analyzed here contains sufficient temporal structure for reliable rate estimation, at least in the short-term, with the already noted exceptions of the CPs of RMV and ToMV.

Phylogenetic Relationships and Age of the Genus Tobamovirus

The MCC tree for the CP revealed that the Tobamovirus species clustered according to the subgroups defined previously (Lartey et al.1996) (Fig. 2a). CGMMV and RMV, which belong to subgroups 1 and 3, respectively, were clearly the most divergent species in the genus. The other five species considered in this study clustered together within subgroup 2.
Fig. 2

MCC phylogeny of the genus Tobamovirus based on the CP data set using (a) nucleotide and (b) amino acid sequences, and a relaxed molecular clock. Branch tip times reflect the dates of viral sampling. The tree is rooted through the use of a relaxed molecular clock, and the total depth of the tree is the TMRCA of the genus. For each node with Bayesian posterior probability values of >0.90, the corresponding 95% HPD intervals for its age (years ago) are indicated. Note the different scale for each panel

The TMRCAs estimated for each gene–virus species combination indicated that the sampled genetic diversity likely arose along the last millennium (Table 1). Major differences in TMRCA estimates among genes were only observed for CGMMV and ORSV, which may again reflect the occurrence of recombination in the latter. Specifically, the RdRp of CGMMV diverged significantly earlier than the CP of this virus (95% HPD values of 909-99 and 242-37 years ago for the RdRp and the CP, respectively), and also earlier than the RdRp of any other tobamovirus studied here (range of 95% HPD values of 322-21 years ago). In contrast, the RdRp of ORSV diverged significantly more recently than those from other Tobamovirus species (95% HPDs of 67-40 and 909-21 years ago, respectively). Although significantly older TMRCAs were found in equivalent analyses using amino acid sequences (mean TMRCAs between 60 and 1,500 years older, depending on the gene-virus combination), the species-specific differences observed at the nucleotide level remained (Supplementary Table 8).

Our mean estimates for the TMRCA of the sampled sequences from the genus Tobamovirus as a whole using the CP gene ranged between 3000 and 5000 years ago, depending on the clock model assumed, with the lower 95% HPD values, representing the oldest credible age, of 7,114 to 9,666 years ago, and upper 95% HPD values of 1,519 to 2,614 years ago (Table 2; Fig. 2a). Similar analysis using the amino acid sequences yielded significantly older TMRCAs—means between 12,782 and 15,618 years ago, depending on the clock model—and revealed that the origin of the tobamoviruses sampled here could be as old as 77,753 years ago (lower 95% HPD; Table 2; Fig. 2b). These differences in the TMRCAs at the nucleotide and the amino acid levels clearly reflect the site saturation at third codon positions. Indeed, estimates of divergence time using only the first and second codon positions ranged from 12,162 to 25,347 years ago (lower 95% HPDs of 108,742 and 70,796, respectively), and hence comparable to those seen at the amino acid level.

Codivergence Between Tobamoviruses and their Hosts

To test the extent of congruence between the host and the virus phylogenies, we inferred trees considering (i) at least one virus species per host family, and (ii) only the plant families infected by the tobamoviruses studied here. In the former case, ten POpt solutions were found, but the level of congruence in all cases did not differ significantly from that obtained by randomizing the virus phylogeny (P > 0.24) (Fig. 3). These results indicate a lack of congruence between the virus and the host phylogenies, and hence the absence of detectable codivergence. Equivalent analyses using only the seven virus species for which evolutionary rates were estimated found only 2 POpt, none of which showed significant congruence (P > 0.12).
Fig. 3

Composite phylogenies of the tobamoviruses and their host plant families. Grey lines denote host–virus associations. The Orchidadeae, the only non-Eudicot plant family infected by tobamoviruses, was used to root the host tree. SHMV, Sunn-hemp mosaic virus; CGMMV, Cucumber green mottle mosaic virus; HLSV, Hibiscus latent Singapore virus; FrMV, Frangipani mosaic virus; TMGMV, Tobacco mild green mosaic virus; PMMoV, Pepper mild mottle virus; ORSV, Odontoglossum ringspot virus; ReMV, Rehmannia mosaic virus; ToMV, Tomato mosaic virus; TMV, Tobacco mosaic virus; SFBV, Streptocarpus flower-break virus; RMV, Ribgrass mosaic virus; TVCV, Turnip vein-clearing virus


Our comparative analysis revealed that members of the genus Tobamovirus generally experience nucleotide substitution rates of between 0.1 and 10 × 10−4 subs/site/year, and hence within the same range as most animal (Duffy et al. 2008) and plant (Gibbs et al. 2010) RNA viruses. As expected, evolutionary rates at the amino acid level were in most cases lower than those at nucleotide sites (~5 × 10−5 subs/site/year). Hence, most mutational changes are synonymous with considerable site saturation at deep phylogenetic distances. Even accounting for undetected multiple substitutions at single nucleotide or amino acid sites, it is difficult to reconcile our rate estimates with those three magnitudes lower that are required under the assumption of virus–host codivergence over a period of ~100 million years (~10−8 subs/site/year). One possibility is that the high rates inferred here in some part reflect the evolution of host resistance and its associated selection pressure on the virus. Most of the tobamoviruses infecting Solanaceae species included in our data set infect pepper crops, and in the last 25 years, the use of genetic resistance to control tobamovirus infection in peppers has become a common agricultural practice. However, a more likely explanation for the higher substitution rates estimated here is that tobamoviruses mutate and reproduce as rapidly as most other viruses that replicate using RNA polymerase. Although a relatively low mutation rate has been observed in TMV (Malpica et al. 2002), this rate is still within the range reported for animal RNA viruses (Sanjuán et al. 2010).

It has also been suggested that severe population bottlenecks during both within-plant colonization and between-plant transmission may in part explain the low evolutionary rates reported for some plant viruses (Li and Roossinck 2004). Indeed, severe population bottlenecks have been inferred during the long-distance within-plant movement of TMV (Sacristan et al. 2003). However, strong population bottlenecks have been estimated for other plant viruses which exhibit substitution rates comparable to those of animal viruses (French and Stenger 2003), and changes in population size will not affect the rate of nucleotide substitution at neutral sites. Hence, the relationship between population bottlenecks and overall rates of nucleotide substitution is complex.

Finally, our study reveals that the time-scale of tobamovirus evolution is more recent than previously thought. Although our mean TMRCA estimates based on nucleotide sequences—up to approximately 10,000 years ago—are comparable to those reported for other plant virus families (Fargette et al. 2008; Gibbs et al. 2008b; Pagán and Holmes 2010), it is likely that these have been adversely affected by site saturation and are therefore underestimates. In this case, TMRCA estimates based on amino acids and/or first and second codon positions should better depict the long-term evolutionary history of these viruses due to a lower incidence of multiple substitutions. Using these data, the tobamoviruses sampled here were found to share a common ancestor up to approximately 100,000 years ago, and we suggest that this may be a more realistic time-scale for the evolutionary history of this important group of plant viruses. Indeed, it is striking that our cophylogeny analysis, which has been used to support host–virus codivergence in other plant viruses (Wu et al. 2008), provided no evidence for such a process in the tobamoviruses. We therefore suggest that the evolutionary history of the tobamoviruses is one characterized by frequent cross-species transmissions over a relatively recent time-frame.


This work was supported by Marie Curie Fellowship PIOF-GA-2009-236470 to IP, NSERC Canada to CF, and NIH grant R01-GM080533 to ECH. We thank Dr. Andrew Kitchen for valuable comments, and all the authors who kindly provided collection information for their sequence data.

Supplementary material

239_2010_9385_MOESM1_ESM.pdf (361 kb)
Supplementary material 1 (PDF 360 kb)

Copyright information

© Springer Science+Business Media, LLC 2010