Analysis of synonymous codon usage in foot-and-mouth disease virus
- First Online:
- Cite this article as:
- Zhou, J., Zhang, J., Chen, H. et al. Vet Res Commun (2010) 34: 393. doi:10.1007/s11259-010-9359-4
- 169 Views
In this study, we calculate the relative synonymous codon usage (RSCU) values and codon usage bias (CUB) values to carry out a comparative analysis of codon usage pattern for open reading frames (ORFs) among 85 samples which belong to all seven serotypes of foot-and-mouth disease virus (FMDV). Although the degree of CUB for ORFs is a relatively slight, there is a significant variation for CUB among different serotypes, which is mainly determined by codon usage pattern depending on RSCU. By comparison with RSCU values for all samples, although RSCU values fail to show the relationship of specific-lineage serotype, there are two main genetic populations existing in FMDV, namely (i) serotypes Asia 1, A, C &O; (ii) serotypes SAT 1, 2 & 3. This interesting characteristic may be formed by the mechanism of RNA virus recombination. The analysis of quantitative & qualitative evaluation based on CUB indicates interesting characteristic of codon usage, which suggests that more FMDV genome diversity may exist in specific-lineage serotypes rather than exist randomly. Furthermore, the relationship between amino acids and codon usage pattern indicates that mutation pressure rather than translational selection in nature is the important determinant of the codon usage bias observed. Our work might give some sight into some characteristics of FMDV ORF and some evolutionary information of this virus.
KeywordsFoot-and-mouth disease virusRelative synonymous codon usageCodon usage biasCodon usage patternGenetic population
It is well known that the genetic code chooses 64 codons to correspond to the 20 standard amino acids & stop signals. The 64 codons can be divided into 20 disjoint groups, each group corresponding to each of the standard amino acids, with the 21st group for stop signals. Each group in the general genetic code has between 1 and 6 codons, hence, alternation codons are often identified as synonymous. Although alterations among synonymous codons often occur at third codon letters, the cases can be interchanged without affecting the primary sequence of the protein product, synonymous codons are not chosen equally both within and between genomes (Grantham et al. 1980; Gu et al. 2004; Lloyd and Sharp 1992; Martin et al. 1989; Xie et al. 1998; Zhou et al. 2006). In general, translation selection in nature and compositional constraints are thought to be the two major factors accounting for variation of codon usage pattern among genomes in various organisms (Karlin and Mrázek 1996; Lesnik et al. 2000; Sharp et al. 1986). Comparison with pattern of synonymous codon usage in different mammals, compositional constraints of the genomes play a weak role in some unicellular organisms (such as Escherichia coli & Sacchromyces cerevisiae) and high expression genes have a obvious selective discrepancy for codon usage with a high concentration of the particular acceptor tRNA molecular, in contrast, low expressed genes show a obviously similar codon usage pattern (Gouy and Gautier 1982; Grantham et al. 1981; Ghosh et al. 2000; Lesnik et al. 2000; Majumdar et al. 1999; Ohno et al. 2001).
As for some RNA viruses, compared with translational selection in nature, mutation pressure plays a key role in the synonymous codon usage pattern (Gu et al. 2004; Jenkins and Holmes 2003). It is well known that foot-and-mouth disease (FMD) is considered to be the most contagious of all the diseases infecting domestic animals such as cattle, pigs, sheep and goats. Foot-and-mouth disease virus (FMDV) is currently classified as a member of the Aphthovirus genus in the family Picornaviridae. The FMDV genome has a single long open reading frame (ORF) and encodes all of its proteins in the form of a polyprotein. The virus exists as seven different serotypes, namely A, O, C, Asia l, and South African Territories 1(SAT 1), SAT 2 & SAT 3. There are a various subtypes existing within each serotype (Knowles and Samuel 2003). There were many researches of FMDV based on molecular biology, including genomic analysis, genome structure, virion structure and polyprotein processing by proteases (Carrillo et al. 2005, 2007; Klein 2009; Lewis-Rogers et al. 2008; Mason et al. 2003; Newman and Brown 1997). However, little information about FMDV genome with codon usage pattern including the relative synonymous codon usage (RSCU) and codon usage bias (CUB) are available on evolution or mutation of this virus. Codon usage pattern of FMDV might give some clues to some evolutionary information about this virus. In this work, we calculated the codon usage pattern data and the evolutionary determinants for codon usage pattern of FMDV were analyzed.
Materials and methods
Asia 1: AY304994, DQ989322, DQ533483, AY687334, AY687333, FJ906802, NC_004915, EF614458, DQ989323, DQ989322, DQ989321, AY593800, AY593799, AY593798, AY593797, AY593796, AY593795
A: EF494488, EF494487, EF494486, AY593751, AY593769, AY593789, AY593767, AY593770, AY593782, AY593783, AY593784, AY593785, AY593786, AY593790, AY593802, AY593787, AY593788, AY593803, AY593753, AY593756, AY593757, AY593768, AY593794, AY593771, AY593758
C: NC_002554, FJ824812, AM409325, DQ409191, DQ409190, DQ409188, AY593809, AY593810, AY593805, AY593806, AY593807, AY593808
O: EF552697, EF552699, EF552695, EF552694, EF552693, EF552692, EU400597, EU140994, AF026168, NC_004004, AJ539139, AY593819, AY593835, AF189157, AY593836, AF511039, EF175732, DQ248888, AJ320488, AJ633821, AH012984, AB079061, AF189157, AF511039, FJ542372, AH012985
- SAT 1:
AY593838, AY593841, AY593842, AY593843
- SAT 2:
AF540910, AY593847, AY593848
- SAT 3:
AY593850, AY593851, AY593852, AY593853
Relative synonymous codon usage calculations
Where gij is the observed number of the the jth codon for the ith amino acid (which has ni synonymous codons). Thus, codons with RSCU values close to 1.0 have relatively little codon usage bias. When RSCU value = 1.0, it means this codon is chosen equally and randomly, namely translational selection in nature.
Codon usage bias (CUB) calculations
To calculate CUB numerically, we assumed that statistically equal & random choose of all available synonymous codons was the “neutral point” (RSCU0 = 1.00) for the development of serotypes-specific codon usages, and the CUB was the sum of the deviations from such random and equal usage.
More simply, CUB is the absolute value of fractional frequencies minus the number expected if usage of synonymous codons is uniform. CUBmin = 0, namely, all RSCU values are RSCU0, and the calculated maximal possible CUBmax is 84, which means that only one codon chosen for each amino acid, while the rest for this amino acid is not used at all.
Principal component analysis
Principal component analysis, which was a commonly used multivariate statistical method (Jolliffe 2002; Mardia et al. 1979), was carried out to analyze the major trend in codon usage pattern among samples. Each sample was represented as a 59 dimensional vector, and each dimension corresponded to the RSCU value of each sense codon, which only included several synonymous codons for a particular amino acid, excluding AUG, UGG & three stop codons.
We can set up a two-dimensional coordinate system, which was made up of the first principal component (abscissa) and the second principal component (ordinate). This two-dimensional coordinate would report the genetic relationship among all analyzed samples.
Correlation analysis was used to identify the relationship between codon usage bias and synonymous codon usage pattern (Ewens and Grant 2001). This analysis was implemented based on the Spearman’s rank correlation analysis way.
All statistical processes were carried out by with statistical software SPSS11.5 for windows.
Synonymous codon usage and codon usage bias in FMDV
Synonymous codon usage and codon usage bias in FMDV
The CUB values among the samples ranged 26.154 to 21.413, with a mean value of 23.375 and standard error (S.D. = 0.909). Due to all CUB values of the ORFs being much lower (23.375 < CUBmax = 84), the CUB in FMDV was a little slight. However, based on CUB values of the samples, there was a marked variation in codon usage pattern among different serotypes (P < 0.01).
Genetic relationship based on synonymous codon usage
In Fig. 1, all samples among seven serotypes were plotted in seven colors. We found some interesting results that the plots of SAT-1, 2, 3 located at the top of the plot; the rest can be separated obviously and these four serotype samples failed to own obvious characteristics to separate each other and some plots overlapped to some degree. The separation of these serotypes on the first axis (r = 0.378, P < 0.01) & the second axis (r = 0.638, P < 0.01) was statistically significant. Taken together, these results indicated that most of CUB among samples was directly related to the nucleotide composition. Pattern of synonymous codon usage may reflect the evolutional relationship among serotypes A, C, O & Asia 1, namely inter-serotypic recombination. Pariente et al. (2003) reported that the genome of this virus was non-static, and its ability to recombine or evolve in order to adapt to any given environment (Pariente et al. 2003). Furthermore, mutational pressure was main factor responsible for the variation of pattern of synonymous codon usage in FMDV.
Quantitative evaluation of CUB in FMDV
Qualitative evaluation of CUB in FMDV
Relationship between amino acids and codon usage pattern
RSCU is basically a way to analyze codon usage bias in the whole coding sequence. The RSCU value is defined as the ratio of the observed frequency of codons to the expected frequency given that all the synonymous codons for the same amino acids are used equally. RSCU values have no relation to the amino acids usage and the abundance ratio of synonymous codon usage. RSCU is the observed number of codon occurrences divided by the number expected if synonymous codons are used randomly and equally.
It has been well published that codon usage patterns in various viruses distinguish the relationship between mutation pressure and translational selection in nature. However, with the development of genome project of viruses, many reports have represented that many factors may play an important role in codon usage bias. Knowledge of the pattern of synonymous codon usage pattern in FMDV might help the improvement of understanding of the evolution of FMDV. Generally, As for natural selection, for example translational selection, gene length and gene fuction are thought to be the factors accounting for the codon usage pattern among in different organisms (Das et al. 2006; Gu et al. 2004; Jenkins and Holmes 2003; Levin and Whittome 2000; Shackelton et al. 2006; Zhou et al. 2005). Another report shows that codon usage patterns are obviously correlated with overall genomic G+C content, suggesting that genome-wide mutational pressure, rather than natural selection for specific coding triplets, is the main determinant of codon usage (Shackelton et al. 2006). In addition, with the development of techniques for nucleotide sequence analysis, comparison of the nucleotide sequences of the capsid protein genes of FMDVs from different geographical sites of isolation shows that there is correspondence with different serotypes, the sequences group into serotype-specific lineages on phylogenetic analysis. Knowles and Samuel (2003) reported that the seven serotypes of FMDV clustered into distinct genetic lineages with approximately 30% to 50% differences in the VP1 gene (Knowles and Samuel 2003). Figure 1 shows that although seven serotypes have been identified serologically, there are two main genetic groups depending on pattern of synonymous codon usage, namely (i) serotypes Asia 1, A, C & O and (ii) SAT 1, 2, &3. This principle may be associated with geographical factors, for example SAT 1, 2 & 3 are often limited in the Africa (Grubman and Baxt 2004). As for the genetic population composed of Asia 1, A, C & O, the result shows that if distinct serotypes have a chance to exist in a common place, these serotypes may carry out intertypic recombination. By analysis of synonymous codon usage, it can reflect this interesting characteristic, and confirm that due to geographic factors SAT 1, 2 & 3 may have no chance to implement inter-typic recombination. These observations raise interesting questions about FMDV genome evolution in nature and the relative contribution of recombination to the generation of FMDV genetic and population diversity. There are some evidences to account for this mechanism (King et al. 1985; McCahon et al. 1985; Wilson et al. 1988; Tosh et al. 2002), but a more comprehensive analysis is needed to disclose significance of viral fitness based on codon usage pattern.
Our method of calculating CUB derives from the original suggestion (Sharp et al. 1986) and regards random and equal codon usage as the “null hypothesis”, namely, RSCU0 = 1.0; and deviation from this hypothesis is the bias. This way makes it possible to free from subjectively and serotypes restrictions in choosing the reference set of codons, and we try to form the concept of CUB based on some statistical rules and the large collection of synonymous codon usage data and codon usage bias values collected in Table 1. Our analysis reveals that codon usage bias in FMDV is less biased, which is mainly determined by the synonymous codon usage pattern. This result is consistent with publish data of codon usage bias among some other viruses (Das et al. 2006; Levin and Whittome 2000). This codon usage pattern assists viral gene in replicating efficiently invertebrate cells with potential distinct codon usage pattern. Some published results has shown that the overall extent of codon usage bias in RNA viruses is low and there is little variation in bias among the genes (Levin and Whittome 2000; Jenkins et al. 2001; Jenkins and Holmes 2003), the codon usage bias of FMDV also corresponds with this result mentioned above. A similar codon usage pattern has been noticed in other RNA viruses (Jenkins and Holmes 2003), it may imply codon usage pattern with low codon usage bias can be one of main factors for regulating evolutional pattern of RNA viruses. In addition, because evolutionary pattern in RNA viruses are much higher than those in DNA viruses (Drake and Holland 1999), it is understandable that evolutionary pattern is the major determinant of synonymous codon usage in 85 samples in this study. RNA viruses including FMDV have much more mutation rates in each genome replication, because of the short of error correction mechanisms during RNA replication, its ability to evolve or undergo recombination with each other at the edge of extinction is probably the reason why this virus is so efficient pathogens of susceptible animals (Domingo 2000; Drake and Holland 1999). In details, translational selection in nature might have no effect on the codon usage pattern in FMDV, while mutational pressure Is the major factor forming the codon usage pattern.
Relationship between amino acids and synonymous codon usage pattern, we proposed here is useful to understand the processes governing codon usage pattern of FMDV, especially the roles played by mutation pressure and natural selection. Further, such information not only can offer an insight into the codon usage patterns and gene classification of FMDV, but also may help in increasing the efficiency of gene delivery and expression systems.
Because conventional vaccines against FMDV have no ability to block outbreak of foot-and-mouth disease (FMD), Many developing countries undergo epizootic of FMD force us face serious economic losses. In our work, codon usage pattern we analyzed here are helpful to understand the evolution of FMDV, especially the roles played by codon usage bias and mutational pressure.
This research is a part of Projection 0801NKDA034 supported by Science and Technology Key Projection of Gansu province of China.