Veterinary Research Communications

, Volume 34, Issue 4, pp 393–404

Analysis of synonymous codon usage in foot-and-mouth disease virus

Authors

  • Jian-Hua Zhou
    • Key Laboratory of Animal Virology of Ministry of Agriculture, State Key Laboratory of Veterinary Etiological BiologyLanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences
  • Jie Zhang
    • Key Laboratory of Animal Virology of Ministry of Agriculture, State Key Laboratory of Veterinary Etiological BiologyLanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences
  • Hao-Tai Chen
    • Key Laboratory of Animal Virology of Ministry of Agriculture, State Key Laboratory of Veterinary Etiological BiologyLanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences
  • Li-Na Ma
    • Key Laboratory of Animal Virology of Ministry of Agriculture, State Key Laboratory of Veterinary Etiological BiologyLanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences
    • Key Laboratory of Animal Virology of Ministry of Agriculture, State Key Laboratory of Veterinary Etiological BiologyLanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences
Short Communication

DOI: 10.1007/s11259-010-9359-4

Cite this article as:
Zhou, J., Zhang, J., Chen, H. et al. Vet Res Commun (2010) 34: 393. doi:10.1007/s11259-010-9359-4

Abstract

In this study, we calculate the relative synonymous codon usage (RSCU) values and codon usage bias (CUB) values to carry out a comparative analysis of codon usage pattern for open reading frames (ORFs) among 85 samples which belong to all seven serotypes of foot-and-mouth disease virus (FMDV). Although the degree of CUB for ORFs is a relatively slight, there is a significant variation for CUB among different serotypes, which is mainly determined by codon usage pattern depending on RSCU. By comparison with RSCU values for all samples, although RSCU values fail to show the relationship of specific-lineage serotype, there are two main genetic populations existing in FMDV, namely (i) serotypes Asia 1, A, C &O; (ii) serotypes SAT 1, 2 & 3. This interesting characteristic may be formed by the mechanism of RNA virus recombination. The analysis of quantitative & qualitative evaluation based on CUB indicates interesting characteristic of codon usage, which suggests that more FMDV genome diversity may exist in specific-lineage serotypes rather than exist randomly. Furthermore, the relationship between amino acids and codon usage pattern indicates that mutation pressure rather than translational selection in nature is the important determinant of the codon usage bias observed. Our work might give some sight into some characteristics of FMDV ORF and some evolutionary information of this virus.

Keywords

Foot-and-mouth disease virusRelative synonymous codon usageCodon usage biasCodon usage patternGenetic population

Introduction

It is well known that the genetic code chooses 64 codons to correspond to the 20 standard amino acids & stop signals. The 64 codons can be divided into 20 disjoint groups, each group corresponding to each of the standard amino acids, with the 21st group for stop signals. Each group in the general genetic code has between 1 and 6 codons, hence, alternation codons are often identified as synonymous. Although alterations among synonymous codons often occur at third codon letters, the cases can be interchanged without affecting the primary sequence of the protein product, synonymous codons are not chosen equally both within and between genomes (Grantham et al. 1980; Gu et al. 2004; Lloyd and Sharp 1992; Martin et al. 1989; Xie et al. 1998; Zhou et al. 2006). In general, translation selection in nature and compositional constraints are thought to be the two major factors accounting for variation of codon usage pattern among genomes in various organisms (Karlin and Mrázek 1996; Lesnik et al. 2000; Sharp et al. 1986). Comparison with pattern of synonymous codon usage in different mammals, compositional constraints of the genomes play a weak role in some unicellular organisms (such as Escherichia coli & Sacchromyces cerevisiae) and high expression genes have a obvious selective discrepancy for codon usage with a high concentration of the particular acceptor tRNA molecular, in contrast, low expressed genes show a obviously similar codon usage pattern (Gouy and Gautier 1982; Grantham et al. 1981; Ghosh et al. 2000; Lesnik et al. 2000; Majumdar et al. 1999; Ohno et al. 2001).

As for some RNA viruses, compared with translational selection in nature, mutation pressure plays a key role in the synonymous codon usage pattern (Gu et al. 2004; Jenkins and Holmes 2003). It is well known that foot-and-mouth disease (FMD) is considered to be the most contagious of all the diseases infecting domestic animals such as cattle, pigs, sheep and goats. Foot-and-mouth disease virus (FMDV) is currently classified as a member of the Aphthovirus genus in the family Picornaviridae. The FMDV genome has a single long open reading frame (ORF) and encodes all of its proteins in the form of a polyprotein. The virus exists as seven different serotypes, namely A, O, C, Asia l, and South African Territories 1(SAT 1), SAT 2 & SAT 3. There are a various subtypes existing within each serotype (Knowles and Samuel 2003). There were many researches of FMDV based on molecular biology, including genomic analysis, genome structure, virion structure and polyprotein processing by proteases (Carrillo et al. 2005, 2007; Klein 2009; Lewis-Rogers et al. 2008; Mason et al. 2003; Newman and Brown 1997). However, little information about FMDV genome with codon usage pattern including the relative synonymous codon usage (RSCU) and codon usage bias (CUB) are available on evolution or mutation of this virus. Codon usage pattern of FMDV might give some clues to some evolutionary information about this virus. In this work, we calculated the codon usage pattern data and the evolutionary determinants for codon usage pattern of FMDV were analyzed.

Materials and methods

Materials

A total of 85 complete encoding sequences were selected in this study, including 17 serotype Asia 1, 25 serotype A, 12 serotype C, 26 serotype O, four serotype SAT 1, three serotype SAT 2 & four serotype SAT 3. These complete encoding sequences were obtained form NCBI (http://www.ncbi.Nlm.Nih.Gov/). They were listed in detail, namely,
  • Asia 1: AY304994, DQ989322, DQ533483, AY687334, AY687333, FJ906802, NC_004915, EF614458, DQ989323, DQ989322, DQ989321, AY593800, AY593799, AY593798, AY593797, AY593796, AY593795

  • A: EF494488, EF494487, EF494486, AY593751, AY593769, AY593789, AY593767, AY593770, AY593782, AY593783, AY593784, AY593785, AY593786, AY593790, AY593802, AY593787, AY593788, AY593803, AY593753, AY593756, AY593757, AY593768, AY593794, AY593771, AY593758

  • C: NC_002554, FJ824812, AM409325, DQ409191, DQ409190, DQ409188, AY593809, AY593810, AY593805, AY593806, AY593807, AY593808

  • O: EF552697, EF552699, EF552695, EF552694, EF552693, EF552692, EU400597, EU140994, AF026168, NC_004004, AJ539139, AY593819, AY593835, AF189157, AY593836, AF511039, EF175732, DQ248888, AJ320488, AJ633821, AH012984, AB079061, AF189157, AF511039, FJ542372, AH012985

  1. SAT 1:

    AY593838, AY593841, AY593842, AY593843

     
  2. SAT 2:

    AF540910, AY593847, AY593848

     
  3. SAT 3:

    AY593850, AY593851, AY593852, AY593853

     

Relative synonymous codon usage calculations

In order to investigate the pattern of synonymous codon usage avoiding the confounding influence of amino acid composition among 85 samples, the RSCU values among different codons in each sample was calculated. The RSCU value of the ith codon for the jth amino acid was calculated, following a formula which Sharp and Li (1986) reported (Sharp and Li 1986).
$$ RSCU = \frac{{g_{ij}}}{{\sum\limits_j^{n_i} {g_{ij}} }} \cdot n_i $$

Where gij is the observed number of the the jth codon for the ith amino acid (which has ni synonymous codons). Thus, codons with RSCU values close to 1.0 have relatively little codon usage bias. When RSCU value = 1.0, it means this codon is chosen equally and randomly, namely translational selection in nature.

Codon usage bias (CUB) calculations

To calculate CUB numerically, we assumed that statistically equal & random choose of all available synonymous codons was the “neutral point” (RSCU0 = 1.00) for the development of serotypes-specific codon usages, and the CUB was the sum of the deviations from such random and equal usage.

CUBij is defined as the absolute deviation between RSCUij value and RSCU0.
$$ CUB = \sum\limits_{i = 1}^{59} {CU{B_{ij}} = \sum\limits_{i = 1}^{59} {\left| {RSC{U_{ij}} - RSC{U_0}} \right|} } $$

More simply, CUB is the absolute value of fractional frequencies minus the number expected if usage of synonymous codons is uniform. CUBmin = 0, namely, all RSCU values are RSCU0, and the calculated maximal possible CUBmax is 84, which means that only one codon chosen for each amino acid, while the rest for this amino acid is not used at all.

Principal component analysis

Principal component analysis, which was a commonly used multivariate statistical method (Jolliffe 2002; Mardia et al. 1979), was carried out to analyze the major trend in codon usage pattern among samples. Each sample was represented as a 59 dimensional vector, and each dimension corresponded to the RSCU value of each sense codon, which only included several synonymous codons for a particular amino acid, excluding AUG, UGG & three stop codons.

We can set up a two-dimensional coordinate system, which was made up of the first principal component (abscissa) and the second principal component (ordinate). This two-dimensional coordinate would report the genetic relationship among all analyzed samples.

Correlation analysis

Correlation analysis was used to identify the relationship between codon usage bias and synonymous codon usage pattern (Ewens and Grant 2001). This analysis was implemented based on the Spearman’s rank correlation analysis way.

All statistical processes were carried out by with statistical software SPSS11.5 for windows.

Results

Synonymous codon usage and codon usage bias in FMDV

The overall RSCU values of 59 sense codons in FMDV were, respectively, listed in Table 1. All preferentially used codons among samples were all G-ended or C-ended codons, and a majority of preferential codons were C-end codons (Table 1). It showed that the compositional constraints concerning C & G content might play a key role in contributing to the pattern of synonymous codon usage. In fact, by analysis of nucleotide composition, FMDV was a G+C relatively redundant genome with G+C content more than 50%; and C content was more than G content in the complete encoding sequence. So the results suggested that compositional limitation often played an integral role in the codon usage pattern of FMDV.
Table 1

Synonymous codon usage and codon usage bias in FMDV

AAa

Codon

RSCU

CUBij

Ala

GCA

0.991

−0.009

GCC

1.472

0.472

GCG

0.607

−0.393

GCU

0.926

−0.074

Arg

AGA

1.559

0.559

AGG

0.801

−0.199

CGA

0.301

−0.699

CGC

1.656

0.656

CGG

0.763

−0.237

CGU

0.912

−0.088

Asn

AAC

1.706

0.706

AAU

0.294

−0.706

Asp

GAC

1.539

0.539

GAU

0.46

−0.54

Cys

UGC

1.130

0.13

UGU

0.874

−0.126

Gln

CAA

0.811

−0.189

CAG

1.188

0.188

Glu

GAA

0.672

−0.328

GAG

1.326

0.326

Gly

GGA

0.998

−0.002

GGC

1.199

0.199

GGG

0.836

−0.164

GGU

0.957

−0.043

His

CAC

1.79

0.79

CAU

0.21

−0.79

Ile

AUA

0.193

−0.807

AUC

1.710

0.710

AUU

1.105

0.105

Leu

CUA

0.181

−0.819

CUC

1.919

0.919

CUG

1.577

0.577

CUU

1.143

0.143

UUA

0.056

−0.944

UUG

1.121

0.121

Lys

AAA

0.856

−0.144

AAG

1.149

0.149

Phe

UUC

1.182

0.182

UUU

0.818

−0.182

Pro

CCA

0.805

−0.195

CCC

1.263

0.263

CCG

0.860

−0.14

CCU

1.072

0.072

Ser

AGC

1.095

0.095

AGU

0.663

−0.337

UCA

1.023

0.023

UCC

1.569

0.569

UCG

0.876

−0.124

UCU

0.763

−0.237

Thr

ACA

0.872

−0.128

ACC

1.604

0.604

ACG

0.606

−0.394

ACU

0.917

−0.083

Tyr

UAC

1.726

0.726

UAU

0.276

−0.724

Val

GUA

0.279

−0.721

GUC

1.088

0.088

GUG

1.685

0.685

GUU

0.943

−0.057

aAA is the abbreviation of amino acid

bRSCU value is a mean value of each codon for a particular amino aicd

cCUBij values is a mean value of each codon for a particular amino acid

dThe preferentially used codons for each amino acid are described in bold

The CUB values among the samples ranged 26.154 to 21.413, with a mean value of 23.375 and standard error (S.D. = 0.909). Due to all CUB values of the ORFs being much lower (23.375 < CUBmax = 84), the CUB in FMDV was a little slight. However, based on CUB values of the samples, there was a marked variation in codon usage pattern among different serotypes (P < 0.01).

Genetic relationship based on synonymous codon usage

Principal component analysis was carried out for the identified ORF from each of the 85 samples. This way detected one major trend in the first axis (f1) which can account for 18.291% of the total variation, and another major trend in the second axis (f2) for 11.068% of the total variation. A plot of the f1 and the f2of each gene was showed in Fig. 1. From the plot, it appeared to be a little complex with many overlaps among samples, but it was clear that the plots of the ORFs belonging to serotypes A, Asia 1, C & O, and SAT 1, 2 & 3 scattered in different areas.
https://static-content.springer.com/image/art%3A10.1007%2Fs11259-010-9359-4/MediaObjects/11259_2010_9359_Fig1_HTML.gif
Fig. 1

A plot of the values of the first axis and the second axis of each ORF in principle component analysis

In Fig. 1, all samples among seven serotypes were plotted in seven colors. We found some interesting results that the plots of SAT-1, 2, 3 located at the top of the plot; the rest can be separated obviously and these four serotype samples failed to own obvious characteristics to separate each other and some plots overlapped to some degree. The separation of these serotypes on the first axis (r = 0.378, P < 0.01) & the second axis (r = 0.638, P < 0.01) was statistically significant. Taken together, these results indicated that most of CUB among samples was directly related to the nucleotide composition. Pattern of synonymous codon usage may reflect the evolutional relationship among serotypes A, C, O & Asia 1, namely inter-serotypic recombination. Pariente et al. (2003) reported that the genome of this virus was non-static, and its ability to recombine or evolve in order to adapt to any given environment (Pariente et al. 2003). Furthermore, mutational pressure was main factor responsible for the variation of pattern of synonymous codon usage in FMDV.

Quantitative evaluation of CUB in FMDV

When all available synonymous codons were chosen randomly & equally (RSCU0 = 1.0), CUB = 0. The maximal calculated bias (CUBmax = 84) indicated that only one codon was used for each amino acid (excluding AUG, UGG & three stop codons), while the remaining 42 codons were not used at all. We calculated CUB among 85 samples mentioned above. Within the seven serotypes of this virus, all genomes sequences were not identical, and that formation of codon usage pattern occurred at the population level per serotype rather than at the individual level. In Fig. 2, among all serotypes, there was a difference with significant decrease (P < 0.01) in the codon usage bias during evolution: serotype SAT 3 had the highest bias while serotype C had the lowest. These results might imply that although these seven serotypes belonged to FMDV, the difference of evolutional bias may exist among seven serotypes and was mainly determined by codon usage bias of each serotype. Haydon et al. (2001) pointed out that high mutation rate contributed to diversities of FMDV replicated genomes from the primitive parental genome of 0.1 to 10 base positions (Haydon et al. 2001), and the quasispecies concept, which should be termed as genetic population was developed to elaborate the effects of mutation in replication on the evolution of replication RNA molecules (Eigen 1971). Within any genetic population of this virus, all genomes sequences were not identical, and that selection occurred at the population level rather than at the individual level (Domingo et al. 2003).
https://static-content.springer.com/image/art%3A10.1007%2Fs11259-010-9359-4/MediaObjects/11259_2010_9359_Fig2_HTML.gif
Fig. 2

Codon Usage Bias (CUB) in seven serotypes. Mean +/− S.D.

Qualitative evaluation of CUB in FMDV

Detailed analysis of the seven serotypes revealed wide variations in CUB (Fig. 3). There was a seemingly random variation in CUB between amino acids and these serotypes. However, a comparison of closely-related serotypes with a large codon collection represented a similar pattern of codon usage. In detail, SAT 1, 2, 3 had a similar CUB pattern to some degree (Fig. 3-b), while serotypes A, C, O & Asia 1 had difference with each other (Fig. 3-a). The result represented that the pattern of codon usage bias can account for the evolutionary distance among different serotypes, while the variation of codon usage bias depended on the pattern of synonymous codon usage, hence, the pattern of synonymous codon usage was a major factor to evolutionary distance. Form Figs. 1 and 3-b, the similar evolutionary pattern of serotypes SAT 1, 2, 3 also was confirmed.
https://static-content.springer.com/image/art%3A10.1007%2Fs11259-010-9359-4/MediaObjects/11259_2010_9359_Fig3_HTML.gif
Fig. 3

CUB comparisons. Codon Usage Biases (CUB) were calculated in 102 strains and sorted into subgroups. The average CUB values of the 59 codons in the indicated subgroups are shown. A: kingdoms, B: SAT 1, SAT 2 & SAT 3

Relationship between amino acids and codon usage pattern

In order to analyze whether the evolution of CUB controlled by mutation pressure or translational selection in nature, we accumulated the CUB data form the 85 samples into the Table 1. This table was intended to give a numerical representation of the translational machinery. The distribution of CUB values in the Table 1 was illustrated in Fig. 4. The transition from maximum-positive to maximum-negative values was smooth and there was no obvious or unambiguous border between the so-called “dominant” and “prohibited” codons, namely, all possible codons were used. The result indicated that translational selection in nature had no effect on the pattern of synonymous codon usage and the evolutionary pattern of FMDV.
https://static-content.springer.com/image/art%3A10.1007%2Fs11259-010-9359-4/MediaObjects/11259_2010_9359_Fig4_HTML.gif
Fig. 4

Distribution of the CUB of a codon for each amino acid. CUB was taken from Table 1 and sorted in descending order

Discussion

RSCU is basically a way to analyze codon usage bias in the whole coding sequence. The RSCU value is defined as the ratio of the observed frequency of codons to the expected frequency given that all the synonymous codons for the same amino acids are used equally. RSCU values have no relation to the amino acids usage and the abundance ratio of synonymous codon usage. RSCU is the observed number of codon occurrences divided by the number expected if synonymous codons are used randomly and equally.

It has been well published that codon usage patterns in various viruses distinguish the relationship between mutation pressure and translational selection in nature. However, with the development of genome project of viruses, many reports have represented that many factors may play an important role in codon usage bias. Knowledge of the pattern of synonymous codon usage pattern in FMDV might help the improvement of understanding of the evolution of FMDV. Generally, As for natural selection, for example translational selection, gene length and gene fuction are thought to be the factors accounting for the codon usage pattern among in different organisms (Das et al. 2006; Gu et al. 2004; Jenkins and Holmes 2003; Levin and Whittome 2000; Shackelton et al. 2006; Zhou et al. 2005). Another report shows that codon usage patterns are obviously correlated with overall genomic G+C content, suggesting that genome-wide mutational pressure, rather than natural selection for specific coding triplets, is the main determinant of codon usage (Shackelton et al. 2006). In addition, with the development of techniques for nucleotide sequence analysis, comparison of the nucleotide sequences of the capsid protein genes of FMDVs from different geographical sites of isolation shows that there is correspondence with different serotypes, the sequences group into serotype-specific lineages on phylogenetic analysis. Knowles and Samuel (2003) reported that the seven serotypes of FMDV clustered into distinct genetic lineages with approximately 30% to 50% differences in the VP1 gene (Knowles and Samuel 2003). Figure 1 shows that although seven serotypes have been identified serologically, there are two main genetic groups depending on pattern of synonymous codon usage, namely (i) serotypes Asia 1, A, C & O and (ii) SAT 1, 2, &3. This principle may be associated with geographical factors, for example SAT 1, 2 & 3 are often limited in the Africa (Grubman and Baxt 2004). As for the genetic population composed of Asia 1, A, C & O, the result shows that if distinct serotypes have a chance to exist in a common place, these serotypes may carry out intertypic recombination. By analysis of synonymous codon usage, it can reflect this interesting characteristic, and confirm that due to geographic factors SAT 1, 2 & 3 may have no chance to implement inter-typic recombination. These observations raise interesting questions about FMDV genome evolution in nature and the relative contribution of recombination to the generation of FMDV genetic and population diversity. There are some evidences to account for this mechanism (King et al. 1985; McCahon et al. 1985; Wilson et al. 1988; Tosh et al. 2002), but a more comprehensive analysis is needed to disclose significance of viral fitness based on codon usage pattern.

Our method of calculating CUB derives from the original suggestion (Sharp et al. 1986) and regards random and equal codon usage as the “null hypothesis”, namely, RSCU0 = 1.0; and deviation from this hypothesis is the bias. This way makes it possible to free from subjectively and serotypes restrictions in choosing the reference set of codons, and we try to form the concept of CUB based on some statistical rules and the large collection of synonymous codon usage data and codon usage bias values collected in Table 1. Our analysis reveals that codon usage bias in FMDV is less biased, which is mainly determined by the synonymous codon usage pattern. This result is consistent with publish data of codon usage bias among some other viruses (Das et al. 2006; Levin and Whittome 2000). This codon usage pattern assists viral gene in replicating efficiently invertebrate cells with potential distinct codon usage pattern. Some published results has shown that the overall extent of codon usage bias in RNA viruses is low and there is little variation in bias among the genes (Levin and Whittome 2000; Jenkins et al. 2001; Jenkins and Holmes 2003), the codon usage bias of FMDV also corresponds with this result mentioned above. A similar codon usage pattern has been noticed in other RNA viruses (Jenkins and Holmes 2003), it may imply codon usage pattern with low codon usage bias can be one of main factors for regulating evolutional pattern of RNA viruses. In addition, because evolutionary pattern in RNA viruses are much higher than those in DNA viruses (Drake and Holland 1999), it is understandable that evolutionary pattern is the major determinant of synonymous codon usage in 85 samples in this study. RNA viruses including FMDV have much more mutation rates in each genome replication, because of the short of error correction mechanisms during RNA replication, its ability to evolve or undergo recombination with each other at the edge of extinction is probably the reason why this virus is so efficient pathogens of susceptible animals (Domingo 2000; Drake and Holland 1999). In details, translational selection in nature might have no effect on the codon usage pattern in FMDV, while mutational pressure Is the major factor forming the codon usage pattern.

Relationship between amino acids and synonymous codon usage pattern, we proposed here is useful to understand the processes governing codon usage pattern of FMDV, especially the roles played by mutation pressure and natural selection. Further, such information not only can offer an insight into the codon usage patterns and gene classification of FMDV, but also may help in increasing the efficiency of gene delivery and expression systems.

Because conventional vaccines against FMDV have no ability to block outbreak of foot-and-mouth disease (FMD), Many developing countries undergo epizootic of FMD force us face serious economic losses. In our work, codon usage pattern we analyzed here are helpful to understand the evolution of FMDV, especially the roles played by codon usage bias and mutational pressure.

Acknowledgements

This research is a part of Projection 0801NKDA034 supported by Science and Technology Key Projection of Gansu province of China.

Copyright information

© Springer Science+Business Media B.V. 2010