Abstract
Bovine viral diarrhea virus (BVDV) is a widespread virus in beef and dairy herds. BVDV has been grouped into two genotypes, genotype 1 and genotype 2. In this study, the relative synonymous codon usage (RSCU) values, effective number of codon (ENC) values and nucleotide content were investigated, and a comparative analysis of codon usage patterns for open reading frames (ORFs) of 22 BVDV genomes, including 14 of genotype 1 and 8 of genotype 2, was carried out. A high A+U content and low codon bias were found in BVDV genomes. Depending on the RSCU data, it was found that there was a significant variation in bias of codon usage between the two genotypes, and a geographic factor exists only in genotype-1 of BVDV. The RSCU data have a negative correlation with general average hydrophobicity (GRAVY), aromaticity and nucleotide content. Furthermore, the overall abundance of C and U has no effect on the synonymous codon usage patterns. In contrast, the A and G content showed a significant correlation with the nucleotide content at the third position. In addition, the codon usage patterns of BVDV are similar to those of 22 conserved genes of Bos taurus. Taken together, the genetic characteristics of BVDV possibly result from interactions between natural seclection and mutation pressure.
Introduction
In 18 out of 20 amino acids (excluding Met and Trp), the degeneracy of the genetic code allows multiple codons to encode the same amino acid, resulting in codon usage bias in genes [7, 24]. Codon usage analysis has been applied to prokaryotes and eukaryotes, such as Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Caenorhabditis elegans and human beings [4, 16, 25, 27]. Some reports have shown that codon usage bias had a high correlation to tRNA abundance, GC content, mRNA secondary structure, exon splicing constraints, translation rate and gene expression level [12, 18, 26] . The study of codon usage can provide some evidence about the molecular evolution of the viruses. It can also enrich our understanding about the relationship between viruses and their hosts by analyzing their codon usage patterns.
BVDV is a member of the genus Pestivirus within the family Flaviviridae. The genus also includes classical swine fever virus (CSFV) and Border disease virus (BDV) of sheep [3, 20]. Based on a comparison of the 5’ untranslated region (UTR) and the Npro- and E2-encoding sequences [23, 30], BVDV can be divided into two different genotypes: BVDV-1 and BVDV-2 [21, 22]. The genome of each genotypes contains a single positive-stranded RNA with a size of approximately 12.3 kb, consisting of a single large open reading frame (ORF) flanked by 5’ and 3’ untranslated regions [6, 8]. The BVDV strains can grow in epithelial cell cultures with cytopathic (CP) or noncytopathic (NCP) effect [17].
Since BVDV is highly genetically variable, little information about synonymous codon usage patterns of BVDV genomes has been acquired to date [13, 29]. To our knowledge, this is the first report of codon usage analysis of BVDV. In this study, we analyzed the codon usage data and base composition of 22 available complete ORFs of BVDV to obtain some clues to the features of genetic evolution of this virus.
Materials and methods
Sequence data
A total of 22 BVDV genomes, consisting of 14 strains of genotype 1 and 8 strains of genotype 2, were used to analyze the relevant factors of synonymous codon usage patterns and nucleotide contents in this study. The genotype, phenotype, country of isolation and GenBank accession numbers of these strains are listed in Table 1. In addition, 22 different well-conserved genes of Bos taurus were selected to examine the relationship between codon preferences in the host and the viruses (Table 2). All of the abovementioned coding sequences were downloaded from NCBI (http://www.ncbi.nlm.nih.gov/Genbank/).
Calculation of relative synonymous codon usage
To investigate the patterns of synonymous codon usage (RSCU) without the confounding influence of amino acid composition among all BVDV samples, the RSCU values of codons in the ORFs of BVDV were calculated according to a formula described in previous reports [25, 32]:
where g ij is the observed number of the ith codon for the jth amino acid, which has ni types of synonymous codons. A codon with an RSCU value of more than 1.0 has a positive codon usage bias, while a value of less than 1.0 has a negative codon usage bias. When the RSCU value is equal to 1.0, it means that this codon is chosen equally and randomly.
The effective number of codons
The effective number of codons (ENC) is used to measure deviation from expected random codon usage of BVDV and is independent of hypotheses involving natural selection [5]. The ENC values range from 20 to 61. The larger the codon preference in a gene is, the smaller the ENC value is. In an extremely biased gene where only one codon is used for each amino acid, this value would be 20, and if all codons were used equally, it would be 61 [28, 31]. The formulas for ENC are as follows:
The n is the observed number of codons used, k is the number of synonymous codons, and P i is the usage frequency of the ith codon (n i /n). ENC is influenced by the amino acid content of the gene and its length.
The fraction of each codon within its synonymous family
Codon frequency normalizes the codon observations to a fraction for each codon within its synonymous family [1]. To examine the degree of similarity in codon usage between BVDV and that of its host animal (Bos taurus), the fraction of each codon (a total of 59 standard codons, excluding the synonymous single codon for AUG [Met], UGG [Trp] and the three termination codons) within its synonymous family of 22 ORFs of BVDV and 22 genes of Bos taurus was compared.
Statistical analysis
Principal component analysis (PCA) was conducted to analyze the major trend in codon usage pattern among BVDV samples. This is a statistical method that performs linear mapping to extract optimal features from an input distribution in the mean squared error and can be used by self-organizing neural networks to form unsupervised neural preprocessing modules for classification problems [15]. In order to minimize the effect of amino acid composition on codon usage, each ORF is represented as a 59-dimensional vector, and each dimension corresponds to the RSCU value of one sense codon excluding AUG (Met), UGG (Trp) and the three stop codons.
A Spearman’s rank correlation analysis was used to identify relationships among nucleotide content, RSCU and principal component factors of BVDV. A linear least-square regression was conducted to evaluate the correlation between the fraction of synonymous codons in BVDV and that in the genes of Bos taurus. General average hydrophobicity (GRAVY) and aromaticity scores were used to investigate hydrophobic properties of the targeted proteins. Both scores of each protein were obtained using the software Codon W 1.2.4.
Results
The characteristics of synonymous codon usage in BVDV
In order to investigate the extent of codon usage bias in BVDV, all RSCU values of different codons in 22 BVDV strains were calculated. There is only one preferred codon, AGU, with U at the third position; all of the remaining preferred codons end with A, C or G (Table 3). Moreover, the BVDV genome is A+U-rich, with the A+U content ranging from 53.63 to 55.11, with a mean value of 54.46 and S.D. of 0.35, but most of preferentially used codons are G/C-ended codons (G/C-ended: A/T-ended = 10:8), suggesting that the percentage of G+C at the third position may influence the pattern of synonymous codon usage (Table 4). The values of ENC among these BVDV ORFs are similar, which vary from 50.69 to 52.6, with a mean value of 51.43 and an S.D. of 0.46. The data showed that the extent of codon preference in BVDV genes remained basically stable.
Genetic relationship based on synonymous codon usage
Principal component analysis was carried out to identify the codon usage bias among ORFs. From this, we could detect one major trend in the first axis (\( f_{1}^{'} \)), which accounted for 26.51% of the total variation, and another major trend in the second axis (\( f_{2}^{'} \)), which accounted for 13.02% of the total variation. A plot of the \( f_{1}^{'} \) and the \( f_{2}^{'} \) of each gene is shown in Supplementary Fig.1. Compared with the scattered groups of BVDV genotype 1, all BVDV genotype 2 strains aggregated more tightly to some degree. Interestingly, it seems that there is a clear geographical demarcation in the BVDV-1 groups.
Compositional properties of all BVDV genomes
Natural selection and mutation pressure are thought to be the main factors that account for codon usage variation in different organisms. The A%, U%, C%, G% and (C+G)% were compared with A3%, C3%, G3%, U3%, (G+C) 3%, respectively. An interesting and complex correlation was observed. In detail, the (C+G)3% values have highly significant correlations with the A%, U%, C%, G% and (C+G)% values, indicating that (C+G)3% may reflect an interaction between mutation pressure and natural selection. In contrast, the U% and C% values did not correlate with the A3%, U3%, G3% and C3% values (Table 5). Both cases suggest that nucleotide constraints possibly influence synonymous codon usage in BVDV. Correlation analysis was used to analyze the relationships among ENC values, (G+C)3% values and (C+G)% values. A highly significant correlation was observed between ENC and (C+G)% (Spearman r = 0.765, p < 0.01), while significant correlation was also observed between ENC and (G+C)3% (Spearman r = 0.534, 0.01 < p<0.05), indicating that codon usage bias is influenced by nucleotide constraints. In addition, the correlation between the \( f_{1}^{'} \)value and A%, C%, G%, U%, A3%, C3%, G3%, U3%, (G+C)%, (G+C) 3% values of each strain was also analyzed. A significant correlation was found between nucleotide composition and synonymous codon usage to some extent (Table 6). The analysis revealed that most of the codon usage bias among ORFs of BVDV strains was directly related to base composition. We found that \( f_{1}^{'} \) also had a significant negative correlation with the general average hydrophobicity (GRAVY) of each protein (Spearman r = -0.737, p < 0.01), and negative correlation with the aromaticity of each protein (Spearman r = –0.455, p = 0.033 < 0.05), indicating that the expressed sequences are hydrophilic, since they accomplish their functions in the aqueous media of the cell.
Effect of other factors on codon usage
As shown in Figure 1, a plot of actual ENC values against both the (G+C)3% and the expected ENC value provides a useful display of the main features of codon usage patterns. The curve indicates the expected codon usage if it is influenced only by the (G+C)3% value of the genome:
where s represents the given (G+C)3% value [31]. However, all of the points with low ENC values lying below the expected curve suggest that although codon usage bias is influenced by mutational pressure, certain other factors must have an influence on the variation of codon usage in these genes. Therefore, we performed another correlation analysis on \( f_{1}^{'} \) in principal component analysis between GRAVY and the aromaticity score of each protein (Table 6).
Comparison of codon usage in BVDV and its host
A plot of average proportions of codons within its synonymous family in BVDV (excluding strain no. 14, which was isolated from swine) and Bos taurus was conducted to explore the relationship between BVDV and its host in codon usage. When two factors are both less than or equal to 0.15, it is defined as a low frequency of usage; and when one factor is greater than or equal to twice of the other factor, it is considered a great difference in frequency. The plot gave a clear linear relationship between BVDV and Bos taurus, showing that the virus and host had very similar patterns of codon usage (r2 = 0.697). The patterns indicate that the least frequently used codons in the host were also the non-preferred codons of the viruses, such as UCG (Ser), CCG (Pro), ACG (Thr), CGU, CGC, CGA, CGG (Arg) and GCG (Ala), and some highly scattered codons including CUA (Leu), AGG (Arg), AUA and AUU (Ile). Linear regression analysis was also performed to investigate the relationship of codon usage patterns between strain 14 and the other BVDV strains. There was no significant difference between the two patterns (P<0.05).
Discussion
Natural selection is a phenomenon that alters the behavior and fitness of living organisms within a given environment. It is the driving force of evolution. Mutation pressure is the change in some gene frequencies due to the repeated occurrence of the same mutations. There are not many biologically realistic situations where mutation pressure is the most important evolutionary process. However, for RNA viruses, the mutation rate is sometimes high enough that mutation pressure needs to be considered.
It is well established that synonymous codon usage reveals genetic information about some viral genomes [10, 14]. In this study, the evidence suggests that the synonymous codon usage bias in BVDV genes is low (mean ENC = 51.43, greater than 40). Therefore, together with published data on codon usage bias of some RNA viruses, such as influenza A H5N1 virus and SARS coronovirus, with mean values of 50.91 and 48.99, respectively [10, 33], the low frequency of codon usage bias for RNA viruses is similar to some degree. Bahir et al. also reported that there is a strong resemblance in codon usage between viruses and their host cells [2]. This suggests that the characteristics of low codon bias may assist BVDV to replicate efficiently in the host cells.
The general association between codon usage indices and composition constraints shows that mutation pressure plays an important role in determining codon usage variation in BVDV. This is supported by the highly significant correlation between codon usage indices (\( f_{1}^{'} \)) and A%, U%, G%, C%, A3%, U3%, G3% and C3% values (Table 6). The relationship between authentic ENC values and (G+C)3% is weaker than that of the expected values (Fig. 2). We suggest that mutation pressure is one of the main factors responsible for the variation of synonymous codon usage in genomes of BVDV. Further analysis showed that these C3% values of BVDV isolates were low, with an average C3 content of 17.47% and an S.D. of 3.05, but it is interesting that six preferential codons are all ended with C (Table 3). Meanwhile, the U3% value is higher than the C3% value (mean U3%: mean C3% = 19.97:17.47), but only one U-ended codon, AGU, is used as a preferentially used codon. This indicates that natural selection is possibly involved in the patterns of synonymous codon usage. No correlation was found between C%, or U% and A3%, U3%, G3%, or C3% (Table 5), suggesting that nucleotide constraints are involved in codon usage patterns due to low U% and C% values. Aromaticity is one of the factors in variations in amino acid usage [19]. The \( f_{1}^{'} \) values had a negative correlation with the aromaticity of each protein (Table 5). In this study, the degree of aromaticity had a negative correlation with codon usage bias of BVDV, suggesting that natural selection may be involved in BVDV evolution.
BVDV was first reported in 1946 [11], and the scattered model of all 14 strains of BVDV-1 may imply that there is more diversity among BVDV-1 strains with the development of evolution (Supplementary Fig. 1). Three BVDV-1 strains isolated from Asia were different from other BVDV-1 strains, implying that the strains isolated from Asia were distantly related to American or European strains. However, the strains from American were more closely related to those from Europe than to those from Asia. The low diversity in BVDV-2 might result from the limited number of samples. It is most likely that the codon usage bias in BVDV is related to genotype and geographic factors.
The remarkable similarity in the codon usage patterns between the viruses and Bos taurus reveals that natural selective pressure gives BVDV higher adaptability to its host. This adaptability makes it possible for the virus to survive in the host cell and to use the components of the cell to produce more of itself. However, there is no evidence that the viruses are generally adapted to the codon usage patterns of their host (AUU, CUA, AGG, and AUA), and this is consistent with mutational bias theory [1]. Although it has been reported that isolate 14 was first found in swine, its nucleotide content is similar to that of strains originating from cattle, suggesting that strain 14 is also a possible cattle-origin virus.
In this study, our analysis reveals that codon usage bias in BVDV is low, and mutation pressure is the main factor that affects codon usage variation in BVDV. Other factors, including base composition, genotype, geography, GRAVY, and even aromaticity may also significantly influence codon usage bias.
Although our study provides a basic understanding of the codon usage patterns of BVDV and the roles played by mutation pressure and natural selection, a more comprehensive analysis is needed to reveal more information about codon usage bias variation within BVDV viruses and the other responsible factors.
References
Adams MJ, Antoniw JF (2004) Codon usage bias amongst plant viruses. Arch Virol 149:113–135
Bahir I, Fromer M, Prat Y, Linial M (2009) Viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences. Mol Syst Biol 5:1–14
Becher P, Orlich M, Kosmidou A, Konig M, Baroth M, Thiel HJ (1999) Genetic diversity of pestiviruses: identification of novel groups and implications for classification. Virology 262:64–71
Bulmer M (1988) Codon usage and intragenic position. J Theor Biol 133:67–71
Castillo-Davis CI, Hartl DL (2002) Genome evolution and developmental constraint in Caenorhabditis elegans. Mol Biol Evol 19:728–735
Colett MS, Larson R, Gold C, Strick D, Anderson DK, Purchio AF (1988) Molecular cloning and nucleotide sequence of the pestivirus bovine viral diarrhea virus. Virology 165:191–199
Cutter AD, Charlesworth B (2006) Selection intensity on preferred codons correlates with overall codon usage bias in Caenorhabditis remanei. Curr Biol 16:2053–2057
Deng R, Brock KV (1992) Molecular cloning and nucleotide sequence of a pestivirus genome, noncytopathic bovine viral diarrhea virus strain SD-1. Virology 191:867–869
Duret L (2002) Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev 12:640–649
Gu W, Zhou T, Ma J, Sun X, Lu Z (2004) Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales. Virus Res 101:155–161
Hillerton JE (1998) Bovine spongiform encephalopathy: current status and possible impacts. J Dairy Sci 81:3042–3048
Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2:13–34
Jackova A, Novackova M, Pelletier C, Audeval C, Gueneau E, Haffar A, Petit E, Rehby L, Vilcek S (2008) The extended genetic diversity of BVDV-1: typing of BVDV isolates from France. Vet Res Commun 32:7–11
Jenkins GM, Holmes EC (2003) The extent of codon usage bias in human RNA viruses and its evolutionary origin. Virus Res 92:1–7
Kanaya S, Kinouchi M, Abe T, Kudo Y, Yamada Y, Nishi T, Mori H, Ikemura T (2001) Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome. Gene 276:89–99
Karlin S, Mrazek J (1996) What drives codon choices in human genes? J Mol Biol 262:459–472
Liebler-Tenorio EM, Ridpath JF, Neill JD (2003) Distribution of viral antigen and development of lesions after experimental infection of calves with a BVDV 2 strain of low virulence. J Vet Diagn Invest 15:221–232
Liu YS, Zhou JH, Chen HT, Ma LN, Ding YZ, Wang M, Zhang J (2010) Analysis of synonymous codon usage in porcine reproductive and respiratory syndrome virus. Infect Genet Evol 10:797–803
Nayak KC (2009) Mutational bias and Gene expression level shape codon usage in Thermobifida fusca YX. In Silico Biology 9:337–353
Paton DJ, Sands JJ, Lowings JP, Smith JE, Ibata G, Edwards S (1995) A proposed division of the pestivirus genus using monoclonal antibodies, supported by cross-neutralisation assays and genetic sequencing. Vet Res 26:92–109
Qi F, Gustad T, Lewis TL, Berry ES (1993) The nucleotide sequence of the 5’-untranslated region of bovine viral diarrhoea virus: its use as a probe in rapid detection of bovine viral diarrhoea viruses and border disease viruses. Mol Cell Probes 7:349–356
Ridpath JF, Bolin SR, Dubovi EJ (1994) Segregation of bovine viral diarrhea virus into genotypes. Virology 205:66–74
Ridpath JF, Neill JD, Vilcek S, Dubovi EJ, Carman S (2006) Multiple outbreaks of severe acute BVDV in North America occurring between 1993 and 1995 linked to the same BVDV2 strain. Vet Microbiol 114:196–204
Sharp PM, Averof M, Lloyd AT, Matassi G, Peden JF (1995) DNA sequence evolution: the sounds of silence. Philos Trans R Soc Lond B Biol Sci 349:241–247
Sharp PM, Li WH (1986) Codon usage in regulatory genes in Escherichia coli does not reflect selection for ‘rare’ codons. Nucleic Acids Res 14:7737–7749
Sorensen MA, Kurland CG, Pedersen S (1989) Codon usage determines translation rate in Escherichia coli. J Mol Biol 207:365–377
Stenico M, Lloyd AT, Sharp PM (1994) Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases. Nucleic Acids Res 22:2437–2446
Tao P, Dai L, Luo M, Tang F, Tien P, Pan Z (2009) Analysis of synonymous codon usage in classical swine fever virus. Virus Genes 38:104–112
Vilcek S, Mojzisova J, Bajova V, Paulik S, Strojny L, Durkovic B, Hipikova V (2003) A survey for BVDV antibodies in cattle farms in Slovakia and genetic typing of BVDV isolates from imported animals. Acta Vet Hung 51:229–236
Vilcek S, Strojny L, Durkovic B, Rossmanith W, Paton D (2001) Storage of bovine viral diarrhoea virus samples on filter paper and detection of viral RNA by a RT-PCR method. J Virol Methods 92:19–22
Wright F (1990) The ‘effective number of codons’ used in a gene. Gene 87:23–29
Zhou JH, Zhang J, Chen HT, Ma LN, Liu YS (2010) Analysis of synonymous codon usage in foot-and-mouth disease virus. Vet Res Commun 34:393–404
Zhou T, Gu W, Ma J, Sun X, Lu Z (2005) Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses. Biosystems 81:77–86
Acknowledgements
This work was supported in parts by grants by National Natural Science foundation of China (No. 30700597 and No. 31072143). This study was also supported by the International Science & Technology Cooperation Program of China (No. 2010DFA32640) and the Science and Technology Key Project of Gansu Province (No. 0801NKDA034)
Author information
Authors and Affiliations
Corresponding author
Additional information
M. Wang and J. Zhang contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary Fig.1 A plot of the values of the first axis (2 6.51%) and the second axis (13.02%) of each ORF in principal component analysis. The strains isolated form the same continent are shown in the same color.
Rights and permissions
About this article
Cite this article
Wang, M., Zhang, J., Zhou, Jh. et al. Analysis of codon usage in bovine viral diarrhea virus. Arch Virol 156, 153–160 (2011). https://doi.org/10.1007/s00705-010-0848-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00705-010-0848-0