Abstract
Highly expressed genes in any species differ in the usage frequency of synonymous codons. The relative recurrence of an event of the favored codon pair (amino acid pairs) varies between gene and genomes due to varying gene expression and different base composition. Here we propose a new measure for predicting the gene expression level, i.e., codon plus amino bias index (CABI). Our approach is based on the relative bias of the favored codon pair inclination among the genes, illustrated by analyzing the CABI score of the Medicago truncatula genes. CABI showed strong correlation with all other widely used measures (CAI, RCBS, SCUO) for gene expression analysis. Surprisingly, CABI outperforms all other measures by showing better correlation with the wet-lab data. This emphasizes the importance of the neighboring codons of the favored codon in a synonymous group while estimating the expression level of a gene.
Similar content being viewed by others
References
Akashi H (2003) Translational selection and yeast proteome evolution. Genetics 164:1291–1303
Akashi H, Gojobori T (2002) Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc Natl Acad Sci USA 99:3695–3700. doi:10.1073/pnas.06252699999/6/3695
Brooks DJ, Fresco JR, Lesk AM, Singh M (2002) Evolution of amino acid frequencies in proteins over deep time: inferred order of introduction of amino acids into the genetic code. Mol Biol Evol 19:1645–1655
Bulmer M (1987) Coevolution of codon usage and transfer. RNA Abund Nat 325:728–730
Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH (2004) Codon usage between genomes is constrained by genome-wide mutational processes. Natl Acad Sci 101(10):3480–3485
D’Onofrio G, Mouchiroud D, Aissani B, Gautier C, Bernardi G (1991) Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J Mol Evol 32:504–510
D’Onofrio G, Ghosh TC, Bernardi G (2002) The base composition of the genes is correlated with the secondary structures of the encoded proteins. Gene 300:179–187. doi:10.1016/S0378-1119(02)01045-4
DePristo MA, Zilversmit MM, Hartl DL (2006) On the abundance, amino acid composition, and evolutionary dynamics of low-complexity regions in proteins Gene 378:19–30. doi:10.1016/j.gene.2006.03.023 (S0378-1119(06)00253-8)
Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci 96:4482–4487
Ehrenberg M, Kurland C (1984) Costs of accuracy determined by a maximal growth rate constraint. Q Rev Biophys 17:45–82
Eyre-Walker A (1996) Synonymous codon bias is related to gene length in Escherichia coli: selection for translational accuracy? Mol Biol Evol 13:864–872
Feng C et al (2013) Codon usage patterns in Chinese bayberry (Myrica rubra) based on RNA-Seq data. BMC Genome 14:732
Foster PG, Jermiin LS, Hickey DA (1997) Nucleotide composition bias affects amino acid content in proteins coded by animal mitochondria. J Mol Evol 44:282–288
Gibson A, Gowri-Shankar V, Higgs PG, Rattray M (2005) A comprehensive analysis of mammalian mitochondrial genome base composition and improved phylogenetic methods. Mol Biol Evol 22:251–264
Henry I, Sharp PM (2007) Predicting gene expression level from codon usage bias. Mol Biol Evol 24:10–12
Higgs PG, Ran W (2008) Coevolution of codon usage and tRNA genes leads to alternative stable states of biased codon. Usage Mol Biol Evol 25(11):2279–2291
Hiraoka Y, Kawamata K, Haraguchi T, Chikashige Y (2009) Codon usage bias is correlated with gene expression levels in the fission yeast Schizosaccharomyces pombe. Genes Cells 14:499–509
Hockenberry AJ, Sirer MI, Amaral LAN, Jewett MC (2014) Quantifying position-dependent codon usage bias. Mol Biol Evol 31(7):1880–1893
Kanaya S, Yamada Y, Kinouchi M, Kudo Y, Ikemura T (2001) Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Biol 53(4–5):290–298
Karlin S, Mrázek J (2000) Predicted highly expressed genes of diverse prokaryotic genomes. J Bacteriol 182:5238–5250
Karlin S, Mrazek J, Campbell A, Kaiser D (2001) Characterizations of highly expressed genes of four fast-growing bacteria. J Bacteriol 183:5025–5040
Karlin S, Mrázek J, Ma J, Brocchieri L (2005) Predicted highly expressed genes in archaeal genomes. Proc Natl Acad Sci USA 102:7303–7308
Knight RD, Freeland SJ, Landweber LF (2001) A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2:RESEARCH0010
Krisko A, Copi T, Gabaldón T, Lehner B, Supek F (2014) Inferring gene function from evolutionary change in signatures of translation efficiency. Genome Biol 15:R44
Magadum S, Banerjee U, Murugan P, Gangapur D, Ravikesavan R (2013) Gene duplication as a major force in evolution. J Genet 92(1):155–161
Man O, Pilpel Y (2007) Differential translation efficiency of orthologous genes is involved in phenotypic divergence of yeast species. Nat Genet 39:415–421
Mrazek J, Bhaya D, Grossman AR, Karlin S (2001) Highly expressed and alien genes of the Synechocystis genome. Nucleic Acids Res 29:1590–1601
Mun J-H et al (2006) Distribution of microsatellites in the genome of Medicago truncatula: a resource of genetic markers that integrate genetic and physical maps. Genetics 172:2541–2555
Palacios C, Wernegreen JJ (2002) A strong effect of AT mutational bias on amino acid usage in Buchnera is mitigated at high-expression genes. Mol Biol Evol 19:1575–1584
Plotkin JB, Kudla G (2010) Synonymous but not the same: the causes and consequences of codon bias. Nature 12(1):32–42. doi:10.1038/nrg2899
Precup J, Parker J (1987) Missense misreading of asparagine codons as a function of codon identity and context. J Biol Chem 262:11351–11355
Raghava GP, Han JH (2005) Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein. BMC Bioinform 6:59
Ran W, Kristensen DM, Koonin EV (2014) Coupling between protein level selection and codon usage optimization in the evolution of bacteria and archaea. MBio 5:e00956–00914
Roymondal U, Das S, Sahoo S (2009) Predicting gene expression level from relative codon usage bias: an application to Escherichia coli genome. DNA Res 16:13–30. doi:10.1093/dnares/dsn029
Schmid M et al (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37:501–506
Sharp PM, Devine KM (1989) Codon usage and gene expression level in Dictyosteiium discoidtum: highly expressed genes do [prefer [optimal codons. Nucleic Acids Res 17:5029–5040
Sharp PM, Li WH (1987) The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295
Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE (2005) Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res 33:1141–1153. doi:10.1093/nar/gki242
Sherman-Broyles S, Bombarely A, Grimwood J, Schmutz J, Doyle J (2014) Complete plastome sequences from glycine syndetika and six additional perennial wild relatives of soybean. G3 Genes Genomes Genet 4:2023–2033
Shpaer EG (1989) Amino acid composition is correlated with protein abundance in Escherichia coli: can this be due to optimization of translational efficiency? Protein Seq Data Anal 2:107–110
Song H, Wang P, Ma D, Xia H, Zhao C, Zhang Y, Zhao S (2015) Analysis of codon usage bias in Medicago truncatula WRKY transcription factors. J Agric Biotechnol 23:203–212
Sueoka N, Kawanishi Y (2000) DNA G+C content of the third codon position and codon usage biases of human genes. Gene 261:53–62. doi:10.1016/S0378-1119(00)00480-7
Veerappan V et al (2014) Keel petal incision: a simple and efficient method for genetic crossing in Medicago truncatula. Plant Method 10:11
Volkening JD et al (2012) A proteogenomic survey of the Medicago truncatula genome. Mol Cell Proteom 11:933–944
Wan XF, Xu D, Kleinhofs A, Zhou J (2004) Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes. BMC Evol Biol 4:19. doi:10.1186/1471-2148-4-191471-2148-4-19
Wright F (1990a) The ‘effective number of codons’ used in a gene. Gene 87:23–29. doi:10.1016/0378-1119(90)90491-9
Wright F (1990b) The ‘effective number of codons’ used in a gene. Gene 87:23–29
Acknowledgements
The authors are grateful to the University Grants Commission, New Delhi, India, for providing a UGC-BSR Fellowship to carry out this research work. We are also grateful to the Assam University, Silchar, Assam, India for providing the research facility.
Author information
Authors and Affiliations
Contributions
P.P carried out all analyses, conceived the idea, analyzed the data and wrote the paper. A.K.M prepared the figures and tables. S.C prepared the software for analysis and edited the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Funding
The authors did not avail any financial assistance from any source in undertaking the present study.
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Paul, P., Malakar, A.K. & Chakraborty, S. Codon usage and amino acid usage influence genes expression level. Genetica 146, 53–63 (2018). https://doi.org/10.1007/s10709-017-9996-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10709-017-9996-4