Codon Usage and Selection on Proteins

  • Joshua B. PlotkinEmail author
  • Jonathan Dushoff
  • Michael M. Desai
  • Hunter B. Fraser


Selection pressures on proteins are usually measured by comparing homologous nucleotide sequences (Zuckerkandl and Pauling 1965). Recently we introduced a novel method, termed volatility, to estimate selection pressures on proteins on the basis of their synonymous codon usage (Plotkin and Dushoff 2003; Plotkin et al. 2004). Here we provide a theoretical foundation for this approach. Under the Fisher-Wright model, we derive the expected frequencies of synonymous codons as a function of the strength of selection on amino acids, the mutation rate, and the effective population size. We analyze the conditions under which we can expect to draw inferences from biased codon usage, and we estimate the time scales required to establish and maintain such a signal. We find that synonymous codon usage can reliably distinguish between negative selection and neutrality only for organisms, such as some microbes, that experience large effective population sizes or periods of elevated mutation rates. The power of volatility to detect positive selection is also modest—requiring approximately 100 selected sites—but it depends less strongly on population size. We show that phenomena such as transient hyper-mutators can improve the power of volatility to detect selection, even when the neutral site heterozygosity is low. We also discuss several confounding factors, neglected by the Fisher-Wright model, that may limit the applicability of volatility in practice.


Codon usage Selection Fisher-Wright Diffusion Volatility Protein evolution 



We thank Daniel Fisher, Andrew Murray, and Michael Turelli for their input during the preparation of the manuscript. We also thank an anonymous referee for substantial conceptual input. J.B.P. acknowledges support from the Harvard Society of Fellows, the Milton Fund, and the Burroughs Wellcome Fund. M.M.D. acknowledges support from a Merck Award for Genome-Related Research.

Supplementary material

supp.pdf (59 kb)
Supplementary material


  1. Akashi H (2001) Gene expression and molecular evolution. Curr Opin Genet Dev 11:660–666PubMedCrossRefGoogle Scholar
  2. Akashi H, Schaeffer SW (1997) Natural selection and the frequency distribution of “silent” DNA polymorphism in Drosophila. Genetics 146:295–307PubMedGoogle Scholar
  3. Anisimova M, Bielawski JP, Yang Z (2001) The accuracy and power of likelihood ratio tests to detect positive selection at amino acid sites. Mol Biol Evol 18:1585–1592PubMedGoogle Scholar
  4. Berg O (1996) Selection intensity for codon bias and the effective population size of Escherichia coli. Genetics 142:1379–1382PubMedGoogle Scholar
  5. Bjedov I, Tenaillon O, Gerard B, et al. (2003) Stress-induced mutagenesis in bacteria. Science 300:1404–1409PubMedCrossRefGoogle Scholar
  6. Bulmer M (1991) The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907PubMedGoogle Scholar
  7. Bustamante CD, Wakely J, Sawyer S, Hartl DL (2001) Directional selection and the site-frequency spectrum. Genetics 159:1779–1788PubMedGoogle Scholar
  8. Chamary JV, Parmley JL, Hurst LD (2006) Hearing silence: non-neutral evolution at synonymous sites in mammals. Nature Rev Genet 7:98–108CrossRefPubMedGoogle Scholar
  9. Charlesworth B, Morgan MT, Charlesworth D (1993) The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289–1303PubMedGoogle Scholar
  10. Clark A, Glanowski S, Nielsen R, Thomas P, Kejariwal A, MA MT, Tanenbaum D, Civello D, Lu F, B BM, Ferriera S, Wang G, Zheng X, White T, Sninsky J, Adams M, Cargill M (2003) Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302:1960–1963PubMedCrossRefGoogle Scholar
  11. Coghlan A, Wolfe KH (2000) Relationship of codon bias to mRNA concentration and protein length in saccharomyces cerevisiae. Yeast 16:1131–1145PubMedCrossRefGoogle Scholar
  12. Crow JF, Kimura M (1970) An introduction to population genetics theory.Burgess, MinneapolisGoogle Scholar
  13. Dagan T, Graur D (2004) The comparative method rules! codon volatility cannot detect positive Darwinian selection using a single genome sequence. Mol Biol Evol 22:496–500PubMedCrossRefGoogle Scholar
  14. Debry R, Marzluff WF (1994) Selection on silent sites in the rodent H3 historic gene family. Genetics 138:191–202PubMedGoogle Scholar
  15. Denamur E, Lecointre G, Darlu P, OTenaillon CA, Sayada C, Sunjevaric I (2000) Evolutionary implications of the frequent horizontal transfer of mismatch repair genes. Cell 103:711–721PubMedCrossRefGoogle Scholar
  16. Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH (2005) Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 102:14338–14343PubMedCrossRefGoogle Scholar
  17. Drummond DA, Raval A, Wilke CO (2006) A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23:327–337PubMedCrossRefGoogle Scholar
  18. Ewens W (2004) Mathematical populations genetics I. Springer-Verlag, New YorkGoogle Scholar
  19. Fleischmann RD, Alland D, Eisen JA, Carpenter L, White O, Peterson J, DeBoy R, Dodson R, Gwinn M, Haft D, Hickey E, Kolonay JF, Nelson WC, Umayam LA, Ermolaeva M, Salzberg SL, Delcher A, Utterback T, Weidman J, Khouri H, Gill J, Mikula A, Bishai W, Jacobs WR, Venter JC, Fraser CM (2002) Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol 184:5479–5490PubMedCrossRefGoogle Scholar
  20. Friedman R, Hughes AL (2005) Codon volatility as an indicator of positive selection: Data from eukaryotic genome comparisons. Mol Biol Evol 22:542–543PubMedCrossRefGoogle Scholar
  21. Gibbons RJ, Kapsimalis B (1967) Estimates of the overall rate of growth of the intenstinal microflora for hamsters, Guinea pigs, and mice J Bacteriol 93:510–512PubMedGoogle Scholar
  22. Gillespie J (2001) Is the population size of a species relevant to its evolution? Evolution 55:2161–2169PubMedCrossRefGoogle Scholar
  23. Giraud A, Radman M, Matic I, Taddei F (2001) The rise and fall of mutator bacteria. Curr Opin Microbiol 4:582–585PubMedCrossRefGoogle Scholar
  24. Golding GB, Strobeck C (1982) Expected frequencies of codon use as a function of mutation rates and codon fitnesses. J Mol Evol 18:379–386PubMedCrossRefGoogle Scholar
  25. Goldman N, Yang Z (1994) Codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736PubMedGoogle Scholar
  26. Hahn MW, Mezey JG, Begun DJ, Gillespie JH, Kern AD, Langley CH, Moyle LC (2005) Codon bias and selection on signle genomes. Nature 433:E1CrossRefGoogle Scholar
  27. Hartl DL, Sawyer SA (1994) Selection intensity for codon bias. Genetics 138:227–234PubMedGoogle Scholar
  28. Higgs P (1994) Error thresholds and stationary mutant distributions in multi-locus diploid genetics models. Genet Res Cambr 63:63–78Google Scholar
  29. Hirsh AE, Fraser HB, Wall DP (2005) Adjusting for selection on synonymous sites in estimates of evolutionary distance. Mol Biol Evol 22:174–177PubMedCrossRefGoogle Scholar
  30. Ikemura T (1981) Correlation between the abundance of Escherichia coli transfer-RNAs and the occurrence of the respective codons in its protein. J Mol Biol 146:1–21PubMedCrossRefGoogle Scholar
  31. Kellis M, Patterson N, Endrizzi M, Birren B, Lander E (2003) Sequencing and comparison of yeast species to identify genes arid regulatory elements. Nature 423:241–254PubMedCrossRefGoogle Scholar
  32. Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49:725–738PubMedGoogle Scholar
  33. King JL, Jukes TH (1969) Non-Darwinian evolution. Science 164:788PubMedCrossRefGoogle Scholar
  34. Konopka AJ (1985) Theory of degenerate coding and informational parameters of protein coding genes. Biochimie 67:455–468PubMedGoogle Scholar
  35. Kreitman M (2000) Methods to detect selection in populations with applications to the human Annu Rev Genomics. Hum Genet 1:539–559Google Scholar
  36. LeClerc -J, Li B, Payne WL, Cebula TA (1996) High mutation frequencies among Escherichia coli and salmonella pathogens. Science 274:1208–1211PubMedCrossRefGoogle Scholar
  37. Li WH (1993) Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol 36:96–99PubMedCrossRefGoogle Scholar
  38. Lynch M, Conery JS (2003) The origins of genome complexity. Science 302:1401–4404PubMedCrossRefGoogle Scholar
  39. Maynard Smith J, Haigh J (1974) The hitch-hiking effect of a favorable gene. Genet Res Cambr 23:23–25CrossRefGoogle Scholar
  40. McDonald JH, Kreitman M (1991) Adaptive protein evolution at the ADH locus in Drosophila. Nature 351:652–654PubMedCrossRefGoogle Scholar
  41. Miyata T, Miyazawa S, Yashunaga T (1979) Two types of amino acid substitutions in protein eolution. J. Mol Evol 12:219–236PubMedCrossRefGoogle Scholar
  42. Myers LA, Ancel FD, Lachmann M (2005) Evolution of genetic potential. PloS Comput Biol 1:236–243Google Scholar
  43. Nagylaki T (1992) Introduction to theoretical population genetics. Springer, BerlinGoogle Scholar
  44. Nielsen R, Hubisz M (2005) Detecting selection needs comparative data. Nature 433:E6PubMedCrossRefGoogle Scholar
  45. Notley-McRobb L, Seeto S, Ferenci T (2001) Enrichment and elimination of mutY mutators in Escherichia coli populations. Genetics 162:1955–1062Google Scholar
  46. Ochman H, Elwyn S, Moran NA (1999) Calibrating bacterial evolution. Proc Natl Acad Sci USA 96:12638–12643PubMedCrossRefGoogle Scholar
  47. Oliver A, Canton R, Campo P, Baquero F, arid Blazquez J (2000) High frequency of hypermutable Pseudornonas aeruginosa in cystic fibrosis lung infection. Science 288:1251–1254PubMedCrossRefGoogle Scholar
  48. Pal C, Papp B, Hurst LD (2001) Highly expressed genes in yeast evolve slowly. Genetics 158:927–931PubMedGoogle Scholar
  49. Plotkin JB, Dushoff J (2003) Codon bias and frequency-dependent selection on the hemagglutinin epitopes of Influenza A virus. Proc Natl Acad Sci USA 100:7152–7157PubMedCrossRefGoogle Scholar
  50. Plotkin JB, Dushoff J, Fraser HB (2004) Detecting selection using a single genome sequence of M. tuberculosis and P falciparum. Nature 248:942–946CrossRefGoogle Scholar
  51. Plotkin JB, Dushoff J, Fraser HB (2005) Codon bias and selection on single genomes: reply. Nature 433:E7–E8CrossRefGoogle Scholar
  52. Plotkin JB, Fraser HB, Dushoff J (2006) Natural selection on the genome of Saccharomyces cerevisiae (in preparation)Google Scholar
  53. Sanjuan R, Moya A, Elena S (2004) The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. Proc Natl Acad Sci USA 101:8396–8401PubMedCrossRefGoogle Scholar
  54. Sawyer SA, Hartl DL (1992) Population genetics of polymorphism and divergence. Genetics 132:1161–1176PubMedGoogle Scholar
  55. Sharp PM (2005) Gene “volatility” is most unlikely to reveal adaptation. Mol Biol Evol 22:807–809PubMedCrossRefGoogle Scholar
  56. Sharp PM, Li WH (1987) The codon adaptation index: A measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295PubMedGoogle Scholar
  57. Simonsen KL, Churchill GA, Aquadro CF (1995) Poperties of statistical tests of neutrality for DNA polymorphism data. Genetics 141:413–429PubMedGoogle Scholar
  58. Sorensen M, Kurland C, Pedersen S (1989) Codon usage determines translation rate in Escherichia coli. J Mol Biol 207:365–377PubMedCrossRefGoogle Scholar
  59. Stoletzki N, Welch J, Hermisson J, Eyre-Walker A (2005) A dissection of volatility in yeast. Mol Biol Evol 22:2022–2026PubMedCrossRefGoogle Scholar
  60. Tajima F (1996) The amount of DNA polymorphism maintained in a finite population when the neutral mutation rate varies among sites. Genetics 143:1457–1465PubMedGoogle Scholar
  61. Tang H, Wyckoff GJ, Lu J, Wu C (2004) Universal evolutionary index for amino acid changes. Mol Biol Evol 21:1548–1556PubMedCrossRefGoogle Scholar
  62. Tenaillon O, Denamur E, Matic I (2004) Evolutionary significance of stressinduced mutagenesis in bacteria. Trends Microbiol 12:264–270PubMedCrossRefGoogle Scholar
  63. Thompson CJ, McBride JL (1973) On Eigen’s theory of the self-organization of matter and the evolution of biological macromolecules. Math Biosci 21:127–142Google Scholar
  64. van Nimwegen E, Crutchfield J, Huynen M (1999) Neutral evolution of mutational robustness. Proc Natl Acad Sci USA 96:9716–9820PubMedCrossRefGoogle Scholar
  65. Wertman K, Drubin D, Botstein D (1992) Systematic mutational analysis of the yeast ACT1 gene. Genetics 132:337–350PubMedGoogle Scholar
  66. Wilke C (2001) Adaptive evolution on neutral networks. Bull Math Biol 63:715–130PubMedCrossRefGoogle Scholar
  67. Winter G, Kawai S, Haeggstrom M, Kaneko O, vonEuler A, Kawazu S, Palm D, Fernandez V, Walgren M (2005) SURFIN is a polymorphic antigen expression on plasmodium falciparum merozoites and infected erythrocytes. J Exp Med 20l:1853–1863CrossRefGoogle Scholar
  68. Wlocha DM, Szafranieca K, Bortsb RH, Korona R (2001) Direct estimate of the mutation rate and the distribution of fitness effects in the yeast Saccharomyces cerevisiae. Genetics 159:441–452Google Scholar
  69. Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159PubMedGoogle Scholar
  70. Yampolsky LY, Stoltzfus A (2004) The exchangability of amino acids in proteins. Genetics 170:1459–1472CrossRefGoogle Scholar
  71. Yang Z (2000) Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J Mol Evol 51:423–432PubMedGoogle Scholar
  72. Yang Z, Nielsen R, Goldman N, Pedersen AMK (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449PubMedGoogle Scholar
  73. Zhang J (2004) On the evolution of codon volatility. Genetics 16S:495–501CrossRefGoogle Scholar
  74. Zuckerkandl E, Pauling L (1965) Molecules as documents of evolutionary history. J Theor Biol 8:357–366PubMedCrossRefGoogle Scholar
  75. Zyel C, DeVisser J (2001) Estimates of the rate and distribution of fitness effects of spontaneous mutation in Saccharomyces cerevisiae. Genetics 157:53–61Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Joshua B. Plotkin
    • 1
    Email author
  • Jonathan Dushoff
    • 2
  • Michael M. Desai
    • 3
  • Hunter B. Fraser
    • 4
  1. 1.Department of BiologyUniversity of PennsylvaniaPhiladelphiaUSA
  2. 2.Department of Ecology and Evolutionary BiologyPrinceton UniversityPrincetonUSA
  3. 3.Department of Molecular and Cellular Biology and Department of PhysicsHarvard UniversityCambridgeUSA
  4. 4.Department of Molecular and Cellular BiologyUniversity of California, BerkeleyBerkeleyUSA

Personalised recommendations