Codon Usage and Selection on Proteins
- 206 Downloads
- 19 Citations
Abstract
Selection pressures on proteins are usually measured by comparing homologous nucleotide sequences (Zuckerkandl and Pauling 1965). Recently we introduced a novel method, termed volatility, to estimate selection pressures on proteins on the basis of their synonymous codon usage (Plotkin and Dushoff 2003; Plotkin et al. 2004). Here we provide a theoretical foundation for this approach. Under the Fisher-Wright model, we derive the expected frequencies of synonymous codons as a function of the strength of selection on amino acids, the mutation rate, and the effective population size. We analyze the conditions under which we can expect to draw inferences from biased codon usage, and we estimate the time scales required to establish and maintain such a signal. We find that synonymous codon usage can reliably distinguish between negative selection and neutrality only for organisms, such as some microbes, that experience large effective population sizes or periods of elevated mutation rates. The power of volatility to detect positive selection is also modest—requiring approximately 100 selected sites—but it depends less strongly on population size. We show that phenomena such as transient hyper-mutators can improve the power of volatility to detect selection, even when the neutral site heterozygosity is low. We also discuss several confounding factors, neglected by the Fisher-Wright model, that may limit the applicability of volatility in practice.
Keywords
Codon usage Selection Fisher-Wright Diffusion Volatility Protein evolutionNotes
Acknowledgments
We thank Daniel Fisher, Andrew Murray, and Michael Turelli for their input during the preparation of the manuscript. We also thank an anonymous referee for substantial conceptual input. J.B.P. acknowledges support from the Harvard Society of Fellows, the Milton Fund, and the Burroughs Wellcome Fund. M.M.D. acknowledges support from a Merck Award for Genome-Related Research.
Supplementary material
References
- Akashi H (2001) Gene expression and molecular evolution. Curr Opin Genet Dev 11:660–666PubMedCrossRefGoogle Scholar
- Akashi H, Schaeffer SW (1997) Natural selection and the frequency distribution of “silent” DNA polymorphism in Drosophila. Genetics 146:295–307PubMedGoogle Scholar
- Anisimova M, Bielawski JP, Yang Z (2001) The accuracy and power of likelihood ratio tests to detect positive selection at amino acid sites. Mol Biol Evol 18:1585–1592PubMedGoogle Scholar
- Berg O (1996) Selection intensity for codon bias and the effective population size of Escherichia coli. Genetics 142:1379–1382PubMedGoogle Scholar
- Bjedov I, Tenaillon O, Gerard B, et al. (2003) Stress-induced mutagenesis in bacteria. Science 300:1404–1409PubMedCrossRefGoogle Scholar
- Bulmer M (1991) The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907PubMedGoogle Scholar
- Bustamante CD, Wakely J, Sawyer S, Hartl DL (2001) Directional selection and the site-frequency spectrum. Genetics 159:1779–1788PubMedGoogle Scholar
- Chamary JV, Parmley JL, Hurst LD (2006) Hearing silence: non-neutral evolution at synonymous sites in mammals. Nature Rev Genet 7:98–108CrossRefPubMedGoogle Scholar
- Charlesworth B, Morgan MT, Charlesworth D (1993) The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289–1303PubMedGoogle Scholar
- Clark A, Glanowski S, Nielsen R, Thomas P, Kejariwal A, MA MT, Tanenbaum D, Civello D, Lu F, B BM, Ferriera S, Wang G, Zheng X, White T, Sninsky J, Adams M, Cargill M (2003) Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302:1960–1963PubMedCrossRefGoogle Scholar
- Coghlan A, Wolfe KH (2000) Relationship of codon bias to mRNA concentration and protein length in saccharomyces cerevisiae. Yeast 16:1131–1145PubMedCrossRefGoogle Scholar
- Crow JF, Kimura M (1970) An introduction to population genetics theory.Burgess, MinneapolisGoogle Scholar
- Dagan T, Graur D (2004) The comparative method rules! codon volatility cannot detect positive Darwinian selection using a single genome sequence. Mol Biol Evol 22:496–500PubMedCrossRefGoogle Scholar
- Debry R, Marzluff WF (1994) Selection on silent sites in the rodent H3 historic gene family. Genetics 138:191–202PubMedGoogle Scholar
- Denamur E, Lecointre G, Darlu P, OTenaillon CA, Sayada C, Sunjevaric I (2000) Evolutionary implications of the frequent horizontal transfer of mismatch repair genes. Cell 103:711–721PubMedCrossRefGoogle Scholar
- Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH (2005) Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 102:14338–14343PubMedCrossRefGoogle Scholar
- Drummond DA, Raval A, Wilke CO (2006) A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23:327–337PubMedCrossRefGoogle Scholar
- Ewens W (2004) Mathematical populations genetics I. Springer-Verlag, New YorkGoogle Scholar
- Fleischmann RD, Alland D, Eisen JA, Carpenter L, White O, Peterson J, DeBoy R, Dodson R, Gwinn M, Haft D, Hickey E, Kolonay JF, Nelson WC, Umayam LA, Ermolaeva M, Salzberg SL, Delcher A, Utterback T, Weidman J, Khouri H, Gill J, Mikula A, Bishai W, Jacobs WR, Venter JC, Fraser CM (2002) Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol 184:5479–5490PubMedCrossRefGoogle Scholar
- Friedman R, Hughes AL (2005) Codon volatility as an indicator of positive selection: Data from eukaryotic genome comparisons. Mol Biol Evol 22:542–543PubMedCrossRefGoogle Scholar
- Gibbons RJ, Kapsimalis B (1967) Estimates of the overall rate of growth of the intenstinal microflora for hamsters, Guinea pigs, and mice J Bacteriol 93:510–512PubMedGoogle Scholar
- Gillespie J (2001) Is the population size of a species relevant to its evolution? Evolution 55:2161–2169PubMedCrossRefGoogle Scholar
- Giraud A, Radman M, Matic I, Taddei F (2001) The rise and fall of mutator bacteria. Curr Opin Microbiol 4:582–585PubMedCrossRefGoogle Scholar
- Golding GB, Strobeck C (1982) Expected frequencies of codon use as a function of mutation rates and codon fitnesses. J Mol Evol 18:379–386PubMedCrossRefGoogle Scholar
- Goldman N, Yang Z (1994) Codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736PubMedGoogle Scholar
- Hahn MW, Mezey JG, Begun DJ, Gillespie JH, Kern AD, Langley CH, Moyle LC (2005) Codon bias and selection on signle genomes. Nature 433:E1CrossRefGoogle Scholar
- Hartl DL, Sawyer SA (1994) Selection intensity for codon bias. Genetics 138:227–234PubMedGoogle Scholar
- Higgs P (1994) Error thresholds and stationary mutant distributions in multi-locus diploid genetics models. Genet Res Cambr 63:63–78Google Scholar
- Hirsh AE, Fraser HB, Wall DP (2005) Adjusting for selection on synonymous sites in estimates of evolutionary distance. Mol Biol Evol 22:174–177PubMedCrossRefGoogle Scholar
- Ikemura T (1981) Correlation between the abundance of Escherichia coli transfer-RNAs and the occurrence of the respective codons in its protein. J Mol Biol 146:1–21PubMedCrossRefGoogle Scholar
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander E (2003) Sequencing and comparison of yeast species to identify genes arid regulatory elements. Nature 423:241–254PubMedCrossRefGoogle Scholar
- Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49:725–738PubMedGoogle Scholar
- King JL, Jukes TH (1969) Non-Darwinian evolution. Science 164:788PubMedCrossRefGoogle Scholar
- Konopka AJ (1985) Theory of degenerate coding and informational parameters of protein coding genes. Biochimie 67:455–468PubMedGoogle Scholar
- Kreitman M (2000) Methods to detect selection in populations with applications to the human Annu Rev Genomics. Hum Genet 1:539–559Google Scholar
- LeClerc -J, Li B, Payne WL, Cebula TA (1996) High mutation frequencies among Escherichia coli and salmonella pathogens. Science 274:1208–1211PubMedCrossRefGoogle Scholar
- Li WH (1993) Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol 36:96–99PubMedCrossRefGoogle Scholar
- Lynch M, Conery JS (2003) The origins of genome complexity. Science 302:1401–4404PubMedCrossRefGoogle Scholar
- Maynard Smith J, Haigh J (1974) The hitch-hiking effect of a favorable gene. Genet Res Cambr 23:23–25CrossRefGoogle Scholar
- McDonald JH, Kreitman M (1991) Adaptive protein evolution at the ADH locus in Drosophila. Nature 351:652–654PubMedCrossRefGoogle Scholar
- Miyata T, Miyazawa S, Yashunaga T (1979) Two types of amino acid substitutions in protein eolution. J. Mol Evol 12:219–236PubMedCrossRefGoogle Scholar
- Myers LA, Ancel FD, Lachmann M (2005) Evolution of genetic potential. PloS Comput Biol 1:236–243Google Scholar
- Nagylaki T (1992) Introduction to theoretical population genetics. Springer, BerlinGoogle Scholar
- Nielsen R, Hubisz M (2005) Detecting selection needs comparative data. Nature 433:E6PubMedCrossRefGoogle Scholar
- Notley-McRobb L, Seeto S, Ferenci T (2001) Enrichment and elimination of mutY mutators in Escherichia coli populations. Genetics 162:1955–1062Google Scholar
- Ochman H, Elwyn S, Moran NA (1999) Calibrating bacterial evolution. Proc Natl Acad Sci USA 96:12638–12643PubMedCrossRefGoogle Scholar
- Oliver A, Canton R, Campo P, Baquero F, arid Blazquez J (2000) High frequency of hypermutable Pseudornonas aeruginosa in cystic fibrosis lung infection. Science 288:1251–1254PubMedCrossRefGoogle Scholar
- Pal C, Papp B, Hurst LD (2001) Highly expressed genes in yeast evolve slowly. Genetics 158:927–931PubMedGoogle Scholar
- Plotkin JB, Dushoff J (2003) Codon bias and frequency-dependent selection on the hemagglutinin epitopes of Influenza A virus. Proc Natl Acad Sci USA 100:7152–7157PubMedCrossRefGoogle Scholar
- Plotkin JB, Dushoff J, Fraser HB (2004) Detecting selection using a single genome sequence of M. tuberculosis and P falciparum. Nature 248:942–946CrossRefGoogle Scholar
- Plotkin JB, Dushoff J, Fraser HB (2005) Codon bias and selection on single genomes: reply. Nature 433:E7–E8CrossRefGoogle Scholar
- Plotkin JB, Fraser HB, Dushoff J (2006) Natural selection on the genome of Saccharomyces cerevisiae (in preparation)Google Scholar
- Sanjuan R, Moya A, Elena S (2004) The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. Proc Natl Acad Sci USA 101:8396–8401PubMedCrossRefGoogle Scholar
- Sawyer SA, Hartl DL (1992) Population genetics of polymorphism and divergence. Genetics 132:1161–1176PubMedGoogle Scholar
- Sharp PM (2005) Gene “volatility” is most unlikely to reveal adaptation. Mol Biol Evol 22:807–809PubMedCrossRefGoogle Scholar
- Sharp PM, Li WH (1987) The codon adaptation index: A measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295PubMedGoogle Scholar
- Simonsen KL, Churchill GA, Aquadro CF (1995) Poperties of statistical tests of neutrality for DNA polymorphism data. Genetics 141:413–429PubMedGoogle Scholar
- Sorensen M, Kurland C, Pedersen S (1989) Codon usage determines translation rate in Escherichia coli. J Mol Biol 207:365–377PubMedCrossRefGoogle Scholar
- Stoletzki N, Welch J, Hermisson J, Eyre-Walker A (2005) A dissection of volatility in yeast. Mol Biol Evol 22:2022–2026PubMedCrossRefGoogle Scholar
- Tajima F (1996) The amount of DNA polymorphism maintained in a finite population when the neutral mutation rate varies among sites. Genetics 143:1457–1465PubMedGoogle Scholar
- Tang H, Wyckoff GJ, Lu J, Wu C (2004) Universal evolutionary index for amino acid changes. Mol Biol Evol 21:1548–1556PubMedCrossRefGoogle Scholar
- Tenaillon O, Denamur E, Matic I (2004) Evolutionary significance of stressinduced mutagenesis in bacteria. Trends Microbiol 12:264–270PubMedCrossRefGoogle Scholar
- Thompson CJ, McBride JL (1973) On Eigen’s theory of the self-organization of matter and the evolution of biological macromolecules. Math Biosci 21:127–142Google Scholar
- van Nimwegen E, Crutchfield J, Huynen M (1999) Neutral evolution of mutational robustness. Proc Natl Acad Sci USA 96:9716–9820PubMedCrossRefGoogle Scholar
- Wertman K, Drubin D, Botstein D (1992) Systematic mutational analysis of the yeast ACT1 gene. Genetics 132:337–350PubMedGoogle Scholar
- Wilke C (2001) Adaptive evolution on neutral networks. Bull Math Biol 63:715–130PubMedCrossRefGoogle Scholar
- Winter G, Kawai S, Haeggstrom M, Kaneko O, vonEuler A, Kawazu S, Palm D, Fernandez V, Walgren M (2005) SURFIN is a polymorphic antigen expression on plasmodium falciparum merozoites and infected erythrocytes. J Exp Med 20l:1853–1863CrossRefGoogle Scholar
- Wlocha DM, Szafranieca K, Bortsb RH, Korona R (2001) Direct estimate of the mutation rate and the distribution of fitness effects in the yeast Saccharomyces cerevisiae. Genetics 159:441–452Google Scholar
- Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159PubMedGoogle Scholar
- Yampolsky LY, Stoltzfus A (2004) The exchangability of amino acids in proteins. Genetics 170:1459–1472CrossRefGoogle Scholar
- Yang Z (2000) Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J Mol Evol 51:423–432PubMedGoogle Scholar
- Yang Z, Nielsen R, Goldman N, Pedersen AMK (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155:431–449PubMedGoogle Scholar
- Zhang J (2004) On the evolution of codon volatility. Genetics 16S:495–501CrossRefGoogle Scholar
- Zuckerkandl E, Pauling L (1965) Molecules as documents of evolutionary history. J Theor Biol 8:357–366PubMedCrossRefGoogle Scholar
- Zyel C, DeVisser J (2001) Estimates of the rate and distribution of fitness effects of spontaneous mutation in Saccharomyces cerevisiae. Genetics 157:53–61Google Scholar