Journal of Molecular Evolution

, Volume 66, Issue 4, pp 395–404 | Cite as

Contact Density Affects Protein Evolutionary Rate from Bacteria to Animals

  • Tong Zhou
  • D. Allan Drummond
  • Claus O. Wilke


The density of contacts or the fraction of buried sites in a protein structure is thought to be related to a protein’s designability, and genes encoding more designable proteins should evolve faster than other genes. Several recent studies have tested this hypothesis but have found conflicting results. Here, we investigate how a gene’s evolutionary rate is affected by its protein’s contact density, considering the four species Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens. We find for all four species that contact density correlates positively with evolutionary rate, and that these correlations do not seem to be confounded by gene expression level. The strength of this signal, however, varies widely among species. We also study the effect of contact density on domain evolution in multidomain proteins and find that a domain’s contact density influences the domain’s evolutionary rate. Within the same protein, a domain with higher contact density tends to evolve faster than a domain with lower contact density. Our study provides evidence that contact density can increase evolutionary rates, and that it acts similarly on the level of entire proteins and of individual protein domains.


Designability Protein structure Evolutionary rate Protein evolution Domain Principal component regression 



This work was supported by NIH Grant AI 065960. D.A.D. received support through an NIH center grant to the FAS Center for Systems Biology. We would like to thank Jesse Bloom for helpful comments on this work.

Supplementary material

239_2008_9094_MOESM1_ESM.pdf (32 kb)
(PDF 32 kb)


  1. Agrafioti I, Swire J, Abbott J, Huntley D, Butcher S, Stumpf MPH (2005) Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks. BMC Evol Biol 5:23PubMedCrossRefGoogle Scholar
  2. Appelgren H, Kniola B, Ekwall K (2003) Distinct centromere domain structures with separate functions demonstrated in live fission yeast cells. J Cell Sci 116:4035–4042PubMedCrossRefGoogle Scholar
  3. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy tat Soc B 57:289–300Google Scholar
  4. Bloom JD, Adami C (2003) Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evol Biol 3:21PubMedCrossRefGoogle Scholar
  5. Bloom JD, Drummond DA, Arnold FH, Wilke CO (2006) Structural determinants of the rate of protein evolution in yeast. Mol Biol Evol 23:1751–1761PubMedCrossRefGoogle Scholar
  6. Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C, Arnold FH (2005) Thermodynamic prediction of protein neutrality. Proc Natl Acad Sci USA 102:606–611PubMedCrossRefGoogle Scholar
  7. Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO (2004) Integrating high-throughput and computational data elucidates bacterial networks. Nature 429:92–96PubMedCrossRefGoogle Scholar
  8. Creighton TE (1992) Proteins: structures and molecular properties, 2nd edn. Freeman, New YorkGoogle Scholar
  9. Dean AM, Neuhauser C, Grenier E, Golding GB (2002) The pattern of amino acid replacements in α\β-barrels. Mol Biol Evol 19:1846–1864PubMedGoogle Scholar
  10. Dietmann S, Park J, Notredame C, Heger A, Lappe M, Holm L (2001) A fully automatic evolutionary classification of protein folds: Dali domain dictionary version 3. Nucleic Acids Res 29:55–57PubMedCrossRefGoogle Scholar
  11. Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH (2005) Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 102:14338–14343PubMedCrossRefGoogle Scholar
  12. Drummond DA, Raval A, Wilke CO (2006) A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23:327–337PubMedCrossRefGoogle Scholar
  13. Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA 96:4482–4487PubMedCrossRefGoogle Scholar
  14. Duret L, Mouchiroud D (2000) Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol 17:68–74PubMedGoogle Scholar
  15. Edgar RC (2004) Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797PubMedCrossRefGoogle Scholar
  16. England JL, Shakhnovich EI (2003) Structural determinant of protein designability. Phys Rev Lett 90:218101PubMedCrossRefGoogle Scholar
  17. England JL, Shakhnovich BE, Shakhnovich EI (2003) Natural selection of more designable folds: a mechanism for thermophilic adaptation. Proc Natl Acad Sci USA 100:8727–8731PubMedCrossRefGoogle Scholar
  18. Fraser HB (2005) Modularity and evolutionary constraint on proteins. Nature Genet 37:351–352PubMedCrossRefGoogle Scholar
  19. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate in the protein interaction network. Science 296:750–752PubMedCrossRefGoogle Scholar
  20. Goldman N, Thorne JL, Jones DT (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149:445–458PubMedGoogle Scholar
  21. Gu W, Zhou T, Ma J, Sun X, Lu Z (2004) The relationship between synonymous codon usage and protein structure in Escherichia coli and Homo sapiens. Biosystems 73:89–97PubMedCrossRefGoogle Scholar
  22. Hahn MW, Kern AD (2005) Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol 22:803–806PubMedCrossRefGoogle Scholar
  23. Hakes L, Lovell SC, Oliver SG, Robertson DL (2007) Specificity in protein interactions and its relationship with sequence diversity and coevolution. Proc Natl Acad Sci USA 104:7999–8004PubMedCrossRefGoogle Scholar
  24. Herbeck JT, Wall DP, Wernegreen JJ (2003) Gene expression level influences amino acid usage, but not codon usage, in the tsetse fly endosymbiont Wigglesworthia. Microbiology 149:2585–2596PubMedCrossRefGoogle Scholar
  25. Hirsh AE, Fraser HB (2001) Protein dispensability and rate of evolution. Nature 411:1046–1049PubMedCrossRefGoogle Scholar
  26. Holstege FCP, Jennings E, Wyrick JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA (1998) Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95:717–728PubMedCrossRefGoogle Scholar
  27. Holstein SE, Ungewickell H, Ungewickell E (1996) Mechanism of clathrin basket dissociation: separate functions of protein domains of the DnaJ homologue auxilin. J Cell Biol 135:925–937PubMedCrossRefGoogle Scholar
  28. Hurst LD, Smith NGC (1999) Do essential genes evolve slowly? Curr Biol 9:747–750PubMedCrossRefGoogle Scholar
  29. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314CrossRefGoogle Scholar
  30. Jordan IK, Rogozin IB, Wolf YI, Koonin EV (2002) Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res 12:962–968PubMedCrossRefGoogle Scholar
  31. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637PubMedCrossRefGoogle Scholar
  32. Kawabata T, Fukuchi S, Homma K, Ota M, Araki J, Ito T, Ichiyoshi N, Nishikawa K (2002) Gtop: a database of protein structures predicted from genome sequence. Nucleic Acids Res 30:294–298PubMedCrossRefGoogle Scholar
  33. Kim PM, Lu LJ, Gerstein MB (2006) Relating three-dimensional structures to protein networks provides evolutionary insights. Science 314:1882–1883CrossRefGoogle Scholar
  34. Koshi JM, Goldstein RA (1995) Context-dependent optimal substitution matrices. Protein Eng 8:641–645PubMedCrossRefGoogle Scholar
  35. Kussell E (2005) The designability hypothesis and protein evolution. Protein Peptide Lett 12:111–116CrossRefGoogle Scholar
  36. Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL (2005) Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol 22:1345–1354PubMedCrossRefGoogle Scholar
  37. Li H, Helling R, Tang C, Wingreen N (1996) Emergence of preferred structures in a simple model of protein folding. Science 273:666–669PubMedCrossRefGoogle Scholar
  38. Lin YS, Hsu WL, Hwang JK, Li WH (2007) Proportion of solvent-exposed amino acids in a protein and rate of protein evolution. Mol Biol Evol 24:1005–1011PubMedCrossRefGoogle Scholar
  39. Mandel J (1982) Use of the singular value decomposition in regression analysis. Am Stat 36:15–24CrossRefGoogle Scholar
  40. Marais G, Duret L (2001) Synonymous codon usage, accuracy of translation, and gene length in Caenorhabditis elegans. J Mol Evol 52:275–280PubMedGoogle Scholar
  41. Meyerguz L, Kleinberg J, Elber R (2007) The network of sequence flow between protein structures. Proc Natl Acad Sci USA 104:11627–11632PubMedCrossRefGoogle Scholar
  42. Mintseris J, Weng Z (2005) Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA 102:10930–10935PubMedCrossRefGoogle Scholar
  43. Mirny LA, Shakhnovich EI (1999) Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 291:177–196PubMedCrossRefGoogle Scholar
  44. Orešič M, Shalloway D (1998) Specific correlations between relative synonymous codon usage and protein secondary structure. J Mol Biol 281:31–48PubMedCrossRefGoogle Scholar
  45. Pal C, Papp B, Hurst LD (2001) Highly expressed genes in yeast evolve slowly. Genetics 158:927–931PubMedGoogle Scholar
  46. Pal C, Papp B, Hurst LD (2003) Rate of evolution and gene dispensability. Nature 421:496–497PubMedCrossRefGoogle Scholar
  47. Pal C, Papp B, Lercher MJ (2006) An integrated view of protein evolution. Nat Rev Genet 7:337–348PubMedCrossRefGoogle Scholar
  48. Ren M, Villamarin A, Shih A, Coutavas E, Moore MS, LoCurcio M, Clarke V, Oppenheim JD, D’Eustachio P, Rush MG (1995) Separate domains of the Ran GTPase interact with different factors to regulate nuclear protein import and RNA processing. Mol Cell Biol 15:2117–2124PubMedGoogle Scholar
  49. Rocha EPC, Danchin A (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21:108–116PubMedCrossRefGoogle Scholar
  50. Shakhnovich BE (2006) Relative contributions of structural designability and functional diversity in molecular evolution of duplicates. Bioinformatics 22:e440–e445PubMedCrossRefGoogle Scholar
  51. Shakhnovich BE, Deeds E, Delisi C, Shakhnovich E (2005) Protein structure and evolutionary history determine sequence space topology. Genome Res 15:385–392PubMedCrossRefGoogle Scholar
  52. Shakhnovich EI (1998) Protein design: a perspective from simple tractable models. Fold Des 3:R45–R58CrossRefGoogle Scholar
  53. Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE, Bussemaker HJ, White KP (2004) A gene expression map for the euchromatic genome of Drosophila melanogaster. Science 306:655–660PubMedCrossRefGoogle Scholar
  54. Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101:6062–6067PubMedCrossRefGoogle Scholar
  55. Subramanian S, Kumar S (2004) Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168:373–381PubMedCrossRefGoogle Scholar
  56. Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW (2005) Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA 102:5483–5488PubMedCrossRefGoogle Scholar
  57. Wolynes PG (1996) Symmetry and the energy landscapes of biomolecules. Proc Natl Acad Sci USA 93:14249–14255PubMedCrossRefGoogle Scholar
  58. Yang ZH (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13:555–556PubMedGoogle Scholar
  59. Zhang J, He X (2005) Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol Biol Evol 22:1147–1155PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Tong Zhou
    • 1
  • D. Allan Drummond
    • 2
  • Claus O. Wilke
    • 1
    • 3
  1. 1.Center for Computational Biology and Bioinformatics, Section of Integrative BiologyUniversity of Texas at AustinAustinUSA
  2. 2.FAS Center for Systems BiologyHarvard UniversityCambridgeUSA
  3. 3.Institute for Cell and Molecular BiologyUniversity of Texas at AustinAustinUSA

Personalised recommendations