Journal of Molecular Evolution

, Volume 79, Issue 3–4, pp 130–142 | Cite as

Predicting Evolutionary Site Variability from Structure in Viral Proteins: Buriedness, Packing, Flexibility, and Design

  • Amir Shahmoradi
  • Dariya K. Sydykova
  • Stephanie J. Spielman
  • Eleisha L. Jackson
  • Eric T. Dawson
  • Austin G. Meyer
  • Claus O. WilkeEmail author
Original Article


Several recent works have shown that protein structure can predict site-specific evolutionary sequence variation. In particular, sites that are buried and/or have many contacts with other sites in a structure have been shown to evolve more slowly, on average, than surface sites with few contacts. Here, we present a comprehensive study of the extent to which numerous structural properties can predict sequence variation. The quantities we considered include buriedness (as measured by relative solvent accessibility), packing density (as measured by contact number), structural flexibility (as measured by B factors, root-mean-square fluctuations, and variation in dihedral angles), and variability in designed structures. We obtained structural flexibility measures both from molecular dynamics simulations performed on nine non-homologous viral protein structures and from variation in homologous variants of those proteins, where they were available. We obtained measures of variability in designed structures from flexible-backbone design in the Rosetta software. We found that most of the structural properties correlate with site variation in the majority of structures, though the correlations are generally weak (correlation coefficients of 0.1–0.4). Moreover, we found that buriedness and packing density were better predictors of evolutionary variation than structural flexibility. Finally, variability in designed structures was a weaker predictor of evolutionary variability than buriedness or packing density, but it was comparable in its predictive power to the best structural flexibility measures. We conclude that simple measures of buriedness and packing density are better predictors of evolutionary variation than the more complicated predictors obtained from dynamic simulations, ensembles of homologous structures, or computational protein design.


Molecular Dynamic Packing Density Evolutionary Variation Structural Flexibility Molecular Dynamic Trajectory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by NIH Grant R01 GM088344, DTRA Grant HDTRA1-12-C-0007, ARO Grant W911NF-12-1-0390, and the BEACON Center for the Study of Evolution in Action (NSF Cooperative Agreement DBI-0939454). The Texas Advanced Computing Center at UT Austin provided high-performance computing resources.

Supplementary material

239_2014_9644_MOESM1_ESM.pdf (185 kb)
Supplementary material 1 (pdf 184 KB)
239_2014_9644_MOESM2_ESM.pdf (488 kb)
Supplementary material 2 (pdf 488 KB)
239_2014_9644_MOESM3_ESM.pdf (248 kb)
Supplementary material 3 (pdf 248 KB)


  1. Berens P (2009) CircStat: a MATLAB toolbox for circular statistics. J Stat Softw 31:1–21CrossRefGoogle Scholar
  2. Bloom JD, Drummond DA, Arnold FH, Wilke CO (2006) Structural determinants of the rate of protein evolution in yeast. Mol Biol Evol 23:1751–1761CrossRefPubMedGoogle Scholar
  3. Bordner AJ, Mittelmann HD (2014) A new formulation of protein evolutionary models that account for structural constraints. Mol Biol Evol 31:736–749CrossRefPubMedGoogle Scholar
  4. Burger L, van Nimwegen E (2010) Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput Biol 6(e1000):633Google Scholar
  5. Bush RM, Bender CA, Subbarao K, Cox NJ, Fitch WM (1999) Predicting the evolution of human influenza A. Science 286:1921–1925CrossRefPubMedGoogle Scholar
  6. Dean AM, Neuhauser C, Grenier E, Golding GB (2002) The pattern of amino acid replacements in \(\alpha /\beta\)-barrels. Mol Biol Evol 19:1846–1864CrossRefPubMedGoogle Scholar
  7. Dokholyan NV, Shakhnovich EI (2001) Understanding hierarchical protein evolution from first principles. J Mol Biol 312:289–307CrossRefPubMedGoogle Scholar
  8. Drummond DA, Raval A, Wilke CO (2006) A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23:327–337CrossRefPubMedGoogle Scholar
  9. Echave J, Fernández FM (2010) A perturbative view of protein structural variation. Proteins 78:173–180CrossRefPubMedGoogle Scholar
  10. Fleishman SJ, Whitehead TA, Ekiert DC, Dreyfus C, Corn JE, Strauch EM, Wilson IA, Baker D (2011) Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332:816–821PubMedCentralCrossRefPubMedGoogle Scholar
  11. Franzosa EA, Xia Y (2009) Structural determinants of protein evolution are context-sensitive at the residue level. Mol Biol Evol 26:2387–2395CrossRefPubMedGoogle Scholar
  12. Franzosa EA, Xia Y (2012) Independent effects of protein core size and expression on residue-level structure-evolution relationships. PLoS ONE 7(e46):602Google Scholar
  13. Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736PubMedGoogle Scholar
  14. Goldman N, Thorne JL, Jones DT (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149:445–458PubMedCentralPubMedGoogle Scholar
  15. Grant BJ, Rodrigues APC, ElSawy KM, McCammon AJ, Caves LSD (2006) Bio3D: an R package for the comparative analysis of protein structures. Bioinformatics 22:2695–2696CrossRefPubMedGoogle Scholar
  16. Halabi N, Rivoire O, Leibler S, Ranganathan R (2009) Protein sectors: Evolutionary units of three-dimensional structure. Cell 138:774–786PubMedCentralCrossRefPubMedGoogle Scholar
  17. Halle B (2002) Flexibility and packing in proteins. Proc Natl Acad Sci USA 99:1274–1279PubMedCentralCrossRefPubMedGoogle Scholar
  18. Huang TT, del Valle Marcos ML, Hwang JK, Echave J (2014) A mechanistic stress model of protein evolution accounts for site-specific evolutionary rates and their relationship with packing density and flexibility. BMC Evol Biol 14:78PubMedCentralCrossRefPubMedGoogle Scholar
  19. Jackson EL, Ollikainen N, Covert III AW, Kortemme T, Wilke CO (2013) Amino-acid site variability among natural and designed proteins. PeerJ 1:e211Google Scholar
  20. Jones DT, Buchan DWA, Cozzetto D, Pontil M (2014) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Mol Biol Evol 31:736–749CrossRefGoogle Scholar
  21. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML (1983) Comparison of simple potential functions for simulating liquid water. J Chem Phys 79(2):926–935, doi: 10.1063/1.445869,
  22. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637CrossRefPubMedGoogle Scholar
  23. Karplus M, McCammon A (2002) Molecular dynamics simulations of biomolecules. Nature Struct Biol 9:646–652CrossRefPubMedGoogle Scholar
  24. Katoh K, Misawa K, Kuma KI, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl Acids Res 30:3059–3066PubMedCentralCrossRefPubMedGoogle Scholar
  25. Katoh K, Kuma KI, Toh H, Miyata T (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl Acids Res 33:511–518PubMedCentralCrossRefPubMedGoogle Scholar
  26. Kosakovsky Pond SL, Frost SDW, Muse SV (2005) HyPhy: hypothesis testing using phylogenetics. Bioinformatics 21:676–679CrossRefGoogle Scholar
  27. Kryazhimskiy S, Plotkin JB (2008) The population genetics of dN/dS. PLoS Genet 4(e1000):304Google Scholar
  28. Kuhlman B, Dantas G, Ireton G, Gabriele V, Stoddard B (2003) Design of a novel globular protein fold with atomic-level accuracy. Science 302:1364–1368CrossRefPubMedGoogle Scholar
  29. Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew DP, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban YEA, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popović Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P (2011) ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 487:545–574PubMedCentralCrossRefPubMedGoogle Scholar
  30. Liao H, Yeh W, Chiang D, Jernigan RL, Lustig B (2005) Protein sequence entropy is closely related to packing density and hydrophobicity. PEDS 18:59–64PubMedCentralPubMedGoogle Scholar
  31. Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, BornbergBauer E, Colwell LJ, de Koning APJ, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjölander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S (2012) The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 21:769–785PubMedCentralCrossRefPubMedGoogle Scholar
  32. Liu Y, Bahar I (2012) Sequence evolution correlates with structural dynamics. Mol Biol Evol 29:2253–2263PubMedCentralCrossRefPubMedGoogle Scholar
  33. Maguida S, Fernandez-Albertia S, Echave J (2008) Evolutionary conservation of protein vibrational dynamics. Gene 422:7–13CrossRefGoogle Scholar
  34. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6(e28):766Google Scholar
  35. Marsh JA, Teichmann SA (2014) Parallel dynamics and evolution: Protein conformational fluctuations and assembly reflect evolutionary changes in sequence and structure. BioEssays 36:209–218CrossRefPubMedGoogle Scholar
  36. Meyer AG, Wilke CO (2013) Integrating sequence variation and protein structure to identify sites under selection. Mol Biol Evol 30:36–44PubMedCentralCrossRefPubMedGoogle Scholar
  37. Meyer AG, Dawson ET, Wilke CO (2013) Cross-species comparison of site-specific evolutionary-rate variation in influenza haemagglutinin. Phil Trans R Soc B 368(20120):334Google Scholar
  38. Mirny LA, Shakhnovich EI (1999) Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 291:177–196CrossRefPubMedGoogle Scholar
  39. Nevin Gerek Z, Kumar S (2013) Structural dynamics flexibility informs function and evolution at a proteome scale. Evol Appl 6:423–433PubMedCentralCrossRefPubMedGoogle Scholar
  40. Ollikainen N, Kortemme T (2013) Computational protein design quantifies structural constraints on amino acid covariation. PLoS Comput Biol 9(e1003):313Google Scholar
  41. Overington J, Donnelly D, Johnson MS, Sali A, Blundell TL (1992) Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds. Protein Sci 1:216–226PubMedCentralCrossRefPubMedGoogle Scholar
  42. Ramsey DC, Scherrer MP, Zhou T, Wilke CO (2011) The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 188:479–488PubMedCentralCrossRefPubMedGoogle Scholar
  43. Rodrigue N, Lartillot N, Bryant D, Philippe H (2005) Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347:207–217CrossRefPubMedGoogle Scholar
  44. Röthlisberger D, Khersonsky O, Wollacott AM, Jiang L, DeChancie J, Betker J, Gallaher JL, Althoff EA, Zanghellini A, Dym O, Albeck S, Houk KN, Tawfik DS, Baker D (2008) Kemp elimination catalysts by computational enzyme design. Nature 453:190–195CrossRefPubMedGoogle Scholar
  45. Ryckaert JP, Ciccotti G, Berendsen HJC (1977) Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J Comput Phys 23:327–341CrossRefGoogle Scholar
  46. Salomon-Ferrer R, Götz AW, Poole D, Le Grand S, Walker RC (2013) Routine microsecond molecular dynamics simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. J Chem Theory Comput 9:3878–3888CrossRefGoogle Scholar
  47. Sanejouand YH (2013) Elastic network models: theoretical and empirical foundations. Methods Mol Biol 924:601–616CrossRefPubMedGoogle Scholar
  48. Scherrer MP, Meyer AG, Wilke CO (2012) Modeling coding-sequence evolution within the context of residue solvent accessibility. BMC Evol Biol 12(1):179PubMedCentralCrossRefPubMedGoogle Scholar
  49. Shih CH, Chang CM, Lin YS, Lo W, Hwang JK (2012) Evolutionary information hidden in a single protein structure. Proteins 80:1647–1657CrossRefPubMedGoogle Scholar
  50. Smith CA, Kortemme T (2008) Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J Mol Biol 380:742–756PubMedCentralCrossRefPubMedGoogle Scholar
  51. Spielman SJ, Wilke CO (2013) Membrane environment imposes unique selection pressures on transmembrane domains of G protein-coupled receptors. J Mol Evol 76:172–182PubMedCentralCrossRefPubMedGoogle Scholar
  52. Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688–2690CrossRefPubMedGoogle Scholar
  53. Stone EA, Sidow A (2007) Constructing a meaningful evolutionary average at the phylogenetic center of mass. BMC Bioinform 8:222CrossRefGoogle Scholar
  54. Suzuki Y (2006) Natural selection on the influenza virus genome. Mol Biol Evol 23:1902–1911CrossRefPubMedGoogle Scholar
  55. Tien MZ, Meyer AG, Sydykova DK, Spielman SJ, Wilke CO (2013) Maximum allowed solvent accessibilites of residues in proteins. PLOS ONE 8(e80):635Google Scholar
  56. Wilke CO, Drummond DA (2010) Signatures of protein biophysics in coding sequence evolution. Cur Opin Struct Biol 20:385–389CrossRefGoogle Scholar
  57. Yang Z (2000) Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J Mol Evol 51:423–432PubMedGoogle Scholar
  58. Yeh SW, Huang TT, Liu JW, Yu SH, Shih CH, Hwang JK (2014) Echave J (2014a) Local packing density is the main structural determinant of the rate of protein sequence evolution at site level. BioMed Res Int 572:409Google Scholar
  59. Yeh SW, Liu JW, Yu SH, Shih CH, Hwang JK, Echave J (2014b) Site-specific structural constraints on protein sequence evolutionary divergence: local packing density versus solvent exposure. Mol Biol Evol 31:135–139CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Amir Shahmoradi
    • 1
    • 2
  • Dariya K. Sydykova
    • 2
  • Stephanie J. Spielman
    • 2
  • Eleisha L. Jackson
    • 2
  • Eric T. Dawson
    • 2
  • Austin G. Meyer
    • 2
  • Claus O. Wilke
    • 2
    Email author
  1. 1.Department of PhysicsThe University of Texas at AustinAustinUSA
  2. 2.Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular BiologyThe University of Texas at AustinAustinUSA

Personalised recommendations