The Influence of Protein Stability on Sequence Evolution: Applications to Phylogenetic Inference

  • Ugo BastollaEmail author
  • Miguel ArenasEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1851)


Phylogenetic inference from protein data is traditionally based on empirical substitution models of evolution that assume that protein sites evolve independently of each other and under the same substitution process. However, it is well known that the structural properties of a protein site in the native state affect its evolution, in particular the sequence entropy and the substitution rate. Starting from the seminal proposal by Halpern and Bruno, where structural properties are incorporated in the evolutionary model through site-specific amino acid frequencies, several models have been developed to tackle the influence of protein structure on sequence evolution. Here we describe stability-constrained substitution (SCS) models that explicitly consider the stability of the native state against both unfolded and misfolded states. One of them, the mean-field model, provides an independent sites approximation that can be readily incorporated in maximum likelihood methods of phylogenetic inference, including ancestral sequence reconstruction. Next, we describe its validation with simulated and real proteins and its limitations and advantages with respect to empirical models that lack site specificity. We finally provide guidelines and recommendations to analyze protein data accounting for stability constraints, including computer simulations and inferences of protein evolution based on maximum likelihood. Some practical examples are included to illustrate these procedures.

Key words

Stability-constrained substitution models Mean-field substitution model Protein folding stability Protein evolution Ancestral protein reconstruction 



M.A. was supported by the grant “Ramón y Cajal” RYC-2015-18241 from the Spanish Government. U.B. is supported by the grant BIO2016-79043 from the Spanish Ministry of Economy.


  1. 1.
    Schmitt AO, Schuchhardt J, Ludwig A, Brockmann GA (2007) Protein evolution within and between species. J Theor Biol 249(2):376–383. Scholar
  2. 2.
    Gao F, Bhattacharya T, Gaschen B, Taylor J, Moore JP, Novitsky V, Yusim K, Lang D, Foley B, Beddows S, Alam M, Haynes B, Hahn BH, Korber B (2003) Consensus and ancestral state HIV vaccines. Science 299(5612):1515–1518Google Scholar
  3. 3.
    Arenas M, Posada D (2010) Computational design of centralized HIV-1 genes. Curr HIV Res 8(8):613–621PubMedCrossRefGoogle Scholar
  4. 4.
    Wilson C, Agafonov RV, Hoemberger M, Kutter S, Zorba A, Halpin J, Buosi V, Otten R, Waterman D, Theobald DL, Kern D (2015) Kinase dynamics. Using ancient protein kinases to unravel a modern cancer drug’s mechanism. Science 347(6224):882–886. Scholar
  5. 5.
    Perez-Jimenez R, Ingles-Prieto A, Zhao ZM, Sanchez-Romero I, Alegre-Cebollada J, Kosuri P, Garcia-Manyes S, Kappock TJ, Tanokura M, Holmgren A, Sanchez-Ruiz JM, Gaucher EA, Fernandez JM (2011) Single-molecule paleoenzymology probes the chemistry of resurrected enzymes. Nat Struct Mol Biol 18(5):592–596PubMedPubMedCentralCrossRefGoogle Scholar
  6. 6.
    Wijma HJ, Floor RJ, Janssen DB (2013) Structure- and sequence-analysis inspired engineering of proteins for enhanced thermostability. Curr Opin Struct Biol 23(4):588–594. Scholar
  7. 7.
    Cole MF, Gaucher EA (2011) Utilizing natural diversity to evolve protein function: applications towards thermostability. Curr Opin Chem Biol 15(3):399–406. Scholar
  8. 8.
    Arenas M (2015) Trends in substitution models of molecular evolution. Front Genet 6:319. Scholar
  9. 9.
    Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning AP, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjolander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S (2012) The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 21(6):769–785PubMedPubMedCentralCrossRefGoogle Scholar
  10. 10.
    Bastolla U (2014) Detecting selection on protein stability through statistical mechanical models of folding and evolution. Biomol Ther 4:291–314Google Scholar
  11. 11.
    Wilke CO (2012) Bringing molecules back into molecular evolution. PLoS Comput Biol 8(6):e1002572PubMedPubMedCentralCrossRefGoogle Scholar
  12. 12.
    Sikosek T, Chan HS (2014) Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 11(100):20140419. Scholar
  13. 13.
    Goldstein RA (2011) The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 79(5):1396–1407PubMedCrossRefGoogle Scholar
  14. 14.
    Serohijos AW, Shakhnovich EI (2014) Merging molecular mechanism and evolution: theory and computation at the interface of biophysics and evolutionary population genetics. Curr Opin Struct Biol 26:84–91. Scholar
  15. 15.
    Bastolla U, Dehouck Y, Echave J (2017) What evolution tells us about protein physics, and protein physics tells us about evolution. Curr Opin Struct Biol 42:59–66. Scholar
  16. 16.
    Echave J (2008) Evolutionary divergence of protein structure: the linearly forced elastic network model. Chem Phys Lett 457(4):413–416. Scholar
  17. 17.
    Tirion MM (1996) Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys Rev Lett 77(9):1905–1908PubMedCrossRefGoogle Scholar
  18. 18.
    Bahar I, Rader AJ (2005) Coarse-grained normal mode analysis in structural biology. Curr Opin Struct Biol 15(5):586–592. Scholar
  19. 19.
    Bornberg-Bauer E, Chan HS (1999) Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci U S A 96(19):10689–10694PubMedPubMedCentralCrossRefGoogle Scholar
  20. 20.
    Bastolla U, Porto M, Eduardo Roman MH, Vendruscolo MH (2003) Connectivity of neutral networks, overdispersion, and structural conservation in protein evolution. J Mol Evol 56(3):243–254PubMedCrossRefGoogle Scholar
  21. 21.
    Lemmon AR, Moriarty EC (2004) The importance of proper model assumption in bayesian phylogenetics. Syst Biol 53(2):265–277PubMedCrossRefGoogle Scholar
  22. 22.
    Zhang J (1999) Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models. Mol Biol Evol 16(6):868–875PubMedCrossRefGoogle Scholar
  23. 23.
    Bordner AJ, Mittelmann HD (2013) A new formulation of protein evolutionary models that account for structural constraints. Mol Biol Evol 31(3):736–749PubMedCrossRefGoogle Scholar
  24. 24.
    Rodrigue N, Lartillot N, Bryant D, Philippe H (2005) Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347(2):207–217PubMedCrossRefGoogle Scholar
  25. 25.
    Arenas M, Sanchez-Cobos A, Bastolla U (2015) Maximum likelihood phylogenetic inference with selection on protein folding stability. Mol Biol Evol 32(8):2195–2207. Scholar
  26. 26.
    Bastolla U, Porto M, Roman HE, Vendruscolo M (2006) A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol Biol 6:43PubMedPubMedCentralCrossRefGoogle Scholar
  27. 27.
    Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D (2017) Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci U S A 114:9122–9127. Scholar
  28. 28.
    Wang ZO, Pollock DD (2005) Context dependence and coevolution among amino acid residues in proteins. Methods Enzymol 395:779–790. Scholar
  29. 29.
    Arenas M, Dos Santos HG, Posada D, Bastolla U (2013) Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29(23):3020–3028PubMedPubMedCentralCrossRefGoogle Scholar
  30. 30.
    Echave J, Wilke CO (2017) Biophysical models of protein evolution: understanding the patterns of evolutionary sequence divergence. Annu Rev Biophys 46:85–103. Scholar
  31. 31.
    Bastolla U, Farwer J, Knapp EW, Vendruscolo M (2001) How to guarantee optimal stability for most representative structures in the Protein Data Bank. Proteins 44(2):79–96PubMedCrossRefGoogle Scholar
  32. 32.
    Minning J, Porto M, Bastolla U (2013) Detecting selection for negative design in proteins through an improved model of the misfolded state. Proteins 81(7):1102–1112. Scholar
  33. 33.
    Sella G, Hirsh AE (2005) The application of statistical physics to evolutionary biology. Proc Natl Acad Sci U S A 102(27):9541–9546PubMedPubMedCentralCrossRefGoogle Scholar
  34. 34.
    Mustonen V, Lassig M (2005) Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies. Proc Natl Acad Sci U S A 102(44):15936–15941. Scholar
  35. 35.
    Arenas M (2012) Simulation of molecular data under diverse evolutionary scenarios. PLoS Comput Biol 8(5):e1002495PubMedPubMedCentralCrossRefGoogle Scholar
  36. 36.
    Hoban S, Bertorelle G, Gaggiotti OE (2012) Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet 13(2):110–122PubMedCrossRefGoogle Scholar
  37. 37.
    Kingman JFC (1982) The coalescent. Stoch Process Appl 13:235–248CrossRefGoogle Scholar
  38. 38.
    Posada D, Wiuf C (2003) Simulating haplotype blocks in the human genome. Bioinformatics 19(2):289–290PubMedCrossRefGoogle Scholar
  39. 39.
    Arenas M, Posada D (2010) Coalescent simulation of intracodon recombination. Genetics 184(2):429–437PubMedPubMedCentralCrossRefGoogle Scholar
  40. 40.
    Arenas M (2013) Computer programs and methodologies for the simulation of DNA sequence data with recombination. Front Genet 4:9PubMedPubMedCentralGoogle Scholar
  41. 41.
    Arenas M, Posada D (2014) Simulation of genome-wide evolution under heterogeneous substitution models and complex multispecies coalescent histories. Mol Biol Evol 31(5):1295–1301PubMedPubMedCentralCrossRefGoogle Scholar
  42. 42.
    Hudson RR (1998) Island models and the coalescent process. Mol Ecol 7(4):413–418CrossRefGoogle Scholar
  43. 43.
    Yang Z (2006) Computational molecular evolution. Oxford University Press, OxfordCrossRefGoogle Scholar
  44. 44.
    Abascal F, Zardoya R, Posada D (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21(9):2104–2105PubMedCrossRefGoogle Scholar
  45. 45.
    Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86CrossRefGoogle Scholar
  46. 46.
    Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A (2000) Comparative protein structure modeling of genes and genomes. Annu Rev Biophys Biomol Struct 29:291–325PubMedCrossRefGoogle Scholar
  47. 47.
    Halpern AL, Bruno WJ (1998) Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol 15(7):910–917PubMedCrossRefGoogle Scholar
  48. 48.
    Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699PubMedCrossRefGoogle Scholar
  49. 49.
    Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282PubMedGoogle Scholar
  50. 50.
    Arenas M, Weber CC, Liberles DA, Bastolla U (2017) ProtASR: an evolutionary framework for ancestral protein reconstruction with selection on folding stability. Syst Biol 66:1054–1064. Scholar
  51. 51.
    Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8):1586–1591CrossRefGoogle Scholar
  52. 52.
    Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13(5):555–556PubMedPubMedCentralGoogle Scholar
  53. 53.
    Merkl R, Sterner R (2016) Ancestral protein reconstruction: techniques and applications. Biol Chem 397(1):1–21. Scholar
  54. 54.
    Liberles DA (2007) Ancestral sequence reconstruction. Oxford University Press, OxfordCrossRefGoogle Scholar
  55. 55.
    Kothe DL, Li Y, Decker JM, Bibollet-Ruche F, Zammit KP, Salazar MG, Chen Y, Weng Z, Weaver EA, Gao F, Haynes BF, Shaw GM, Korber BT, Hahn BH (2006) Ancestral and consensus envelope immunogens for HIV-1 subtype C. Virology 352(2):438–449PubMedCrossRefGoogle Scholar
  56. 56.
    Gaucher EA, Govindarajan S, Ganesh OK (2008) Palaeotemperature trend for Precambrian life inferred from resurrected proteins. Nature 451(7179):704–707PubMedCrossRefGoogle Scholar
  57. 57.
    Hobbs JK, Shepherd C, Saul DJ, Demetras NJ, Haaning S, Monk CR, Daniel RM, Arcus VL (2012) On the origin and evolution of thermophily: reconstruction of functional precambrian enzymes from ancestors of Bacillus. Mol Biol Evol 29(2):825–835. Scholar
  58. 58.
    Bastolla U, Moya A, Viguera E, van Ham RC (2004) Genomic determinants of protein folding thermodynamics in prokaryotic organisms. J Mol Biol 343(5):1451–1466PubMedCrossRefGoogle Scholar
  59. 59.
    Williams PD, Pollock DD, Blackburne BP, Goldstein RA (2006) Assessing the accuracy of ancestral protein reconstruction methods. PLoS Comput Biol 2(6):e69PubMedPubMedCentralCrossRefGoogle Scholar
  60. 60.
    Lartillot N, Lepage T, Blanquart S (2009) PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25(17):2286–2288. Scholar
  61. 61.
    Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21(6):1095–1109PubMedCrossRefGoogle Scholar
  62. 62.
    Mustonen V, Lassig M (2009) From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation. Trends Genet 25(3):111–119. Scholar
  63. 63.
    Arenas M, Patricio M, Posada D, Valiente G (2010) Characterization of phylogenetic networks with NetTest. BMC Bioinformatics 11(1):268PubMedPubMedCentralCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Centre for Molecular Biology Severo Ochoa(CSIC-UAM)MadridSpain
  2. 2.Department of Biochemistry, Genetics and ImmunologyUniversity of VigoVigoSpain

Personalised recommendations