Abstract
For high-throughput structural genomic and evolutionary bioinformatics approaches, there is a clear need for fast methods to evaluate substitutions structurally. Coarse-grained methods are both powerful and fast, and a coarse-grained approach to position the substituted side chains is presented. Through the application of a coarse-grained method, a speed-up on the single- residue replacement, of at least sevenfold is achieved compared with modern all-atom approaches. At the same time, this approach maintains a small median RMSD from the leading all-atom approach (as measured in coarse-grained space), and predicts the conformation of point mutants with similar accuracy and generates biologically realistic side chain angles. This method is also substantially more predictable in its run time, making it useful for high-throughput studies of protein structural evolution. To demonstrate the utility of this method, it has been implemented in a forward simulation of sequences threaded through the SH2 domains, with selective pressures to fold and bind specifically. The relative substitution rates across the protein structure and at the binding interface are reflective of those observed in SH2 domain evolution. The algorithm has been implemented in C++, with the source code and binaries (currently supported for Linux systems) freely available as SARA at http://www.wyomingbioinformatics.org/LiberlesGroup/SARA.
Similar content being viewed by others
References
Bastolla U, Farwer J, Knapp EW, Vendruscolo M (2001) How to guarantee optimal stability for most representative structures in the protein data bank. Proteins Struct Funct Genet 44:79–96
Brenner SE, Koehl P, Levitt M (2000) The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res 28:254–256
Bridgham JT, Ortlund EA, Thornton JW (2009) An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461:515–519
Canutescu AA, Shelenkov AA, Dunbrack RL (2003) A graph-theory algorithm for rapid protein sidechain prediction. Protein Sci 12:2001–2014
Christ CD, Mark AE, van Gunsteren WF (2010) Basic ingredients of free energy calculations: a review. J Comput Chem 31:1569–1582
Crooks GE, Hon G, Chandonia J-M, Brenner SE (2004) WebLogo: a sequence logo generator. Genome Res 14:1188–1190
DePristo MA, Weinreich DM, Hartl DL (2005) Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet 6:678–687
Desmet J, Maeyer MD, Hazes B, Lasters I (1992) The dead-end elimination theorem and its use in protein side chain positioning. Nature 356:539–542
Dill KA, Ozkan SB, Shell MS, Weikl TR (2008) The protein folding problem. Annu Rev Biophys 37:289–316
Favrin G, Irbäck A, Wallin S (2002) Folding of a small helical protein using hydrogen bonds and hydrophobicity forces. Proteins 47:99–105
Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K et al (2009) The Pfam protein families database. Nucleic Acids Res 38:D211–D222
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109
Hills RD, Lu L, Voth GA (2010) Multiscale coarse-graining of the protein energy landscape. PLoS Comput Biol 6:e1000827
Holm L, Sander C (1992) Fast and simple Monte Carlo algorithm for side chain optimization in proteins: application to model building by homology. Proteins 14:213–223
Huzurbazar S, Kolesov G, Massey SE, Harris KC, Churbanov A, Liberles DA (2010) Lineage-specific differences in the amino acid substitution process. J Mol Biol 396:1410–1421
Kellogg EH, Leaver-Fay A, Baker D (2010) Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins. Available at: http://www.ncbi.nlm.nih.gov/pubmed/21132773. Accessed December 9, 2010
Khalili M, Saunders JA, Liwo A, Ołdziej S, Scheraga HA (2004) A united residue force-field for calcium–protein interactions. Protein Sci 13:2725–2735
Kingsford CL, Chazelle B, Singh M (2005) Solving and analyzing side chain positioning problems using linear and integer programing. Bioinformatics 21:1028–1036
Kleinman CL, Rodrigue N, Lartillot N, Philippe H (2010) Statistical potentials for improved structurally constrained evolutionary models. Mol Biol Evol 27:1546–1560
Krivov GG, Shapovalov MV, Dunbrack RL (2009) Improved prediction of protein side chain conformations with SCWRL4. Proteins 77:778–795
Kumar MDS, Bava KA, Gromiha MM, Prabakaran P, Kitajima K, Uedaira H, Sarai A (2006) ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions. Nucleic Acids Res 34:D204–D206
Levitt M, Warshel A (1975) Computer simulation of protein folding. Nature 253:694–698
Liang S, Grishin NV (2002) Side chain modeling with an optimized scoring function. Protein Sci 11:322–331
Liberles DA, Tisdell MDM, Grahnen JA (2011) Binding constraints on the evolution of enzymes and signalling proteins: the important role of negative pleiotropy. Proc R Soc B: Biol Sci 278:1930–1935
Madera M, Calmus R, Thiltgen G, Karplus K, Gough J (2010) Improving protein secondary structure prediction using a simple k-mer model. Bioinformatics 26:596–602
Massey SE, Churbanov A, Rastogi S, Liberles DA (2008) Characterizing positive and negative selection and their phylogenetic effects. Gene 418:22–26
Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092
Mukherjee A, Bagchi B (2003) Correlation between rate of folding, energy landscape, and topology in the folding of a model protein HP-36. J Chem Phys 118:4733–4747
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540
Parisi G, Echave J (2001) Structural constraints and emergence of sequence patterns in protein evolution. Mol Biol Evol 18:750–756
Potapov V, Cohen M, Inbar Y, Schreiber G (2010) Protein structure modelling and evaluation based on a 4-distance description of side chain interactions. BMC Bioinform 11:374
Poy F, Yaffe MB, Sayos J, Saxena K, Morra M, Sumegi J, Cantley LC, Terhorst C, Eck MJ (1999) Crystal structures of the XLP protein SAP reveal a class of SH2 domains with extended, phosphotyrosine-independent sequence recognition. Mol Cell 4:555–561
Rastogi S, Liberles DA (2005) Subfunctionalization of duplicated genes as a transition state to neofunctionalization. BMC Evol Biol 5:28
Rastogi S, Reuter N, Liberles DA (2006) Evaluation of models for the evolution of protein sequences and functions under structural constraint. Biophys Chem 124:134–144
Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234:779–815
Shakhnovich E, Abkevich V, Ptitsyn O (1996) Conserved residues and the mechanism of protein folding. Nature 379:96–98
Sinha N, Nussinov R (2001) Point mutations and sequence variability in proteins: redistributions of preexisting populations. Proc Natl Acad Sci USA 98:3139–3144
Summa CM, Levitt M (2007) Near-native structure refinement using in vacuo energy minimization. Proc Natl Acad Sci USA 104:3177–3182
Tokuriki N, Tawfik DS (2009) Protein dynamism and evolvability. Science 324:203–207
Tozzini V (2005) Coarse-grained models for proteins. Curr Opin Struct Biol 15:144–150
Voelz VA, Bowman GR, Beauchamp K, Pande VS (2010) Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1–39). J Am Chem Soc 132:1526–1528
Voigt CA, Gordon DB, Mayo SL (2000) Trading accuracy for speed: a quantitative comparison of search algorithms in protein sequence design. J Mol Biol 299:789–803
Acknowledgments
This study was supported by an institutional NIH INBRE award to University of Wyoming (P20 RR016474). Jan Kubelka is supported by NSF CAREER award 0846140. David Liberles receives support from NSF award DBI-0743374.
Author information
Authors and Affiliations
Corresponding authors
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Grahnen, J.A., Kubelka, J. & Liberles, D.A. Fast Side Chain Replacement in Proteins Using a Coarse-Grained Approach for Evaluating the Effects of Mutation During Evolution. J Mol Evol 73, 23–33 (2011). https://doi.org/10.1007/s00239-011-9454-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-011-9454-3