Abstract
The design of proteins and miniproteins is an important challenge. Designed variants should be stable, meaning the folded/unfolded free energy difference should be large enough. Thus, the unfolded state plays a central role. An extended peptide model is often used, where side chains interact with solvent and nearby backbone, but not each other. The unfolded energy is then a function of sequence composition only and can be empirically parametrized. If the space of sequences is explored with a Monte Carlo procedure, protein variants will be sampled according to a well-defined Boltzmann probability distribution. We can then choose unfolded model parameters to maximize the probability of sampling native-like sequences. This leads to a well-defined maximum likelihood framework. We present an iterative algorithm that follows the likelihood gradient. The method is presented in the context of our Proteus software, as a detailed downloadable tutorial. The unfolded model is combined with a folded model that uses molecular mechanics and a Generalized Born solvent. It was optimized for three PDZ domains and then used to redesign them. The sequences sampled are native-like and similar to a recent PDZ design study that was experimentally validated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dahiyat BI, Mayo SL (1997) De novo protein design: fully automated sequence selection. Science 278:82–87
Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D (2003) Design of a novel globular protein fold with atomic-level accuracy. Science 302:1364–1368
Dantas G, Kuhlman B, Callender D, Wong M, Baker D (2003) A large test of computational protein design: folding and stability of nine completely redesigned globular proteins. J Mol Biol 332:449–460
Xiong P, Wang M, Zhou X, Zhang T, Zhang J, Chen Q, Liu H (2014) Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability. Nat Commun 5:5330
Huang P, Feldmeier K, Parmggiegiani F, Velas DAF, Hoecker B, Baker D (2016) De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracies. Nat Chem Biol 12:29–43
Johansson KE, Johansen NT, Christensen S, Horowitz S, Bardwell JCA, Olsen JG, Willemoes M, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T, Winther JR (2016) Computational redesign of thioredoxin is hypersensitive toward minor conformational changes in the backbone template. J Mol Biol 428:4361–4377
Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A, Arrowsmith CH, Baker D (2017) Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357:168–175
Cao L, Goreshnik I, Coventry B, Case JB, Miller L, Kozodoy L, Chen RE, Carter L, Walls L, Park Y-J, Stewart L, Diamond M, Veesler D, Baker D (2020) De novo design of picomolar SARS-Cov-2 miniprotein inhibitors. Science 370:426–431
Opuu V, Nigro G, Gaillard T, Mechulam Y, Schmitt E, Simonson T (2020). Adaptive landscape flattening allows the design of both enzyme:substrate binding and catalytic power. PLoS Comp Biol 16:e1007600
Simon AJ, Zhou Y, Ramasubramani V, Glaser J, Pothukuchy A, Gollihar J, Gerberich JC, Leggere JC, Morrow BR, Jung C, Glotzer SC, Taylor DW, Ellington AD (2019) Supercharging enables organized assembly of synthetic biomolecules. Nat Chem 11:204–212
Hsia Y, Mout R, Sheffler W, Edman NI, Vulovic I, Park Y-J, Redler RL, Bick MJ, Bera AK, Courbet A, Kang A, Brunette T, Nattermann U, Tsai E, Saleem A, Chow CM, Ekiert D, Bhabha G, Veesler D, Baker D (2021) Hierarchical design of multi-scale protein complexes by combinatorial assembly of oligomeric helical bundle and repeat protein building blocks. Nat Chem (in press)
Pokala N, Handel TM (2005) Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. J Mol Biol 347:203–227
Villa F, Panel N, Chen X, Simonson T (2018) Adaptive landscape flattening in amino acid sequence space for the computational design of protein:peptide binding. J Chem Phys 149:072302
Ptitsyn OB (1995) Molten globule and protein folding. Adv Prot Chem 47:83–229
Korzhnev DM, Religa TL, Banachewicz W, Fersht AR, Kay LE (2010) A transient and low-populated protein-folding intermediate at atomic resolution. Science 329:1312–1316
Voelz VA, Bowman GR, Beauchamp K, Pande VS (2010) Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1–39). J Am Chem Soc 132:1526–1528
Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W (2010) Atomic-level characterization of the structural dynamics of proteins. Science 330:341–346
Kundrotas P, Karshikoff A (2002) Modeling of denatured state for calculation of the electrostatic contribution to protein stability. Prot Sci 11:1681
Saven JG (2003) Connecting statistical and optimized potentials in protein folding via a generalized foldability criterion. J Chem Phys 118:6133–6136
Zhou HX (2002) A Gaussian-chain model for treating residual charge-charge interactions in the unfolded state of proteins. Proc Natl Acad Sci USA 99:3569–3574
Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG (2011) Theoretical and computational protein design. Ann Rev Phys Chem 62:129–149
Simonson T, Gaillard T, Mignon D, Schmidt am Busch M, Lopes A, Amara N, Polydorides S, Sedano A, Druart K, Archontis G (2013) Computational protein design: the Proteus software and selected applications. J Comput Chem 34:2472–2484
Mignon D, Panel N, Chen X, Fuentes EJ, Simonson T (2017) Computational design of the Tiam1 PDZ domain and its ligand binding. J Chem Theory Comput 13:2271–2289
Mignon D, Druart K, Michael E, Opuu V, Polydorides S, Villa F, Gaillard T, Gaillard T, Panel N, Archontis G, Simonson T (2020) Physics-based computational protein design: an update. J Phys Chem A 124:10637–10648
Hallen MA, Reedy DA, Donald BR (2012) Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous side chain and backbone flexibility. Proteins 81:18–39
Simoncini D, Allouche D, de Givry S, Delmas C, Barbe S, Schiex T (2015) Guaranteed discrete energy optimization on large protein design problems. J Chem Theory Comput 11:5980–5989
Nisonoff PGHM, Donald BR (2016) Algorithms for protein design. Curr Opin Struct Biol 39:16–26
Karimi M, Shen Y (2018) iCFN: an efficient exact algorithm for multistate protein design. Bioinformatics 34:i811–820
Charpentier A, Mignon D, Barbe S, Cortes J, Schiex T, Simonson T, Allouche D (2019) Variable neighborhood search with cost function networks to solve large computational protein design problems. J Chem Inf Model 59:127–136
Mignon D, Simonson T (2016) Comparing three stochastic search algorithms for computational protein design: Monte Carlo, Replica Exchange Monte Carlo, and a multistart, steepest-descent heuristic. J Comput Chem 37:1781–1793
Davidson AR, Sauer RT (1994) Folded proteins occur frequently in libraries of random amino acid sequences. Proc Natl Acad Sci USA 91:2146–2150
Jackel C, Kast P, Hilvert D (2008) Protein design by directed evolution. Ann Rev Biochem 37:153–173
Saven JG, Wolynes PG (1997) Statistical mechanics of the combinatorial synthesis and analysis of folding macromolecules. J Phys Chem B 101:8375–8389
Grimmett GR, Stirzaker DR (2001) Probability and random processes. Oxford University Press, Oxford
Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman KW, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban Y-EA, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popovic Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P (2011) Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzym 487:545–574
Kuhlman B (2019) Designing protein structures and complexes with the molecular modeling program Rosetta. J Biol Chem 294:19436–19443
Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen C, Reza F, Anderson AC, Richardson DC, Richardson JS, Donald BR (2013) OSPREY: protein design with ensembles, flexibility, and provable algorithms. Methods Enzym 523:87–107
Pokola N, Handel TM (2005) Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. J Mol Biol 347:203–227
Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N (2006) A maximum likelihood framework for protein design. BMC Bioinf. 7:Art. 326
Cornell W, Cieplak P, Bayly C, Gould I, Merz K, Ferguson D, Spellmeyer D, Fox T, Caldwell J, Kollman P (1995) A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc 117:5179–5197
Lopes A, Aleksandrov A, Bathelt C, Archontis G, Simonson T (2007) Computational sidechain placement and protein mutagenesis with implicit solvent models. Proteins 67:853–867
Michael E, Polydorides S, Simonson T, Archontis G (2017) Simple models for nonpolar solvation: parametrization and testing. J Comput Chem 38:2509–2519
Polydorides S, Simonson T (2013) Monte Carlo simulations of proteins at constant pH with generalized Born solvent, flexible sidechains, and an effective dielectric boundary. J Comput Chem 34:2742–2756
Villa F, Mignon D, Polydorides S, Simonson T (2017) Comparing pairwise-additive and many-body generalized born models for acid/base calculations and protein design. J Comput Chem 38:2396–2410
Archontis G, Simonson T (2005) A residue-pairwise Generalized Born scheme suitable for protein design calculations. J Phys Chem B 109:22667–22673
Schaffer AA, Aravind L, Madden TL, Shavirin JL, Spouge S, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucl Acids Res 29:2994–3005
Finn, RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucl Acids Res 39:W29–37
Wilson D, Madera M, Vogel C, Chothia C, Gough J (2007) The SUPERFAMILY database in 2007: families and functions. Nucl Acids Res 35:D308–D313
Sievers F, Higgins DG (2018) Clustal Omega for making accurate alignments of many protein sequences. Prot Sci 27:135–145
Simonson T (2019) The Proteus software for computational protein design. https://proteus.polytechnique.fr. Ecole Polytechnique, Paris
Finn RD, Mistry J, Schuster-Böckler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer ELL, Bateman A (2006) Pfam: clans, web tools and services. Nucl Acids Res 34:D247–251
Druart K, Bigot J, Audit E, Simonson T (2017) A hybrid Monte Carlo method for multibackbone protein design. J Chem Theory Comput 12:6035–6048
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Opuu, V., Mignon, D., Simonson, T. (2022). Knowledge-Based Unfolded State Model for Protein Design. In: Simonson, T. (eds) Computational Peptide Science. Methods in Molecular Biology, vol 2405. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1855-4_19
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1855-4_19
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1854-7
Online ISBN: 978-1-0716-1855-4
eBook Packages: Springer Protocols