Knowledge-Based Unfolded State Model for Protein Design

Opuu, Vaitea; Mignon, David; Simonson, Thomas

doi:10.1007/978-1-0716-1855-4_19

Vaitea Opuu³,
David Mignon³ &
Thomas Simonson³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2405))

1543 Accesses

Abstract

The design of proteins and miniproteins is an important challenge. Designed variants should be stable, meaning the folded/unfolded free energy difference should be large enough. Thus, the unfolded state plays a central role. An extended peptide model is often used, where side chains interact with solvent and nearby backbone, but not each other. The unfolded energy is then a function of sequence composition only and can be empirically parametrized. If the space of sequences is explored with a Monte Carlo procedure, protein variants will be sampled according to a well-defined Boltzmann probability distribution. We can then choose unfolded model parameters to maximize the probability of sampling native-like sequences. This leads to a well-defined maximum likelihood framework. We present an iterative algorithm that follows the likelihood gradient. The method is presented in the context of our Proteus software, as a detailed downloadable tutorial. The unfolded model is combined with a folded model that uses molecular mechanics and a Generalized Born solvent. It was optimized for three PDZ domains and then used to redesign them. The sequences sampled are native-like and similar to a recent PDZ design study that was experimentally validated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dahiyat BI, Mayo SL (1997) De novo protein design: fully automated sequence selection. Science 278:82–87
Article CAS PubMed Google Scholar
Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D (2003) Design of a novel globular protein fold with atomic-level accuracy. Science 302:1364–1368
Article CAS PubMed Google Scholar
Dantas G, Kuhlman B, Callender D, Wong M, Baker D (2003) A large test of computational protein design: folding and stability of nine completely redesigned globular proteins. J Mol Biol 332:449–460
Article CAS PubMed Google Scholar
Xiong P, Wang M, Zhou X, Zhang T, Zhang J, Chen Q, Liu H (2014) Protein design with a comprehensive statistical energy function and boosted by experimental selection for foldability. Nat Commun 5:5330
Article CAS PubMed Google Scholar
Huang P, Feldmeier K, Parmggiegiani F, Velas DAF, Hoecker B, Baker D (2016) De novo design of a four-fold symmetric TIM-barrel protein with atomic-level accuracies. Nat Chem Biol 12:29–43
Article CAS Google Scholar
Johansson KE, Johansen NT, Christensen S, Horowitz S, Bardwell JCA, Olsen JG, Willemoes M, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T, Winther JR (2016) Computational redesign of thioredoxin is hypersensitive toward minor conformational changes in the backbone template. J Mol Biol 428:4361–4377
Article CAS PubMed PubMed Central Google Scholar
Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, Carter L, Ravichandran R, Mulligan VK, Chevalier A, Arrowsmith CH, Baker D (2017) Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357:168–175
Article CAS PubMed PubMed Central Google Scholar
Cao L, Goreshnik I, Coventry B, Case JB, Miller L, Kozodoy L, Chen RE, Carter L, Walls L, Park Y-J, Stewart L, Diamond M, Veesler D, Baker D (2020) De novo design of picomolar SARS-Cov-2 miniprotein inhibitors. Science 370:426–431
Article CAS PubMed PubMed Central Google Scholar
Opuu V, Nigro G, Gaillard T, Mechulam Y, Schmitt E, Simonson T (2020). Adaptive landscape flattening allows the design of both enzyme:substrate binding and catalytic power. PLoS Comp Biol 16:e1007600
Article Google Scholar
Simon AJ, Zhou Y, Ramasubramani V, Glaser J, Pothukuchy A, Gollihar J, Gerberich JC, Leggere JC, Morrow BR, Jung C, Glotzer SC, Taylor DW, Ellington AD (2019) Supercharging enables organized assembly of synthetic biomolecules. Nat Chem 11:204–212
Article CAS PubMed Google Scholar
Hsia Y, Mout R, Sheffler W, Edman NI, Vulovic I, Park Y-J, Redler RL, Bick MJ, Bera AK, Courbet A, Kang A, Brunette T, Nattermann U, Tsai E, Saleem A, Chow CM, Ekiert D, Bhabha G, Veesler D, Baker D (2021) Hierarchical design of multi-scale protein complexes by combinatorial assembly of oligomeric helical bundle and repeat protein building blocks. Nat Chem (in press)
Google Scholar
Pokala N, Handel TM (2005) Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. J Mol Biol 347:203–227
Article CAS PubMed Google Scholar
Villa F, Panel N, Chen X, Simonson T (2018) Adaptive landscape flattening in amino acid sequence space for the computational design of protein:peptide binding. J Chem Phys 149:072302
Article PubMed Google Scholar
Ptitsyn OB (1995) Molten globule and protein folding. Adv Prot Chem 47:83–229
CAS Google Scholar
Korzhnev DM, Religa TL, Banachewicz W, Fersht AR, Kay LE (2010) A transient and low-populated protein-folding intermediate at atomic resolution. Science 329:1312–1316
Article CAS PubMed Google Scholar
Voelz VA, Bowman GR, Beauchamp K, Pande VS (2010) Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1–39). J Am Chem Soc 132:1526–1528
Article CAS PubMed PubMed Central Google Scholar
Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y, Wriggers W (2010) Atomic-level characterization of the structural dynamics of proteins. Science 330:341–346
Article CAS PubMed Google Scholar
Kundrotas P, Karshikoff A (2002) Modeling of denatured state for calculation of the electrostatic contribution to protein stability. Prot Sci 11:1681
Article CAS Google Scholar
Saven JG (2003) Connecting statistical and optimized potentials in protein folding via a generalized foldability criterion. J Chem Phys 118:6133–6136
Article CAS Google Scholar
Zhou HX (2002) A Gaussian-chain model for treating residual charge-charge interactions in the unfolded state of proteins. Proc Natl Acad Sci USA 99:3569–3574
Article CAS PubMed PubMed Central Google Scholar
Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG (2011) Theoretical and computational protein design. Ann Rev Phys Chem 62:129–149
Article CAS Google Scholar
Simonson T, Gaillard T, Mignon D, Schmidt am Busch M, Lopes A, Amara N, Polydorides S, Sedano A, Druart K, Archontis G (2013) Computational protein design: the Proteus software and selected applications. J Comput Chem 34:2472–2484
Google Scholar
Mignon D, Panel N, Chen X, Fuentes EJ, Simonson T (2017) Computational design of the Tiam1 PDZ domain and its ligand binding. J Chem Theory Comput 13:2271–2289
Article PubMed Google Scholar
Mignon D, Druart K, Michael E, Opuu V, Polydorides S, Villa F, Gaillard T, Gaillard T, Panel N, Archontis G, Simonson T (2020) Physics-based computational protein design: an update. J Phys Chem A 124:10637–10648
Article PubMed Google Scholar
Hallen MA, Reedy DA, Donald BR (2012) Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous side chain and backbone flexibility. Proteins 81:18–39
Article PubMed PubMed Central Google Scholar
Simoncini D, Allouche D, de Givry S, Delmas C, Barbe S, Schiex T (2015) Guaranteed discrete energy optimization on large protein design problems. J Chem Theory Comput 11:5980–5989
Article CAS PubMed Google Scholar
Nisonoff PGHM, Donald BR (2016) Algorithms for protein design. Curr Opin Struct Biol 39:16–26
Article PubMed PubMed Central Google Scholar
Karimi M, Shen Y (2018) iCFN: an efficient exact algorithm for multistate protein design. Bioinformatics 34:i811–820
Article CAS PubMed PubMed Central Google Scholar
Charpentier A, Mignon D, Barbe S, Cortes J, Schiex T, Simonson T, Allouche D (2019) Variable neighborhood search with cost function networks to solve large computational protein design problems. J Chem Inf Model 59:127–136
Article CAS PubMed Google Scholar
Mignon D, Simonson T (2016) Comparing three stochastic search algorithms for computational protein design: Monte Carlo, Replica Exchange Monte Carlo, and a multistart, steepest-descent heuristic. J Comput Chem 37:1781–1793
Article CAS PubMed Google Scholar
Davidson AR, Sauer RT (1994) Folded proteins occur frequently in libraries of random amino acid sequences. Proc Natl Acad Sci USA 91:2146–2150
Article CAS PubMed PubMed Central Google Scholar
Jackel C, Kast P, Hilvert D (2008) Protein design by directed evolution. Ann Rev Biochem 37:153–173
CAS Google Scholar
Saven JG, Wolynes PG (1997) Statistical mechanics of the combinatorial synthesis and analysis of folding macromolecules. J Phys Chem B 101:8375–8389
Article CAS Google Scholar
Grimmett GR, Stirzaker DR (2001) Probability and random processes. Oxford University Press, Oxford
Google Scholar
Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman KW, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban Y-EA, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popovic Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P (2011) Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzym 487:545–574
Article CAS Google Scholar
Kuhlman B (2019) Designing protein structures and complexes with the molecular modeling program Rosetta. J Biol Chem 294:19436–19443
Article CAS PubMed PubMed Central Google Scholar
Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen C, Reza F, Anderson AC, Richardson DC, Richardson JS, Donald BR (2013) OSPREY: protein design with ensembles, flexibility, and provable algorithms. Methods Enzym 523:87–107
Article CAS Google Scholar
Pokola N, Handel TM (2005) Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity. J Mol Biol 347:203–227
Article Google Scholar
Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N (2006) A maximum likelihood framework for protein design. BMC Bioinf. 7:Art. 326
Google Scholar
Cornell W, Cieplak P, Bayly C, Gould I, Merz K, Ferguson D, Spellmeyer D, Fox T, Caldwell J, Kollman P (1995) A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc 117:5179–5197
Article CAS Google Scholar
Lopes A, Aleksandrov A, Bathelt C, Archontis G, Simonson T (2007) Computational sidechain placement and protein mutagenesis with implicit solvent models. Proteins 67:853–867
Article CAS PubMed Google Scholar
Michael E, Polydorides S, Simonson T, Archontis G (2017) Simple models for nonpolar solvation: parametrization and testing. J Comput Chem 38:2509–2519
Article CAS PubMed Google Scholar
Polydorides S, Simonson T (2013) Monte Carlo simulations of proteins at constant pH with generalized Born solvent, flexible sidechains, and an effective dielectric boundary. J Comput Chem 34:2742–2756
Article CAS PubMed Google Scholar
Villa F, Mignon D, Polydorides S, Simonson T (2017) Comparing pairwise-additive and many-body generalized born models for acid/base calculations and protein design. J Comput Chem 38:2396–2410
Article CAS PubMed Google Scholar
Archontis G, Simonson T (2005) A residue-pairwise Generalized Born scheme suitable for protein design calculations. J Phys Chem B 109:22667–22673
Article CAS PubMed Google Scholar
Schaffer AA, Aravind L, Madden TL, Shavirin JL, Spouge S, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucl Acids Res 29:2994–3005
Article CAS PubMed PubMed Central Google Scholar
Finn, RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucl Acids Res 39:W29–37
Article CAS PubMed PubMed Central Google Scholar
Wilson D, Madera M, Vogel C, Chothia C, Gough J (2007) The SUPERFAMILY database in 2007: families and functions. Nucl Acids Res 35:D308–D313
Article CAS PubMed Google Scholar
Sievers F, Higgins DG (2018) Clustal Omega for making accurate alignments of many protein sequences. Prot Sci 27:135–145
Article CAS Google Scholar
Simonson T (2019) The Proteus software for computational protein design. https://proteus.polytechnique.fr. Ecole Polytechnique, Paris
Finn RD, Mistry J, Schuster-Böckler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer ELL, Bateman A (2006) Pfam: clans, web tools and services. Nucl Acids Res 34:D247–251
Article CAS PubMed Google Scholar
Druart K, Bigot J, Audit E, Simonson T (2017) A hybrid Monte Carlo method for multibackbone protein design. J Chem Theory Comput 12:6035–6048
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
Vaitea Opuu, David Mignon & Thomas Simonson

Authors

Vaitea Opuu
View author publications
You can also search for this author in PubMed Google Scholar
David Mignon
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Simonson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Simonson .

Editor information

Editors and Affiliations

Lab de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
Thomas Simonson

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Opuu, V., Mignon, D., Simonson, T. (2022). Knowledge-Based Unfolded State Model for Protein Design. In: Simonson, T. (eds) Computational Peptide Science. Methods in Molecular Biology, vol 2405. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1855-4_19

Download citation

DOI: https://doi.org/10.1007/978-1-0716-1855-4_19
Published: 08 July 2021
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1854-7
Online ISBN: 978-1-0716-1855-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics