Skip to main content

Deterministic Search Methods for Computational Protein Design

Part of the Methods in Molecular Biology book series (MIMB,volume 1529)

Abstract

One main challenge in Computational Protein Design (CPD) lies in the exploration of the amino-acid sequence space, while considering, to some extent, side chain flexibility. The exorbitant size of the search space urges for the development of efficient exact deterministic search methods enabling identification of low-energy sequence-conformation models, corresponding either to the global minimum energy conformation (GMEC) or an ensemble of guaranteed near-optimal solutions. In contrast to stochastic local search methods that are not guaranteed to find the GMEC, exact deterministic approaches always identify the GMEC and prove its optimality in finite but exponential worst-case time. After a brief overview on these two classes of methods, we discuss the grounds and merits of four deterministic methods that have been applied to solve CPD problems. These approaches are based either on the Dead-End-Elimination theorem combined with A* algorithm (DEE/A*), on Cost Function Networks algorithms (CFN), on Integer Linear Programming solvers (ILP) or on Markov Random Fields solvers (MRF). The way two of these methods (DEE/A* and CFN) can be used in practice to identify low-energy sequence-conformation models starting from a pairwise decomposed energy matrix is detailed in this review.

Key words

  • Exact combinatorial optimization
  • Global minimum energy conformation
  • Near-optimal solutions
  • Dead-end-elimination
  • Cost function network
  • Integer linear programming
  • Markov random field

This is a preview of subscription content, access via your institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4939-6637-0_4
  • Chapter length: 17 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-1-4939-6637-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   139.99
Price excludes VAT (USA)
Hardcover Book
USD   199.99
Price excludes VAT (USA)
Fig. 1
Fig. 2

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Shapovalov MV, Dunbrack RL Jr (2011) A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19(6):844–858. doi:10.1016/j.str.2011.03.019

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  2. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M (1977) The Protein Data Bank. A computer-based archival file for macromolecular structures. Eur J Biochem 80(2):319–324

    CAS  CrossRef  PubMed  Google Scholar 

  3. Boas FE, Harbury PB (2007) Potential energy functions for protein design. Curr Opin Struct Biol 17(2):199–204. doi:10.1016/j.sbi.2007.03.006

    CAS  CrossRef  PubMed  Google Scholar 

  4. Desmet J, De Maeyer M, Hazes B, Lasters I (1992) The dead-end elimination theorem and its use in protein side-chain positioning. Nature 356(6369):539–542

    CAS  CrossRef  PubMed  Google Scholar 

  5. Gainza P, Roberts KE, Donald BR (2012) Protein design using continuous rotamers. PLoS Comput Biol 8(1), e1002335

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  6. Georgiev I, Donald BR (2007) Dead-end elimination with backbone flexibility. Bioinformatics 23(13):185–194

    CrossRef  Google Scholar 

  7. Ma H, Keedy DA, Donald BR (2013) Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins 81(1):18–39. doi:10.1002/prot.24150

    CrossRef  Google Scholar 

  8. Pierce NA, Winfree E (2002) Protein design is NP-hard. Protein Eng 15(10):779–782. doi:10.1093/protein/15.10.779

    CAS  CrossRef  PubMed  Google Scholar 

  9. Chazelle B, Kingsford C, Singh M (2004) A semidefinite programming approach to side chain positioning with new rounding strategies. Informs J Comput 16(4):380–392

    CrossRef  Google Scholar 

  10. Kuhlman B, Baker D (2000) Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci U S A 97(19):10383–10388

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  11. Voigt CA, Gordon DB, Mayo SL (2000) Trading accuracy for speed: a quantitative comparison of search algorithms in protein sequence design. J Mol Biol 299(3):789–803. doi:10.1006/jmbi.2000.3758

    CAS  CrossRef  PubMed  Google Scholar 

  12. Raha K, Wollacott AM, Italia MJ, Desjarlais JR (2000) Prediction of amino acid sequence from structure. Protein Sci 9(6):1106–1119. doi:10.1110/ps.9.6.1106

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  13. Ogata K, Jaramillo A, Cohen W, Briand J, Conan F, Wodak S (2003) Automatic sequence design of MHC class-I binding peptides impairing CD8+ T cell recognition. J Biol Chem 278:1281

    CAS  CrossRef  PubMed  Google Scholar 

  14. Allen BD, Mayo SL (2006) Dramatic performance enhancements for the FASTER optimization algorithm. J Comput Chem 27(10):1071–1075

    CAS  CrossRef  PubMed  Google Scholar 

  15. Desmet J, Spriet J, Lasters I (2002) Fast and accurate side-chain topology and energy refinement (FASTER) as a new method for protein structure optimization. Proteins 48(1):31–43. doi:10.1002/prot.10131

    CAS  CrossRef  PubMed  Google Scholar 

  16. Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban YE, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popovic Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P (2011) ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 487:545–574. doi:10.1016/B978-0-12-381270-4.00019-6

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  17. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087

    CAS  CrossRef  Google Scholar 

  18. Chowdry AB, Reynolds KA, Hanes MS, Voorhies M, Pokala N, Handel TM (2007) An object-oriented library for computational protein design. J Comput Chem 28(14):2378–2388. doi:10.1002/jcc.20727

    CAS  CrossRef  PubMed  Google Scholar 

  19. Allouche D, André I, Barbe S, Davies J, de Givry S, Katsirelos G, O'Sullivan B, Prestwich S, Schiex T, Traoré S (2014) Computational protein design as an optimization problem. Artif Intell 212:59–79. doi:10.1016/j.artint.2014.03.005

    CrossRef  Google Scholar 

  20. Dahiyat BI, Mayo SL (1996) Protein design automation. Protein Sci 5(5):895–903

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  21. Leach AR, Lemon AP (1998) Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins 33(2):227–239

    CAS  CrossRef  PubMed  Google Scholar 

  22. Georgiev I, Lilien RH, Donald BR (2008) The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J Comput Chem 29(10):1527–1542

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  23. Goldstein RF (1994) Efficient rotamer elimination applied to protein side-chains and related spin glasses. Biophys J 66(5):1335–1340

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  24. Pierce NA, Spriet JA, Desmet J, Mayo SL (2000) Conformational splitting: a more powerful criterion for dead-end elimination. J Comput Chem 21(11):999

    CAS  CrossRef  Google Scholar 

  25. Looger LL, Hellinga HW (2001) Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J Mol Biol 307(1):429–445. doi:10.1006/jmbi.2000.4424

    CAS  CrossRef  PubMed  Google Scholar 

  26. Georgiev I, Lilien RH, Donald BR (2006) Improved pruning algorithms and divide-and-conquer strategies for dead-end elimination, with application to protein design. Bioinformatics 22(14):E174–E183. doi:10.1093/bioinformatics/btl220

    CAS  CrossRef  PubMed  Google Scholar 

  27. Chen C-Y, Georgiev I, Anderson AC, Donald BR (2009) Computational structure-based redesign of enzyme activity. Proc Natl Acad Sci 106(10):3764–3769

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  28. Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, Reza F, Anderson AC, Richardson DC, Richardson JS, Donald BR (2013) Osprey: protein design with ensembles, flexibility, and provable algorithms. Methods Enzymol 523:87–107. doi:10.1016/B978-0-12-394292-0.00005-9

    CAS  CrossRef  PubMed  PubMed Central  Google Scholar 

  29. Schiex T, Fargier H, Verfaillie G (1995) Valued constraint satisfaction problems: hard and easy problems. Int Joint Conf Artif Intell 14:631–639

    Google Scholar 

  30. Cooper M, Schiex T (2004) Arc consistency for soft constraints. Artif Intell 154(1):199–227

    CrossRef  Google Scholar 

  31. Larrosa J, Schiex T (2004) Solving weighted CSP by maintaining arc consistency. Artif Intell 159(1):1–26

    CrossRef  Google Scholar 

  32. Cooper M, Givry Sd, Schiex T (2006) The quest for the best arc consistent closure in weighted CSP. In: 8th International CP-06 workshop on preferences and soft constraints, Nantes, France

    Google Scholar 

  33. Otten L, Dechter R (2012) Anytime {AND/OR} depth-first search for combinatorial optimization. Artif Intell Commun 25(3):211–227

    Google Scholar 

  34. Sontag D, Choe DK, Li Y (2012) Efficiently searching for frustrated cycles in {MAP} inference. AUAI Press, Corvallis, OR, pp 795–804

    Google Scholar 

  35. Allouche D, Traoré S, André I, de Givry S, Katsirelos G, Barbe S, Schiex T (2012) Computational protein design as a cost function network optimization problem CP 2012

    Google Scholar 

  36. Traoré S, Allouche D, André I, de Givry S, Katsirelos G, Schiex T, Barbe S (2013) A new framework for computational protein design through cost function network optimization. Bioinformatics. doi:10.1093/bioinformatics/btt374

    PubMed  Google Scholar 

  37. Koster AMCA, van Hoesel SPM, Kolen AWJ (1999) Solving frequency assignment problems via tree-decomposition. Electron Notes Discrete Math 3:102

    CrossRef  Google Scholar 

  38. Kingsford CL, Chazelle B, Singh M (2005) Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics (Oxford) 21(7):1028–1036. doi:10.1093/bioinformatics/bti144

    CAS  CrossRef  Google Scholar 

  39. Zhou Y, Wu Y, Zeng J (2015) Computational protein design using AND/OR branch-and-bound search. In: Przytycka TM (ed) Research in computational molecular biology, vol 9029, Lecture notes in computer science. Springer, New York, NY, pp 354–366. doi:10.1007/978-3-319-16706-0_36

    Google Scholar 

  40. Khoury GA, Smadbeck J, Kieslich CA, Floudas CA (2014) Protein folding and \emph{de novo} protein design for biotechnological applications. Trends Biotechnol 32(2):99–109

    CAS  CrossRef  PubMed  Google Scholar 

  41. Yanover C, Meltzer T, Weiss Y (2006) Linear programming relaxations and belief propagation—an empirical study. J Mach Learn Res 7:1887–1907

    Google Scholar 

  42. De Givry S, Heras F, Zytnicki M, Larrosa J (2005) Existential arc consistency: getting closer to full arc consistency in weighted CSPs. In: IJCAI'05 proceedings of the 19th international joint conference on Artificial intelligence

    Google Scholar 

  43. Lecoutre C, Saïs L, Tabary S, Vidal V (2009) Reasoning from last conflict(s) in constraint programming. Artif Intell 173:1592–1614

    CrossRef  Google Scholar 

  44. Dechter R, Mateescu R (2007) {AND/OR} search spaces for graphical models. Artif intell 171(2):73–106

    CrossRef  Google Scholar 

  45. Dechter R, Rish I (2003) Mini-buckets: a general scheme for bounded inference. J ACM 50(2):107–153

    CrossRef  Google Scholar 

  46. Schiex T (2000) Valued constraint networks. In: Proceedings of the 6th conference on principles and practice of constraint programming

    Google Scholar 

  47. Globerson A, Jaakkola TS (2007) Fixing max-product: convergent message passing algorithms for MAP LP-relaxations. In: NIPS’07 Proceedings of the 20th international conference on neural information processing systems, pp 553–560

    Google Scholar 

  48. Sontag D, Meltzer T, Globerson A, Weiss Y, Jaakkola T (2008) Tightening {LP} relaxations for {MAP} using message-passing. AUAI Press, Corvallis, OR, pp 503–510

    Google Scholar 

Download references

Acknowledgments

This work has been funded by a grant from INRA and the Region Midi-Pyrénées and the “Agence Nationale de la Recherche,” references ANR 10-BLA-0214 and ANR-12-MONU-0015-03. We thank the Computing Center of Region Midi-Pyrénées (CALMIP, Toulouse, France) and the GenoToul Bioinformatics Platform of INRA-Toulouse for providing computing resources and support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sophie Barbe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer Science+Business Media New York

About this protocol

Cite this protocol

Traoré, S., Allouche, D., André, I., Schiex, T., Barbe, S. (2017). Deterministic Search Methods for Computational Protein Design. In: Samish, I. (eds) Computational Protein Design. Methods in Molecular Biology, vol 1529. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6637-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6637-0_4

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6635-6

  • Online ISBN: 978-1-4939-6637-0

  • eBook Packages: Springer Protocols