Deterministic Search Methods for Computational Protein Design

  • Seydou Traoré
  • David Allouche
  • Isabelle André
  • Thomas Schiex
  • Sophie Barbe
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1529)

Abstract

One main challenge in Computational Protein Design (CPD) lies in the exploration of the amino-acid sequence space, while considering, to some extent, side chain flexibility. The exorbitant size of the search space urges for the development of efficient exact deterministic search methods enabling identification of low-energy sequence-conformation models, corresponding either to the global minimum energy conformation (GMEC) or an ensemble of guaranteed near-optimal solutions. In contrast to stochastic local search methods that are not guaranteed to find the GMEC, exact deterministic approaches always identify the GMEC and prove its optimality in finite but exponential worst-case time. After a brief overview on these two classes of methods, we discuss the grounds and merits of four deterministic methods that have been applied to solve CPD problems. These approaches are based either on the Dead-End-Elimination theorem combined with A* algorithm (DEE/A*), on Cost Function Networks algorithms (CFN), on Integer Linear Programming solvers (ILP) or on Markov Random Fields solvers (MRF). The way two of these methods (DEE/A* and CFN) can be used in practice to identify low-energy sequence-conformation models starting from a pairwise decomposed energy matrix is detailed in this review.

Key words

Exact combinatorial optimization Global minimum energy conformation Near-optimal solutions Dead-end-elimination Cost function network Integer linear programming Markov random field 

References

  1. 1.
    Shapovalov MV, Dunbrack RL Jr (2011) A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19(6):844–858. doi:10.1016/j.str.2011.03.019 CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Bernstein FC, Koetzle TF, Williams GJ, Meyer EF Jr, Brice MD, Rodgers JR, Kennard O, Shimanouchi T, Tasumi M (1977) The Protein Data Bank. A computer-based archival file for macromolecular structures. Eur J Biochem 80(2):319–324CrossRefPubMedGoogle Scholar
  3. 3.
    Boas FE, Harbury PB (2007) Potential energy functions for protein design. Curr Opin Struct Biol 17(2):199–204. doi:10.1016/j.sbi.2007.03.006 CrossRefPubMedGoogle Scholar
  4. 4.
    Desmet J, De Maeyer M, Hazes B, Lasters I (1992) The dead-end elimination theorem and its use in protein side-chain positioning. Nature 356(6369):539–542CrossRefPubMedGoogle Scholar
  5. 5.
    Gainza P, Roberts KE, Donald BR (2012) Protein design using continuous rotamers. PLoS Comput Biol 8(1), e1002335CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Georgiev I, Donald BR (2007) Dead-end elimination with backbone flexibility. Bioinformatics 23(13):185–194CrossRefGoogle Scholar
  7. 7.
    Ma H, Keedy DA, Donald BR (2013) Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins 81(1):18–39. doi:10.1002/prot.24150 CrossRefGoogle Scholar
  8. 8.
    Pierce NA, Winfree E (2002) Protein design is NP-hard. Protein Eng 15(10):779–782. doi:10.1093/protein/15.10.779 CrossRefPubMedGoogle Scholar
  9. 9.
    Chazelle B, Kingsford C, Singh M (2004) A semidefinite programming approach to side chain positioning with new rounding strategies. Informs J Comput 16(4):380–392CrossRefGoogle Scholar
  10. 10.
    Kuhlman B, Baker D (2000) Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci U S A 97(19):10383–10388CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Voigt CA, Gordon DB, Mayo SL (2000) Trading accuracy for speed: a quantitative comparison of search algorithms in protein sequence design. J Mol Biol 299(3):789–803. doi:10.1006/jmbi.2000.3758 CrossRefPubMedGoogle Scholar
  12. 12.
    Raha K, Wollacott AM, Italia MJ, Desjarlais JR (2000) Prediction of amino acid sequence from structure. Protein Sci 9(6):1106–1119. doi:10.1110/ps.9.6.1106 CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Ogata K, Jaramillo A, Cohen W, Briand J, Conan F, Wodak S (2003) Automatic sequence design of MHC class-I binding peptides impairing CD8+ T cell recognition. J Biol Chem 278:1281CrossRefPubMedGoogle Scholar
  14. 14.
    Allen BD, Mayo SL (2006) Dramatic performance enhancements for the FASTER optimization algorithm. J Comput Chem 27(10):1071–1075CrossRefPubMedGoogle Scholar
  15. 15.
    Desmet J, Spriet J, Lasters I (2002) Fast and accurate side-chain topology and energy refinement (FASTER) as a new method for protein structure optimization. Proteins 48(1):31–43. doi:10.1002/prot.10131 CrossRefPubMedGoogle Scholar
  16. 16.
    Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban YE, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popovic Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P (2011) ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 487:545–574. doi:10.1016/B978-0-12-381270-4.00019-6 CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087CrossRefGoogle Scholar
  18. 18.
    Chowdry AB, Reynolds KA, Hanes MS, Voorhies M, Pokala N, Handel TM (2007) An object-oriented library for computational protein design. J Comput Chem 28(14):2378–2388. doi:10.1002/jcc.20727 CrossRefPubMedGoogle Scholar
  19. 19.
    Allouche D, André I, Barbe S, Davies J, de Givry S, Katsirelos G, O'Sullivan B, Prestwich S, Schiex T, Traoré S (2014) Computational protein design as an optimization problem. Artif Intell 212:59–79. doi:10.1016/j.artint.2014.03.005 CrossRefGoogle Scholar
  20. 20.
    Dahiyat BI, Mayo SL (1996) Protein design automation. Protein Sci 5(5):895–903CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Leach AR, Lemon AP (1998) Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins 33(2):227–239CrossRefPubMedGoogle Scholar
  22. 22.
    Georgiev I, Lilien RH, Donald BR (2008) The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J Comput Chem 29(10):1527–1542CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Goldstein RF (1994) Efficient rotamer elimination applied to protein side-chains and related spin glasses. Biophys J 66(5):1335–1340CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Pierce NA, Spriet JA, Desmet J, Mayo SL (2000) Conformational splitting: a more powerful criterion for dead-end elimination. J Comput Chem 21(11):999CrossRefGoogle Scholar
  25. 25.
    Looger LL, Hellinga HW (2001) Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J Mol Biol 307(1):429–445. doi:10.1006/jmbi.2000.4424 CrossRefPubMedGoogle Scholar
  26. 26.
    Georgiev I, Lilien RH, Donald BR (2006) Improved pruning algorithms and divide-and-conquer strategies for dead-end elimination, with application to protein design. Bioinformatics 22(14):E174–E183. doi:10.1093/bioinformatics/btl220 CrossRefPubMedGoogle Scholar
  27. 27.
    Chen C-Y, Georgiev I, Anderson AC, Donald BR (2009) Computational structure-based redesign of enzyme activity. Proc Natl Acad Sci 106(10):3764–3769CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Gainza P, Roberts KE, Georgiev I, Lilien RH, Keedy DA, Chen CY, Reza F, Anderson AC, Richardson DC, Richardson JS, Donald BR (2013) Osprey: protein design with ensembles, flexibility, and provable algorithms. Methods Enzymol 523:87–107. doi:10.1016/B978-0-12-394292-0.00005-9 CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Schiex T, Fargier H, Verfaillie G (1995) Valued constraint satisfaction problems: hard and easy problems. Int Joint Conf Artif Intell 14:631–639Google Scholar
  30. 30.
    Cooper M, Schiex T (2004) Arc consistency for soft constraints. Artif Intell 154(1):199–227CrossRefGoogle Scholar
  31. 31.
    Larrosa J, Schiex T (2004) Solving weighted CSP by maintaining arc consistency. Artif Intell 159(1):1–26CrossRefGoogle Scholar
  32. 32.
    Cooper M, Givry Sd, Schiex T (2006) The quest for the best arc consistent closure in weighted CSP. In: 8th International CP-06 workshop on preferences and soft constraints, Nantes, FranceGoogle Scholar
  33. 33.
    Otten L, Dechter R (2012) Anytime {AND/OR} depth-first search for combinatorial optimization. Artif Intell Commun 25(3):211–227Google Scholar
  34. 34.
    Sontag D, Choe DK, Li Y (2012) Efficiently searching for frustrated cycles in {MAP} inference. AUAI Press, Corvallis, OR, pp 795–804Google Scholar
  35. 35.
    Allouche D, Traoré S, André I, de Givry S, Katsirelos G, Barbe S, Schiex T (2012) Computational protein design as a cost function network optimization problem CP 2012Google Scholar
  36. 36.
    Traoré S, Allouche D, André I, de Givry S, Katsirelos G, Schiex T, Barbe S (2013) A new framework for computational protein design through cost function network optimization. Bioinformatics. doi:10.1093/bioinformatics/btt374 PubMedGoogle Scholar
  37. 37.
    Koster AMCA, van Hoesel SPM, Kolen AWJ (1999) Solving frequency assignment problems via tree-decomposition. Electron Notes Discrete Math 3:102CrossRefGoogle Scholar
  38. 38.
    Kingsford CL, Chazelle B, Singh M (2005) Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics (Oxford) 21(7):1028–1036. doi:10.1093/bioinformatics/bti144 CrossRefGoogle Scholar
  39. 39.
    Zhou Y, Wu Y, Zeng J (2015) Computational protein design using AND/OR branch-and-bound search. In: Przytycka TM (ed) Research in computational molecular biology, vol 9029, Lecture notes in computer science. Springer, New York, NY, pp 354–366. doi:10.1007/978-3-319-16706-0_36 Google Scholar
  40. 40.
    Khoury GA, Smadbeck J, Kieslich CA, Floudas CA (2014) Protein folding and \emph{de novo} protein design for biotechnological applications. Trends Biotechnol 32(2):99–109CrossRefPubMedGoogle Scholar
  41. 41.
    Yanover C, Meltzer T, Weiss Y (2006) Linear programming relaxations and belief propagation—an empirical study. J Mach Learn Res 7:1887–1907Google Scholar
  42. 42.
    De Givry S, Heras F, Zytnicki M, Larrosa J (2005) Existential arc consistency: getting closer to full arc consistency in weighted CSPs. In: IJCAI'05 proceedings of the 19th international joint conference on Artificial intelligenceGoogle Scholar
  43. 43.
    Lecoutre C, Saïs L, Tabary S, Vidal V (2009) Reasoning from last conflict(s) in constraint programming. Artif Intell 173:1592–1614CrossRefGoogle Scholar
  44. 44.
    Dechter R, Mateescu R (2007) {AND/OR} search spaces for graphical models. Artif intell 171(2):73–106CrossRefGoogle Scholar
  45. 45.
    Dechter R, Rish I (2003) Mini-buckets: a general scheme for bounded inference. J ACM 50(2):107–153CrossRefGoogle Scholar
  46. 46.
    Schiex T (2000) Valued constraint networks. In: Proceedings of the 6th conference on principles and practice of constraint programmingGoogle Scholar
  47. 47.
    Globerson A, Jaakkola TS (2007) Fixing max-product: convergent message passing algorithms for MAP LP-relaxations. In: NIPS’07 Proceedings of the 20th international conference on neural information processing systems, pp 553–560Google Scholar
  48. 48.
    Sontag D, Meltzer T, Globerson A, Weiss Y, Jaakkola T (2008) Tightening {LP} relaxations for {MAP} using message-passing. AUAI Press, Corvallis, OR, pp 503–510Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Seydou Traoré
    • 1
    • 2
    • 3
  • David Allouche
    • 4
  • Isabelle André
    • 1
    • 2
    • 3
  • Thomas Schiex
    • 4
  • Sophie Barbe
    • 1
    • 2
    • 3
  1. 1.INSA, UPS, INPUniversité de ToulouseToulouseFrance
  2. 2.Laboratoire d’Ingénierie Ingénierie des Systèmes Biologiques et des Procédés – INSAINRA, UMR792ToulouseFrance
  3. 3.CNRS, UMR5504ToulouseFrance
  4. 4.Unité de Mathématiques et Informatique de ToulouseUR 875, INRACastanet TolosanFrance

Personalised recommendations