Skip to main content

Cost Function Networks to Solve Large Computational Protein Design Problems

  • Chapter
  • First Online:
Operations Research and Simulation in Healthcare

Abstract

Proteins are chains of simple molecules called amino acids. The sequence of amino acids in the chain defines the three-dimensional shape of the protein and ultimately its biochemical function. Over millions of years, living organisms have evolved a large catalog of proteins. By exploring the space of possible amino acid sequences, protein engineering aims at similarly designing tailored proteins with specific desirable properties such as therapeutic properties in biomedical engineering for healthcare purposes. In computational protein design (CPD), the challenge of identifying a protein that performs a given task is defined as the combinatorial optimization of a complex energy function over amino acid sequences. First, we introduce the CPD problem and some of the main approaches that have been used by structural biologists to solve it. The CPD problem can be formulated as a cost function network (CFN). We present some of the most efficient techniques in CFN. Overall, the CFN approach shows the best efficiency on these problems, improving by several orders of magnitude against the previous exact CPD-dedicated approaches and also against integer programming approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.rcsb.org/stats/distribution_residue-count.

References

  1. Allouche, D., André, I., Barbe, S., Davies, J., de Givry, S., Katsirelos, G., O’Sullivan, B., Prestwich, S., Schiex, T., Traoré, S.: Computational protein design as an optimization problem. Artificial Intelligence 212, 59–79 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  2. Allouche, D., de Givry, S., Katsirelos, G., Schiex, T., Zytnicki, M.: Anytime Hybrid Best-First Search with Tree Decomposition for Weighted CSP. In: Proc. of CP-15, pp. 12–28. Cork, Ireland (2015)

    Google Scholar 

  3. Allouche, D., Traoré, S., André, I., de Givry, S., Katsirelos, G., Barbe, S., Schiex, T.: Computational protein design as a cost function network optimization problem. In: Principles and Practice of Constraint Programming, pp. 840–849. Springer (2012)

    Google Scholar 

  4. Anfinsen, C.: Principles that govern the folding of protein chains. Science 181(4096), 223–253 (1973)

    Article  Google Scholar 

  5. Boas, F.E., Harbury, P.B.: Potential energy functions for protein design. Current opinion in structural biology 17(2), 199–204 (2007)

    Article  Google Scholar 

  6. Bowie, J.U., Luthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known three-dimensional structure. Science 253(5016), 164–170 (1991)

    Article  Google Scholar 

  7. Campeotto, F., Dal Palù, A., Dovier, A., Fioretto, F., Pontelli, E.: A constraint solver for flexible protein models. J. Artif. Int. Res. (JAIR) 48(1), 953–1000 (2013)

    Google Scholar 

  8. Carothers, J.M., Goler, J.A., Keasling, J.D.: Chemical synthesis using synthetic biology. Current opinion in biotechnology 20(4), 498–503 (2009)

    Article  Google Scholar 

  9. Case, D., Darden, T., Cheatham III, T., Simmerling, C., Wang, J., Duke, R., Luo, R., Merz, K., Pearlman, D., Crowley, M., Walker, R., Zhang, W., Wang, B., Hayik, S., Roitberg, A., Seabra, G., Wong, K., Paesani, F., Wu, X., Brozell, S., Tsui, V., Gohlke, H., Yang, L., Tan, C., Mongan, J., Hornak, V., Cui, G., Beroza, P., Mathews, D., Schafmeister, C., Ross, W., Kollman, P.: Amber 9. Tech. rep., University of California, San Francisco (2006)

    Google Scholar 

  10. Champion, E., André, I., Moulis, C., Boutet, J., Descroix, K., Morel, S., Monsan, P., Mulard, L.A., Remaud-Siméon, M.: Design of α-transglucosidases of controlled specificity for programmed chemoenzymatic synthesis of antigenic oligosaccharides. Journal of the American Chemical Society 131(21), 7379–7389 (2009)

    Article  Google Scholar 

  11. Charpentier, A., Mignon, D., Barbe, S., Cortes, J., Schiex, T., Simonson, T., Allouche, D.: Variable neighborhood search with cost function networks to solve large computational protein design problems. Journal of Chemical Information and Modeling 59(1), 127–136 (2019)

    Article  Google Scholar 

  12. Chowdry, A.B., Reynolds, K.A., Hanes, M.S., Voorhies, M., Pokala, N., Handel, T.M.: An object-oriented library for computational protein design. J. Comput. Chem. 28(14), 2378–2388 (2007)

    Article  Google Scholar 

  13. Cooper, M., de Givry, S., Sanchez, M., Schiex, T., Zytnicki, M., Werner, T.: Soft arc consistency revisited. Artificial Intelligence 174, 449–478 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cooper, M.C.: High-order consistency in Valued Constraint Satisfaction. Constraints 10, 283–305 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  15. Cooper, M.C., de Givry, S., Sánchez, M., Schiex, T., Zytnicki, M.: Virtual arc consistency for weighted CSP. In: Proc. of AAAI’08, vol. 8, pp. 253–258. Chicago, IL (2008)

    Google Scholar 

  16. Cooper, M.C., de Givry, S., Schiex, T.: Optimal soft arc consistency. In: Proc. of IJCAI’2007, pp. 68–73. Hyderabad, India (2007)

    Google Scholar 

  17. Cooper, M.C., Schiex, T.: Arc consistency for soft constraints. Artificial Intelligence 154(1-2), 199–227 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  18. Dahiyat, B.I., Mayo, S.L.: Protein design automation. Protein science 5(5), 895–903 (1996)

    Article  Google Scholar 

  19. Desmet, J., De Maeyer, M., Hazes, B., Lasters, I.: The dead-end elimination theorem and its use in protein side-chain positioning. Nature 356(6369), 539–42 (1992)

    Article  Google Scholar 

  20. Desmet, J., Spriet, J., Lasters, I.: Fast and accurate side-chain topology and energy refinement (FASTER) as a new method for protein structure optimization. Proteins 48(1), 31–43 (2002)

    Article  Google Scholar 

  21. Fersht, A.: Structure and mechanism in protein science: a guide to enzyme catalysis and protein folding. WH. Freeman and Co., New York (1999)

    Google Scholar 

  22. Fontaine, M., Loudni, S., Boizumault, P.: Exploiting tree decomposition for guiding neighborhoods exploration for VNS. RAIRO OR 47(2), 91–123 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  23. Freuder, E.C.: Eliminating interchangeable values in constraint satisfaction problems. In: Proc. of AAAI’91, pp. 227–233. Anaheim, CA (1991)

    Google Scholar 

  24. Friesen, A.L., Domingos, P.: Recursive decomposition for nonconvex optimization. In: Proc. of IJCAI’15, pp. 253–259. Buenos Aires, Argentina (2015)

    Google Scholar 

  25. Fritz, B.R., Timmerman, L.E., Daringer, N.M., Leonard, J.N., Jewett, M.C.: Biology by design: from top to bottom and back. BioMed Research International 2010 (2010)

    Google Scholar 

  26. Gainza, P., Roberts, K.E., Georgiev, I., Lilien, R.H., Keedy, D.A., Chen, C.Y., Reza, F., Anderson, A.C., Richardson, D.C., Richardson, J.S., et al.: Osprey: Protein design with ensembles, flexibility, and provable algorithms. Methods Enzymol (2012)

    Google Scholar 

  27. Georgiev, I., Lilien, R.H., Donald, B.R.: Improved Pruning algorithms and Divide-and-Conquer strategies for Dead-End Elimination, with application to protein design. Bioinformatics 22(14), e174–83 (2006)

    Article  Google Scholar 

  28. Georgiev, I., Lilien, R.H., Donald, B.R.: The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. Journal of computational chemistry 29(10), 1527–42 (2008)

    Article  MATH  Google Scholar 

  29. de Givry, S., Prestwich, S., O’Sullivan, B.: Dead-End Elimination for Weighted CSP. In: Proc. of CP-13, pp. 263–272. Uppsala, Sweden (2013)

    Google Scholar 

  30. de Givry, S., Schiex, T., Verfaillie, G.: Exploiting Tree Decomposition and Soft Local Consistency in Weighted CSP. In: Proc. of AAAI’06, pp. 22–27. Boston, MA (2006)

    Google Scholar 

  31. Goldstein, R.F.: Efficient rotamer elimination applied to protein side-chains and related spin glasses. Biophysical journal 66(5), 1335–40 (1994)

    Article  Google Scholar 

  32. Gront, D., Kulp, D.W., Vernon, R.M., Strauss, C.E., Baker, D.: Generalized fragment picking in Rosetta: design, protocols and applications. PloS one 6(8), e23294 (2011)

    Article  Google Scholar 

  33. Grunwald, I., Rischka, K., Kast, S.M., Scheibel, T., Bargel, H.: Mimicking biopolymers on a molecular scale: nano(bio)technology based on engineered proteins. Philosophical transactions. Series A, Mathematical, physical, and engineering sciences 367(1894), 1727–47 (2009)

    Google Scholar 

  34. Hallen, M.A., Keedy, D.A., Donald, B.R.: Dead-end elimination with perturbations (deeper): A provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins: Structure, Function, and Bioinformatics 81(1), 18–39 (2013)

    Article  Google Scholar 

  35. Harvey, W.D., Ginsberg, M.L.: Limited discrepancy search. In: Proc. of IJCAI’95. Montréal, Canada (1995)

    Google Scholar 

  36. Hawkins, G., Cramer, C., Truhlar, D.: Parametrized models of aqueous free energies of solvation based on pairwise descreening of solute atomic charges from a dielectric medium. The Journal of Physical Chemistry 100(51), 19824–19839 (1996)

    Article  Google Scholar 

  37. Hurley, B., O’Sullivan, B., Allouche, D., Katsirelos, G., Schiex, T., Zytnicki, M., de Givry, S.: Multi-Language Evaluation of Exact Solvers in Graphical Model Discrete Optimization. Constraints 21(3), 413–434 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  38. Janin, J., Wodak, S., Levitt, M., Maigret, B.: Conformation of amino acid side-chains in proteins. Journal of molecular biology 125(3), 357–386 (1978)

    Article  Google Scholar 

  39. Khalil, A.S., Collins, J.J.: Synthetic biology: applications come of age. Nature Reviews Genetics 11(5), 367–379 (2010)

    Article  Google Scholar 

  40. Khare, S.D., Kipnis, Y., Greisen, P., Takeuchi, R., Ashani, Y., Goldsmith, M., Song, Y., Gallaher, J.L., Silman, I., Leader, H., Sussman, J.L., Stoddard, B.L., Tawfik, D.S., Baker, D.: Computational redesign of a mononuclear zinc metalloenzyme for organophosphate hydrolysis. Nature chemical biology 8(3), 294–300 (2012)

    Article  Google Scholar 

  41. Khoury, G.A., Smadbeck, J., Kieslich, C.A., Floudas, C.A.: Protein folding and de novo protein design for biotechnological applications. Trends in biotechnology 32(2), 99–109 (2014)

    Article  Google Scholar 

  42. Kingsford, C.L., Chazelle, B., Singh, M.: Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics 21(7), 1028–36 (2005)

    Article  Google Scholar 

  43. Kuhlman, B., Baker, D.: Native protein sequences are close to optimal for their structures. Proceedings of the National Academy of Sciences of the United States of America 97(19), 10383–8 (2000)

    Article  Google Scholar 

  44. Larrosa, J.: On arc and node consistency in weighted CSP. In: Proc. of AAAI’02, pp. 48–53. Edmonton, CA (2002)

    Google Scholar 

  45. Larrosa, J., de Givry, S., Heras, F., Zytnicki, M.: Existential arc consistency: getting closer to full arc consistency in weighted CSPs. In: Proc. of IJCAI’05, pp. 84–89. Edinburgh, Scotland (2005)

    Google Scholar 

  46. Larrosa, J., Schiex, T.: Solving weighted CSP by maintaining arc consistency. Artificial Intelligence 159(1-2), 1–26 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  47. Leach, A.R., Lemon, A.P.: Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins 33(2), 227–39 (1998)

    Article  Google Scholar 

  48. Leaver-Fay, A., Tyka, M., Lewis, S.M., Lange, O.F., Thompson, J., Jacak, R., Kaufman, K., Renfrew, P.D., Smith, C.A., Sheffler, W., Davis, I.W., Cooper, S., Treuille, A., Mandell, D.J., Richter, F., Ban, Y.E.A., Fleishman, S.J., Corn, J.E., Kim, D.E., Lyskov, S., Berrondo, M., Mentzer, S., Popović, Z., Havranek, J.J., Karanicolas, J., Das, R., Meiler, J., Kortemme, T., Gray, J.J., Kuhlman, B., Baker, D., Bradley, P.: Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011)

    Google Scholar 

  49. Lecoutre, C., Roussel, O., Dehani, D.: WCSP integration of soft neighborhood substitutability. In: Proc. of CP’12, pp. 406–421. Quebec City, Canada (2012)

    Google Scholar 

  50. Lewis, J.C., Bastian, S., Bennett, C.S., Fu, Y., Mitsuda, Y., Chen, M.M., Greenberg, W.A., Wong, C.H., Arnold, F.H.: Chemoenzymatic elaboration of monosaccharides using engineered cytochrome p450bm3 demethylases. Proceedings of the National Academy of Sciences 106(39), 16550–16555 (2009)

    Article  Google Scholar 

  51. Looger, L.L., Hellinga, H.W.: Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. Journal of molecular biology 307(1), 429–45 (2001)

    Article  Google Scholar 

  52. Loudni, S., Boizumault, P.: Solving constraint optimization problems in anytime contexts. In: Proc. of IJCAI’03, pp. 251–256. Acapulco, Mexico (2003)

    Google Scholar 

  53. Lovell, S.C., Word, J.M., Richardson, J.S., Richardson, D.C.: The penultimate rotamer library. Proteins 40(3), 389–408 (2000)

    Article  Google Scholar 

  54. Martin, V.J., Pitera, D.J., Withers, S.T., Newman, J.D., Keasling, J.D.: Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nature biotechnology 21(7), 796–802 (2003)

    Article  Google Scholar 

  55. Mladenović, N., Hansen, P.: Variable Neighborhood Search. Comput. Oper. Res. 24(11), 1097–1100 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  56. Nestl, B.M., Nebel, B.A., Hauer, B.: Recent progress in industrial biocatalysis. Current Opinion in Chemical Biology 15(2), 187–193 (2011)

    Article  Google Scholar 

  57. Noguchi, H., Addy, C., Simoncini, D., Wouters, S., Mylemans, B., Van Meervelt, L., Schiex, T., Zhang, K.Y., Tame, J.R., Voet, A.R.: Computational design of symmetrical eight-bladed β-propeller proteins. IUCrJ 6(1) (2019)

    Google Scholar 

  58. O’Meara, M.J., Leaver-Fay, A., Tyka, M., Stein, A., Houlihan, K., DiMaio, F., Bradley, P., Kortemme, T., Baker, D., Snoeyink, J., Kuhlman, B.: A combined covalent-electrostatic model of hydrogen bonding improves structure prediction with rosetta. J. Chem. Theory Comput. 11(2), 609–622 (2015)

    Article  Google Scholar 

  59. Ouali, A., Allouche, D., de Givry, S., Loudni, S., Lebbah, Y., Eckhardt, F., Loukil, L.: Iterative Decomposition Guided Variable Neighborhood Search for Graphical Model Energy Minimization. In: Proc. of UAI’17, pp. 550–559. Sydney, Australia (2017)

    Google Scholar 

  60. Ouali, A., Allouche, D., de Givry, S., Loudni, S., Lebbah, Y., Loukil, L., Boizumault, P.: Variable neighborhood search for graphical model energy minimization. Artificial Intelligence (2019). https://doi.org/10.1016/j.artint.2019.103194

  61. Pabo, C.: Molecular technology. Designing proteins and peptides. Nature 301(5897), 200 (1983)

    Google Scholar 

  62. Peisajovich, S.G., Tawfik, D.S.: Protein engineers turned evolutionists. Nature methods 4(12), 991–4 (2007)

    Article  Google Scholar 

  63. Pierce, N., Spriet, J., Desmet, J., Mayo, S.: Conformational splitting: A more powerful criterion for dead-end elimination. Journal of computational chemistry 21(11), 999–1009 (2000)

    Article  Google Scholar 

  64. Pierce, N.A., Winfree, E.: Protein design is NP-hard. Protein engineering 15(10), 779–82 (2002)

    Article  Google Scholar 

  65. Pleiss, J.: Protein design in metabolic engineering and synthetic biology. Current opinion in biotechnology 22(5), 611–7 (2011)

    Article  Google Scholar 

  66. Raha, K., Wollacott, A.M., Italia, M.J., Desjarlais, J.R.: Prediction of amino acid sequence from structure. Protein science 9(6), 1106–19 (2000)

    Article  Google Scholar 

  67. Sánchez, M., de Givry, S., Schiex, T.: Mendelian error detection in complex pedigrees using weighted constraint satisfaction techniques. Constraints 13(1-2), 130–154 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  68. Schiex, T.: Arc consistency for soft constraints. In: Proc. of CP’00, pp. 411–424. Singapore (2000)

    Google Scholar 

  69. Simoncini, D., Allouche, D., de Givry, S., Delmas, C., Barbe, S., Schiex, T.: Guaranteed discrete energy optimization on large protein design problems. Journal of Chemical Theory and Computation 11(12), 5980–5989 (2015)

    Article  Google Scholar 

  70. Swain, M., Kemp, G.: A CLP approach to the protein side-chain placement problem. In: Principles and Practice of Constraint Programming–CP 2001, pp. 479–493. Springer (2001)

    Google Scholar 

  71. Terrioux, C., Jégou, P.: Hybrid backtracking bounded by tree-decomposition of constraint networks. Artificial Intelligence 146(1), 43–75 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  72. Traoré, S., Allouche, D., André, I., de Givry, S., Katsirelos, G., Schiex, T., Barbe, S.: A new framework for computational protein design through cost function network optimization. Bioinformatics 29(17), 2129–2136 (2013)

    Article  Google Scholar 

  73. Traoré, S., Roberts, K.E., Allouche, D., Donald, B.R., André, I., Schiex, T., Barbe, S.: Fast search algorithms for computational protein design. Journal of computational chemistry 37(12), 1048–1058 (2016)

    Article  Google Scholar 

  74. Verges, A., Cambon, E., Barbe, S., Salamone, S., Le Guen, Y., Moulis, C., Mulard, L.A., Remaud-Siméon, M., André, I.: Computer-aided engineering of a transglycosylase for the glucosylation of an unnatural disaccharide of relevance for bacterial antigen synthesis. ACS Catalysis 5(2), 1186–1198 (2015)

    Article  Google Scholar 

  75. Voigt, C.A., Gordon, D.B., Mayo, S.L.: Trading accuracy for speed: A quantitative comparison of search algorithms in protein sequence design. Journal of molecular biology 299(3), 789–803 (2000)

    Article  Google Scholar 

Download references

Acknowledgements

This work has been partly funded by the “Agence nationale de la Recherche” (ANR-10-BLA-0214, ANR-12-MONU-0015-03, and ANR-16-CE40-0028).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Schiex .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Allouche, D. et al. (2021). Cost Function Networks to Solve Large Computational Protein Design Problems. In: Masmoudi, M., Jarboui, B., Siarry, P. (eds) Operations Research and Simulation in Healthcare. Springer, Cham. https://doi.org/10.1007/978-3-030-45223-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45223-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45222-3

  • Online ISBN: 978-3-030-45223-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics