Skip to main content

Learning Strategies in Protein Directed Evolution

  • Protocol
  • First Online:
Directed Evolution

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2461))

Abstract

Synthetic biology is a fast-evolving research field that combines biology and engineering principles to develop new biological systems for medical, pharmacological, and industrial applications. Synthetic biologists use iterative “design, build, test, and learn” cycles to efficiently engineer genetic systems that are reliable, reproducible, and predictable. Protein engineering by directed evolution can benefit from such a systematic engineering approach for various reasons. Learning can be carried out before starting, throughout or after finalizing a directed evolution project. Computational tools, bioinformatics, and scanning mutagenesis methods can be excellent starting points, while molecular dynamics simulations and other strategies can guide engineering efforts. Similarly, studying protein intermediates along evolutionary pathways offers fascinating insights into the molecular mechanisms shaped by evolution. The learning step of the cycle is not only crucial for proteins or enzymes that are not suitable for high-throughput screening or selection systems, but it is also valuable for any platform that can generate a large amount of data that can be aided by machine learning algorithms. The main challenge in protein engineering is to predict the effect of a single mutation on one functional parameter—to say nothing of several mutations on multiple parameters. This is largely due to nonadditive mutational interactions, known as epistatic effects—beneficial mutations present in a genetic background may not be beneficial in another genetic background. In this work, we provide an overview of experimental and computational strategies that can guide the user to learn protein function at different stages in a directed evolution project. We also discuss how epistatic effects can influence the success of directed evolution projects. Since machine learning is gaining momentum in protein engineering and the field is becoming more interdisciplinary thanks to collaboration between mathematicians, computational scientists, engineers, molecular biologists, and chemists, we provide a general workflow that familiarizes nonexperts with the basic concepts, dataset requirements, learning approaches, model capabilities and performance metrics of this intriguing area. Finally, we also provide some practical recommendations on how machine learning can harness epistatic effects for engineering proteins in an “outside-the-box” way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Change history

  • 17 April 2023

    A correction has been published.

References

  1. Arnold FH (2018) Directed evolution: bringing new chemistry to life. Angew Chem Int Ed 57(16):4143–4148. https://doi.org/10.1002/anie.201708408

    Article  CAS  Google Scholar 

  2. Reetz MT (2016) Directed evolution of selective enzymes. Wiley-VCH Verlag GmbH & Co KGaA, Weinheim

    Book  Google Scholar 

  3. Zeymer C, Hilvert D (2018) Directed evolution of protein catalysts. Annu Rev Biochem 87:131–157. https://doi.org/10.1146/annurev-biochem-062917-012034

    Article  CAS  PubMed  Google Scholar 

  4. Trudeau DL, Tawfik DS (2019) Protein engineers turned evolutionists—the quest for the optimal starting point. Curr Opin Biotechnol 60:46–52. https://doi.org/10.1016/j.copbio.2018.12.002

    Article  CAS  PubMed  Google Scholar 

  5. Sachsenhauser V, Bardwell JC (2018) Directed evolution to improve protein folding in vivo. Curr Opin Struct Biol 48:117–123. https://doi.org/10.1016/j.sbi.2017.12.003

    Article  CAS  PubMed  Google Scholar 

  6. Rodriguez EA, Campbell RE, Lin JY et al (2017) The growing and glowing toolbox of fluorescent and photoactive proteins. Trends Biochem Sci 42(2):111–129. https://doi.org/10.1016/j.tibs.2016.09.010

    Article  CAS  PubMed  Google Scholar 

  7. Tizei PAG, Csibra E, Torres L, Pinheiro VB (2016) Selection platforms for directed evolution in synthetic biology. Biochem Soc Trans 44(4):1165–1175. https://doi.org/10.1042/BST20160076

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Liu R, Liang L, Freed EF, Gill RT (2020) Directed evolution of CRISPR/Cas systems for precise gene editing. Trends Biotechnol 39(3):262–273. https://doi.org/10.1016/j.tibtech.2020.07.005

    Article  CAS  PubMed  Google Scholar 

  9. Packer MS, Liu DR (2015) Methods for the directed evolution of proteins. Nat Rev Genet 16(7):379–394. https://doi.org/10.1038/nrg3927

    Article  CAS  PubMed  Google Scholar 

  10. Molina-Espeja P, Viña-Gonzalez J, Gomez-Fernandez BJ et al (2016) Beyond the outer limits of nature by directed evolution. Biotechnol Adv 34(5):754–767. https://doi.org/10.1016/j.biotechadv.2016.03.008

    Article  PubMed  Google Scholar 

  11. Samish I (2017) The framework of computational protein design. Methods Mol Biol 1529:1–17. https://doi.org/10.1007/978-1-4939-6637-0_1

    Article  CAS  Google Scholar 

  12. Romero PA, Arnold FH (2009) Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 10(12):866–876. https://doi.org/10.1038/nrm2805

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Arnold FH (2019) Innovation by evolution: bringing new chemistry to life (Nobel lecture). Angew Chem Int Ed 58(41):14420–14426. https://doi.org/10.1002/anie.201907729

    Article  CAS  Google Scholar 

  14. Bornscheuer UT, Hauer B, Jaeger KE, Schwaneberg U (2019) Directed evolution empowered redesign of natural proteins for the sustainable production of chemicals and pharmaceuticals. Angew Chem Int Ed 58(1):36–40. https://doi.org/10.1002/anie.201812717

    Article  CAS  Google Scholar 

  15. Truppo MD (2017) Biocatalysis in the pharmaceutical industry: the need for speed. ACS Med Chem Lett 8(5):476–480. https://doi.org/10.1021/acsmedchemlett.7b00114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Fasim A, More VS, More SS (2021) Large-scale production of enzymes for biotechnology uses. Curr Opin Biotechnol 69:68–76. https://doi.org/10.1016/j.copbio.2020.12.002

    Article  CAS  PubMed  Google Scholar 

  17. Wu S, Snajdrova R, Moore JC et al (2021) Biocatalysis: enzymatic synthesis for industrial applications. Angew Chem Int Ed 60(1):88–119. https://doi.org/10.1002/anie.202006648

    Article  CAS  Google Scholar 

  18. Heckmann CM, Paradisi F (2020) Looking back: a short history of the discovery of enzymes and how they became powerful chemical tools. ChemCatChem 12(24):6082–6102. https://doi.org/10.1002/cctc.202001107

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Abdelraheem EMM, Busch H, Hanefeld U, Tonin F (2019) Biocatalysis explained: from pharmaceutical to bulk chemical production. React Chem Eng 4(11):1878–1894. https://doi.org/10.1039/c9re00301k

    Article  CAS  Google Scholar 

  20. Dvořák P, Nikel PI, Damborský J, de Lorenzo V (2017) Bioremediation 3.0: engineering pollutant-removing bacteria in the times of systemic biology. Biotechnol Adv 35(7):845–866. https://doi.org/10.1016/j.biotechadv.2017.08.001

    Article  CAS  PubMed  Google Scholar 

  21. Bernhardsgrütter I, Stoffel GM, Miller TE, Erb TJ (2021) CO2-converting enzymes for sustainable biotechnology: from mechanisms to application. Curr Opin Biotechnol 67:80–87. https://doi.org/10.1016/j.copbio.2021.01.003

    Article  CAS  PubMed  Google Scholar 

  22. Wei R, Tiso T, Bertling J et al (2020) Possibilities and limitations of biotechnological plastic degradation and recycling. Nat Catal 3(11):867–871. https://doi.org/10.1038/s41929-020-00521-w

    Article  CAS  Google Scholar 

  23. Woodley JM (2019) Accelerating the implementation of biocatalysis in industry. Appl Microbiol Biotechnol 103(12):4733–4739. https://doi.org/10.1007/s00253-019-09796-x

    Article  CAS  PubMed  Google Scholar 

  24. Hauer B (2020) Embracing Nature’s catalysts: a viewpoint on the future of biocatalysis. ACS Catal 10(15):8418–8427. https://doi.org/10.1021/acscatal.0c01708

    Article  CAS  Google Scholar 

  25. Wong TS, Tee KL (2020) A practical guide to protein engineering. Springer International Publishing, Cham

    Book  Google Scholar 

  26. Cameron DE, Bashor CJ, Collins JJ (2014) A brief history of synthetic biology. Nat Rev Microbiol 12(5):381–390. https://doi.org/10.1038/nrmicro3239

    Article  CAS  PubMed  Google Scholar 

  27. Nielsen J, Keasling JD (2016) Engineering cellular metabolism. Cell 164(6):1185–1197. https://doi.org/10.1016/j.cell.2016.02.004

    Article  CAS  PubMed  Google Scholar 

  28. Opgenorth P, Costello Z, Okada T et al (2019) Lessons from two design-build-test-learn cycles of dodecanol production in Escherichia coli aided by machine learning. ACS Synth Biol 8(6):1337–1351. https://doi.org/10.1021/acssynbio.9b00020

    Article  CAS  PubMed  Google Scholar 

  29. Carbonell P, Jervis AJ, Robinson CJ et al (2018) An automated design-build-test-learn pipeline for enhanced microbial production of fine chemicals. Commun Biol 1(1):66. https://doi.org/10.1038/s42003-018-0076-9

    Article  PubMed  PubMed Central  Google Scholar 

  30. Mate DM, Gonzalez-Perez D, Mateljak I et al (2017) The pocket manual of directed evolution: tips and tricks. In: Brahmachari G (ed) Biotechnology of microbial enzymes: production, biocatalysis and industrial applications. Elsevier Inc, Philadelphia, PA

    Google Scholar 

  31. Sayous V, Lubrano P, Li Y (1868) Acevedo-Rocha CG (2020) Unbiased libraries in protein directed evolution. Biochim Biophys Acta, Proteins Proteomics 2:140321. https://doi.org/10.1016/j.bbapap.2019.140321

    Article  CAS  Google Scholar 

  32. Firth AE, Patrick WM (2008) GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries. Nucleic Acids Res 36(Web Server Issue):W281–W285. https://doi.org/10.1093/nar/gkn226

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Denault M, Pelletier JN (2007) Protein library design and screening: working out the probabilities. Protein Eng Protoc 352:127–154

    Article  CAS  Google Scholar 

  34. Nov Y (2012) When second best is good enough: another probabilistic look at saturation mutagenesis. Appl Environ Microbiol 78(1):258–262. https://doi.org/10.1128/AEM.06265-11

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Hoebenreich S, Zilly FE, Acevedo-Rocha CG et al (2015) Speeding up directed evolution: combining the advantages of solid-phase combinatorial gene synthesis with statistically guided reduction of screening effort. ACS Synth Biol 4(3):317–331. https://doi.org/10.1021/sb5002399

    Article  CAS  PubMed  Google Scholar 

  36. Li A, Qu G, Sun Z, Reetz MT (2019) Statistical analysis of the benefits of focused saturation mutagenesis in directed evolution based on reduced amino acid alphabets. ACS Catal 9(9):7769–7778. https://doi.org/10.1021/acscatal.9b02548

    Article  CAS  Google Scholar 

  37. Tee KL, Wong TS (2013) Polishing the craft of genetic diversity creation in directed evolution. Biotechnol Adv 31:1707–1721

    Article  CAS  PubMed  Google Scholar 

  38. Li A, Acevedo-Rocha CG, Sun Z et al (2018) Beating bias in the directed evolution of proteins: combining high-fidelity on-chip solid-phase gene synthesis with efficient gene assembly for combinatorial library construction. ChemBioChem 19(3):221–228. https://doi.org/10.1002/cbic.201700540

    Article  CAS  PubMed  Google Scholar 

  39. She W, Ni J, Shui K et al (2018) Rapid and error-free site-directed mutagenesis by a PCR-free in vitro CRISPR/Cas9-mediated mutagenic system. ACS Synth Biol 7(9):2236–2244. https://doi.org/10.1021/acssynbio.8b00245

    Article  CAS  PubMed  Google Scholar 

  40. Ferla MP (2016) Mutanalyst, an online tool for assessing the mutational spectrum of epPCR libraries with poor sampling. BMC Bioinformatics 17(1):152. https://doi.org/10.1186/s12859-016-0996-7

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Hanson-Manful P, Patrick WM (2013) Construction and analysis of randomized protein-encoding libraries using error-prone PCR. Methods Mol Biol 996:251–267. https://doi.org/10.1007/978-1-62703-354-1_15

    Article  CAS  PubMed  Google Scholar 

  42. Acevedo-Rocha CG, Ferla M, Reetz MT (2018) Directed evolution of proteins based on mutational scanning. In: Bornscheuer U, Höhne M (eds) Protein engineering. Methods in molecular biology. Humana Press Inc, New York, NY

    Google Scholar 

  43. Sullivan B, Walton AZ, Stewart JD (2013) Library construction and evaluation for site saturation mutagenesis. Enzym Microb Technol 53(1):70–77. https://doi.org/10.1016/j.enzmictec.2013.02.012

    Article  CAS  Google Scholar 

  44. Acevedo-Rocha CG, Reetz MT, Nov Y (2015) Economical analysis of saturation mutagenesis experiments. Sci Rep 5:10654. https://doi.org/10.1038/srep10654

    Article  PubMed  PubMed Central  Google Scholar 

  45. Pourmir A, Johannes TW (2012) Directed evolution: selection of the host organism. Comput Struct Biotechnol J 2:e201209012. https://doi.org/10.5936/csbj.201209012

    Article  PubMed  PubMed Central  Google Scholar 

  46. Gonzalez-Perez D, Garcia-Ruiz E, Alcalde M (2012) Saccharomyces cerevisiae in directed evolution: an efficient tool to improve enzymes. Bioeng Bugs 3(3):172–177. https://doi.org/10.4161/bbug.19544

    Article  PubMed  PubMed Central  Google Scholar 

  47. Feránndez L, Jiao N, Soni P et al (2010) An efficient method for mutant library creation in Pichia pastoris useful in directed evolution. Biocatal Biotransforma 28(2):122–129. https://doi.org/10.3109/10242420903505834

    Article  CAS  Google Scholar 

  48. Boersma YL, Dröge MJ, Quax WJ (2007) Selection strategies for improved biocatalysts. FEBS J 274(9):2181–2195. https://doi.org/10.1111/j.1742-4658.2007.05782.x

    Article  CAS  PubMed  Google Scholar 

  49. Fox RJ, Davis SC, Mundorff EC et al (2007) Improving catalytic function by ProSAR-driven enzyme evolution. Nat Biotechnol 25(3):338–344. https://doi.org/10.1038/nbt1286

    Article  CAS  PubMed  Google Scholar 

  50. Yang KK, Wu Z, Arnold FH (2019) Machine-learning-guided directed evolution for protein engineering. Nat Methods 16(8):687–694. https://doi.org/10.1038/s41592-019-0496-6

    Article  CAS  PubMed  Google Scholar 

  51. Xiao H, Bao Z, Zhao H (2014) High throughput screening and selection methods for directed enzyme evolution. Ind Eng Chem Res 54(16):4011–4020. https://doi.org/10.1021/ie503060a

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Markel U, Essani KD, Besirlioglu V et al (2020) Advances in ultrahigh-throughput screening for directed enzyme evolution. Chem Soc Rev 49(1):233–262. https://doi.org/10.1039/c8cs00981c

    Article  CAS  PubMed  Google Scholar 

  53. Sheludko YV, Fessner WD (2020) Winning the numbers game in enzyme evolution—fast screening methods for improved biotechnology proteins. Curr Opin Struct Biol 63:123–133. https://doi.org/10.1016/j.sbi.2020.05.003

    Article  CAS  PubMed  Google Scholar 

  54. Stucki A, Vallapurackal J, Ward TR, Dittrich PS (2021) Droplet microfluidics and directed evolution of enzymes: an intertwined journey. Angew Chem Int Ed 60:24368. https://doi.org/10.1002/ange.202016154

    Article  CAS  Google Scholar 

  55. Ravikumar A, Arzumanyan GA, Obadi MKA et al (2018) Scalable, continuous evolution of genes at mutation rates above genomic error thresholds. Cell 175(7):1946–1957.e13. https://doi.org/10.1016/j.cell.2018.10.021

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Morrison MS, Podracky CJ, Liu DR (2020) The developing toolkit of continuous directed evolution. Nat Chem Biol 16(6):610–619. https://doi.org/10.1038/s41589-020-0532-y

    Article  CAS  PubMed  Google Scholar 

  57. Acevedo-Rocha CG, Agudo R, Reetz MT (2014) Directed evolution of stereoselective enzymes based on genetic selection as opposed to screening systems. J Biotechnol 191:3–10. https://doi.org/10.1016/j.jbiotec.2014.04.009

    Article  CAS  PubMed  Google Scholar 

  58. Qu G, Li A, Acevedo-Rocha CG et al (2020) The crucial role of methodology development in directed evolution of selective enzymes. Angew Chem Int Ed 59(32):13204–13231. https://doi.org/10.1002/anie.201901491

    Article  CAS  Google Scholar 

  59. Acevedo-Rocha CG, Hollmann F, Sanchis J, Sun Z (2020) A pioneering career in catalysis: Manfred T. Reetz. ACS Catal 10(24):15123–15139. https://doi.org/10.1021/acscatal.0c04108

    Article  CAS  Google Scholar 

  60. Reetz MT, Kahakeaw D, Lohmer R (2008) Addressing the numbers problem in directed evolution. ChemBioChem 9(11):1797–1804. https://doi.org/10.1002/cbic.200800298

    Article  CAS  PubMed  Google Scholar 

  61. Acevedo-Rocha CG, Reetz MT (2016) Handling the numbers problem in directed evolution. In: Svendsen AS (ed) Understanding enzymes; function, design, engineering and analysis. Jenny Stanford Publishing, Singapore

    Google Scholar 

  62. Currin A, Swainston N, Day PJ, Kell DB (2015) Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 44(5):1172–1239. https://doi.org/10.1039/c4cs00351a

    Article  CAS  PubMed  Google Scholar 

  63. Li G, Qin Y, Fontaine NT et al (2021) Machine learning enables selection of epistatic enzyme mutants for stability against unfolding and detrimental aggregation. ChemBioChem 22(5):904–914. https://doi.org/10.1002/cbic.202000612

    Article  CAS  PubMed  Google Scholar 

  64. Biswas S, Khimulya G, Alley EC et al (2021) Low-N protein engineering with data-efficient deep learning. Nat Methods 18(4):389–396. https://doi.org/10.1038/s41592-021-01100-y

    Article  CAS  PubMed  Google Scholar 

  65. Xu Y, Verma D, Sheridan RP et al (2020) Deep dive into machine learning models for protein engineering. J Chem Inf Model 60(6):2773–2790

    Article  CAS  PubMed  Google Scholar 

  66. Bedbrook CN, Yang KK, Rice AJ et al (2017) Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization. PLoS Comput Biol 13(10):e1005786

    Article  PubMed  PubMed Central  Google Scholar 

  67. Romero PA, Krause A, Arnold FH (2013) Navigating the protein fitness landscape with Gaussian processes. Proc Natl Acad Sci U S A 110(3):e193. https://doi.org/10.1073/pnas.1215251110

    Article  PubMed  Google Scholar 

  68. Repecka D, Jauniskis V, Karpus L et al (2021) Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell 3(4):324–333. https://doi.org/10.1038/s42256-021-00310-5

    Article  Google Scholar 

  69. Saito Y, Oikawa M, Nakazawa H et al (2018) Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins. ACS Synth Biol 7(9):2014–2022. https://doi.org/10.1021/acssynbio.8b00155

    Article  CAS  PubMed  Google Scholar 

  70. Bedbrook CN, Yang KK, Robinson JE et al (2019) Machine learning-guided channel rhodopsin engineering enables minimally invasive optogenetics. Nat Methods 16(11):1176–1184. https://doi.org/10.1038/s41592-019-0583-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Biswas S, Khimulya G, Alley EC et al (2020) Low-N protein engineering with data-efficient deep learning. bioRxiv. https://doi.org/10.1101/2020.01.23.917682

  72. Cadet F, Fontaine N, Li G et al (2018) A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes. Sci Rep 8(1):1–15. https://doi.org/10.1038/s41598-018-35033-y

    Article  CAS  Google Scholar 

  73. Riesselman AJ, Ingraham JB, Marks DS (2018) Deep generative models of genetic variation capture the effects of mutations. Nat Methods 15(10):816–822. https://doi.org/10.1038/s41592-018-0138-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Ogden PJ, Kelsic ED, Sinai S, Church GM (2019) Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design. Science 366(6469):1139–1143. https://doi.org/10.1126/science.aaw2900

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Liao J, Warmuth MK, Govindarajan S et al (2007) Engineering proteinase K using machine learning and synthetic genes. BMC Biotechnol 7(1):16. https://doi.org/10.1186/1472-6750-7-16

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Wu Z, Yang KK, Liszka MJ et al (2020) Signal peptides generated by attention-based neural networks. ACS Synth Biol 9(8):2154–2161. https://doi.org/10.1021/acssynbio.0c00219

    Article  CAS  PubMed  Google Scholar 

  77. Alley EC, Khimulya G, Biswas S et al (2019) Unified rational protein engineering with sequence-based deep representation learning. Nat Methods 16(12):1315–1322. https://doi.org/10.1038/s41592-019-0598-1

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Cadet F, Fontaine N, Vetrivel I et al (2018) Application of fourier transform and proteochemometrics principles to protein engineering. BMC Bioinformatics 19(1):382. https://doi.org/10.1186/s12859-018-2407-8

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Thompson MC, Barad BA, Wolff AM et al (2019) Temperature-jump solution X-ray scattering reveals distinct motions in a dynamic enzyme. Nat Chem 11(11):1058–1066. https://doi.org/10.1038/s41557-019-0329-3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Van Den Bedem H, Fraser JS (2015) Integrative, dynamic structural biology at atomic resolution - it’s about time. Nat Methods 12:307–318

    Article  PubMed  PubMed Central  Google Scholar 

  82. Planas-Iglesias J, Marques SM, Pinto GP et al (2021) Computational design of enzymes for biotechnological applications. Biotechnol Adv 47:107696. https://doi.org/10.1016/j.biotechadv.2021.107696

    Article  CAS  PubMed  Google Scholar 

  83. Kiss G, Çelebi-Ölçüm N, Moretti R et al (2013) Computational enzyme design. Angew Chem Int Ed 52(22):5700–5725. https://doi.org/10.1002/anie.201204077

    Article  CAS  Google Scholar 

  84. Ruiz-Carmona S, Schmidtke P, Luque FJ et al (2017) Dynamic undocking and the quasi-bound state as tools for drug discovery. Nat Chem 9(3):201–206. https://doi.org/10.1038/nchem.2660

    Article  CAS  PubMed  Google Scholar 

  85. Leman JK, Weitzner BD, Lewis SM et al (2020) Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat Methods 17(7):665–680. https://doi.org/10.1038/s41592-020-0848-2

    Article  CAS  PubMed  Google Scholar 

  86. Waterhouse A, Bertoni M, Bienert S et al (2018) SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46(W1):W296–W303. https://doi.org/10.1093/nar/gky427

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Kelley LA, Mezulis S, Yates CM et al (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 10(6):845–858. https://doi.org/10.1038/nprot.2015.053

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Yang J, Yan R, Roy A et al (2014) The I-TASSER suite: protein structure and function prediction. Nat Methods 12(1):7–8. https://doi.org/10.1038/nmeth.3213

    Article  CAS  Google Scholar 

  89. Yang G, Miton CM, Tokuriki N (2020) A mechanistic view of enzyme evolution. Protein Sci 29(8):1724–1747. https://doi.org/10.1002/pro.3901

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Osuna S (2020) The challenge of predicting distal active site mutations in computational enzyme design. WIREs Comput Mol Sci 11(3):e1502. https://doi.org/10.1002/wcms.1502

    Article  CAS  Google Scholar 

  91. Crean RM, Gardner JM, Kamerlin SCL (2020) Harnessing conformational plasticity to generate designer enzymes. J Am Chem Soc 142(26):11324–11342. https://doi.org/10.1021/jacs.0c04924

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Nett N, Duewel S, Richter AA, Hoebenreich S (2017) Revealing additional stereocomplementary pairs of old yellow enzymes by rational transfer of engineered residues. ChemBioChem 18(7):685–691. https://doi.org/10.1002/cbic.201600688

    Article  CAS  PubMed  Google Scholar 

  93. Toogood HS, Scrutton NS (2018) Discovery, characterization, engineering, and applications of ene-reductases for industrial biocatalysis. ACS Catal 8(4):3532–3549. https://doi.org/10.1021/acscatal.8b00624

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Burley SK, Berman HM, Bhikadiya C et al (2019) Protein data bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 47(D1):D520–D528. https://doi.org/10.1093/nar/gky949

    Article  CAS  Google Scholar 

  95. Bateman A (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515. https://doi.org/10.1093/nar/gky1049

    Article  CAS  Google Scholar 

  96. Chang A, Jeske L, Ulbrich S et al (2021) BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res 49(D1):D498–D508. https://doi.org/10.1093/nar/gkaa1025

    Article  CAS  PubMed  Google Scholar 

  97. Finnigan W, Hepworth LJ, Flitsch SL, Turner NJ (2021) RetroBioCat as a computer-aided synthesis planning tool for biocatalytic reactions and cascades. Nat Catal 4(2):98–104. https://doi.org/10.1038/s41929-020-00556-z

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Bava KA, Gromiha MM, Uedaira H et al (2004) ProTherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Res 32(Suppl 1):D120–D121. https://doi.org/10.1093/nar/gkh082

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Kawabata T, Ota M, Nishikawa K (1999) The protein mutant database. Nucleic Acids Res 27:355–357

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Wang CY, Chang PM, Ary ML et al (2018) ProtaBank: a repository for protein design and engineering data. Protein Sci 27(6):1113–1124. https://doi.org/10.1002/pro.3406

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Mazurenko S, Prokop Z, Damborsky J (2020) Machine learning in enzyme engineering. ACS Catal 10(2):1210–1223. https://doi.org/10.1021/acscatal.9b04321

    Article  CAS  Google Scholar 

  102. Stourac J, Dubrava J, Musil M et al (2021) FireProtDB: database of manually curated protein stability data. Nucleic Acids Res 49(D1):D319–D324. https://doi.org/10.1093/nar/gkaa981

    Article  CAS  PubMed  Google Scholar 

  103. Acevedo-Rocha CG, Hoebenreich S, Reetz MT (2014) Iterative saturation mutagenesis: a powerful approach to engineer proteins by systematically simulating Darwinian evolution. Methods Mol Biol 1179:103–128. https://doi.org/10.1007/978-1-4939-1053-3_7

    Article  PubMed  Google Scholar 

  104. Reetz MT, Carballeira JD (2007) Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes. Nat Protoc 2(4):891–903. https://doi.org/10.1038/nprot.2007.72

    Article  CAS  PubMed  Google Scholar 

  105. Goldenzweig A, Goldsmith M, Hill SE et al (2016) Automated structure- and sequence-based design of proteins for high bacterial expression and stability. Mol Cell 63(2):337–346. https://doi.org/10.1016/j.molcel.2016.06.012

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Musil M, Konegger H, Hon J et al (2019) Computational design of stable and soluble biocatalysts. ACS Catal 9(2):1033–1054. https://doi.org/10.1021/acscatal.8b03613

    Article  CAS  Google Scholar 

  107. Gora A, Brezovsky J, Damborsky J (2013) Gates of enzymes. Chem Rev 113(8):5871–5923. https://doi.org/10.1021/cr300384w

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Sequeiros-Borja CE, Surpeta B, Brezovsky J (2021) Recent advances in user-friendly computational tools to engineer protein function. Brief Bioinform 22(3):1–15. https://doi.org/10.1093/bib/bbaa150

    Article  Google Scholar 

  109. Ashkenazy H, Erez E, Martz E et al (2010) ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res 38(Web Server Issue):W529–W533. https://doi.org/10.1093/nar/gkq399

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Kourist R, Jochens H, Bartsch S et al (2010) The α/β-hydrolase fold 3DM database (ABHDB) as a tool for protein engineering. ChemBioChem 11:1635–1643

    Article  CAS  PubMed  Google Scholar 

  111. Sumbalova L, Stourac J, Martinek T et al (2018) HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information. Nucleic Acids Res 46(W1):W356–W362. https://doi.org/10.1093/nar/gky417

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Höhne M, Schätzle S, Jochens H et al (2010) Rational assignment of key motifs for function guides in silico enzyme identification. Nat Chem Biol 6(11):807–813. https://doi.org/10.1038/nchembio.447

    Article  CAS  PubMed  Google Scholar 

  113. Marshall JR, Yao P, Montgomery SL et al (2020) Screening and characterization of a diverse panel of metagenomic imine reductases for biocatalytic reductive amination. Nat Chem 13:1–9. https://doi.org/10.1038/s41557-020-00606-w

    Article  CAS  Google Scholar 

  114. Davidi D, Shamshoum M, Guo Z et al (2020) Highly active rubiscos discovered by systematic interrogation of natural sequence diversity. EMBO J 39(18):e104081. https://doi.org/10.15252/embj.2019104081

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Alcalde M (2017) When directed evolution met ancestral enzyme resurrection. Microb Biotechnol 10(1):22–24. https://doi.org/10.1111/1751-7915.12452

    Article  PubMed  Google Scholar 

  116. Gumulya Y, Baek JM, Wun SJ et al (2018) Engineering highly functional thermostable proteins using ancestral sequence reconstruction. Nat Catal 1(11):878–888. https://doi.org/10.1038/s41929-018-0159-5

    Article  CAS  Google Scholar 

  117. Gomez-Fernandez BJ, Risso VA, Rueda A et al (2020) Ancestral resurrection and directed evolution of fungal mesozoic laccases. Appl Environ Microbiol 86(14):e00778. https://doi.org/10.1128/AEM.00778-20

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Kaltenbach M, Burke JR, Dindo M et al (2018) Evolution of chalcone isomerase from a noncatalytic ancestor. Nat Chem Biol 14(6):548–555. https://doi.org/10.1038/s41589-018-0042-3

    Article  CAS  PubMed  Google Scholar 

  119. Gamiz-Arco G, Gutierrez-Rus LI, Risso VA et al (2021) Heme-binding enables allosteric modulation in an ancient TIM-barrel glycosidase. Nat Commun 12(1):1–16. https://doi.org/10.1038/s41467-020-20630-1

    Article  CAS  Google Scholar 

  120. Gardner JM, Biler M, Risso VA et al (2020) Manipulating conformational dynamics to repurpose ancient proteins for modern catalytic functions. ACS Catal 10(9):4863–4870. https://doi.org/10.1021/acscatal.0c00722

    Article  CAS  Google Scholar 

  121. Visootsat A, Nakamura A, Wang TW, Iino R (2020) Combined approach to engineer a highly active mutant of processive chitinase hydrolyzing crystalline chitin. ACS Omega 5(41):26807–26816. https://doi.org/10.1021/acsomega.0c03911

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  122. Sun Z, Lonsdale R, Kong X-D et al (2015) Reshaping an enzyme binding pocket for enhanced and inverted stereoselectivity: use of smallest amino acid alphabets in directed evolution. Angew Chem 127(42):12587–12592. https://doi.org/10.1002/ange.201501809

    Article  Google Scholar 

  123. Sun Z, Lonsdale R, Wu L et al (2016) Structure-guided triple-code saturation mutagenesis: efficient tuning of the stereoselectivity of an epoxide hydrolase. ACS Catal 6(3):1590–1597. https://doi.org/10.1021/acscatal.5b02751

    Article  CAS  Google Scholar 

  124. Sun Z, Lonsdale R, Ilie A et al (2016) Catalytic asymmetric reduction of difficult-to-reduce ketones: triple-code saturation mutagenesis of an alcohol dehydrogenase. ACS Catal 6(3):1598–1605. https://doi.org/10.1021/acscatal.5b02752

    Article  CAS  Google Scholar 

  125. Li D, Wu Q, Reetz MT (2020) Focused rational iterative site-specific mutagenesis (FRISM). Methods Enzymol 643:225–242. https://doi.org/10.1016/bs.mie.2020.04.055

    Article  CAS  PubMed  Google Scholar 

  126. Van Der Meer JY, Poddar H, Baas BJ et al (2016) Using mutability landscapes of a promiscuous tautomerase to guide the engineering of enantioselective Michaelases. Nat Commun 7(1):10911. https://doi.org/10.1038/ncomms10911

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Guo C, Ni Y, Biewenga L et al (2021) Using mutability landscapes to guide enzyme thermostabilization. ChemBioChem 22(1):170–175. https://doi.org/10.1002/cbic.202000442

    Article  CAS  PubMed  Google Scholar 

  128. Acevedo-Rocha CG, Gamble CG, Lonsdale R et al (2018) P450-catalyzed regio- and diastereoselective steroid hydroxylation: efficient directed evolution enabled by mutability landscaping. ACS Catal 8(4):3395–3410. https://doi.org/10.1021/acscatal.8b00389

    Article  CAS  Google Scholar 

  129. Li A, Acevedo-Rocha CG, D’Amore L et al (2020) Regio- and stereoselective steroid hydroxylation at C7 by cytochrome P450 monooxygenase mutants. Angew Chem Int Ed 59(30):12499–12505. https://doi.org/10.1002/anie.202003139

    Article  CAS  Google Scholar 

  130. Nov Y, Fulton A, Jaeger KE (2013) Optimal scanning of all single-point mutants of a protein. J Comput Biol 20(12):990–997. https://doi.org/10.1089/cmb.2013.0026

    Article  CAS  PubMed  Google Scholar 

  131. Fowler DM, Fields S (2014) Deep mutational scanning: a new style of protein science. Nat Methods 11(8):801–807. https://doi.org/10.1038/nmeth.3027

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Romero PA, Tran TM, Abate AR (2015) Dissecting enzyme function with microfluidic-based deep mutational scanning. Proc Natl Acad Sci U S A 112(23):7159–7164. https://doi.org/10.1073/pnas.1422285112

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  133. Mehlhoff JD, Ostermeier M (2020) Biological fitness landscapes by deep mutational scanning. Methods Enzymol 643:203–224. https://doi.org/10.1016/bs.mie.2020.04.023

    Article  PubMed  Google Scholar 

  134. Song H, Bremer BJ, Hinds EC et al (2020) Inferring protein sequence-function relationships with large-scale positive-unlabeled learning. Cell Syst 12(1):92–101. https://doi.org/10.1016/j.cels.2020.10.007

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Tang Q, Grathwol CW, Aslan-Üzel AS et al (2021) Directed evolution of a halide methyltransferase enables biocatalytic synthesis of diverse SAM analogs. Angew Chem Int Ed 60(3):1524–1527. https://doi.org/10.1002/anie.202013871

    Article  CAS  Google Scholar 

  136. Orozco M (2014) A theoretical view of protein dynamics. Chem Soc Rev 43(14):5051–5066. https://doi.org/10.1039/C3CS60474H

    Article  CAS  PubMed  Google Scholar 

  137. Dodani SC, Kiss G, Cahn JKB et al (2016) Discovery of a regioselectivity switch in nitrating P450s guided by molecular dynamics simulations and Markov models. Nat Chem 8(5):419–425. https://doi.org/10.1038/nchem.2474

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Osuna S, Jiménez-Osés G, Noey EL, Houk KN (2015) Molecular dynamics explorations of active site structure in designed and evolved enzymes. Acc Chem Res 48(4):1080–1089. https://doi.org/10.1021/ar500452q

    Article  CAS  PubMed  Google Scholar 

  139. Childers MC, Daggett V (2017) Insights from molecular dynamics simulations for computational protein design. Mol Syst Des Eng 2(1):9–33. https://doi.org/10.1039/c6me00083e

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Bunzel HA, Anderson JLLR, Mulholland AJ (2021) Designing better enzymes: insights from directed evolution. Curr Opin Struct Biol 67:212–218. https://doi.org/10.1016/j.sbi.2020.12.015

    Article  CAS  PubMed  Google Scholar 

  141. Sandström AG, Wikmark Y, Engström K et al (2012) Combinatorial reshaping of the Candida antarctica lipase A substrate pocket for enantioselectivity using an extremely condensed library. Proc Natl Acad Sci 109(1):78–83. https://doi.org/10.1073/pnas.1111537108

    Article  PubMed  Google Scholar 

  142. Tokuriki N, Jackson CJ, Afriat-Jurnou L et al (2012) Diminishing returns and tradeoffs constrain the laboratory optimization of an enzyme. Nat Commun 3:1257. https://doi.org/10.1038/ncomms2246

    Article  CAS  PubMed  Google Scholar 

  143. Kaltenbach M, Tokuriki N (2014) Dynamics and constraints of enzyme evolution. J Exp Zool Part B Mol Dev Evol 322(7):468–487. https://doi.org/10.1002/jez.b.22562

    Article  CAS  Google Scholar 

  144. Goldsmith M, Aggarwal N, Ashani Y et al (2017) Overcoming an optimization plateau in the directed evolution of highly efficient nerve agent bioscavengers. Protein Eng Des Sel 30(4):333–345. https://doi.org/10.1093/protein/gzx003

    Article  CAS  PubMed  Google Scholar 

  145. Götz AW, Williamson MJ, Xu D et al (2012) Routine microsecond molecular dynamics simulations with AMBER on GPUs. 1. Generalized born. J Chem Theory Comput 8(5):1542–1555. https://doi.org/10.1021/ct200909j

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Romero-Rivera A, Garcia-Borràs M, Osuna S (2017) Computational tools for the evaluation of laboratory-engineered biocatalysts. Chem Commun 53(2):284–297. https://doi.org/10.1039/C6CC06055B

    Article  CAS  Google Scholar 

  147. Yu H, Dalby PA (2020) A beginner’s guide to molecular dynamics simulations and the identification of cross-correlation networks for enzyme engineering. Methods Enzymol 643:15–49. https://doi.org/10.1016/bs.mie.2020.04.020

    Article  CAS  PubMed  Google Scholar 

  148. Marques SM, Planas-Iglesias J, Damborsky J (2020) Web-based tools for computational enzyme design. Preprints. https://doi.org/10.20944/preprints202012.0089.v1

  149. Cilia E, Pancsa R, Tompa P et al (2014) The DynaMine webserver: predicting protein dynamics from sequence. Nucleic Acids Res 42(W1):W264. https://doi.org/10.1093/nar/gku270

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  150. Obexer R, Godina A, Garrabou X et al (2017) Emergence of a catalytic tetrad during evolution of a highly active artificial aldolase. Nat Chem 9(1):50–56. https://doi.org/10.1038/nchem.2596

    Article  CAS  PubMed  Google Scholar 

  151. Broom A, Rakotoharisoa RV, Thompson MC et al (2020) Ensemble-based enzyme design can recapitulate the effects of laboratory directed evolution in silico. Nat Commun 11(1):4808. https://doi.org/10.1038/s41467-020-18619-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Li A, Wang B, Ilie A et al (2017) A redox-mediated Kemp eliminase. Nat Commun 8(1):1–8. https://doi.org/10.1038/ncomms14876

    Article  CAS  Google Scholar 

  153. Hong NS, Petrović D, Lee R et al (2018) The evolution of multiple active site configurations in a designed enzyme. Nat Commun 9(1):3900. https://doi.org/10.1038/s41467-018-06305-y

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Boehr DD, Nussinov R, Wright PE (2009) The role of dynamic conformational ensembles in biomolecular recognition. Nat Chem Biol 5(11):789–796. https://doi.org/10.1038/nchembio.232

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Otten R, Pádua RAP, Bunze HA et al (2020) How directed evolution reshapes the energy landscape in an enzyme to boost catalysis. Science 370(6523):1442–1446. https://doi.org/10.1126/science.abd3623

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Fasan R, Meharenna YT, Snow CD et al (2008) Evolutionary history of a specialized p450 propane monooxygenase. J Mol Biol 383(5):1069–1080. https://doi.org/10.1016/j.jmb.2008.06.060

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Li G, Zhang H, Sun Z et al (2016) Multiparameter optimization in directed evolution: engineering thermostability, enantioselectivity, and activity of an epoxide hydrolase. ACS Catal 6(6):3679–3687. https://doi.org/10.1021/acscatal.6b01113

    Article  CAS  Google Scholar 

  158. Ostafe R, Fontaine N, Frank D et al (2020) One-shot optimization of multiple enzyme parameters: tailoring glucose oxidase for pH and electron mediators. Biotechnol Bioeng 117(1):17–29. https://doi.org/10.1002/bit.27169

    Article  CAS  PubMed  Google Scholar 

  159. Schmidt-Dannert C, Arnold FH (1999) Directed evolution of industrial enzymes. Trends Biotechnol 17(4):135–136. https://doi.org/10.1016/S0167-7799(98)01283-9

    Article  CAS  PubMed  Google Scholar 

  160. Starr TN, Thornton JW (2016) Epistasis in protein evolution. Protein Sci 25(7):1204–1218. https://doi.org/10.1002/pro.2897

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. Reetz MT (2013) The importance of additive and non-additive mutational effects in protein engineering. Angew Chem Int Ed 52:2658–2666

    Article  CAS  Google Scholar 

  162. Acevedo-Rocha CG, Li A, D’Amore L et al (2021) Pervasive cooperative mutational effects on multiple catalytic enzyme traits emerge via long-range conformational dynamics. Nat Commun 12(1):1–13. https://doi.org/10.1038/s41467-021-21833-w

    Article  CAS  Google Scholar 

  163. Miton CM, Tokuriki N (2016) How mutational epistasis impairs predictability in protein evolution and design. Protein Sci 25(7):1260–1272. https://doi.org/10.1002/pro.2876

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  164. Bershtein S, Segal M, Bekerman R et al (2006) Robustness–epistasis link shapes the fitness landscape of a randomly drifting protein. Nature 444(7121):929–932

    Article  CAS  PubMed  Google Scholar 

  165. Weinreich DM, Delaney NF, DePristo MA, Hartl DL (2006) Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312(5770):111–114. https://doi.org/10.1126/science.1123539

    Article  CAS  PubMed  Google Scholar 

  166. Poelwijk FJ, Kiviet DJ, Weinreich DM, Tans SJ (2007) Empirical fitness landscapes reveal accessible evolutionary paths. Nature 445(7126):383–386. https://doi.org/10.1038/nature05451

    Article  CAS  PubMed  Google Scholar 

  167. Zhang Z-G, Lonsdale R, Sanchis J, Reetz MT (2014) Extreme synergistic mutational effects in the directed evolution of a Baeyer–Villiger monooxygenase as catalyst for asymmetric sulfoxidation. J Am Chem Soc 136(49):17262–17272. https://doi.org/10.1021/ja5098034

    Article  CAS  PubMed  Google Scholar 

  168. Reetz MT, Sanchis J (2008) Constructing and analyzing the fitness landscape of an experimental evolutionary process. ChemBioChem 9(14):2260–2267. https://doi.org/10.1002/cbic.200800371

    Article  CAS  PubMed  Google Scholar 

  169. Calzadiaz-Ramirez L, Calvó-Tusell C, Stoffel GMM et al (2020) In vivo selection for formate dehydrogenases with high efficiency and specificity toward NADP+. ACS Catal 10(14):7512–7525. https://doi.org/10.1021/acscatal.0c01487

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  170. Maynard Smith J (1970) Natural selection and the concept of a protein space. Nature 225(5232):563–564. https://doi.org/10.1038/225563a0

    Article  Google Scholar 

  171. Tracewell CA, Arnold FH (2009) Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Curr Opin Chem Biol 13(1):3–9. https://doi.org/10.1016/j.cbpa.2009.01.017

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  172. Vornholt T, Christoffel F, Pellizzoni MM et al (2021) Systematic engineering of artificial metalloenzymes for new-to-nature reactions. Sci Adv 7(4):eabe4208. https://doi.org/10.1126/sciadv.abe4208

    Article  CAS  PubMed  Google Scholar 

  173. Khersonsky O, Lipsh R, Avizemer Z et al (2018) Automated design of efficient and functionally diverse enzyme repertoires. Mol Cell 72(1):178–186.e5. https://doi.org/10.1016/j.molcel.2018.08.033

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  174. Miton CM, Chen JZ, Ost K et al (2020) Statistical analysis of mutational epistasis to reveal intramolecular interaction networks in proteins. Methods Enzymol 643:243–280. https://doi.org/10.1016/bs.mie.2020.07.012

    Article  CAS  PubMed  Google Scholar 

  175. Reetz MT, Soni P, Acevedo JP, Sanchis J (2009) Creation of an amino acid network of structurally coupled residues in the directed evolution of a thermostable enzyme. Angew Chem Int Ed 48(44):8268–8272. https://doi.org/10.1002/anie.200904209

    Article  CAS  Google Scholar 

  176. Yu H, Dalby PA (2018) Coupled molecular dynamics mediate long- and short-range epistasis between mutations that affect stability and aggregation kinetics. Proc Natl Acad Sci 115(47):E11043–E11052. https://doi.org/10.1073/pnas.1810324115

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  177. Dean J (2020) The deep learning revolution and its implications for computer architecture and chip design. In: Fujino L (ed) IEEE International Solid-State Circuits Conference. Institute of Electrical and Electronics Engineers Inc., San Francisco, CA

    Google Scholar 

  178. Muggleton S, King RD, Stenberg MJE (1992) Protein secondary structure prediction using logic-based machine learning. Protein Eng Des Sel 5(7):647–657. https://doi.org/10.1093/protein/5.7.647

    Article  CAS  Google Scholar 

  179. Li Y, Huang C, Ding L et al (2019) Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods 166:4–21

    Article  CAS  PubMed  Google Scholar 

  180. Li H, Tian S, Li Y et al (2020) Modern deep learning in bioinformatics. J Mol Cell Biol 12(11):823–827. https://doi.org/10.1093/jmcb/mjaa030

    Article  PubMed  PubMed Central  Google Scholar 

  181. Li G, Dong Y, Reetz MT (2019) Can machine learning revolutionize directed evolution of selective enzymes? Adv Synth Catal 361(11):2377–2386. https://doi.org/10.1002/adsc.201900149

    Article  CAS  Google Scholar 

  182. Wittmann BJ, Johnston KE, Wu Z, Arnold FH (2021) Advances in machine learning for directed evolution. Curr Opin Struct Biol 69:11–18. https://doi.org/10.1016/j.sbi.2021.01.008

    Article  CAS  PubMed  Google Scholar 

  183. Chowdhury R, Maranas CD (2020) From directed evolution to computational enzyme engineering—a review. AIChE J 66(3):e16847. https://doi.org/10.1002/aic.16847

    Article  CAS  Google Scholar 

  184. Siedhoff NE, Schwaneberg U, Davari MD (2020) Machine learning-assisted enzyme engineering. Methods Enzymol 643:281–315. https://doi.org/10.1016/bs.mie.2020.05.005

    Article  CAS  PubMed  Google Scholar 

  185. Service R (2020) ‘The game has changed.’ AI triumphs at solving protein structures. Science 370:1144. https://doi.org/10.1126/science.abf9367

    Article  PubMed  Google Scholar 

  186. Callaway E (2020) “It will change everything”: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 588:203–204

    Article  CAS  PubMed  Google Scholar 

  187. Jones MT (2018) Data, structure, and the data science pipeline. https://developer.ibm.com/articles/ba-intro-data-science-1/. Accessed 24 Apr 2021

  188. Lawrence N (2017) Data readiness levels. arXiv:170502245

    Google Scholar 

  189. Pestov V (2013) Is the k-NN classifier in high dimensions affected by the curse of dimensionality? Comput Math Appl 65(10):1427–1437. https://doi.org/10.1016/j.camwa.2012.09.011

    Article  Google Scholar 

  190. Ma F, Chung MT, Yao Y et al (2018) Efficient molecular evolution to generate enantioselective enzymes using a dual-channel microfluidic droplet screening platform. Nat Commun 9(1):1–8. https://doi.org/10.1038/s41467-018-03492-6

    Article  CAS  Google Scholar 

  191. Wittmann BJ, Yue Y, Arnold FH (2020) Machine learning-assisted directed evolution navigates a combinatorial epistatic fitness landscape with minimal screening burden. bioRxiv. https://doi.org/10.1101/2020.12.04.408955

  192. Jun Z, Bin L (2019) A review on the recent developments of sequence-based protein feature extraction methods. Curr Bioinforma 14(3):190–199. https://doi.org/10.2174/1574893614666181212102749

    Article  CAS  Google Scholar 

  193. Rawi R, Mall R, Kunji K et al (2018) PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. Bioinformatics 34(7):1092–1098. https://doi.org/10.1093/bioinformatics/btx662

    Article  CAS  PubMed  Google Scholar 

  194. Ding X, Zou Z, Brooks CL (2019) Deciphering protein evolution and fitness landscapes with latent space models. Nat Commun 10(1):1–13. https://doi.org/10.1038/s41467-019-13633-0

    Article  CAS  Google Scholar 

  195. Linder J, Bogard N, Rosenberg AB, Seelig G (2020) A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences. Cell Syst 11(1):49–62.e16. https://doi.org/10.1016/j.cels.2020.05.007

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  196. Lu AX, Zhang H, Ghassemi M, Moses A (2020) Self-supervised contrastive learning of protein representations by mutual information maximization. bioRxiv. https://doi.org/10.1101/2020.09.04.283929

  197. Rives A, Goyal S, Meier J et al (2019) Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv:622803. https://doi.org/10.1101/622803

  198. Madani A, Mccann B, Naik N et al (2020) ProGen: language modeling for protein generation. arXiv:200403497

    Google Scholar 

  199. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, Cambridge, MA

    Google Scholar 

  200. Angermueller C, Research G, Dohan D et al (n.d.) Model-based reinforcement learning for biological sequence design. Under review

    Google Scholar 

  201. Markova K, Chmelova K, Marques SM et al (2020) Decoding the intricate network of molecular interactions of a hyperstable engineered biocatalyst. Chem Sci 11(41):11162–11178. https://doi.org/10.1039/d0sc03367g

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  202. Hie B, Bryson BD, Berger B (2020) Leveraging uncertainty in machine learning accelerates biological discovery and design. Cell Syst 11(5):461–477.e9. https://doi.org/10.1016/j.cels.2020.09.007

    Article  CAS  PubMed  Google Scholar 

  203. Von Luxburg U, Schölkopf B (2011) Statistical learning theory: models, concepts, and results. In: Gabbay DM, Hartmann S, Woods J (eds) Handbook of the history of logic. North-Holland, Amsterdam

    Google Scholar 

  204. Hawkins DM (2004) The problem of overfitting. J Chem Inf Comput Sci 44:1–12

    Article  CAS  PubMed  Google Scholar 

  205. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge, MA

    Google Scholar 

  206. Meinshausen N (2006) Quantile regression forests. J Mach Learn Res 7:983–999

    Google Scholar 

  207. Shin J-E, Riesselman AJ, Kollasch AW et al (2021) Protein design and variant prediction using autoregressive generative models. Nat Commun 12(1):2403. https://doi.org/10.1038/s41467-021-22732-w

  208. Luo Y, Jiang G, Yu T et al (2021) ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat Commun 12(1):5743. https://doi.org/10.1038/s41467-021-25976-8

Download references

Acknowledgments

PEACCEL was supported through a research program partially cofunded by the European Union (UE) and Region Reunion (FEDER). The funding agencies had no influence on the research process. We thank Matteo Ferla, Marc Garcia-Borràs, Jiri Damborsky, and Manfred Reetz for their excellent comments on this work. XFC was supported by 2029 the UKRI CDT in AI for Healthcare http://ai4health.io (Grant 2030 No. P/S023283/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos G. Acevedo-Rocha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Cadet, X.F., Gelly, J.C., van Noord, A., Cadet, F., Acevedo-Rocha, C.G. (2022). Learning Strategies in Protein Directed Evolution. In: Currin, A., Swainston, N. (eds) Directed Evolution. Methods in Molecular Biology, vol 2461. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2152-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2152-3_15

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2151-6

  • Online ISBN: 978-1-0716-2152-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics