Abstract
Traditional directed evolution experiments are often time-, labor- and cost-intensive because they involve repeated rounds of random mutagenesis and the selection or screening of large mutant libraries. The efficiency of directed evolution experiments can be significantly improved by targeting mutagenesis to a limited number of hot-spot positions and/or selecting a limited set of substitutions. The design of such “smart” libraries can be greatly facilitated by in silico analyses and predictions. Here we provide an overview of computational tools applicable for (a) the identification of hot-spots for engineering enzyme properties, and (b) the evaluation of predicted hot-spots and selection of suitable amino acids for substitutions. The selected tools do not require any specific expertise and can easily be implemented by the wider scientific community.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bornscheuer UT, Huisman GW, Kazlauskas RJ et al (2012) Engineering the third wave of biocatalysis. Nature 485:185–194
Chica RA, Doucet N, Pelletier JN (2005) Semi-rational approaches to engineering enzyme activity: combining the benefits of directed evolution and rational design. Curr Opin Biotechnol 16:378–384
Lutz S (2010) Beyond directed evolution–semi-rational protein engineering and design. Curr Opin Biotechnol 21:734–743
Bommarius AS, Blum JK, Abrahamson MJ (2011) Status of protein engineering for biocatalysts: how to design an industrially useful biocatalyst. Curr Opin Chem Biol 15:194–200
Goldsmith M, Tawfik DS (2012) Directed enzyme evolution: beyond the low-hanging fruit. Curr Opin Struct Biol 22:406–412
Reetz MT, Kahakeaw D, Lohmer R (2008) Addressing the numbers problem in directed evolution. Chembiochem 9:1797–1804
Reetz MT, Wu S (2008) Greatly reduced amino acid alphabets in directed evolution: making the right choice for saturation mutagenesis at homologous enzyme positions. Chem Commun 43:5499–5501. http://www.ncbi.nlm.nih.gov/pubmed/18997931
Barrozo A, Borstnar R, Marloie G et al (2012) Computational protein engineering: bridging the gap between rational design and laboratory evolution. Int J Mol Sci 13:12428–12460
Damborsky J, Brezovsky J (2014) Computational tools for designing and engineering enzymes. Curr Opin Chem Biol 19C:8–16
Chaparro-Riggers JF, Polizzi KM, Bommarius AS (2007) Better library design: data-driven protein engineering. Biotechnol J 2:180–191
Moore GL, Maranas CD (2004) Computational challenges in combinatorial library design for protein engineering. AIChE J 50:262–272
Wong TS, Roccatano D, Schwaneberg U (2007) Steering directed protein evolution: strategies to manage combinatorial complexity of mutant libraries. Environ Microbiol 9:2645–2659
Fox RJ, Huisman GW (2008) Enzyme optimization: moving from blind evolution to statistical exploration of sequence-function space. Trends Biotechnol 26:132–138
Verma R, Schwaneberg U, Roccatano D (2012) Computer-aided protein directed evolution: a review of web servers, databases and other computational tools for protein engineering. Comput Struct Biotechnol J 2:e201209008
Dalby PA (2003) Optimising enzyme function by directed evolution. Curr Opin Struct Biol 13:500–505
Hibbert EG, Dalby PA (2005) Directed evolution strategies for improved enzymatic performance. Microb Cell Fact 4:29
Morley KL, Kazlauskas RJ (2005) Improving enzyme properties: when are closer mutations better? Trends Biotechnol 23:231–237
Paramesvaran J, Hibbert EG, Russell AJ et al (2009) Distributions of enzyme residues yielding mutants with improved substrate specificities from two different directed evolution strategies. Protein Eng Des Sel 22:401–411
Laskowski RA, Swindells MB (2011) LigPlot+: multiple ligand-protein interaction diagrams for drug discovery. J Chem Inf Model 51:2778–2786
Stierand K, Rarey M (2010) Drawing the PDB: protein-ligand complexes in two dimensions. ACS Med Chem Lett 1:540–545
Sobolev V, Sorokine A, Prilusky J et al (1999) Automated analysis of interatomic contacts in proteins. Bioinformatics 15:327–332
Krissinel E, Henrick K (2007) Inference of macromolecular assemblies from crystalline state. J Mol Biol 372:774–797
Biesiada J, Porollo A, Velayutham P et al (2011) Survey of public domain software for docking simulations and virtual screening. Hum Genomics 5:497–505
Li X, Li Y, Cheng T et al (2010) Evaluation of the performance of four molecular docking programs on a diverse set of protein-ligand complexes. J Comput Chem 31:2109–2125
Morris GM, Lim-Wilby M (2008) Molecular docking. In: Kukol A (ed) Molecular modeling of proteins, Methods in molecular biology. Humana Press, Totowa, NJ, pp 3–23
McDonald IK, Thornton JM (1994) Satisfying hydrogen bonding potential in proteins. J Mol Biol 238:777–793
Sobolev V, Wade RC, Vriend G et al (1996) Molecular docking using surface complementarity. Proteins 25:120–129
Henrich S, Salo-Ahen OMH, Huang B et al (2010) Computational approaches to identifying and characterizing protein binding sites for ligand design. J Mol Recognit 23:209–219
Laurie ATR, Jackson RM (2006) Methods for the prediction of protein-ligand binding sites for structure-based drug design and virtual ligand screening. Curr Protein Pept Sci 7:395–406
Pérot S, Sperandio O, Miteva MA et al (2010) Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. Drug Discov Today 15:656–667
Dundas J, Ouyang Z, Tseng J et al (2006) CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic Acids Res 34:W116–W118
Zhang Z, Li Y, Lin B et al (2011) Identification of cavities on protein surface using multiple computational approaches for drug binding site prediction. Bioinformatics 27:2083–2088
Hendlich M, Rippmann F, Barnickel G (1997) LIGSITE: automatic and efficient detection of potential small molecule-binding sites in proteins. J Mol Graph Model 15:359–363
Laurie ATR, Jackson RM (2005) Q-SiteFinder: an energy-based method for the prediction of protein-ligand binding sites. Bioinformatics 21:1908–1916
Hernandez M, Ghersi D, Sanchez R (2009) SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res 37:W413–W416
Schmidtke P, Le Guilloux V, Maupetit J et al (2010) fpocket: online tools for protein ensemble pocket detection and tracking. Nucleic Acids Res 38:W582–W589
La D, Esquivel-Rodríguez J, Venkatraman V et al (2009) 3D-SURFER: software for high-throughput protein surface comparison and analysis. Bioinformatics 25:2843–2844
Binkowski TA, Naghibzadeh S, Liang J (2003) CASTp: Computed Atlas of Surface Topography of proteins. Nucleic Acids Res 31:3352–3355
Pavelka A, Chovancova E, Damborsky J (2009) HotSpot Wizard: a web server for identification of hot spots in protein engineering. Nucleic Acids Res 37:W376–W383
Porter CT, Bartlett GJ, Thornton JM (2004) The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucleic Acids Res 32:D129–D133
Magrane M, Uniprot Consortium (2011) UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford) 2011:bar009
Chovancova E, Pavelka A, Benes P et al (2012) CAVER 3.0: a tool for the analysis of transport pathways in dynamic protein structures. PLoS Comput Biol 8:e1002708
Ashkenazy H, Erez E, Martz E et al (2010) ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res 38:W529–W533
Huang B (2009) MetaPocket: a meta approach to improve protein ligand binding site prediction. OMICS 13:325–330
Prokop Z, Gora A, Brezovsky J et al (2012) Engineering of protein tunnels: keyhole-lock-key model for catalysis by the enzymes with buried active sites. In: Lutz S, Bornscheuer UT (eds) Protein engineering handbook, vol 3. Wiley-VCH, Weinheim, pp 421–464
Brezovsky J, Chovancova E, Gora A et al (2013) Software tools for identification, visualization and analysis of protein tunnels and channels. Biotechnol Adv. 31:38–49
Berka K, Hanák O, Sehnal D et al (2012) MOLEonline 2.0: interactive web-based analysis of biomacromolecular channels. Nucleic Acids Res 40:W222–W227
Yaffe E, Fishelovitch D, Wolfson HJ et al (2008) MolAxis: a server for identification of channels in macromolecules. Nucleic Acids Res 36:W210–W215
Becker OM, Karplus M (2005) Guide to biomolecular simulations. Springer, New York
Lindahl ER (2008) Molecular dynamics simulations. In: Kukol A (ed) Molecular modeling of proteins, Methods in molecular biology. Humana Press, Totowa, NJ, pp 3–23
Polizzi KM, Bommarius AS, Broering JM et al (2007) Stability of biocatalysts. Curr Opin Chem Biol 11:220–225
Reetz MT, Carballeira JD (2007) Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes. Nat Protoc 2:891–903
Reetz MT, Carballeira JD, Vogel A (2006) Iterative saturation mutagenesis on the basis of B factors as a strategy for increasing protein thermostability. Angew Chem Int Ed Engl 45:7745–7751
Koudelakova T, Chaloupkova R, Brezovsky J et al (2013) Engineering enzyme stability and resistance to an organic cosolvent by modification of residues in the access tunnel. Angew Chem Int Ed Engl 52(7):1959–1963. doi:10.1002/anie.201206708
Potapov V, Cohen M, Schreiber G (2009) Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details. Protein Eng Des Sel 22:553–560
Khan S, Vihinen M (2010) Performance of protein stability predictors. Hum Mutat 31:675–684
Thiltgen G, Goldstein RA (2012) Assessing predictors of changes in protein stability upon mutation using self-consistency. PLoS One 7:e46084
Dehouck Y, Kwasigroch JM, Gilis D et al (2011) PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 12:151
Dehouck Y, Grosfils A, Folch B et al (2009) Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0. Bioinformatics 25:2537–2543
Guerois R, Nielsen JE, Serrano L (2002) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320:369–387
Schymkowitz J, Borg J, Stricher F et al (2005) The FoldX web server: an online force field. Nucleic Acids Res 33:W382–W388
Van Durme J, Delgado J, Stricher F et al (2011) A graphical interface for the FoldX forcefield. Bioinformatics 27:1711–1712
Valdar WSJ (2002) Scoring residue conservation. Proteins 48:227–241
Johansson F, Toh H (2010) A comparative study of conservation and variation scores. BMC Bioinformatics 11:388
Mihalek I, Res I, Lichtarge O (2004) A family of evolution-entropy hybrid methods for ranking protein residues by importance. J Mol Biol 336:1265–1282
Morgan DH, Kristensen DM, Mittelman D et al (2006) ET viewer: an application for predicting and visualizing functional sites in protein structures. Bioinformatics 22:2049–2050
Ma B-G, Berezovsky IN (2010) The MBLOSUM: a server for deriving mutation targets and position-specific substitution rates. J Biomol Struct Dyn 28:415–419
Crooks GE, Hon G, Chandonia JM et al (2004) WebLogo: a sequence logo generator. Genome Res 14:1188–1190
Jochens H, Bornscheuer UT (2010) Natural diversity to guide focused directed evolution. Chembiochem 11:1861–1866
Mayrose I, Graur D, Ben-Tal N et al (2004) Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol Biol Evol 21:1781–1791
Pupko T, Bell RE, Mayrose I et al (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18:S71–S77
Goldenberg O, Erez E, Nimrod G et al (2009) The ConSurf-DB: pre-calculated evolutionary conservation profiles of protein structures. Nucleic Acids Res 37:D323–D327
Joosten RP, Te Beek TAH, Krieger E et al (2011) A series of PDB related databases for everyday needs. Nucleic Acids Res 39:D411–D419
Kuipers RK, Joosten H-J, Van Berkel WJH et al (2010) 3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities. Proteins 78:2101–2113
Kuipers R, Van den Bergh T, Joosten H-J et al (2010) Novel tools for extraction and validation of disease-related mutations applied to Fabry disease. Hum Mutat 31:1026–1032
Zhou H, Zhou Y (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11:2714–2726
Parthiban V, Gromiha MM, Schomburg D (2006) CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res 34:W239–W242
Capriotti E, Fariselli P, Casadio R (2005) I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 33:W306–W310
Ng PC, Henikoff S (2006) Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 7:61–80
Thusberg J, Olatubosun A, Vihinen M (2011) Performance of mutation pathogenicity prediction methods on missense variants. Hum Mutat 32:358–368
Thusberg J, Vihinen M (2009) Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Hum Mutat 30:703–714
Ng PC, Henikoff S (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31:3812–3814
Choi Y, Sims GE, Murphy S et al (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS One 7:e46688
Thomas PD, Campbell MJ, Kejariwal A et al (2003) PANTHER: a library of protein families and subfamilies indexed by function. Genome Res 13:2129–2141
Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4:1073–1081
Nov Y (2012) When second best is good enough: another probabilistic look at saturation mutagenesis. Appl Environ Microbiol 78:258–262
Firth AE, Patrick WM (2008) GLUE-IT and PEDEL-AA: new programmes for analyzing protein diversity in randomized libraries. Nucleic Acids Res 36:W281–W285
Firth AE, Patrick WM (2005) Statistics of protein library construction. Bioinformatics 21:3314–3315
Bohác M, Nagata Y, Prokop Z et al (2002) Halide-stabilizing residues of haloalkane dehalogenases studied by quantum mechanic calculations and site-directed mutagenesis. Biochemistry 41:14272–14280
Kleywegt GJ, Harris MR, Zou JY et al (2004) The Uppsala Electron-Density Server. Acta Crystallogr D Biol Crystallogr 60:2240–2249
Leis S, Schneider S, Zacharias M (2010) In silico prediction of binding sites on proteins. Curr Med Chem 17:1550–1562
Xin F, Radivojac P (2011) Computational methods for identification of functional residues in protein structures. Curr Protein Pept Sci 12:456–469
Do CB, Katoh K (2008) Protein multiple sequence alignment. In: Thompson JD, Ueffing M, Schaeffer-Reiss C (eds) Functional proteomics, Methods in molecular biology. Humana Press, Totowa, NJ, pp 379–413
Pirovano W, Heringa J (2008) Multiple sequence alignment. In: Keith JM (ed) Bioinformatics, Methods in molecular biology. Humana Press, Totowa, NJ, pp 143–161
Pei J (2008) Multiple protein sequence alignment. Curr Opin Struct Biol 18:382–386
Finn RD, Tate J, Mistry J et al (2008) The Pfam protein families database. Nucleic Acids Res 36:D281–D288
Sirim D, Wagner F, Wang L et al (2011) The Laccase Engineering Database: a classification and analysis system for laccases and related multicopper oxidases. Database (Oxford) 2011:bar006
Fischer M, Pleiss J (2003) The Lipase Engineering Database: a navigation and analysis tool for protein families. Nucleic Acids Res 31:319–321
Fischer M, Knoll M, Sirim D et al (2007) The Cytochrome P450 Engineering Database: a navigation and prediction tool for the cytochrome P450 protein family. Bioinformatics 23:2015–2017
Chaloupková R, Sýkorová J, Prokop Z et al (2003) Modification of activity and specificity of haloalkane dehalogenase from Sphingomonas paucimobilis UT26 by engineering of its entrance tunnel. J Biol Chem 278:52622–52628
Acknowledgements
The research work of the authors is supported by the Grant Agency of the Czech Republic (P207/12/0775 and P503/12/0572), the Czech Ministry of Education (LO1214, LH14027, CZ.1.07/2.3.00/30.0037), and the European Regional Development Fund (CZ.1.05/2.1.00/01.0001). MetaCentrum is acknowledged for providing access to computing facilities, supported by the Czech Ministry of Education of the Czech Republic (LM2010005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this protocol
Cite this protocol
Sebestova, E., Bendl, J., Brezovsky, J., Damborsky, J. (2014). Computational Tools for Designing Smart Libraries. In: Gillam, E., Copp, J., Ackerley, D. (eds) Directed Evolution Library Creation. Methods in Molecular Biology, vol 1179. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1053-3_20
Download citation
DOI: https://doi.org/10.1007/978-1-4939-1053-3_20
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1052-6
Online ISBN: 978-1-4939-1053-3
eBook Packages: Springer Protocols