Computational Design of DNA-Binding Proteins

Part of the Methods in Molecular Biology book series (MIMB, volume 1414)


Predicting the outcome of engineered and naturally occurring sequence perturbations to protein–DNA interfaces requires accurate computational modeling technologies. It has been well established that computational design to accommodate small numbers of DNA target site substitutions is possible. This chapter details the basic method of design used in the Rosetta macromolecular modeling program that has been successfully used to modulate the specificity of DNA-binding proteins. More recently, combining computational design and directed evolution has become a common approach for increasing the success rate of protein engineering projects. The power of such high-throughput screening depends on computational methods producing multiple potential solutions. Therefore, this chapter describes several protocols for increasing the diversity of designed output. Lastly, we describe an approach for building comparative models of protein–DNA complexes in order to utilize information from homologous sequences. These models can be used to explore how nature modulates specificity of protein–DNA interfaces and potentially can even be used as starting templates for further engineering.

Key words

Protein–DNA interactions Computational design Rosetta Specificity In silico prediction Direct readout Homology model 



The authors would like to thank Justin Ashworth, Phil Bradley, and Jim Havranek for their vast contributions to improving protein–DNA interface design, as well as the entire ROSETTA Commons community for contributions to the Rosetta code base. This work was supported by the US National Institutes of Health (#GM084433 and #RL1CA133832 to D.B.), the Foundation for the National Institutes of Health through the Gates Foundation Grand Challenges in Global Health Initiative, and the Howard Hughes Medical Institute.


  1. 1.
    Alibes A, Nadra AD, De Masi F, Bulyk ML, Serrano L, Stricher F (2010) Using protein design algorithms to understand the molecular basis of disease caused by protein–DNA interactions: the Pax6 example. Nucleic Acids Res 38(21):7422–7431. doi: 10.1093/nar/gkq683 CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Epstein DJ (2009) Cis-regulatory mutations in human disease. Brief Funct Genomics 8(4):310–316. doi: 10.1093/bfgp/elp021 CrossRefGoogle Scholar
  3. 3.
    VanderMeer JE, Ahituv N (2011) cis-regulatory mutations are a genetic cause of human limb malformations. Dev Dyn 240(5):920–930. doi: 10.1002/dvdy.22535 CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Muller PA, Vousden KH (2013) p53 mutations in cancer. Nat Cell Biol 15(1):2–8. doi: 10.1038/ncb2641 CrossRefPubMedGoogle Scholar
  5. 5.
    D'Elia AV, Tell G, Paron I, Pellizzari L, Lonigro R, Damante G (2001) Missense mutations of human homeoboxes: a review. Hum Mutat 18(5):361–374. doi: 10.1002/humu.1207 CrossRefPubMedGoogle Scholar
  6. 6.
    Wray GA (2007) The evolutionary significance of cis-regulatory mutations. Nat Rev Genet 8(3):206–216. doi: 10.1038/nrg2063 CrossRefPubMedGoogle Scholar
  7. 7.
    Wittkopp PJ, Kalay G (2012) Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet 13(1):59–69. doi: 10.1038/nrg3095 CrossRefGoogle Scholar
  8. 8.
    Borneman AR, Gianoulis TA, Zhang ZD, Yu H, Rozowsky J, Seringhaus MR, Wang LY, Gerstein M, Snyder M (2007) Divergence of transcription factor binding sites across related yeast species. Science 317(5839):815–819. doi: 10.1126/science.1140748 CrossRefPubMedGoogle Scholar
  9. 9.
    Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A, Kutter C, Watt S, Martinez-Jimenez CP, Mackay S, Talianidis I, Flicek P, Odom DT (2010) Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328(5981):1036–1040. doi: 10.1126/science.1186176 CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Prud'homme B, Gompel N, Carroll SB (2007) Emerging principles of regulatory evolution. Proc Natl Acad Sci U S A 104(Suppl 1):8605–8612. doi: 10.1073/pnas.0700488104 CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Leaver-Fay A, Tyka M, Lewis SM, Lange OF, Thompson J, Jacak R, Kaufman K, Renfrew PD, Smith CA, Sheffler W, Davis IW, Cooper S, Treuille A, Mandell DJ, Richter F, Ban YE, Fleishman SJ, Corn JE, Kim DE, Lyskov S, Berrondo M, Mentzer S, Popovic Z, Havranek JJ, Karanicolas J, Das R, Meiler J, Kortemme T, Gray JJ, Kuhlman B, Baker D, Bradley P (2011) ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 487:545–574. doi: 10.1016/B978-0-12-381270-4.00019-6 CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Dunbrack RL Jr, Cohen FE (1997) Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci 6(8):1661–1681. doi: 10.1002/pro.5560060807 CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Thyme SB, Baker D, Bradley P (2012) Improved modeling of side-chain--base interactions and plasticity in protein–DNA interface design. J Mol Biol 419(3-4):255–274. doi: 10.1016/j.jmb.2012.03.005 CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Rohs R, Jin X, West SM, Joshi R, Honig B, Mann RS (2010) Origins of specificity in protein–DNA recognition. Annu Rev Biochem 79:233–269. doi: 10.1146/annurev-biochem-060408-091030 CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Harteis S, Schneider S (2014) Making the bend: DNA tertiary structure and protein–DNA interactions. Int J Mol Sci 15(7):12335–12363. doi: 10.3390/ijms150712335 CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Ashworth J, Baker D (2009) Assessment of the optimization of affinity and specificity at protein–DNA interfaces. Nucleic Acids Res 37(10), e73. doi: 10.1093/nar/gkp242 CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Morozov AV, Havranek JJ, Baker D, Siggia ED (2005) Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res 33(18):5781–5798. doi: 10.1093/nar/gki875 CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Ashworth J, Havranek JJ, Duarte CM, Sussman D, Monnat RJ Jr, Stoddard BL, Baker D (2006) Computational redesign of endonuclease DNA binding and cleavage specificity. Nature 441(7093):656–659. doi: 10.1038/nature04818 CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Nadra AD, Serrano L, Alibes A (2011) DNA-binding specificity prediction with FoldX. Methods Enzymol 498:3–18. doi: 10.1016/B978-0-12-385120-8.00001-2 CrossRefPubMedGoogle Scholar
  20. 20.
    Thyme SB, Jarjour J, Takeuchi R, Havranek JJ, Ashworth J, Scharenberg AM, Stoddard BL, Baker D (2009) Exploitation of binding energy for catalysis and design. Nature 461(7268):1300–1304. doi: 10.1038/nature08508 CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Ulge UY, Baker DA, Monnat RJ Jr (2011) Comprehensive computational design of mCreI homing endonuclease cleavage specificity for genome engineering. Nucleic Acids Res 39(10):4330–4339. doi: 10.1093/nar/gkr022 CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Ashworth J, Taylor GK, Havranek JJ, Quadri SA, Stoddard BL, Baker D (2010) Computational reprogramming of homing endonuclease specificity at multiple adjacent base pairs. Nucleic Acids Res 38(16):5601–5608. doi: 10.1093/nar/gkq283 CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    O'Meara MJ, Leaver-Fay A, Tyka M, Stein A, Houlihan K, DiMaio F, Bradley P, Kortemme T, Baker D, Snoeyink J, Kuhlman B (2015) A combined covalent-electrostatic model of hydrogen bonding improves structure prediction with Rosetta. J Chem Theory Comput 11(2):609–622. doi: 10.1021/ct500864r CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Sheffler W, Baker D (2010) RosettaHoles2: a volumetric packing measure for protein structure refinement and validation. Protein Sci 19(10):1991–1995. doi: 10.1002/pro.458 CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Borgo B, Havranek JJ (2012) Automated selection of stabilizing mutations in designed and natural proteins. Proc Natl Acad Sci U S A 109(5):1494–1499. doi: 10.1073/pnas.1115172109 CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Lazaridis T, Karplus M (1999) Effective energy function for proteins in solution. Proteins 35(2):133–152CrossRefPubMedGoogle Scholar
  27. 27.
    Yanover C, Bradley P (2011) Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers. Nucleic Acids Res 39(11):4564–4576. doi: 10.1093/nar/gkr048 CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Li S, Bradley P (2013) Probing the role of interfacial waters in protein–DNA recognition using a hybrid implicit/explicit solvation model. Proteins 81(8):1318–1329. doi: 10.1002/prot.24272 CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Redondo P, Prieto J, Munoz IG, Alibes A, Stricher F, Serrano L, Cabaniols JP, Daboussi F, Arnould S, Perez C, Duchateau P, Paques F, Blanco FJ, Montoya G (2008) Molecular basis of xeroderma pigmentosum group C DNA recognition by engineered meganucleases. Nature 456(7218):107–111. doi: 10.1038/nature07343 CrossRefPubMedGoogle Scholar
  30. 30.
    Takeuchi R, Lambert AR, Mak AN, Jacoby K, Dickson RJ, Gloor GB, Scharenberg AM, Edgell DR, Stoddard BL (2011) Tapping natural reservoirs of homing endonucleases for targeted gene modification. Proc Natl Acad Sci U S A 108(32):13077–13082. doi: 10.1073/pnas.1107719108 CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Grizot S, Duclert A, Thomas S, Duchateau P, Paques F (2011) Context dependence between subdomains in the DNA binding interface of the I-CreI homing endonuclease. Nucleic Acids Res 39(14):6124–6136. doi: 10.1093/nar/gkr186 CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Fleishman SJ, Whitehead TA, Ekiert DC, Dreyfus C, Corn JE, Strauch EM, Wilson IA, Baker D (2011) Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332(6031):816–821. doi: 10.1126/science.1202617 CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Strauch EM, Fleishman SJ, Baker D (2014) Computational design of a pH-sensitive IgG binding protein. Proc Natl Acad Sci U S A 111(2):675–680. doi: 10.1073/pnas.1313605111 CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Azoitei ML, Correia BE, Ban YE, Carrico C, Kalyuzhniy O, Chen L, Schroeter A, Huang PS, McLellan JS, Kwong PD, Baker D, Strong RK, Schief WR (2011) Computation-guided backbone grafting of a discontinuous motif onto a protein scaffold. Science 334(6054):373–376. doi: 10.1126/science.1209368 CrossRefPubMedGoogle Scholar
  35. 35.
    Rothlisberger D, Khersonsky O, Wollacott AM, Jiang L, DeChancie J, Betker J, Gallaher JL, Althoff EA, Zanghellini A, Dym O, Albeck S, Houk KN, Tawfik DS, Baker D (2008) Kemp elimination catalysts by computational enzyme design. Nature 453(7192):190–195. doi: 10.1038/nature06879 CrossRefPubMedGoogle Scholar
  36. 36.
    Thyme SB, Boissel SJ, Arshiya Quadri S, Nolan T, Baker DA, Park RU, Kusak L, Ashworth J, Baker D (2014) Reprogramming homing endonuclease specificity through computational design and directed evolution. Nucleic Acids Res 42(4):2564–2576. doi: 10.1093/nar/gkt1212 CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Voigt CA, Mayo SL, Arnold FH, Wang ZG (2001) Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci U S A 98(7):3778–3783. doi: 10.1073/pnas.051614498 CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Chen MM, Snow CD, Vizcarra CL, Mayo SL, Arnold FH (2012) Comparison of random mutagenesis and semi-rational designed libraries for improved cytochrome P450 BM3-catalyzed hydroxylation of small alkanes. Protein Eng Des Sel 25(4):171–178. doi: 10.1093/protein/gzs004 CrossRefPubMedGoogle Scholar
  39. 39.
    Khersonsky O, Rothlisberger D, Wollacott AM, Murphy P, Dym O, Albeck S, Kiss G, Houk KN, Baker D, Tawfik DS (2011) Optimization of the in-silico-designed kemp eliminase KE70 by computational design and directed evolution. J Mol Biol 407(3):391–412. doi: 10.1016/j.jmb.2011.01.041 CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Jarjour J, West-Foyle H, Certo MT, Hubert CG, Doyle L, Getz MM, Stoddard BL, Scharenberg AM (2009) High-resolution profiling of homing endonuclease binding and catalytic specificity using yeast surface display. Nucleic Acids Res 37(20):6871–6880. doi: 10.1093/nar/gkp726 CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Takeuchi R, Certo M, Caprara MG, Scharenberg AM, Stoddard BL (2009) Optimization of in vivo activity of a bifunctional homing endonuclease and maturase reverses evolutionary degradation. Nucleic Acids Res 37(3):877–890. doi: 10.1093/nar/gkn1007 CrossRefPubMedPubMedCentralGoogle Scholar
  42. 42.
    Chames P, Epinat JC, Guillier S, Patin A, Lacroix E, Paques F (2005) In vivo selection of engineered homing endonucleases using double-strand break induced homologous recombination. Nucleic Acids Res 33(20), e178. doi: 10.1093/nar/gni175 CrossRefPubMedPubMedCentralGoogle Scholar
  43. 43.
    Doyon JB, Pattanayak V, Meyer CB, Liu DR (2006) Directed evolution and substrate specificity profile of homing endonuclease I-SceI. J Am Chem Soc 128(7):2477–2484. doi: 10.1021/ja057519l CrossRefPubMedGoogle Scholar
  44. 44.
    Havranek JJ, Baker D (2009) Motif-directed flexible backbone design of functional interactions. Protein Sci 18(6):1293–1305. doi: 10.1002/pro.142 CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Borgo B, Havranek JJ (2014) Motif-directed redesign of enzyme specificity. Protein Sci 23(3):312–320. doi: 10.1002/pro.2417 CrossRefPubMedPubMedCentralGoogle Scholar
  46. 46.
    Szeto MD, Boissel SJ, Baker D, Thyme SB (2011) Mining endonuclease cleavage determinants in genomic sequence data. J Biol Chem 286(37):32617–32627. doi: 10.1074/jbc.M111.259572 CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Thyme SB, Song Y, Brunette TJ, Szeto MD, Kusak L, Bradley P, Baker D (2014) Massively parallel determination and modeling of endonuclease substrate specificity. Nucleic Acids Res 42(22):13839–13852. doi: 10.1093/nar/gku1096 CrossRefPubMedPubMedCentralGoogle Scholar
  48. 48.
    Combs SA, Deluca SL, Deluca SH, Lemmon GH, Nannemann DP, Nguyen ED, Willis JR, Sheehan JH, Meiler J (2013) Small-molecule ligand docking into comparative models with Rosetta. Nat Protoc 8(7):1277–1298. doi: 10.1038/nprot.2013.074 CrossRefPubMedGoogle Scholar
  49. 49.
    Jha RK, Chakraborti S, Kern TL, Fox DT, Strauss CE (2015) Rosetta comparative modeling for library design: Engineering alternative inducer specificity in a transcription factor. Proteins. doi: 10.1002/prot.24828 PubMedGoogle Scholar
  50. 50.
    Thyme S, Baker D (2014) Redesigning the specificity of protein–DNA interactions with Rosetta. Methods Mol Biol 1123:265–282. doi: 10.1007/978-1-62703-968-0_17 CrossRefPubMedGoogle Scholar
  51. 51.
    Fleishman SJ, Leaver-Fay A, Corn JE, Strauch EM, Khare SD, Koga N, Ashworth J, Murphy P, Richter F, Lemmon G, Meiler J, Baker D (2011) RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS One 6(6):e20161. doi: 10.1371/journal.pone.0020161 CrossRefPubMedPubMedCentralGoogle Scholar
  52. 52.
    Havranek JJ, Harbury PB (2003) Automated design of specificity in molecular recognition. Nat Struct Biol 10(1):45–52. doi: 10.1038/nsb877 CrossRefPubMedGoogle Scholar
  53. 53.
    Mitchell M (1996) An introduction to genetic algorithms. Complex adaptive systems. MIT Press, Cambridge, MAGoogle Scholar
  54. 54.
    Coley DA (2010) An introduction to genetic algorithms for scientists and engineers. World Scientific, River Edge, NJGoogle Scholar
  55. 55.
    Canutescu AA, Dunbrack RL Jr (2003) Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci 12(5):963–972. doi: 10.1110/ps.0242703 CrossRefPubMedPubMedCentralGoogle Scholar
  56. 56.
    Wang C, Bradley P, Baker D (2007) Protein-protein docking with backbone flexibility. J Mol Biol 373(2):503–519. doi: 10.1016/j.jmb.2007.07.050 CrossRefPubMedGoogle Scholar
  57. 57.
    Smith CA, Kortemme T (2008) Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction. J Mol Biol 380(4):742–756. doi: 10.1016/j.jmb.2008.05.023 CrossRefPubMedPubMedCentralGoogle Scholar
  58. 58.
    Mandell DJ, Coutsias EA, Kortemme T (2009) Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Nat Methods 6(8):551–552. doi: 10.1038/nmeth0809-551 CrossRefPubMedPubMedCentralGoogle Scholar
  59. 59.
    Huang PS, Ban YE, Richter F, Andre I, Vernon R, Schief WR, Baker D (2011) RosettaRemodel: a generalized framework for flexible backbone protein design. PLoS One 6(8), e24109. doi: 10.1371/journal.pone.0024109 CrossRefPubMedPubMedCentralGoogle Scholar
  60. 60.
    Ollikainen N, Smith CA, Fraser JS, Kortemme T (2013) Flexible backbone sampling methods to model and design protein alternative conformations. Methods Enzymol 523:61–85. doi: 10.1016/B978-0-12-394292-0.00004-7 CrossRefPubMedPubMedCentralGoogle Scholar
  61. 61.
    Das R (2013) Atomic-accuracy prediction of protein loop structures through an RNA-inspired Ansatz. PLoS One 8(10):e74830. doi: 10.1371/journal.pone.0074830 CrossRefPubMedPubMedCentralGoogle Scholar
  62. 62.
    Song Y, DiMaio F, Wang RY, Kim D, Miles C, Brunette T, Thompson J, Baker D (2013) High-resolution comparative modeling with RosettaCM. Structure 21(10):1735–1742. doi: 10.1016/j.str.2013.08.005 CrossRefPubMedGoogle Scholar
  63. 63.
    Tange O (2011) GNU Parallel - the command-line power tool. The USENIX Magazine: pp. 42–47Google Scholar
  64. 64.
    Leaver-Fay A, O’Meara MJ, Tyka M, Jacak R, Song Y, Kellogg EH, Thompson J, Davis IW, Pache RA, Lyskov S, Gray JJ, Kortemme T, Richardson JS, Havranek JJ, Snoeyink J, Baker D, Kuhlman B (2013) Scientific benchmarks for guiding macromolecular energy function improvement. Methods Enzymol 523:109–143. doi: 10.1016/B978-0-12-394292-0.00006-0 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Molecular and Cellular BiologyHarvard UniversityCambridgeUSA
  2. 2.Department of BiochemistryUniversity of WashingtonSeattleUSA

Personalised recommendations