Redesigning the Specificity of Protein–DNA Interactions with Rosetta

  • Summer Thyme
  • David Baker
Part of the Methods in Molecular Biology book series (MIMB, volume 1123)


Building protein tools that can selectively bind or cleave specific DNA sequences requires efficient technologies for modifying protein–DNA interactions. Computational design is one method for accomplishing this goal. In this chapter, we present the current state of protein–DNA interface design with the Rosetta macromolecular modeling program. The LAGLIDADG endonuclease family of DNA-cleaving enzymes, under study as potential gene therapy reagents, has been the main testing ground for these in silico protocols. At this time, the computational methods are most useful for designing endonuclease variants that can accommodate small numbers of target site substitutions. Attempts to engineer for more extensive interface changes will likely benefit from an approach that uses the computational design results in conjunction with a high-throughput directed evolution or screening procedure. The family of enzymes presents an engineering challenge because their interfaces are highly integrated and there is significant coordination between the binding and catalysis events. Future developments in the computational algorithms depend on experimental feedback to improve understanding and modeling of these complex enzymatic features. This chapter presents both the basic method of design that has been successfully used to modulate specificity and more advanced procedures that incorporate DNA flexibility and other properties that are likely necessary for reliable modeling of more extensive target site changes.

Key words

Protein–DNA interactions Computational design Rosetta Specificity In silico prediction Gene targeting Direct readout 



The authors would like to thank Justin Ashworth, Phil Bradley, and Jim Havranek for their vast contributions to improving protein–DNA interface design, as well as the entire RosettaCommons community for contributions to the Rosetta code base. This work was supported by the US National Institutes of Health (#GM084433 and #RL1CA133832 to DB), the Foundation for the National Institutes of Health through the Gates Foundation Grand Challenges in Global Health Initiative, and the Howard Hughes Medical Institute.


  1. 1.
    Jin X, West SM, Joshi R, Honig B, Mann RS (2010) Origins of specificity in protein–DNA recognition. Annu Rev Biochem 79:233–269PubMedCentralPubMedCrossRefGoogle Scholar
  2. 2.
    Ashworth J, Baker D (2009) Assessment of optimization of affinity and specificity at protein–DNA interfaces. Nucleic Acids Res 37:e73PubMedCentralPubMedCrossRefGoogle Scholar
  3. 3.
    Morozov AV, Havranek JJ, Baker D, Siggia ED (2005) Protein–DNA binding specificity predictions with structural models. Nucleic Acids Res 33:5781–5798PubMedCentralPubMedCrossRefGoogle Scholar
  4. 4.
    Stoddard BL (2005) Homing endonuclease structure and function. Q Rev Biophys 38:39–95Google Scholar
  5. 5.
    Stoddard BL (2011) Homing endonucleases: from microbial genetic invaders to reagents for targeted DNA modification. Structure 19:7–15PubMedCentralPubMedCrossRefGoogle Scholar
  6. 6.
    Gao H et al (2010) Heritable targeted mutagenesis in maize using a designed endonuclease. Plant J 61:176–187PubMedCrossRefGoogle Scholar
  7. 7.
    Windbichler N et al (2011) A synthetic homing endonuclease-based gene drive system in the human malaria mosquito. Nature 473:212–215PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Marcaida MJ, Munoz IG, Blanco FJ, Prieto J, Montoya G (2009) Homing endonucleases: from basis to therapeutic applications. Cell Mol Life Sci 67:727–748CrossRefGoogle Scholar
  9. 9.
    Perez EE et al (2008) Establishment of HIV-1 resistance in CD4+ T cells by genome editing using zinc-finger nucleases. Nat Biotechnol 26:808–816PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Takeuchi R, Certo M, Caprara MG, Scharenberg AM, Stoddard BL (2008) Optimization of in vivo activity of a bifunctional homing endonuclease and maturase reverses evolutionary degradation. Nucleic Acids Res 37:877–890PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Chames P, Epinat JC, Guillier S, Patin A, Lacroix E, Pâques F (2005) In vivo selection of engineered homing endonucleases using double-strand break induced homologous recombination. Nucleic Acids Res 33:e178PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Doyon JB, Pattanayak V, Meyer CB, Liu DR (2006) Directed evolution and substrate specificity profiling of homing endonuclease I-SceI. J Am Chem Soc 128:2477–2484PubMedCrossRefGoogle Scholar
  13. 13.
    Jarjour J et al (2009) High-resolution profiling of homing endonuclease binding and catalytic specificity using yeast surface display. Nucleic Acids Res 37:6871–6880PubMedCentralPubMedCrossRefGoogle Scholar
  14. 14.
    Voigt CA, Mayo SL, Arnold FH, Wang Z (2001) Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci U S A 98:3778–3783PubMedCentralPubMedCrossRefGoogle Scholar
  15. 15.
    Chen MM, Snow CD, Vizcarra CL, Mayo SL, Arnold FH (2012) Comparison of random mutagenesis and semi-rational designed libraries for improved cytochrome P450 BM3-catalyzed hydroxylation of small alkanes. Protein Eng Des Sel 25:171–178PubMedCrossRefGoogle Scholar
  16. 16.
    Khersonsky O, Röthlisberger D, Wollacott AM, Murphy P, Dym O, Albeck S, Kiss G, Houk KN, Baker D, Tawfik DS (2011) Optimization of the in-silico-designed kemp eliminase KE70 by computational design and directed evolution. J Mol Biol 407:391–412PubMedCentralPubMedCrossRefGoogle Scholar
  17. 17.
    Leaver-Fay A et al (2011) Rosetta3: an object-oriented software suite for simulation and design of macromolecules. Methods Enzymol 487:545–574PubMedCrossRefGoogle Scholar
  18. 18.
    Ashworth J, Havranek JJ, Duarte CM, Sussman D, Monnat RJ Jr, Stoddard BL, Baker D (2006) Computational redesign of endonuclease DNA binding and cleavage specificity. Nature 441:656–659PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Thyme SB, Jarjour J, Takeuchi R, Havranek JJ, Ashworth J, Scharenberg AM, Stoddard BL, Baker D (2009) Exploitation of binding energy for catalysis and design. Nature 461:1300–1304PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Ulge UY, Baker DA, Monnat RJ Jr (2011) Comprehensive computational design of mCreI homing endonuclease cleavage specificity for genome engineering. Nucleic Acids Res 39:4330–4339PubMedCentralPubMedCrossRefGoogle Scholar
  21. 21.
    Ashworth J, Taylor GK, Havranek JJ, Quadri SA, Stoddard BL, Baker D (2010) Computational reprogramming of homing endonuclease specificity at multiple adjacent base pairs. Nucleic Acids Res 38:5601–5608PubMedCentralPubMedCrossRefGoogle Scholar
  22. 22.
    Dunbrack RL Jr, Cohen FE (1997) Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci 6:1661–1681PubMedCrossRefGoogle Scholar
  23. 23.
    Thyme SB, Baker D, Bradley P (2012) Improved modeling of side-chain–base interactions and plasticity in protein–DNA interface design. J Mol Biol 419:255–274PubMedCentralPubMedCrossRefGoogle Scholar
  24. 24.
    Yanover C, Bradley P (2011) Extensive protein and DNA backbone sampling improves structure-based specificity prediction for C2H2 zinc fingers. Nucleic Acids Res 39:4564–4576PubMedCentralPubMedCrossRefGoogle Scholar
  25. 25.
    Havranek JJ, Baker D (2009) Motif-directed flexible backbone design of functional interactions. Protein Sci 18:1293–1305PubMedCrossRefGoogle Scholar
  26. 26.
    Li H, Ulge UY, Hovde BT, Doyle LA, Monnat RJ Jr (2011) Comprehensive homing endonuclease target site specificity profiling reveals evolutionary constraints and enables genome engineering applications. Nucleic Acids Res 40:2587–2598PubMedCentralPubMedCrossRefGoogle Scholar
  27. 27.
    Redondo P et al (2008) Molecular basis of xeroderma pigmentosum group C DNA recognition by engineered meganucleases. Nature 456:107–111PubMedCrossRefGoogle Scholar
  28. 28.
    Takeuchi R, Lambert AR, Mak AN, Jacoby K, Dickson RJ, Gloor GB, Scharenberg AM, Edgell DR, Stoddard BL (2011) Tapping natural reservoirs of homing endonucleases for targeted gene modification. Proc Natl Acad Sci U S A 108:13077–13082PubMedCentralPubMedCrossRefGoogle Scholar
  29. 29.
    Grizot S, Duclert A, Thomas S, Duchateau P, Pâques F (2011) Context dependence between subdomains in the DNA binding interface of the I-CreI homing endonuclease. Nucleic Acids Res 39:6124–6136PubMedCentralPubMedCrossRefGoogle Scholar
  30. 30.
    Pabo CO, Nekludova L (2000) Geometric analysis and comparison of protein–DNA interfaces: why is there no simple code for recognition? J Mol Biol 301:597–624PubMedCrossRefGoogle Scholar
  31. 31.
    Miller JC et al (2011) A TALE nuclease architecture for efficient genome editing. Nat Biotechnol 29:143–148PubMedCrossRefGoogle Scholar
  32. 32.
    Fleishman SJ et al (2011) Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332:816–821PubMedCentralPubMedCrossRefGoogle Scholar
  33. 33.
    Röthlisberger D et al (2008) Kemp elimination catalysts by computational enzyme design. Nature 453:190–195PubMedCrossRefGoogle Scholar
  34. 34.
    Azoitei ML et al (2011) Computation-guided backbone grafting of a discontinuous motif onto a protein scaffold. Science 334:373–376PubMedCrossRefGoogle Scholar
  35. 35.
    Szeto MD, Boissel SJS, Baker D, Thyme SB (2011) Mining endonuclease cleavage determinants in genomic sequence data. J Biol Chem 286:32617–32627PubMedCrossRefGoogle Scholar
  36. 36.
    Baxter S, Lambert AR, Kuhar R, Jarjour J, Kulshina N, Parmeggiani F, Danaher P, Gano J, Baker D, Stoddard BL, Scharenberg AM (2012) Engineering domain fusion chimeras from I-OnuI family LAGLIDADG homing endonucleases. Nucleic Acids Res 40:7985–8000PubMedCentralPubMedCrossRefGoogle Scholar
  37. 37.
    Steffen NR, Murphy SD, Tolleri L, Hatfield GW, Lathrop RH (2002) DNA sequence and structure: direct and indirect recognition in protein–DNA binding. Bioinformatics 18:S22–S30PubMedCrossRefGoogle Scholar
  38. 38.
    Becker NB, Wolff L, Everaers R (2006) Indirect readout: detection of optimized sequences and calculation of relative binding affinities using different DNA elastic potentials. Nucleic Acids Res 34:5638–5649PubMedCentralPubMedCrossRefGoogle Scholar
  39. 39.
    Fleishman SJ et al (2011) RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite. PLoS One 6:e20161PubMedCentralPubMedCrossRefGoogle Scholar
  40. 40.
    Canutescu AA, Dunbrack RL (2003) Cyclic coordinate descent: a robotics algorithm for protein loop closure. Protein Sci 12:963–972PubMedCrossRefGoogle Scholar
  41. 41.
    Wang C, Bradley P, Baker D (2007) Protein–protein docking with backbone flexibility. J Mol Biol 373:503–519PubMedCrossRefGoogle Scholar
  42. 42.
    Havranek JJ, Harbury PB (2003) Automated design of specificity in molecular recognition. Nature Struct Biol 10:45–52PubMedCrossRefGoogle Scholar
  43. 43.
    Mitchell M (1996) An introduction to genetic algorithms, MIT PressGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Summer Thyme
    • 1
  • David Baker
    • 2
  1. 1.Department of Biological SciencesUniversity of WashingtonSeattleUSA
  2. 2.Department of Biochemistry, Institute for Protein DesignUniversity of WashingtonSeattleUSA

Personalised recommendations