Hypergraph Model of Multi-residue Interactions in Proteins: Sequentially–Constrained Partitioning Algorithms for Optimization of Site-Directed Protein Recombination

  • Xiaoduan Ye
  • Alan M. Friedman
  • Chris Bailey-Kellogg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


Relationships among amino acids determine stability and function and are also constrained by evolutionary history. We develop a probabilistic hypergraph model of residue relationships that generalizes traditional pairwise contact potentials to account for the statistics of multi-residue interactions. Using this model, we detected non-random associations in protein families and in the protein database. We also use this model in optimizing site-directed recombination experiments to preserve significant interactions and thereby increase the frequency of generating useful recombinants. We formulate the optimization as a sequentially-constrained hypergraph partitioning problem; the quality of recombinant libraries wrt a set of breakpoints is characterized by the total perturbation to edge weights. We prove this problem to be NP-hard in general, but develop exact and heuristic polynomial-time algorithms for a number of important cases. Application to the beta-lactamase family demonstrates the utility of our algorithms in planning site-directed recombination.


Edge Weight Protein Structure Prediction Stochastic Dynamic Programming Additional Perturbation Potential Score 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tanaka, S., Scheraga, H.: Medium and long range interaction parameters between amino acids for predicting three dimensional strutures of proteins. Macromolecules 9, 945–950 (1976)CrossRefGoogle Scholar
  2. 2.
    Miyazawa, S., Jernigan, R.: Estimation of effective interresidue contact energies from protein crystal structures: Quasi-chemical approximation. Macromolecules 18, 531–552 (1985)CrossRefGoogle Scholar
  3. 3.
    Maiorov, V., Crippen, G.: Contact potential that recognizes the correct folding of globular proteins. J. Mol. Biol. 227, 876–888 (1992)CrossRefGoogle Scholar
  4. 4.
    Simons, K.T., Kooperberg, C., Huang, E., Baker, D.: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225 (1997)CrossRefGoogle Scholar
  5. 5.
    Kihara, D., Lu, H., Kolinski, A., Skolnick, J.: TOUCHSTONE: an ab initio protein structure prediction method that uses threading-based tertiary restraints. PNAS 98, 10125–10130 (2001)CrossRefGoogle Scholar
  6. 6.
    Godzik, A.: Fold recognition methods. Methods Biochem. Anal. 44, 525–546 (2003)Google Scholar
  7. 7.
    Betancourt, M., Thirumalai, D.: Pair potentials for protein folding: Choice of reference states and sensitivity of predictive native states to variations in the interaction schemes. Protein Sci. 8, 361–369 (1999)CrossRefGoogle Scholar
  8. 8.
    Carter Jr., C., LeFebvre, B., Cammer, S., Tropsha, A., Edgell, M.: Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. J. Mol. Biol. 311, 621–638 (2001)Google Scholar
  9. 9.
    Krishnamoorthy, B., Tropsha, A.: Development of a four-body statistical pseudo-potential to discriminate native from non-native protein conformations. Bioinformatics 19, 1540–1548 (2003)CrossRefGoogle Scholar
  10. 10.
    Simons, K., Ruczinski, I., Kooperberg, C., Fox, B., Bystroff, C., Baker, D.: Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins. PROTEINS: Structure, Function, and Genetics 34, 82–95 (1999)CrossRefGoogle Scholar
  11. 11.
    Gobel, U., Sander, C., Schneider, R., Valencia, A.: Correlated mutations and residue contacts in proteins. PROTEINS: Structure, Function, and Genetics 18, 309–317 (1994)CrossRefGoogle Scholar
  12. 12.
    Lockless, S., Ranganathan, R.: Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999)CrossRefGoogle Scholar
  13. 13.
    Thomas, J., Ramakrishnan, N., Bailey-Kellogg, C.: Graphical models of residue coupling in protein families. In: 5th ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD) (2005)Google Scholar
  14. 14.
    Stemmer, W.: Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389–391 (1994)CrossRefGoogle Scholar
  15. 15.
    Ostermeier, M., Shim, J., Benkovic, S.: A combinatorial approach to hybrid enzymes independent of DNA homology. Nat. Biotechnol. 17, 1205–1209 (1999)CrossRefGoogle Scholar
  16. 16.
    Lutz, S., Ostermeier, M., Moore, G., Maranas, C., Benkovic, S.: Creating multiple-crossover DNA libraries independent of sequence identity. PNAS 98, 11248–11253 (2001)CrossRefGoogle Scholar
  17. 17.
    Sieber, V., Martinez, C., Arnold, F.: Libraries of hybrid proteins from distantly related sequences. Nat. Biotechnol. 19, 456–460 (2001)CrossRefGoogle Scholar
  18. 18.
    Voigt, C., Martinez, C., Wang, Z., Mayo, S., Arnold, F.: Protein building blocks preserved by recombination. Nat. Struct. Biol. 9, 553–558 (2002)Google Scholar
  19. 19.
    O’Maille, P., Bakhtina, M., Tsai, M.: Structure-based combinatorial protein engineering (SCOPE). J. Mol. Biol. 321, 677–691 (2002)CrossRefGoogle Scholar
  20. 20.
    Aguinaldo, A., Arnold, F.: Staggered extension process (StEP) in vitro recombination. Methods Mol. Biol. 231, 105–110 (2003)Google Scholar
  21. 21.
    Coco, W.: RACHITT: Gene family shuffling by random chimeragenesis on transient templates. Methods Mol. Biol. 231, 111–127 (2003)Google Scholar
  22. 22.
    Castle, L., Siehl, D., Gorton, R., Patten, P., Chen, Y., Bertain, S., Cho, H.J., Duck, N., Wong, J., Liu, D., Lassner, M.: Discovery and directed evolution of a glyphosate tolerance gene. Science 304, 1151–1154 (2004)CrossRefGoogle Scholar
  23. 23.
    Meyer, M., Silberg, J., Voigt, C., Endelman, J., Mayo, S., Wang, Z., Arnold, F.: Library analysis of SCHEMA-guided protein recombination. Protein Sci. 12, 1686–1693 (2003)CrossRefGoogle Scholar
  24. 24.
    Endelman, J., Silberg, J., Wang, Z.G., Arnold, F.: Site-directed protein recombination as a shortest-path problem. Protein Eng., Design and Sel. 17, 589–594 (2004)CrossRefGoogle Scholar
  25. 25.
    Sippl, M.: Calculation of conformational ensembles from potentials of mean force. an approach to the knowledge-based prediction of local structures in globular proteins. J. Mol. Biol. 213, 859–883 (1990)CrossRefGoogle Scholar
  26. 26.
    Wang, G., Dunbrack, R.L.J.: Pisces: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xiaoduan Ye
    • 1
  • Alan M. Friedman
    • 2
  • Chris Bailey-Kellogg
    • 1
  1. 1.Department of Computer ScienceDartmouth CollegeHanoverUSA
  2. 2.Department of Biological Sciences and Purdue Cancer CenterPurdue UniversityWest LafayetteUSA

Personalised recommendations