Journal of Molecular Modeling

, Volume 12, Issue 3, pp 355–361 | Cite as

Prediction of β-strand packing interactions using the signature product

  • W. Michael Brown
  • Shawn Martin
  • Joseph P. Chabarek
  • Charlie Strauss
  • Jean-Loup Faulon
Original paper


The prediction of β-sheet topology requires the consideration of long-range interactions between β-strands that are not necessarily consecutive in sequence. Since these interactions are difficult to simulate using ab initio methods, we propose a supplementary method able to assign β-sheet topology using only sequence information. We envision using the results of our method to reduce the three-dimensional search space of ab initio methods. Our method is based on the signature molecular descriptor, which has been used previously to predict protein–protein interactions successfully, and to develop quantitative structure–activity relationships for small organic drugs and peptide inhibitors. Here, we show how the signature descriptor can be used in a Support Vector Machine to predict whether or not two β-strands will pack adjacently within a protein. We then show how these predictions can be used to order β-strands within β-sheets. Using the entire PDB database with ten-fold cross-validation, we have achieved 74.0% accuracy in packing prediction and 75.6% accuracy in the prediction of edge strands. For the case of β-strand ordering, we are able to predict the correct ordering accurately for 51.3% of the β-sheets. Furthermore, using a simple confidence metric, we can determine those sheets for which accurate predictions can be obtained. For the top 25% highest confidence predictions, we are able to achieve 95.7% accuracy in β-strand ordering.


Using signature products to predict the packing interactions within a β-sheet


β-sheets Secondary structure prediction Signature descriptor Support vector machine 



This work was funded by the U.S. Department of Energy’s Genomics: GTL program ( under project, “Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling” ( Sandia is a multiprogram laboratory operated by Sandia Corporation, a LockheedMartin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.


  1. 1.
    Bohm G (1996) Biophys Chemist 59:1–32CrossRefGoogle Scholar
  2. 2.
    Honig B (1999) J Mol Biol 293:283–293CrossRefPubMedGoogle Scholar
  3. 3.
    Jones DT (1999) J Mol Biol 292:195–202CrossRefPubMedGoogle Scholar
  4. 4.
    Lin K, Simossis VA, Taylor WR, Heringa J (2005) Bioinformatics 21:152–159CrossRefPubMedGoogle Scholar
  5. 5.
    Rost B (2001) J Struct Biol 134:204–218CrossRefPubMedGoogle Scholar
  6. 6.
    Orengo CA, Bray JE, Hubbard T, LoConte L, Sillitoe I (1999) Proteins (Suppl 3):149–170CrossRefGoogle Scholar
  7. 7.
    Kolinski A, Betancourt MR, Kihara D, Rotkiewicz P, Skolnick J (2001) Proteins 44:133–149CrossRefPubMedGoogle Scholar
  8. 8.
    Siepen JA, Radford SE, Westhead DR (2003) Protein Sci 12:2348–2359CrossRefPubMedGoogle Scholar
  9. 9.
    Przybylski D, Rost B (2002) Proteins 46:197–205CrossRefPubMedGoogle Scholar
  10. 10.
    Hutchinson EG, Sessions RB, Thornton JM, Woolfson DN (1998) Protein Sci 7:2287–2300PubMedCrossRefGoogle Scholar
  11. 11.
    Steward RE, Thornton JM (2002) Proteins 48:178–191CrossRefPubMedGoogle Scholar
  12. 12.
    Zaremba SM, Gregoret LM (1999) J Mol Biol 291:463–479CrossRefPubMedGoogle Scholar
  13. 13.
    King RD, Clark DA, Shirazi J, Sternberg MJ (1994) Protein Eng 7:1295–1303PubMedCrossRefGoogle Scholar
  14. 14.
    Churchwell CJ, Rintoul MD, Martin S, Visco Jr DP, Kotu A, Larson RS, Sillerud LO, Brown DC, Faulon JL (2004) J Mol Graph Model 22:263–273CrossRefPubMedGoogle Scholar
  15. 15.
    Faulon JL, Visco Jr DP, Pophale RS (2003) J Chem Inf Comput Sci 43:707–720CrossRefPubMedGoogle Scholar
  16. 16.
    Martin S, Roe D, Faulon JL (2005) Bioinformatics 21:218–226CrossRefPubMedGoogle Scholar
  17. 17.
    Joachims T (1999) In: Scholkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel Methods-Support Vector Learning, pp 169–184Google Scholar
  18. 18.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) Nucleic Acids Res 28:235–242CrossRefPubMedGoogle Scholar
  19. 19.
    Dumais ST (1998) IEEE Intelligent Systems Magazine 13:21–23Google Scholar
  20. 20.
    Sheridan RP, Feuston BP, Maiorov VN, Kearsley SK (2004) J Chem Inf Comput Sci 44:1912—1928CrossRefPubMedGoogle Scholar
  21. 21.
    Richardson JS, Richardson DC (2002) Proc Natl Acad Sci USA 99:2754—2759CrossRefPubMedGoogle Scholar
  22. 22.
    Brown WM, Faulon JL, Sale K (2005) Comput Biol Chem 29:143—150CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  • W. Michael Brown
    • 1
  • Shawn Martin
    • 1
  • Joseph P. Chabarek
    • 1
  • Charlie Strauss
    • 2
  • Jean-Loup Faulon
    • 1
  1. 1.Computational Biology 9212Sandia National LaboratoriesAlbuquerqueUSA
  2. 2.Biosciences DivisionLos Alamos National LaboratoryLos AlamosUSA

Personalised recommendations