SVM-BetaPred: Prediction of Right-Handed ß-Helix Fold from Protein Sequence Using SVM

  • Siddharth Singh
  • Krishnan Hajela
  • Ashwini Kumar Ramani
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4774)


The right-handed single-stranded ß-helix proteins are characterized as virulence factors, allergens, toxins that are threat to human health. Identification of these proteins from amino acid sequence is of great importance as these proteins are potential targets for anti-bacterial and fungal agents. In this paper, support vector machine (SVM) has been used to predict the presence of ß-helix fold in protein sequences using dipeptide composition. An input vector of 400 dimensions for dipeptide compositions is used to search for the presence of putative rungs or coils, the conserved secondary structure, found in ß-helix proteins. An average accuracy of 89.2% with Matthew’s correlation coefficient of 0.75 is obtained in a 5-fold cross-validation technique. In addition, a PSSM was also used to score the query sequence of proteins identified as ß-helices by SVM. The method recognizes right-handed ß-helices with 100% sensitivity and 99.6% specificity on test set of known protein structures.


ß-helix fold ß-sheet stacking pectate lyase fold recognition secondary structure Support Vector Machines SVM PSSM 


  1. 1.
    Bradley, P., Cowen, L., Menke, M., King, J., Berger, B.: BETAWRAP: Successful prediction of parallel beta helices from primary sequence reveals an association with many microbial pathogens. Proc. Natl. Acad. Sci. 98, 14819–14824 (2001)CrossRefGoogle Scholar
  2. 2.
    Yoder, M.D., Jurnak, F.: The parallel β helix and other coiled folds. FASEB J. 9(5), 335–342 (1999)Google Scholar
  3. 3.
    Heffron, S., Moe, G., Sieber, V., Mengaud, J., Cossart, P., Vitali, J., Jurnak, F.: Sequence profile of the parallel beta helix in the pectate lyase superfamily. J. Struct. Biol. 122, 223–235 (1998)CrossRefGoogle Scholar
  4. 4.
    Yonder, M., Keen, N., Jurnak, F.: New domain motif: The structure of pectate lyase C, a secreted plant virulence factor. Science 260(5113), 1503–1507 (1993)CrossRefGoogle Scholar
  5. 5.
    Jenkins, J., Shevchik, V.E., Hugouvieux-Cotte-Pattat, N., Pickersgill, R.W.: The crystal structure of Pectate Lyase Pel9A from Erwinia chrysabthemi. J. Biol. Chem. 279(10), 9139–9145 (2004)CrossRefGoogle Scholar
  6. 6.
    Iengar, P., Joshi, N.V., Padmanabhan, B.: Conformational and Sequence Signatures in β Helix Proteins. Structure 14(3), 529–542 (2006)CrossRefGoogle Scholar
  7. 7.
    Kreisberg, J.F., Betts, S.D., King, J.: βeta-helix core packing within the triple-stranded oligomerization domain of P22 tailspike. Protein Sci. 9(12), 2338–2343 (2000)CrossRefGoogle Scholar
  8. 8.
    Jenkins, J., Mayans, O., Pickersgill, R.: Structure and evolution of parallel helix proteins. Journal of Struct. Biol. 122, 236–246 (1998)CrossRefGoogle Scholar
  9. 9.
    Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Pac. Symp. Biocomput., pp. 564–575 (2002)Google Scholar
  10. 10.
    Cheng, J., Baldi, P.: A machine learning information retrieval approach to protein fold recognition. Bioinformatics 22(12), 1456–1463 (2006)CrossRefGoogle Scholar
  11. 11.
    Murzin, A., Brenner, S., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for investigation of sequences and structures. J. Mol. Bio. 297, 536–540 (1995)Google Scholar
  12. 12.
    Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, L.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  13. 13.
    Eddy, S., Mitchison, G., Durbin, R.: Maximum discrimination hidden Markov models of sequence consensus. J. Comput. Biol. 2, 9–23 (1995)Google Scholar
  14. 14.
    McDonnell, A.V., Menke, M., Palmer, N., King, J., Cowen, L., Berger, B.: Prediction and comparative modeling of sequences directing beta-sheet proteins by profile wrapping. Proteins: Structure, Function, and Bioinformatics 63, 976–985 (2006)CrossRefGoogle Scholar
  15. 15.
    Bairoch, A., Apweiler: The SWISS-PROT protein daabse and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000)CrossRefGoogle Scholar
  16. 16.
    Govaerts, C., Wille, H., Prusiner, S.B., Cohen, F.E.: Evidence for assembly of prions with left-handed beta-helices into trimers. Proc. Natl. Acad. Sci. USA 101(22), 8342–8347 (2004)CrossRefGoogle Scholar
  17. 17.
    Li, W., Jaroszewski, L., Godzik, A.: Sequence clustering strategies improve remote homology recognitions while reducing search times. Protein Eng. 15(8), 643–649 (2002)CrossRefGoogle Scholar
  18. 18.
    Zavaljevski, N., Stevens, F.J., Reifman, J.: Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions. Bioinformatics 18, 689–696 (2002)CrossRefGoogle Scholar
  19. 19.
    Bhasin, M., Raghava, G.P.S.: ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Research 32, W414–W419 (2004)CrossRefGoogle Scholar
  20. 20.
    Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)zbMATHGoogle Scholar
  21. 21.
    Joachims, T.: Making large-scale SVM learning practical. In: Scholkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning. MIT Press, Cambridge, MA, London, England (1999)Google Scholar
  22. 22.
    Song, J., Burrage, K., Yuan, Z., Huber, T.: Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information. BMC Bioinformatics 7, 124 (2006)CrossRefGoogle Scholar
  23. 23.
    Matthews, B.W.: Comparison of predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta. 405, 442–451 (1975)Google Scholar
  24. 24.
    Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16(6), 276–277 (2000)CrossRefGoogle Scholar
  25. 25.
    Qiu, P., Cai, X.Y., Wang, L., Greene, J., Malcolm, B.: Hepatitis C virus whole genome position weight matrix and robust primer design. BMC Microbiology 2, 29 (2002)CrossRefGoogle Scholar
  26. 26.
    Bryson, K., McGuffin, L.J., Marsden, R.L., Sodhi, J.S., Jones, D.T.: Protein structure prediction servers at University College London. Nucleic Acids Res. 1, 33 (2005)Google Scholar
  27. 27.
    Freiberg, A., Morona, R., Bosch, L., Baxa, U.: The Tailspike Protein of Shigella Phage Sf6. J. Biol. Chem. 278(3), 1542–1548 (2003)CrossRefGoogle Scholar
  28. 28.
    Krogh, A., Riis, S.K.: Prediction of b sheets in protein. In: Touretzky, D.S., Mozer, M.C., Hasaselmo, M.E. (eds.) Advances in Neural Information Processing System 8, pp. 917–923. MIT Press, Cambridge, MA (1996)Google Scholar
  29. 29.
    Reczko, M., Bohr, H.: The DEF database of sequence based protein fold class prediction. Nucleic Acid Res. 22, 3616–3619 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Siddharth Singh
    • 1
  • Krishnan Hajela
    • 2
  • Ashwini Kumar Ramani
    • 1
  1. 1.School of Computer Science 
  2. 2.School of Life Science, Devi Ahilya University, Khandwa Road, Indore-452001India

Personalised recommendations