Skip to main content
Log in

Prediction of protein structural classes using support vector machines

  • Published:
Amino Acids Aims and scope Submit manuscript

Summary.

The support vector machine, a machine-learning method, is used to predict the four structural classes, i.e. mainly α, mainly β, α–β and fss, from the topology-level of CATH protein structure database. For the binary classification, any two structural classes which do not share any secondary structure such as α and β elements could be classified with as high as 90% accuracy. The accuracy, however, will decrease to less than 70% if the structural classes to be classified contain structure elements in common. Our study also shows that the dimensions of feature space 202 = 400 (for dipeptide) and 203 = 8 000 (for tripeptide) give nearly the same prediction accuracy. Among these 4 structural classes, multi-class classification gives an overall accuracy of about 52%, indicating that the multi-class classification technique in support of vector machines may still need to be further improved in future investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • P Aloy R Russell (2004) ArticleTitleTen thousand interactions for the molecular biologist Nature Biotechnol 22 1317–1321 Occurrence Handle1:CAS:528:DC%2BD2cXotFGqurs%3D Occurrence Handle10.1038/nbt1018

    Article  CAS  Google Scholar 

  • CB Anfinsen (1973) ArticleTitlePrinciples that govern folding of protein chain Science 181 223 Occurrence Handle4124164 Occurrence Handle1:CAS:528:DyaE3sXkvVygtbc%3D

    PubMed  CAS  Google Scholar 

  • Anguita D, Ridella S, Sterpi D (2004) A new method for multiclass support vector machines. In: Neural Networks, 2004. Proceedings 2004 IEEE International Joint Conference, pp 412–417

  • HM Berman T Battistuz TN Bhat WF Bluhm PE Bourne K Burkhardt et al. (2002) ArticleTitleThe protein data bank Acta Cryst D 58 899–907 Occurrence Handle10.1107/S0907444902003451 Occurrence Handle1:CAS:528:DC%2BD38XktVKhtLg%3D

    Article  CAS  Google Scholar 

  • C Branden J Tooze (1999) Introduction to protein structure EditionNumber2 Garland Publishing New York

    Google Scholar 

  • MPS Brown WN Grundy D Lin N Cristianini CW Sugnet TS Furey et al. (2000) ArticleTitleKnowledge-based analysis of microarray gene expression data by using support vector machine Proc Natl Acad Sci USA 97 262–267 Occurrence Handle10618406 Occurrence Handle1:CAS:528:DC%2BD3cXjvVGjtw%3D%3D Occurrence Handle10.1073/pnas.97.1.262

    Article  PubMed  CAS  Google Scholar 

  • YD Cai XJ Liu XB Xu KC Chou (2000) ArticleTitleSupport vector machines for prediction of subcellular location Mol Cell Biol Res Commun 4 230–233 Occurrence Handle11409917 Occurrence Handle1:CAS:528:DC%2BD3MXlsVyqtrg%3D Occurrence Handle10.1006/mcbr.2001.0285

    Article  PubMed  CAS  Google Scholar 

  • YD Cai XJ Liu XB Xu KC Chou (2002a) ArticleTitleSupport vector machines for predicting the specificity of GalNAc-transferase Peptides 23 205–208 Occurrence Handle1:CAS:528:DC%2BD38XovFCrsA%3D%3D Occurrence Handle10.1016/S0196-9781(01)00597-6

    Article  CAS  Google Scholar 

  • YD Cai XJ Liu XB Xu KC Chou (2002b) ArticleTitlePrediction of protein structure classes by support vector machines Comput Chem 26 293–296 Occurrence Handle1:CAS:528:DC%2BD38XmtlWrsA%3D%3D Occurrence Handle10.1016/S0097-8485(01)00113-9

    Article  CAS  Google Scholar 

  • YD Cai XJ Liu YX Li XB Xu KC Chou (2003a) ArticleTitlePrediction of β-turns with learning machines Peptides 24 665–659 Occurrence Handle1:CAS:528:DC%2BD3sXlvVWhurw%3D Occurrence Handle10.1016/S0196-9781(03)00133-5

    Article  CAS  Google Scholar 

  • YD Cai XJ Liu JB Xu KC Chou (2003b) ArticleTitleSupport vector machines for prediction of protein domain structural class J Theor Biol 221 115–120 Occurrence Handle1:CAS:528:DC%2BD3sXhvFersL0%3D Occurrence Handle10.1006/jtbi.2003.3179

    Article  CAS  Google Scholar 

  • JM Chandonia M Karplus (1995) ArticleTitleNeural networks for secondary structure and structural class prediction Protein Sci 4 275–285 Occurrence Handle7757016 Occurrence Handle1:CAS:528:DyaK2MXktFGmtrc%3D Occurrence Handle10.1002/pro.5560040214

    Article  PubMed  CAS  Google Scholar 

  • Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/∼cjlin/libsvm

  • E Chargaff (1951) ArticleTitleStructure and function of nucleic acids as cell constituents Fed Proc 10 654–659 Occurrence Handle14887699 Occurrence Handle1:CAS:528:DyaG3MXmtVagsA%3D%3D

    PubMed  CAS  Google Scholar 

  • E Chargaff (1979) ArticleTitleHow genetics got a chemical education Ann NY Acad Sci 325 345–360 Occurrence Handle1:CAS:528:DyaE1MXlsFKkt7w%3D

    CAS  Google Scholar 

  • C Chothia (1992) ArticleTitleOne thousand families for the molecular biologist Nature 357 543–544 Occurrence Handle1608464 Occurrence Handle1:STN:280:DyaK38zgtFGnuw%3D%3D Occurrence Handle10.1038/357543a0

    Article  PubMed  CAS  Google Scholar 

  • JJ Chou CT Zhang (1993) ArticleTitleA joint prediction of the folding types of 1490 human proteins from their genetic codons J Theor Biol 161 251–262 Occurrence Handle8331952 Occurrence Handle1:CAS:528:DyaK3sXkvV2ns7g%3D Occurrence Handle10.1006/jtbi.1993.1053

    Article  PubMed  CAS  Google Scholar 

  • KC Chou (1995) ArticleTitleA novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space Proteins 21 319–344 Occurrence Handle7567954 Occurrence Handle1:CAS:528:DyaK2MXls12rsb0%3D Occurrence Handle10.1002/prot.340210406

    Article  PubMed  CAS  Google Scholar 

  • KC Chou (2000) ArticleTitleReview: Prediction of protein structural classes and subcellular locations Curr Protein Pept Sci 1 171–208 Occurrence Handle12369916 Occurrence Handle1:CAS:528:DC%2BD3cXnsVeisL0%3D Occurrence Handle10.2174/1389203003381379

    Article  PubMed  CAS  Google Scholar 

  • KC Chou YD Cai (2002) ArticleTitleUsing functional domain composition and support vector machines for prediction of protein subcellular location J Biol Chem 277 45765–45769 Occurrence Handle12186861 Occurrence Handle1:CAS:528:DC%2BD38XovFKjurg%3D Occurrence Handle10.1074/jbc.M204161200

    Article  PubMed  CAS  Google Scholar 

  • KC Chou YD Cai (2004) ArticleTitlePredicting protein structural class by functional domain composition Biochem Biophys Res Comm 321 1007–1009 Occurrence Handle15358128 Occurrence Handle1:CAS:528:DC%2BD2cXmt1Ogtb0%3D Occurrence Handle10.1016/j.bbrc.2004.07.059

    Article  PubMed  CAS  Google Scholar 

  • KC Chou DW Elrod (1998) ArticleTitleUsing discriminant function for prediction of subcellular location of prokaryotic proteins Biochem Biophys Res Commun 252 63–68 Occurrence Handle9813147 Occurrence Handle1:CAS:528:DyaK1cXnsVKnur8%3D Occurrence Handle10.1006/bbrc.1998.9498

    Article  PubMed  CAS  Google Scholar 

  • KC Chou DW Elrod (1999a) ArticleTitleProtein subcellular location prediction Protein Eng 12 107–118 Occurrence Handle1:CAS:528:DyaK1MXhvFehs7g%3D Occurrence Handle10.1093/protein/12.2.107

    Article  CAS  Google Scholar 

  • KC Chou DW Elrod (1999b) ArticleTitlePrediction of membrane protein types and subcellular locations Proteins 34 137–153 Occurrence Handle1:CAS:528:DyaK1MXjtFGisg%3D%3D Occurrence Handle10.1002/(SICI)1097-0134(19990101)34:1<137::AID-PROT11>3.0.CO;2-O

    Article  CAS  Google Scholar 

  • KC Chou GM Maggiora (1998) ArticleTitleDomain structural class prediction Protein Eng 11 523–538 Occurrence Handle9740370 Occurrence Handle1:STN:280:DyaK1cvhtFSrtA%3D%3D Occurrence Handle10.1093/protein/11.7.523

    Article  PubMed  CAS  Google Scholar 

  • KC Chou W Liu GM Maggiora CT Zhang (1998) ArticleTitlePrediction and classification of domain structural classes Proteins 31 97–103 Occurrence Handle9552161 Occurrence Handle1:CAS:528:DyaK1cXit1yms70%3D Occurrence Handle10.1002/(SICI)1097-0134(19980401)31:1<97::AID-PROT8>3.0.CO;2-E

    Article  PubMed  CAS  Google Scholar 

  • KC Chou CT Zhang (1994) ArticleTitlePredicting protein folding types by distance functions that make allowances for amino acid interactions J Biol Chem 269 22014–22020 Occurrence Handle8071322 Occurrence Handle1:CAS:528:DyaK2cXlslCls7o%3D

    PubMed  CAS  Google Scholar 

  • KC Chou CT Zhang (1995) ArticleTitleReview: Prediction of protein structural classes Crit Rev Biochem Mol Biol 30 275–349 Occurrence Handle7587280 Occurrence Handle1:CAS:528:DyaK2MXosFentb8%3D

    PubMed  CAS  Google Scholar 

  • Chou PY (1980) Amino acid composition of four classes of proteins. In: Abstracts of Papers, Part I, Second Chemical Congress of the North American Continent, Las Vegas, Nevada

  • PY Chou (1989) Prediction of protein structural classes from amino acid composition GD Fasman (Eds) Prediction of protein structure and the principles of protein conformation Plenum Press New York 549–586

    Google Scholar 

  • T Creighton (1993) Proteins, structures and molecular properties EditionNumber2 Freeman and Company New York

    Google Scholar 

  • N Cristianini J Shawe-Taylor (2000) An introduction to support vector machines Cambridge University Press Cambridge

    Google Scholar 

  • CHQ Ding I Dubchak (2001) ArticleTitleMulti-class protein fold recognition using support vector machines and neural networks Bioinformatics 17 345–358 Occurrence Handle10.1093/bioinformatics/17.4.349

    Article  Google Scholar 

  • I Dubchak SR Holbrook SH Kim (1993) ArticleTitlePredicting protein secondary structure content: a tandem neural network approach Proteins 16 79–91 Occurrence Handle8497486 Occurrence Handle1:CAS:528:DyaK3sXlvF2jt7w%3D Occurrence Handle10.1002/prot.340160109

    Article  PubMed  CAS  Google Scholar 

  • QS Du DQ Wei KC Chou (2003) ArticleTitleCorrelations of amino acids in protein Peptides 24 1863–1869 Occurrence Handle15127938 Occurrence Handle1:CAS:528:DC%2BD2cXhtV2ktLw%3D Occurrence Handle10.1016/j.peptides.2003.10.012

    Article  PubMed  CAS  Google Scholar 

  • S Hua S Sun (2001) ArticleTitleA novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach J Mol Biol 302 397–407 Occurrence Handle10.1006/jmbi.2001.4580 Occurrence Handle1:CAS:528:DC%2BD3MXjtVShs7k%3D

    Article  CAS  Google Scholar 

  • TJP Hubbard J Park (1995) ArticleTitleFold recognition and ab initio structure predictions using Hidden Markov Models and b-strand pair potentials Proteins 23 398–402 Occurrence Handle8710832 Occurrence Handle1:CAS:528:DyaK2MXpslyjsr4%3D Occurrence Handle10.1002/prot.340230313

    Article  PubMed  CAS  Google Scholar 

  • Z Isik B Yanikoglu U Sezerman (2004) Protein structural class determination using support vector machines C Aykanat T Dayar I Korpeoglu (Eds) Lecture notes in computer science SeriesTitleComputer and information sciences NumberInSeries3280 Springer New York 82–89

    Google Scholar 

  • V Kecman (2001) Learning and soft computing MIT Press Cambridge

    Google Scholar 

  • P Klein C Delisi (1986) ArticleTitlePrediction of protein structural class from amino acid sequence Biopolymers 25 1659–1672 Occurrence Handle3768479 Occurrence Handle1:CAS:528:DyaL28XlvVOju7w%3D Occurrence Handle10.1002/bip.360250909

    Article  PubMed  CAS  Google Scholar 

  • Leslie C, Eskim E, Noble SW (2002) The spectrum kernel: a string kernel for SVM protein classification. In: Proc. Pacific Symposium on Biocomputing 7: 566–775

  • M Levitt C Chothia (1976) ArticleTitleStructural patterns in globular proteins Nature 261 552–558 Occurrence Handle934293 Occurrence Handle1:CAS:528:DyaE28XltFyhsb4%3D Occurrence Handle10.1038/261552a0

    Article  PubMed  CAS  Google Scholar 

  • RY Luo ZP Feng JK Liu (2002) ArticleTitlePrediction of protein structural class by amino acid and polypeptide composition Eur J Biochem 269 4219–4225 Occurrence Handle12199700 Occurrence Handle1:CAS:528:DC%2BD38Xnt1eiur8%3D Occurrence Handle10.1046/j.1432-1033.2002.03115.x

    Article  PubMed  CAS  Google Scholar 

  • F Markowetz L Edler M Vingron (2003) ArticleTitleSupport vector machines for protein fold class prediction Biometr J 45 377–389 Occurrence Handle10.1002/bimj.200390019

    Article  Google Scholar 

  • BA Metfessel PN Saurugger DP Connelly ST Rich (1993) ArticleTitleCross-validation of protein structural class prediction using statistical clustering and neural networks Protein Sci 2 1171–1182 Occurrence Handle8358300 Occurrence Handle1:CAS:528:DyaK3sXmt1Wjur0%3D Occurrence Handle10.1002/pro.5560020712

    Article  PubMed  CAS  Google Scholar 

  • H Nakashima K Nishikawa T Ooi (1986) ArticleTitleThe folding type of a protein is relevant to the amino acid composition J Biochem 99 152–162

    Google Scholar 

  • MN Nguyen JC Rajapakse (2003) ArticleTitleMulti-class support vector machines for protein secondary structure prediction Genome Informatics 14 218–227 Occurrence Handle1:CAS:528:DC%2BD2cXitV2it7s%3D Occurrence Handle15706536

    CAS  PubMed  Google Scholar 

  • CA Orengo AD Machie S Jones DT Jones MB Swindells SM Thornton (1997) ArticleTitleCATH – a hierarchic classification of protein domain structures Structure 5 1093–1108 Occurrence Handle9309224 Occurrence Handle1:CAS:528:DyaK2sXmt1Wgs74%3D Occurrence Handle10.1016/S0969-2126(97)00260-8

    Article  PubMed  CAS  Google Scholar 

  • JC Platt (1999) Fast training of support vector machines using sequsntial minimal optimization B Scholkopf CJC Verges AJ Smola (Eds) Advances in kernel methods: support vector learning MIT Press Cambridge

    Google Scholar 

  • JS Rechardson DC Rechardson (1989) Principles and patterns of protein conformation GD Fasman (Eds) Prediction of protein structure and the principles of protein conformation Plenum Press New York 1–98

    Google Scholar 

  • B Rost C Sander (1993) ArticleTitlePrediction of protein secondary structure at better than 70% accuracy J Mol Biol 232 584–599 Occurrence Handle8345525 Occurrence Handle1:CAS:528:DyaK3sXmt1WjurY%3D Occurrence Handle10.1006/jmbi.1993.1413

    Article  PubMed  CAS  Google Scholar 

  • B Rost C Sander (1994) ArticleTitleCombining evolutionary information and neural networks to predict protein secondary structure Proteins 19 55–72 Occurrence Handle8066087 Occurrence Handle1:CAS:528:DyaK2MXhtlWqtw%3D%3D Occurrence Handle10.1002/prot.340190108

    Article  PubMed  CAS  Google Scholar 

  • S Saxonon W Gilbert (2003) ArticleTitleThe universal of exons revisited Genetica 118 267–278 Occurrence Handle10.1023/A:1024142701533

    Article  Google Scholar 

  • B Scholkopf A Smola (2002) Learning with kernels MIT Press Cambridge

    Google Scholar 

  • M Sela FH White SuffixJr CB Anfinsen (1957) ArticleTitleReductive cleavage of disulfide bridges in ribonuclease Science 125 691–692 Occurrence Handle13421663 Occurrence Handle1:CAS:528:DyaG2sXmt1KltQ%3D%3D

    PubMed  CAS  Google Scholar 

  • N Sueoka (1961) ArticleTitleCompositional correlations between deoxyribonucleic acid and protein Cold Spring Hard Symp Quant Biol 26 35–43 Occurrence Handle1:CAS:528:DyaF38XktlWqtrc%3D

    CAS  Google Scholar 

  • WR Taylor CA Orengo (1989) ArticleTitleProtein structure alignment J Mol Biol 208 1–22 Occurrence Handle2769748 Occurrence Handle1:CAS:528:DyaL1MXkvFentr8%3D Occurrence Handle10.1016/0022-2836(89)90084-3

    Article  PubMed  CAS  Google Scholar 

  • J Thorsten (2002) Learning to classify text using support vector machines Kluwer Norwell

    Google Scholar 

  • V Vapnik (1995) Statistical learning theory Wiley New York

    Google Scholar 

  • ZR Yang (2004) ArticleTitleBiological applications of support vector machines Brief Bioinformatics 5 328–338 Occurrence Handle15606969 Occurrence Handle1:CAS:528:DC%2BD2MXhsVejt7o%3D Occurrence Handle10.1093/bib/5.4.328

    Article  PubMed  CAS  Google Scholar 

  • ZR Yang KC Chou (2004) ArticleTitleBio-support vector machines for computational proteomic Bioinformatics 20 735–741 Occurrence Handle14751989 Occurrence Handle1:CAS:528:DC%2BD2cXitlyhs7w%3D Occurrence Handle10.1093/bioinformatics/btg477

    Article  PubMed  CAS  Google Scholar 

  • CT Zhang KC Chou (1992) ArticleTitleAn optimization approach to predicting protein structural class from amino acid composition Protein Sci 1 401–408 Occurrence Handle1304347 Occurrence Handle1:CAS:528:DyaK38Xks1GqtLY%3D Occurrence Handle10.1002/pro.5560010312

    Article  PubMed  CAS  Google Scholar 

  • GP Zhou (1998) ArticleTitleAn intriguing controversy over protein structural class prediction J Protein Chem 17 729–738 Occurrence Handle9988519 Occurrence Handle1:CAS:528:DyaK1MXnslaltw%3D%3D Occurrence Handle10.1023/A:1020713915365

    Article  PubMed  CAS  Google Scholar 

  • GP Zhou N Assa-Munt (2001) ArticleTitleSome insights into protein structural class prediction Proteins Struct Funct Genet 44 57–59 Occurrence Handle11354006 Occurrence Handle1:CAS:528:DC%2BD3MXktlSnsbk%3D Occurrence Handle10.1002/prot.1071

    Article  PubMed  CAS  Google Scholar 

  • GP Zhou K Doctor (2003) ArticleTitleSubcellular location prediction of apoptosis proteins Proteins Struct Funct Genet 50 44–48 Occurrence Handle12471598 Occurrence Handle1:CAS:528:DC%2BD3sXlsVKmug%3D%3D Occurrence Handle10.1002/prot.10251

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, XD., Huang, RB. Prediction of protein structural classes using support vector machines. Amino Acids 30, 469–475 (2006). https://doi.org/10.1007/s00726-005-0239-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-005-0239-0

Navigation