Summary.
The support vector machine, a machine-learning method, is used to predict the four structural classes, i.e. mainly α, mainly β, α–β and fss, from the topology-level of CATH protein structure database. For the binary classification, any two structural classes which do not share any secondary structure such as α and β elements could be classified with as high as 90% accuracy. The accuracy, however, will decrease to less than 70% if the structural classes to be classified contain structure elements in common. Our study also shows that the dimensions of feature space 202 = 400 (for dipeptide) and 203 = 8 000 (for tripeptide) give nearly the same prediction accuracy. Among these 4 structural classes, multi-class classification gives an overall accuracy of about 52%, indicating that the multi-class classification technique in support of vector machines may still need to be further improved in future investigation.
Similar content being viewed by others
References
P Aloy R Russell (2004) ArticleTitleTen thousand interactions for the molecular biologist Nature Biotechnol 22 1317–1321 Occurrence Handle1:CAS:528:DC%2BD2cXotFGqurs%3D Occurrence Handle10.1038/nbt1018
CB Anfinsen (1973) ArticleTitlePrinciples that govern folding of protein chain Science 181 223 Occurrence Handle4124164 Occurrence Handle1:CAS:528:DyaE3sXkvVygtbc%3D
Anguita D, Ridella S, Sterpi D (2004) A new method for multiclass support vector machines. In: Neural Networks, 2004. Proceedings 2004 IEEE International Joint Conference, pp 412–417
HM Berman T Battistuz TN Bhat WF Bluhm PE Bourne K Burkhardt et al. (2002) ArticleTitleThe protein data bank Acta Cryst D 58 899–907 Occurrence Handle10.1107/S0907444902003451 Occurrence Handle1:CAS:528:DC%2BD38XktVKhtLg%3D
C Branden J Tooze (1999) Introduction to protein structure EditionNumber2 Garland Publishing New York
MPS Brown WN Grundy D Lin N Cristianini CW Sugnet TS Furey et al. (2000) ArticleTitleKnowledge-based analysis of microarray gene expression data by using support vector machine Proc Natl Acad Sci USA 97 262–267 Occurrence Handle10618406 Occurrence Handle1:CAS:528:DC%2BD3cXjvVGjtw%3D%3D Occurrence Handle10.1073/pnas.97.1.262
YD Cai XJ Liu XB Xu KC Chou (2000) ArticleTitleSupport vector machines for prediction of subcellular location Mol Cell Biol Res Commun 4 230–233 Occurrence Handle11409917 Occurrence Handle1:CAS:528:DC%2BD3MXlsVyqtrg%3D Occurrence Handle10.1006/mcbr.2001.0285
YD Cai XJ Liu XB Xu KC Chou (2002a) ArticleTitleSupport vector machines for predicting the specificity of GalNAc-transferase Peptides 23 205–208 Occurrence Handle1:CAS:528:DC%2BD38XovFCrsA%3D%3D Occurrence Handle10.1016/S0196-9781(01)00597-6
YD Cai XJ Liu XB Xu KC Chou (2002b) ArticleTitlePrediction of protein structure classes by support vector machines Comput Chem 26 293–296 Occurrence Handle1:CAS:528:DC%2BD38XmtlWrsA%3D%3D Occurrence Handle10.1016/S0097-8485(01)00113-9
YD Cai XJ Liu YX Li XB Xu KC Chou (2003a) ArticleTitlePrediction of β-turns with learning machines Peptides 24 665–659 Occurrence Handle1:CAS:528:DC%2BD3sXlvVWhurw%3D Occurrence Handle10.1016/S0196-9781(03)00133-5
YD Cai XJ Liu JB Xu KC Chou (2003b) ArticleTitleSupport vector machines for prediction of protein domain structural class J Theor Biol 221 115–120 Occurrence Handle1:CAS:528:DC%2BD3sXhvFersL0%3D Occurrence Handle10.1006/jtbi.2003.3179
JM Chandonia M Karplus (1995) ArticleTitleNeural networks for secondary structure and structural class prediction Protein Sci 4 275–285 Occurrence Handle7757016 Occurrence Handle1:CAS:528:DyaK2MXktFGmtrc%3D Occurrence Handle10.1002/pro.5560040214
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines. Software available at: http://www.csie.ntu.edu.tw/∼cjlin/libsvm
E Chargaff (1951) ArticleTitleStructure and function of nucleic acids as cell constituents Fed Proc 10 654–659 Occurrence Handle14887699 Occurrence Handle1:CAS:528:DyaG3MXmtVagsA%3D%3D
E Chargaff (1979) ArticleTitleHow genetics got a chemical education Ann NY Acad Sci 325 345–360 Occurrence Handle1:CAS:528:DyaE1MXlsFKkt7w%3D
C Chothia (1992) ArticleTitleOne thousand families for the molecular biologist Nature 357 543–544 Occurrence Handle1608464 Occurrence Handle1:STN:280:DyaK38zgtFGnuw%3D%3D Occurrence Handle10.1038/357543a0
JJ Chou CT Zhang (1993) ArticleTitleA joint prediction of the folding types of 1490 human proteins from their genetic codons J Theor Biol 161 251–262 Occurrence Handle8331952 Occurrence Handle1:CAS:528:DyaK3sXkvV2ns7g%3D Occurrence Handle10.1006/jtbi.1993.1053
KC Chou (1995) ArticleTitleA novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space Proteins 21 319–344 Occurrence Handle7567954 Occurrence Handle1:CAS:528:DyaK2MXls12rsb0%3D Occurrence Handle10.1002/prot.340210406
KC Chou (2000) ArticleTitleReview: Prediction of protein structural classes and subcellular locations Curr Protein Pept Sci 1 171–208 Occurrence Handle12369916 Occurrence Handle1:CAS:528:DC%2BD3cXnsVeisL0%3D Occurrence Handle10.2174/1389203003381379
KC Chou YD Cai (2002) ArticleTitleUsing functional domain composition and support vector machines for prediction of protein subcellular location J Biol Chem 277 45765–45769 Occurrence Handle12186861 Occurrence Handle1:CAS:528:DC%2BD38XovFKjurg%3D Occurrence Handle10.1074/jbc.M204161200
KC Chou YD Cai (2004) ArticleTitlePredicting protein structural class by functional domain composition Biochem Biophys Res Comm 321 1007–1009 Occurrence Handle15358128 Occurrence Handle1:CAS:528:DC%2BD2cXmt1Ogtb0%3D Occurrence Handle10.1016/j.bbrc.2004.07.059
KC Chou DW Elrod (1998) ArticleTitleUsing discriminant function for prediction of subcellular location of prokaryotic proteins Biochem Biophys Res Commun 252 63–68 Occurrence Handle9813147 Occurrence Handle1:CAS:528:DyaK1cXnsVKnur8%3D Occurrence Handle10.1006/bbrc.1998.9498
KC Chou DW Elrod (1999a) ArticleTitleProtein subcellular location prediction Protein Eng 12 107–118 Occurrence Handle1:CAS:528:DyaK1MXhvFehs7g%3D Occurrence Handle10.1093/protein/12.2.107
KC Chou DW Elrod (1999b) ArticleTitlePrediction of membrane protein types and subcellular locations Proteins 34 137–153 Occurrence Handle1:CAS:528:DyaK1MXjtFGisg%3D%3D Occurrence Handle10.1002/(SICI)1097-0134(19990101)34:1<137::AID-PROT11>3.0.CO;2-O
KC Chou GM Maggiora (1998) ArticleTitleDomain structural class prediction Protein Eng 11 523–538 Occurrence Handle9740370 Occurrence Handle1:STN:280:DyaK1cvhtFSrtA%3D%3D Occurrence Handle10.1093/protein/11.7.523
KC Chou W Liu GM Maggiora CT Zhang (1998) ArticleTitlePrediction and classification of domain structural classes Proteins 31 97–103 Occurrence Handle9552161 Occurrence Handle1:CAS:528:DyaK1cXit1yms70%3D Occurrence Handle10.1002/(SICI)1097-0134(19980401)31:1<97::AID-PROT8>3.0.CO;2-E
KC Chou CT Zhang (1994) ArticleTitlePredicting protein folding types by distance functions that make allowances for amino acid interactions J Biol Chem 269 22014–22020 Occurrence Handle8071322 Occurrence Handle1:CAS:528:DyaK2cXlslCls7o%3D
KC Chou CT Zhang (1995) ArticleTitleReview: Prediction of protein structural classes Crit Rev Biochem Mol Biol 30 275–349 Occurrence Handle7587280 Occurrence Handle1:CAS:528:DyaK2MXosFentb8%3D
Chou PY (1980) Amino acid composition of four classes of proteins. In: Abstracts of Papers, Part I, Second Chemical Congress of the North American Continent, Las Vegas, Nevada
PY Chou (1989) Prediction of protein structural classes from amino acid composition GD Fasman (Eds) Prediction of protein structure and the principles of protein conformation Plenum Press New York 549–586
T Creighton (1993) Proteins, structures and molecular properties EditionNumber2 Freeman and Company New York
N Cristianini J Shawe-Taylor (2000) An introduction to support vector machines Cambridge University Press Cambridge
CHQ Ding I Dubchak (2001) ArticleTitleMulti-class protein fold recognition using support vector machines and neural networks Bioinformatics 17 345–358 Occurrence Handle10.1093/bioinformatics/17.4.349
I Dubchak SR Holbrook SH Kim (1993) ArticleTitlePredicting protein secondary structure content: a tandem neural network approach Proteins 16 79–91 Occurrence Handle8497486 Occurrence Handle1:CAS:528:DyaK3sXlvF2jt7w%3D Occurrence Handle10.1002/prot.340160109
QS Du DQ Wei KC Chou (2003) ArticleTitleCorrelations of amino acids in protein Peptides 24 1863–1869 Occurrence Handle15127938 Occurrence Handle1:CAS:528:DC%2BD2cXhtV2ktLw%3D Occurrence Handle10.1016/j.peptides.2003.10.012
S Hua S Sun (2001) ArticleTitleA novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach J Mol Biol 302 397–407 Occurrence Handle10.1006/jmbi.2001.4580 Occurrence Handle1:CAS:528:DC%2BD3MXjtVShs7k%3D
TJP Hubbard J Park (1995) ArticleTitleFold recognition and ab initio structure predictions using Hidden Markov Models and b-strand pair potentials Proteins 23 398–402 Occurrence Handle8710832 Occurrence Handle1:CAS:528:DyaK2MXpslyjsr4%3D Occurrence Handle10.1002/prot.340230313
Z Isik B Yanikoglu U Sezerman (2004) Protein structural class determination using support vector machines C Aykanat T Dayar I Korpeoglu (Eds) Lecture notes in computer science SeriesTitleComputer and information sciences NumberInSeries3280 Springer New York 82–89
V Kecman (2001) Learning and soft computing MIT Press Cambridge
P Klein C Delisi (1986) ArticleTitlePrediction of protein structural class from amino acid sequence Biopolymers 25 1659–1672 Occurrence Handle3768479 Occurrence Handle1:CAS:528:DyaL28XlvVOju7w%3D Occurrence Handle10.1002/bip.360250909
Leslie C, Eskim E, Noble SW (2002) The spectrum kernel: a string kernel for SVM protein classification. In: Proc. Pacific Symposium on Biocomputing 7: 566–775
M Levitt C Chothia (1976) ArticleTitleStructural patterns in globular proteins Nature 261 552–558 Occurrence Handle934293 Occurrence Handle1:CAS:528:DyaE28XltFyhsb4%3D Occurrence Handle10.1038/261552a0
RY Luo ZP Feng JK Liu (2002) ArticleTitlePrediction of protein structural class by amino acid and polypeptide composition Eur J Biochem 269 4219–4225 Occurrence Handle12199700 Occurrence Handle1:CAS:528:DC%2BD38Xnt1eiur8%3D Occurrence Handle10.1046/j.1432-1033.2002.03115.x
F Markowetz L Edler M Vingron (2003) ArticleTitleSupport vector machines for protein fold class prediction Biometr J 45 377–389 Occurrence Handle10.1002/bimj.200390019
BA Metfessel PN Saurugger DP Connelly ST Rich (1993) ArticleTitleCross-validation of protein structural class prediction using statistical clustering and neural networks Protein Sci 2 1171–1182 Occurrence Handle8358300 Occurrence Handle1:CAS:528:DyaK3sXmt1Wjur0%3D Occurrence Handle10.1002/pro.5560020712
H Nakashima K Nishikawa T Ooi (1986) ArticleTitleThe folding type of a protein is relevant to the amino acid composition J Biochem 99 152–162
MN Nguyen JC Rajapakse (2003) ArticleTitleMulti-class support vector machines for protein secondary structure prediction Genome Informatics 14 218–227 Occurrence Handle1:CAS:528:DC%2BD2cXitV2it7s%3D Occurrence Handle15706536
CA Orengo AD Machie S Jones DT Jones MB Swindells SM Thornton (1997) ArticleTitleCATH – a hierarchic classification of protein domain structures Structure 5 1093–1108 Occurrence Handle9309224 Occurrence Handle1:CAS:528:DyaK2sXmt1Wgs74%3D Occurrence Handle10.1016/S0969-2126(97)00260-8
JC Platt (1999) Fast training of support vector machines using sequsntial minimal optimization B Scholkopf CJC Verges AJ Smola (Eds) Advances in kernel methods: support vector learning MIT Press Cambridge
JS Rechardson DC Rechardson (1989) Principles and patterns of protein conformation GD Fasman (Eds) Prediction of protein structure and the principles of protein conformation Plenum Press New York 1–98
B Rost C Sander (1993) ArticleTitlePrediction of protein secondary structure at better than 70% accuracy J Mol Biol 232 584–599 Occurrence Handle8345525 Occurrence Handle1:CAS:528:DyaK3sXmt1WjurY%3D Occurrence Handle10.1006/jmbi.1993.1413
B Rost C Sander (1994) ArticleTitleCombining evolutionary information and neural networks to predict protein secondary structure Proteins 19 55–72 Occurrence Handle8066087 Occurrence Handle1:CAS:528:DyaK2MXhtlWqtw%3D%3D Occurrence Handle10.1002/prot.340190108
S Saxonon W Gilbert (2003) ArticleTitleThe universal of exons revisited Genetica 118 267–278 Occurrence Handle10.1023/A:1024142701533
B Scholkopf A Smola (2002) Learning with kernels MIT Press Cambridge
M Sela FH White SuffixJr CB Anfinsen (1957) ArticleTitleReductive cleavage of disulfide bridges in ribonuclease Science 125 691–692 Occurrence Handle13421663 Occurrence Handle1:CAS:528:DyaG2sXmt1KltQ%3D%3D
N Sueoka (1961) ArticleTitleCompositional correlations between deoxyribonucleic acid and protein Cold Spring Hard Symp Quant Biol 26 35–43 Occurrence Handle1:CAS:528:DyaF38XktlWqtrc%3D
WR Taylor CA Orengo (1989) ArticleTitleProtein structure alignment J Mol Biol 208 1–22 Occurrence Handle2769748 Occurrence Handle1:CAS:528:DyaL1MXkvFentr8%3D Occurrence Handle10.1016/0022-2836(89)90084-3
J Thorsten (2002) Learning to classify text using support vector machines Kluwer Norwell
V Vapnik (1995) Statistical learning theory Wiley New York
ZR Yang (2004) ArticleTitleBiological applications of support vector machines Brief Bioinformatics 5 328–338 Occurrence Handle15606969 Occurrence Handle1:CAS:528:DC%2BD2MXhsVejt7o%3D Occurrence Handle10.1093/bib/5.4.328
ZR Yang KC Chou (2004) ArticleTitleBio-support vector machines for computational proteomic Bioinformatics 20 735–741 Occurrence Handle14751989 Occurrence Handle1:CAS:528:DC%2BD2cXitlyhs7w%3D Occurrence Handle10.1093/bioinformatics/btg477
CT Zhang KC Chou (1992) ArticleTitleAn optimization approach to predicting protein structural class from amino acid composition Protein Sci 1 401–408 Occurrence Handle1304347 Occurrence Handle1:CAS:528:DyaK38Xks1GqtLY%3D Occurrence Handle10.1002/pro.5560010312
GP Zhou (1998) ArticleTitleAn intriguing controversy over protein structural class prediction J Protein Chem 17 729–738 Occurrence Handle9988519 Occurrence Handle1:CAS:528:DyaK1MXnslaltw%3D%3D Occurrence Handle10.1023/A:1020713915365
GP Zhou N Assa-Munt (2001) ArticleTitleSome insights into protein structural class prediction Proteins Struct Funct Genet 44 57–59 Occurrence Handle11354006 Occurrence Handle1:CAS:528:DC%2BD3MXktlSnsbk%3D Occurrence Handle10.1002/prot.1071
GP Zhou K Doctor (2003) ArticleTitleSubcellular location prediction of apoptosis proteins Proteins Struct Funct Genet 50 44–48 Occurrence Handle12471598 Occurrence Handle1:CAS:528:DC%2BD3sXlsVKmug%3D%3D Occurrence Handle10.1002/prot.10251
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sun, XD., Huang, RB. Prediction of protein structural classes using support vector machines. Amino Acids 30, 469–475 (2006). https://doi.org/10.1007/s00726-005-0239-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-005-0239-0