Abstract
Several descriptors of protein structure at the sequence and residue levels have been recently proposed. They are widely adopted in the analysis and prediction of structural and functional characteristics of proteins. Numerous in silico methods have been developed for sequence-based prediction of these descriptors. However, many of them do not have a public web-server and only a few integrate multiple descriptors to improve the predictions. We introduce iFC2 (integrated prediction of fold, class, and content) server that is the first to integrate three modern predictors of sequence-level descriptors. They concern fold type (PFRES), structural class (SCEC), and secondary structure content (PSSC-core). The server exploits relations between the three descriptors to implement a cross-evaluation procedure that improves over the predictions of the individual methods. The iFC2 annotates fold and class predictions as potentially correct/incorrect. When tested on datasets with low-similarity chains, for the fold prediction iFC2 labels 82% of the PFRES predictions as correct and the accuracy of these predictions equals 72%. The accuracy of the remaining 28% of the PFRES predictions equals 38%. Similarly, our server assigns correct labels for over 79% of SCEC predictions, which are shown to be 98% accurate, while the remaining SCEC predictions are only 15% accurate. These results are shown to be competitive when contrasted against recent relevant web-servers. Predictions on CASP8 targets show that the content predicted by iFC2 is competitive when compared with the content computed from the tertiary structures predicted by three best-performing methods in CASP8. The iFC2 server is available at http://biomine.ece.ualberta.ca/1D/1D.html.
Similar content being viewed by others
Abbreviations
- CASP:
-
Critical assessment of techniques for protein structure prediction
- FASTA:
-
FAST-all
- iFC2 :
-
Integrated prediction of fold class and content
- iFC2-FT:
-
iFC2 cross-evaluation for fold type
- iFC2-SSC:
-
iFC2 cross-evaluation for secondary structure content
- iFC2-SC:
-
iFC2 cross-evaluation for structural class
- MAE:
-
Mean absolute error
- PSSM:
-
Position-specific scoring matrix
- PSSC-core:
-
Prediction of secondary structure content through comprehensive sequence representation
- SCEC:
-
Prediction of structural class using evolutionary collocation
- PDB:
-
Protein data bank
- PFRES:
-
Protein fold recognition using evolutionary information and predicted secondary structure
- SCOP:
-
Structural classification of proteins
- SVM:
-
Support vector machine
- 3D:
-
Tertiary
References
Ahmad S, Gromiha MM (2002) NETASA: neural network based prediction of solvent accessibility. Bioinformatics 18:819–824
Ahmad S, Gromiha MM, Sarai A (2003) Real value prediction of solvent accessibility from amino acid sequence. Proteins 50:629–635
Altschul SF et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 17:3389–3402
Bahar I, Atilgan AR, Jernigan RL, Erman B (1997) Understanding the recognition of protein structural classes by amino acid composition. Proteins 29:172–185
Björkholm P, Daniluk P, Kryshtafovych A, Fidelis K, Andersson R, Hvidsten TR (2009) Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue–residue contacts. Bioinformatics 25:1264–1270
Cai YD, Liu XJ, Chou KC (2003a) Prediction of protein secondary structure content by artificial neural network. J Comput Chem 24:727–731
Cai YD, Zhou GP, Chou KC (2003b) Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J 84:3257–3263
Chen K, Kurgan L (2007) PFRES: protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics 23:2843–2850
Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243:444–448
Chen C, Chen LX, Zou XY, Cai PX (2008a) Predicting protein structural class based on multi-features fusion. J Theor Biol 253:388–392
Chen K, Kurgan L, Ruan J (2008b) Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 29:1596–1604
Chen K, Kurgan M, Kurgan L (2008c) Sequence based prediction of relative solvent accessibility using two-stage support vector regression with confidence values. J Biomed Sci Eng 1:1–9
Chen Y, Chen Q, Chen F, Zhao Y (2008d) Protein fold recognition based on error correcting output codes and SVM. Protein Pept Lett 15:443–447
Chen L, Lu L, Feng K, Li W, Song J, Zheng L, Yuan Y, Zeng Z, Feng K, Lu W, Cai Y (2009) Multiple classifier integration for the prediction of protein structural classes. J Comput Chem 30:2248–2254
Cheng J, Randall AZ, Sweredoski MJ, Baldi P (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 33:W72–W76
Chou KC (1997) Prediction and classification of alpha-turn types. Biopolymers 42:837–853
Chou KC (1999) Using pair-coupled amino-acid composition to predict protein secondary structure content. J Protein Chem 18:473–480
Chou KC (2000a) Prediction of protein structural classes and subcellular locations. Curr Protein Pept Sci 1:171–208
Chou KC (2000b) Prediction of tight turns and their types in proteins. Anal Biochem 286:1–16
Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins 43:246–255
Chou KC (2004) Structural bioinformatics and its impact to biomedical science. Curr Med Chem 11:2105–2134
Chou KC (2005a) Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Curr Protein Pept Sci 6:423–436
Chou KC (2005b) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19
Chou KC (2009) Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr Proteomics 6:262–274
Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769
Chou KC, Cai YD (2004) Predicting protein structural class by functional domain composition. Biochem Biophys Res Commun 321:1007–1009
Chou KC, Shen HB (2006) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157
Chou KC, Shen HB (2007) Recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16
Chou KC, Shen HB (2008) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162
Chou KC, Shen HB (2009a) Recent advances in developing web-servers for predicting protein attributes. Nat Sci 2:63–92
Chou KC, Shen HB (2009b) FoldRate: a web-server for predicting protein folding rates from primary sequence. Open Bioinform J 3:31–50
Concepcion GP, David MP, Padlan EA (2005) Why don’t humans get scrapie from eating sheep? A possible explanation based on secondary structure predictions. Med Hypotheses 64:919–924
Damoulas T, Girolami MA (2008) Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection. Bioinformatics 24:1264–1270
Ding CH, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17:349–358
Ding YS, Zhang TL, Gu Q, Zhao PY, Chou KC (2009) Using maximum entropy model to predict protein secondary structure with single sequence. Protein Pept Lett 16:552–560
Dobson PD, Doig AJ (2003) Distinguishing enzyme structures from non-enzymes without alignments. J Mol Biol 330:771–783
Dobson PD, Doig AJ (2005) Predicting enzyme class from protein structure without alignments. J Mol Biol 345:187–199
Dor O, Zhou Y (2007) Real-SPINE: an integrated system of neural networks for real-value prediction of protein structural properties. Proteins 68:76–81
Enright AJ, Van Dongen S, Ouzounis CA (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 30:1575–1584
Faraggi E, Xue B, Zhou Y (2009) Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins 74:847–856
Fischer JD, Mayer CE, Söding J (2008) Prediction of protein functional residues from sequence by probability density estimation. Bioinformatics 24:613–620
Gao J, Zhang T, Zhang H, Shen S, Ruan J, Kurgan L (2010) Accurate prediction of protein folding rates from sequence and sequence-derived residue flexibility and solvent accessibility. Proteins 78(9):2114–2130
Garg A, Kaur H, Raghava GP (2005) Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure. Proteins 61:318–324
Gewehr JE, Hintermair V, Zimmer R (2007) AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings. Bioinformatics 23:1203–1210
Gong H, Isom DG, Srinivasan R, Rose GD (2003) Local secondary structure content predicts folding rates for simple, two-state proteins. J Mol Biol 327:1149–1154
Gromiha M (2005a) Motifs in outer membrane protein sequences: applications for discrimination. Biophys Chem 117:65–71
Gromiha M (2005b) A statistical model for predicting protein folding rates from amino acid sequence with structural class information. J Chem Inf Model 45:494–501
Gromiha M, Selvaraj S (2008) Bioinformatics approaches for understanding and predicting protein folding rates. Curr Bioinform 3:1–9
Gromiha M, Suwa M (2005) A simple statistical method for discriminating outer membrane proteins with better accuracy. Bioinformatics 21:961–968
Gromiha M, Selvaraj S, Thangakani AM (2006) A statistical method for predicting protein unfolding rates from amino acid sequence. J Chem Inf Model 46:1503–1508
Homaeian L, Kurgan L, Ruan J, Cios KJ, Chen K (2007) Prediction of protein secondary structure content for the twilight zone sequences. Proteins 69:486–498
Hu X, Li Q (2008) Using support vector machine to predict beta- and gamma-turns in proteins. J Comput Chem 29:1867–1875
Huang JT, Cheng JP (2007) Prediction of folding transition-state position (T) of small, two-state proteins from local secondary structure content. Proteins 68:218–222
Hvidsten TR, Kryshtafovych A, Komorowski J, Fidelis K (2003) Novel approach to fold recognition using sequence-derived properties from sets of structurally similar local fragments of proteins. Bioinformatics 19(Suppl 2):ii81–ii91
Ivankov DN, Finkelstein AV (2004) Prediction of protein folding rates from the amino acid sequence-predicted secondary structure. Proc Natl Acad Sci USA 101:8942–8944
Jauch R, Yeo HC, Kolatkar PR, Clarke ND (2007) Assessment of CASP7 structure predictions for template free targets. Proteins 69(Suppl 8): 57–67
Jeong J, Berman P, Przytycka T (2006) Fold classification based on secondary structure—how much is gained by including loop topology? BMC Struct Biol 6:3
Jiang Y, Iglinski P, Kurgan L (2009) Prediction of protein folding rates from primary sequences using hybrid sequence representation. J Comput Chem 30:772–783
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637
Kedarisetti KD, Kurgan L, Dick S (2006) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–988
Kim H, Park H (2004) Prediction of protein relative solvent accessibility with support vector machines and long-range interaction 3D local descriptor. Proteins 54:557–562
Kinjo AR, Nishikawa K (2005a) Recoverable one-dimensional encoding of protein three-dimensional structures. Bioinformatics 21:2167–2170
Kinjo AR, Nishikawa K (2005b) Predicting secondary structures, contact numbers, and residue-wise contact orders of native protein structures from amino acid sequences using critical random networks. Biophysics 1:67–74
Kinjo AR, Horimoto K, Nishikawa K (2005) Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58:158–165
Kurgan L (2008) On the relation between the predicted secondary structure and the protein size. Protein J 27:234–239
Kurgan L, Chen K (2007) Prediction of protein structural class for the twilight zone sequences. Biochem Biophys Res Commun 357:453–460
Kurgan L, Mizianty M (2009) Sequence-based protein crystallization propensity prediction for structural genomics: review and comparative analysis. Nat Sci 1(2):93–106
Kurgan L, Cios K, Chen K (2008a) SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinform 9:226
Kurgan L, Zhang T, Zhang H, Shen S, Ruan J (2008b) Secondary structure based assignment of the protein structural classes. Amino Acids 35:551–556
Kuznetsov IB, Gou Z, Li R, Hwang S (2006) Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins. Proteins 64:19–27
Lee S, Lee BC, Kim D (2006) Prediction of protein secondary structure content using amino acid composition and evolutionary information. Proteins 62:1107–1114
Li ZC, Zhou XB, Lin YR, Zou XY (2008) Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids 35:581–590
Li S, Li H, Li M, Shyr Y, Xie L, Li Y (2009) Improved prediction of lysine acetylation by support vector machines. Protein Pept Lett 16:977–983
Lin H, Li QZ (2007) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466
Lin ZH, Wang HL, Zhu B, Wang YQ, Lin Y, Wu YZ (2009) Estimation of affinity of HLA-A*0201 restricted CTL epitope based on the SCORE function. Protein Pept Lett 16:561–569
Liu W, Chou KC (1999) Prediction of protein secondary structure content. Protein Eng 12:1041–1050
Liu J, Gough J, Rost B (2006) Distinguishing protein-coding from non-coding RNAs through support vector machines. PLoS Genet 2:529–536
McGuffin LJ, Bryson K, Jones DT (2000) The PSIPRED protein structure prediction server. Bioinformatics 16:404–405
Melvin I, Ie E, Kuang R, Weston J, Stafford WN, Leslie C (2007) SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinform 8(Suppl 4):S2
Mizianty M, Kurgan L (2009a) Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC Bioinform 10:414
Mizianty M, Kurgan L (2009b) Meta prediction of protein crystallization propensity. Biochem Biophys Res Commun 390(1):10–15
Montgomerie S, Cruz JA, Shrivastava S, Arndt D, Berjanskii M, Wishart D (2008) PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation. Nucleic Acids Res 36:W202–W209
Mooney C, Pollastri G (2009) Beyond the Twilight Zone: automated prediction of structural properties of proteins by recursive neural networks and remote homology information. Proteins 77:181–190
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540
Nanni L, Lumini A (2009) A further step toward an optimal ensemble of classifiers for peptide classification, a case study: HIV protease. Protein Pept Lett 16:163–167
Nguyen MN, Rajapakse JC (2006) Two-stage support vector regression approach for predicting accessible surface areas of amino acids. Proteins 63:542–550
Ofer D, Yaoqi Z (2007) Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training. Proteins 66:838–845
Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21:1719–1720
Pollastri G, Baldi P, Fariselli P, Casadio R (2001) Improved prediction of the number of residue contacts in proteins by recurrent neural networks. Bioinformatics 1:S234–S242
Radivojac P, Obradovic Z, Smith DK, Zhu G, Vucetic S, Brown CJ, Lawson JD, Dunker AK (2004) Protein flexibility and intrinsic disorder. Protein Sci 13:71–80
Rangwala H, Karypis G (2006) Building multiclass classifiers for remote homology detection and fold recognition. BMC Bioinform 7:455
Redfern OC, Harrison A, Dallman T, Pearl FM, Orengo CA (2007) CATHEDRAL: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput Biol 3:e232
Rost B (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134:204–218
Rost B (2005) Prediction of protein structure in 1D—secondary structure, membrane regions, and solvent accessibility. In: Bourne PE, Weissig H (eds) Struct Bioinform 44:559–587
Rost B (2008) Prediction of protein structure in 1D—secondary structure, membrane regions, and solvent accessibility. In: Bourne PE, Weissig H (eds) Structural Bioinformatics. Wiley, New York
Rost B, Yachdav G, Liu J (2004) The PredictProtein server. Nucleic Acids Res 32:W321–W326
Ruan J, Wang K, Yang J, Kurgan L, Cios KJ (2005) Highly accurate and consistent method for prediction of helix and strand content from primary protein sequences. Artif Intel Med 35:19–35
Schlessinger A, Rost B (2005) Protein flexibility and rigidity predicted from sequence. Proteins 61:115–126
Sethi D, Garg A, Raghava GP (2008) DPROT: prediction of disordered proteins using evolutionary information. Amino Acids 35:599–605
Shamim MT, Anwaruddin M, Nagarajaram HA (2007) Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinformatics 23:3320–3327
Shen HB, Chou KC (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics 22:1717–1722
Shen HB, Chou KC (2009) Predicting protein fold pattern with functional domain and sequential evolution information. J Theor Biol 256:441–446
Shen HB, Song JN, Chou KC (2009) Prediction of protein folding rates from primary sequence by fusing multiple sequential features. J Biomed Sci Eng 2:136–143
Shi Y, Zhou J, Arndt D, Wishart DS, Lin G (2008) Protein contact order prediction from primary sequences. BMC Bioinform 9:255
Smith J, Diez G, Klemm AH, Schewkunow V, Goldmann WH (2006) CapZ–lipid membrane interactions: a computer analysis. Theor Biol Med Model 3:33–37
Song JN, Burrage K (2006) Predicting residue-wise contact orders in proteins by support vector regression. BMC Bioinform 7:425
Taguchi Y, Gromiha M (2007) Application of amino acid occurrence for discriminating different folding types of globular proteins. BMC Bioinform 8:404
Vilar S, Gonzalez-Diaz H, Santana L, Uriarte E (2009) A network-QSAR model for prediction of genetic-component biomarkers in human colorectal cancer. J Theor Biol 261:449–458
Wang Y, Xue Z, Shen G, Xu J (2008) PRINTR: prediction of RNA binding sites in proteins using SVM and profiles. Amino Acids 35:295–302
Wu S, Zhang Y (2008) MUSTER: Improving protein sequence profile–profile alignments by using multiple sources of structure information. Proteins 72:547–556
Xiao X, Lin WZ, Chou KC (2008a) Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes. J Comput Chem 29:2018–2024
Xiao X, Wang P, Chou KC (2008b) Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. J Theor Biol 254:691–696
Xu J, Li M, Kim D, Xu Y (2003) RAPTOR: optimal protein threading by linear programming. J Bioinform Comput Biol 1:95–117
Xue B, Dor O, Faraggi E, Zhou Y (2008) Real-value prediction of backbone torsion angles. Proteins 72:427–433
Yang JY, Peng ZL, Yu ZG, Zhang RJ, Anh V, Wang D (2009) Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. J Theor Biol 257:618–626
Yuan Z (2005) Better prediction of protein contact number using a support vector regression analysis of amino acid sequence. BMC Bioinform 6:248
Yuan Z, Wang ZX (2008) Quantifying the relationship of protein burying depth and sequence. Proteins 70:509–516
Yuan Z, Bailey TL, Teasdale RD (2005) Prediction of protein B-factor profiles. Proteins 58:905–912
Zeng YH, Guo YZ, Xiao RQ, Yang L, Yu LZ, Li ML (2009) Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 259:366–372
Zhang Y (2007) Template-based modeling and free modeling by I-TASSER in CASP7. Proteins 69(8):108–117
Zhang Q, Yoon S, Welsh WJ (2005) Improved method for predicting β-turn using support vector machine. Bioinformatics 21:2370–2374
Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L (2008a) Sequence based residue depth prediction using evolutionary information and predicted secondary structure. BMC Bioinform 9:388
Zhang TL, Ding YS, Chou KC (2008b) Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. J Theor Biol 250:186–193
Zhang H, Zhang T, Chen K, Shen S, Ruan J, Kurgan L (2009) On the relation between residue flexibility and local solvent accessibility in proteins. Proteins 76:617–636
Zheng C, Kurgan L (2008) Prediction of ß-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments. BMC Bioinform 9:430
Zhou H, Pandit SB, Lee SY, Borreguero J, Chen H, Wroblewska L, Skolnick J (2007) Analysis of TASSER-based CASP7 protein structure prediction results. Proteins 69(8):90–97
Acknowledgments
KC and WS research was supported by the Alberta Ingenuity and iCORE Scholarships. LK acknowledges support from NSERC Canada.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chen, K., Stach, W., Homaeian, L. et al. iFC2: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content. Amino Acids 40, 963–973 (2011). https://doi.org/10.1007/s00726-010-0721-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-010-0721-1