Abstract
Knowledge of structural class plays an important role in understanding protein folding patterns. As a transitional stage in recognition of three-dimensional structure of a protein, protein structural class prediction is considered to be an important and challenging task. In this study, we firstly introduce a feature extraction technique which is based on tri-grams computed directly from position-specific scoring matrix (PSSM). A total of 8,000 features are extracted to represent a protein. Then, support vector machine-recursive feature elimination (SVM-RFE) is applied for feature selection and reduced features are input to a support vector machine (SVM) classifier to predict structural class of a given protein. To examine the effectiveness of our method, jackknife tests are performed on six widely used benchmark datasets, i.e., Z277, Z498, 1189, 25PDB, D640, and D1185. The overall accuracies of 97.1, 98.6, 92.5, 93.5, 94.2, and 95.9 % are achieved on these datasets, respectively. Comparison of the proposed method with other prediction methods shows that our method is very promising to perform the prediction of protein structural class.
Similar content being viewed by others
References
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. doi:10.1093/nar/25.17.3389
Anand A, Pugalenthi G, Suganthan PN (2008) Predicting protein structural class by SVM with class-wise optimized features and decision probabilities. J Theor Biol 253(2):375–380. doi:10.1016/j.jtbi.2008.02.031
Cai YD, Zhou GP (2000) Prediction of protein structural classes by neural network. Biochimie 82(8):783–785
Cai YD, Liu XJ, Xu X, Zhou GP (2001) Support vector machines for predicting protein structural class. BMC Bioinform 2:3. doi:10.1186/1471-2105-2-3
Cai YD, Liu XJ, Xu XB, Chou KC (2002) Prediction of protein structural classes by support vector machines. Comput Chem 26(3):293–296. doi:10.1016/s0097-8485(01)00113-9
Cao YF, Liu S, Zhang LD, Qin J, Wang J, Tang KX (2006) Prediction of protein structural class with Rough Sets. BMC Bioinform 7:20. doi:10.1186/1471-2105-7-20
Chang CC, Lin CJ (2011) LIBSVM: A Library for Support Vector Machines. ACM Trans Intell Syst Technol 2(3):27. doi:10.1145/1961189.1961199
Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006a) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243(3):444–448. doi:10.1016/j.jtbi.2006.06.025
Chen C, Zhou X, Tian Y, Zou X, Cai P (2006b) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357(1):116–121. doi:10.1016/j.ab.2006.07.022
Chen K, Kurgan LA, Ruan JS (2008) Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 29(10):1596–1604. doi:10.1002/Jcc.20918
Chou KC (1999) A key driving force in determination of protein structural classes. Biochem Biophys Res Commun 264(1):216–224. doi:10.1006/bbrc.1999.1325
Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43(3):246–255. doi:10.1002/prot.1035
Chou KC (2005) Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Curr Protein Pept Sci 6(5):423–436. doi:10.2174/138920305774329368
Chou KC, Cai YD (2004) Predicting protein structural class by functional domain composition. Biochem Biophys Res Commun 321(4):1007–1009. doi:10.1016/j.bbrc.2004.07.059
Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30(4):275–349. doi:10.3109/10409239509083488
Dehzangi A, Paliwal K, Sharma A, Dehzangi O, Sattar A (2013) A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem. IEEE/ACM Trans Comput Biol Bioinform 10(3):564–575. doi:10.1109/TCBB.2013.65
Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A (2014) Proposing a highly accurate protein structural class predictor using segmentation-based features. BMC Genomics 15(Suppl 1):S2. doi:10.1186/1471-2164-15-s1-s2
Deschavanne P, Tuffery P (2008) Exploring an alignment free approach for protein classification and structural class prediction. Biochimie 90(4):615–625. doi:10.1016/j.biochi.2007.11.004
Ding S, Yan S, Qi S, Li Y, Yao Y (2014) A protein structural classes prediction method based on PSI-BLAST profile. J Theor Biol 353:19–23. doi:10.1016/j.jtbi.2014.02.034
Dong L, Yuan Y, Cai Y (2006) Using bagging classifier to predict protein domain structural class. J Biomol Struct Dyn 24(3):239–242
Feng KY, Cai YD, Chou KC (2005) Boosting classifier for predicting protein domain structural class. Biochem Biophys Res Commun 334(1):213–217. doi:10.1016/j.bbrc.2005.06.075
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422. doi:10.1023/A:1012487302797
Hayat M, Tahir M, Khan SA (2014) Prediction of protein structure classes using hybrid space of multi-profile Bayes and bi-gram probability feature spaces. J Theor Biol 346(7):8–15. doi:10.1016/j.jtbi.2013.12.015
Jin LX, Fang WW, Tang HW (2003) Prediction of protein structural classes by a new measure of information discrepancy. Comput Biol Chem 27(3):373–380. doi:10.1016/S1476-9271(02)00087-7
Kedarisetti KD, Kurgan L, Dick S (2006a) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348(3):981–988. doi:10.1016/j.bbrc.2006.07.141
Kedarisetti KD, Kurgan L, Dick S (2006b) A comment on—“Prediction of protein structural classes by a new measure of information discrepancy”. Comput Biol Chem 30(5):393–394. doi:10.1016/j.compbiolchem.2006.06.003
Kurgan LA, Homaeian L (2006) Prediction of structural classes for protein sequences and domains—impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recogn 39(12):2323–2343. doi:10.1016/j.patcog.2006.02.014
Kurgan L, Cios K, Chen K (2008a) SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinform 9:226. doi:10.1186/1471-2105-9-226
Kurgan LA, Zhang T, Zhang H, Shen SY, Ruan JS (2008b) Secondary structure-based assignment of the protein structural classes. Amino Acids 35(3):551–564. doi:10.1007/s00726-008-0080-3
Levitt M, Chothia C (1976) Structural patterns in globular proteins. Nature 261(5561):552–558. doi:10.1038/261552a0
Li ZC, Zhou XB, Lin YR, Zou XY (2008) Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids 35(3):581–590. doi:10.1007/s00726-008-0084-z
Li ZC, Zhou XB, Dai Z, Zou XY (2009) Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis. Amino Acids 37(2):415–425. doi:10.1007/s00726-008-0170-2
Li L, Cui X, Yu S, Zhang Y, Luo Z, Yang H, Zhou Y, Zheng X (2014) PSSP-RFE: Accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations. PLoS One 9(3):e92863. doi:10.1371/journal.pone.0092863
Lin H, Li QZ (2007) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28(9):1463–1466. doi:10.1002/Jcc.20554
Liu TG, Zheng XQ, Wang J (2010) Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile. Biochimie 92(10):1330–1334. doi:10.1016/j.biochi.2010.06.013
Liu T, Geng X, Zheng X, Li R, Wang J (2012) Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles. Amino Acids 42(6):2243–2249. doi:10.1007/s00726-011-0964-5
Luo RY, Feng ZP, Liu JK (2002) Prediction of protein structural class by amino acid and polypeptide composition. Eur J Biochem 269(17):4219–4225. doi:10.1046/j.1432-1033.2002.03115.x
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405(2):442–451. doi:10.1016/0005-2795(75)90109-9
Mizianty MJ, Kurgan L (2009) Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC Bioinform 10:24. doi:10.1186/1471-2105-10-414
Nakashima H, Nishikawa K, Ooi T (1986) The folding type of a protein is relevant to the amino acid composition. J Biochem 99(1):153–162
Paliwal KK, Sharma A, Lyons J, Dehzangi A (2014) A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition. IEEE Trans Nanobiosci 13(1):44–50. doi:10.1109/tnb.2013.2296050
Shen HB, Yang J, Liu XJ, Chou KC (2005) Using supervised fuzzy clustering to predict protein structural classes. Biochem Biophys Res Commun 334(2):577–581. doi:10.1016/j.bbrc.2005.06.128
Vapnik V (1995) The Nature of Statistical Learning Theory. Springer, New York
Wang ZX, Yuan Z (2000) How good is prediction of protein structural class by the component-coupled method? Proteins 38(2):165–175. doi:10.1002/(sici)1097-0134(20000201)38:2<165::aid-prot5>3.0.co;2-v
Wang J, Li Y, Liu X, Dai Q, Yao Y, He P (2014) High-accuracy prediction of protein structural classes using PseAA structural properties and secondary structural patterns. Biochimie 101:104–112. doi:10.1016/j.biochi.2013.12.021
Xia X-Y, Ge M, Wang Z-X, Pan X-M (2012) Accurate prediction of protein structural class. PLoS One 7(6):e37653. doi:10.1371/journal.pone.0037653
Yang JY, Peng ZL, Chen X (2010) Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinform 11(Suppl 1):10. doi:10.1186/1471-2105-11-s1-s9
Zheng X, Li C, Wang J (2010) An information-theoretic approach to the prediction of protein structural class. J Comput Chem 31(6):1201–1206. doi:10.1002/jcc.21406
Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17(8):729–738. doi:10.1023/a:1020713915365
Acknowledgments
This work was supported by the Innovation Program of Shanghai Municipal Education Commission (No. 13YZ098), the Foundation for University Youth Teachers of Shanghai (No. ZZhy12028), the National Natural Science Foundation of China (Nos. 31271830, 41376135), and the Doctoral Fund of Shanghai Ocean University.
Conflict of interest
The authors declare no conflict of interest related to this study.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Tao, P., Liu, T., Li, X. et al. Prediction of protein structural class using tri-gram probabilities of position-specific scoring matrix and recursive feature elimination. Amino Acids 47, 461–468 (2015). https://doi.org/10.1007/s00726-014-1878-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-014-1878-9