Abstract
Determining the structural class of a given protein can provide important information about its functionality and its general tertiary structure. In the last two decades, the protein structural class prediction problem has attracted tremendous attention and its prediction accuracy has been significantly improved. Features extracted from the Position Specific Scoring Matrix (PSSM) have played an important role to achieve this enhancement. However, this information has not been adequately explored since the protein structural class prediction accuracy relying on PSSM for feature extraction still remains limited. In this study, to explore this potential, we propose segmentation-based feature extraction technique based on the concepts of amino acids’ distribution and auto covariance. By applying a Support Vector Machine (SVM) to our extracted features, we enhance protein structural class prediction accuracy up to 16% over similar studies found in the literature. We achieve over 90% and 80% prediction accuracies for 25PDB and 1189 benchmarks respectively by solely relying on the PSSM for feature extraction.
Chapter PDF
References
Chothia, C.: The nature of the accessible and buried surfaces in proteins. Journal of Molecular Biology 105(1), 1–12 (1976)
Chou, K.C.: Progress in protein structural class prediction and its impact to bioinformatics and proteomics. Current Protein and Peptide Science 6, 423–436 (2005)
Wang, Z.X., Yuan, Z.: How good is prediction of protein structural class by the component-coupled method? Proteins: Structure, Function, and Bioinformatics 38(2), 165–175 (2000)
Chou, K.C.: Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology 273(1), 236–247 (2011)
Yang, J.Y., Peng, Z.L., Chen, X.: Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinformatics 11(suppl. 1), S9 (2010)
Li, Z.C., Zhou, X.B., Lin, Y.R., Zou, X.Y.: Prediction of protein structure class by coupling improved genetic algorithm and support vector machine. Amino Acids 35(3), 581–590 (2008)
Zhang, S., Ding, S., Wang, T.: High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure. Biochimie 93(4), 710–714 (2011)
Liu, T., Jia, C.: A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. Journal of Theoretical Biology 267(3), 272–275 (2010)
Jahandideh, S., Abdolmaleki, P., Jahandideh, M., Asadabadi, E.B.: Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophysical Chemistry 128(1), 87–93 (2007)
Jahandideh, S., Abdolmaleki, P., Jahandideh, M., Hayatshahi, S.H.S.: Novel hybrid method for the evaluation of parameters contributing in determination of protein structural classes. Journal of Theoretical Biology 244(2), 275–281 (2007)
Cai, Y.D., Feng, K., Lu, W., Chou, K.: Using logitboost classifier to predict protein structural classes. Theoretical Biollogy 238, 172–176 (2006)
Jain, P., Hirst, J.: Automatic structure classification of small proteins using random forest. BMC Bioinformatics 11(1), 364 (2010)
Kurgan, L.A., Chen, K.: Prediction of protein structural class for the twilight zone sequences. Biochemical and Biophysical Research Communications 357(2), 453–460 (2007)
Kurgan, L.A., Zhang, T., Zhang, H., Shen, S., Ruan, J.: Secondary structure-based assignment of the protein structural classes. Amino Acids 35, 551–564 (2008)
Chen, K., Kurgan, L.A., Ruan, J.: Prediction of protein structural class using novel evolutionary collocation-based sequence representation. Journal of Computational Chemistry 29(10), 1596–1604 (2008)
Mizianty, M., Kurgan, L.A.: Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC Bioinformatics 10(1), 414 (2009)
Liu, T., Geng, X., Zheng, X., Li, R., Wang, J.: Accurate prediction of protein structural class using auto covariance transformation of psi-blast profiles. Amino Acids 42, 2243–2249 (2012)
Zhang, S., Ye, F., Yuan, X.: Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via pssm. Journal of Biomolecular Structure and Dynamics 29(6), 1138–1146 (2012)
Kurgan, L.A., Homaeian, L.: Prediction of structural classes for protein sequences and domains - impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recognition 39, 2323–2343 (2006)
Yang, J.Y., Peng, Z.L., Yu, Z.G., Zhang, R.J., Anh, V., Wang, D.: Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. Journal of Theoretical Biology 257(4), 618–626 (2009)
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 17, 3389–3402 (1997)
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292(2), 195–202 (1999)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28(1), 235–242 (2000)
Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: Scop: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247(4), 536–540 (1995)
Vapnik, V.N.: The nature of statistical learning theory. Springer-Verlag New York, Inc. (1995)
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines (2001)
Costantini, S., Facchiano, A.M.: Prediction of the protein structural class by specific peptide frequencies. Biochimie 91(2), 226–229 (2009)
Kurgan, L.A., Cios, K.J., Chen, K.: Scpred: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinformatics 9, 226 (2008)
Li, Z.C., Zhou, X.B., Dai, Z., Zou, X.Y.: Prediction of protein structural classes by chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis. Amino Acids 37, 415–425 (2009)
Zhang, T.L., Ding, Y.S., Chou, K.C.: Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. Theoretical Biology 250, 186–193 (2008)
Chen, C., Zhou, X., Tian, Y., Zou, X., Cai, P.: Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Analytical Biochemistry 357(1), 116–121 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dehzangi, A., Paliwal, K., Lyons, J., Sharma, A., Sattar, A. (2013). Exploring Potential Discriminatory Information Embedded in PSSM to Enhance Protein Structural Class Prediction Accuracy. In: Ngom, A., Formenti, E., Hao, JK., Zhao, XM., van Laarhoven, T. (eds) Pattern Recognition in Bioinformatics. PRIB 2013. Lecture Notes in Computer Science(), vol 7986. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39159-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-39159-0_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39158-3
Online ISBN: 978-3-642-39159-0
eBook Packages: Computer ScienceComputer Science (R0)