Prediction of protein structural classes based on feature selection technique

Abstract

The prediction of protein structural classes is beneficial to understanding folding patterns, functions and interactions of proteins. In this study, we proposed a feature selection-based method to accurately predict protein structural classes. Three datasets with sequence identity lower than 25% were used to test the prediction performance of the method. Through jackknife cross-validation, we have verified that the overall accuracies of these three datasets are 92.1%, 89.7% and 84.0%, respectively. The proposed method is more efficient and accurate than other existing methods. The present study will offer an excellent alternative to other methods for predicting protein structural classes.

This is a preview of subscription content, log in to check access.

References

  1. [1]

    Bu, W.S., Feng, Z.P., Zhang, Z., Zhang, C.T. 1999. Prediction of protein (domain) structural classes based on amino-acid index. Eur J Biochem 266, 1043–1049.

    PubMed  Article  CAS  Google Scholar 

  2. [2]

    Cai, Y.D., Li, Y.X., Chou, K.C. 2000. Using neural networks for prediction of domain structural classes. Biochim Biophys Acta 3, 1–2.

    Article  Google Scholar 

  3. [3]

    Chen, C., Shen, Z.B., Zou, X.Y. 2012. Dual-layer Wavelet SVM for Predicting Protein Structural Class via the General Form of Chou’s Pseudo Amino Acid Composition. Protein Pept Lett 19, 422–429.

    PubMed  Article  CAS  Google Scholar 

  4. [4]

    Chen, K., Kurgan, L.A., Ruan, J. 2008. Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 29, 1596–1604.

    PubMed  Article  CAS  Google Scholar 

  5. [5]

    Costantini, S., Facchiano, A.M. 2009. Prediction of the protein structural class by specific peptide frequencies. Biochimie 91, 226–229.

    PubMed  Article  CAS  Google Scholar 

  6. [6]

    Dai, Q., Wu, L., Li, L. 2011. Improving protein structural class prediction using novel combined sequence information and predicted secondary structural features. J Comput Chem 32, 3393–3398.

    PubMed  Article  CAS  Google Scholar 

  7. [7]

    Ding, S., Zhang, S., Li. Y., Wang, T. 2012. A novel protein structural classes prediction method based on predicted secondary structure. Biochimie 94, 1166–1171.

    PubMed  Article  CAS  Google Scholar 

  8. [8]

    Fan, R.E., Chen, P.H., Lin, C.J. 2005. Working set selection using the second order information for training SVM. J Mach Learn Res 6, 1889–1918.

    Google Scholar 

  9. [9]

    Feng, Y., Luo, L. 2008. Use of tetrapeptide signals for protein secondary-structure prediction. Amino Acids 35, 607–614.

    PubMed  Article  CAS  Google Scholar 

  10. [10]

    Kurgan, L., Chen, K. 2007. Prediction of protein structural class for the twilight zone sequences. Biochem Biophys Res Commun 357, 453–460.

    PubMed  Article  CAS  Google Scholar 

  11. [11]

    Kurgan, L., Cios, K., Chen, K. 2008a. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinformatics 9, 226.

    PubMed  Article  PubMed Central  Google Scholar 

  12. [12]

    Kurgan, L., Homaeian, L. 2006. Prediction of structural classes for protein sequences and domains—Impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy. Pattern Recog 39, 2323–2343.

    Article  Google Scholar 

  13. [13]

    Kurgan, L., Zhang, T., Zhang, H., Shen, S., Ruan, J. 2008b. Secondary structure-based assignment of the protein structural classes. Amino Acids 35, 551–564.

    PubMed  Article  CAS  Google Scholar 

  14. [14]

    Levitt, M., Chothia, C. 1976. Structural patterns in globular proteins. Nature 261, 552–558.

    PubMed  Article  CAS  Google Scholar 

  15. [15]

    Li, Z.C., Zhou, X.B., Dai, Z., Zou, X.Y. 2009. Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis. Amino Acids 37, 415–425.

    PubMed  Article  Google Scholar 

  16. [16]

    Lin, H., Ding, C., Song, Q., Yang, P., Ding, H., Deng, K.J, Chen, W. 2012. The prediction of protein structural class using averaged chemical shifts. J Biomol Struct Dyn 29, 643–648.

    PubMed  Article  Google Scholar 

  17. [17]

    Lin, H., Li, Q.Z. 2007. Using Pseudo Amino Acid Composition to Predict Protein Structural Class: Approached by Incorporating 400 Dipeptide Components. J Comput Chem 28, 1463–1466.

    PubMed  Article  CAS  Google Scholar 

  18. [18]

    Liu, T., Geng, X., Zheng, X., Li, R., Wang, J. 2011. Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles. Amino Acids 42, 2243–2249.

    PubMed  Article  Google Scholar 

  19. [19]

    Liu, T., Jia, C. 2010. A high-accuracy protein structural class prediction algorithm using predicted secondary structural information. J Theor Biol 267, 272–275.

    PubMed  Article  CAS  Google Scholar 

  20. [20]

    Liu, T., Zheng, X., Wang, J. 2010. Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile. Biochimie 92, 1330–1334.

    PubMed  Article  CAS  Google Scholar 

  21. [21]

    McGuffin, L.J., Bryson, K., Jones, D.T. 2000. The PSIPRED protein structure prediction server. Bioinformatics 16, 404–405.

    PubMed  Article  CAS  Google Scholar 

  22. [22]

    Meus, J., Brylinski, M., Piwowar, M., et al. 2006. A tabular approach to the sequence-to-structure relation in proteins (tetrapeptide representation) for de novo protein design. Med Sci Monit 12, BR208–214.

    PubMed  CAS  Google Scholar 

  23. [23]

    Mizianty, M.J., Kurgan, L. 2009. Modular prediction of protein structural classes from sequences of twilightzone identity with predicting sequences. BMC Bioinformatics 10, 414.

    PubMed  Article  PubMed Central  Google Scholar 

  24. [24]

    Prevelige Jr, P., Fasman, G.D. 1989. Chou-Fasman prediction of the secondary structure of proteins, in Prediction of Protein structure and the principles of protein conformation, G.D. Fasman, ed., Plenum Press, New York, pp. 391–416.

    Google Scholar 

  25. [25]

    Qi, Y., Liang, H., Han, X., Lai, L. 2012. Sequence Preference of α-Helix N-Terminal Tetrapeptide. Protein Pept Lett 345–352.

    Google Scholar 

  26. [26]

    Qin, Y.F., Wang, C.H., Yu, X.Q., Zhu, J., Liu, T.G., Zheng, X.Q. 2012. Predicting protein structural class by incorporating patterns of over-represented k-mers into the general form of Chou’s PseAAC. Protein Pept Lett 19, 388–397.

    PubMed  Article  CAS  Google Scholar 

  27. [27]

    Rackovsky, S. 1993. On the nature of protein folding code. Proc Natl Acad Sci USA 90, 644–648.

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  28. [28]

    Shafiullah, G.M., Al-Mamun, H.A. 2010. Protein strucutral class prediction using support vector machine. 6th International Conference on Electrical and Computer Engineering 179–182.

    Google Scholar 

  29. [29]

    Yang, J.Y., Peng, Z.L., Chen, X. 2010. Prediction of protein structural classes for low-homology sequences based on predicted secondary structure. BMC Bioinformatics 11, S9.

    Article  Google Scholar 

  30. [30]

    Yang, J.Y., Peng, Z.L., Yu, Z.G., Zhang, R.J., Anh, V., Wang, D. 2009. Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. J Theor Biol 257, 618–626.

    PubMed  Article  CAS  Google Scholar 

  31. [31]

    Yu, T., Sun, Z.B., Sang, J.P., Huang, S.Y., Zou, X.W. 2007. Structural class tendency of polypeptide: A new conception in predicting protein structural class. Physica A 386, 581–589.

    Article  CAS  Google Scholar 

  32. [32]

    Zhou, G.P. 1998. An intriguing controversy over protein structural class prediction. J Protein Chem 17, 729–738.

    PubMed  Article  CAS  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding authors

Correspondence to Hao Lin or Wei Chen.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ding, H., Lin, H., Chen, W. et al. Prediction of protein structural classes based on feature selection technique. Interdiscip Sci Comput Life Sci 6, 235–240 (2014). https://doi.org/10.1007/s12539-013-0205-6

Download citation

Key words

  • protein structural class
  • feature selection technique
  • support vector machine
  • tetrapeptide