Skip to main content

Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6023))

Abstract

Recent advancement in the pattern recognition field has driven many classification algorithms being implemented to tackle protein fold prediction problem. In this paper, a newly introduced method called Rotation Forest for building ensemble of classifiers based on bootstrap sampling and feature extraction is implemented and applied to challenge this problem. The Rotation Forest is a straight forward extension of bagging algorithms which aims to promote diversity within the ensemble through feature extraction by using Principle Component Analysis (PCA). We compare the performance of the employed method with other Meta classifiers that are based on boosting and bagging algorithms, such as: AdaBoost.M1, LogitBoost, Bagging and Random Forest. Experimental results show that the Rotation Forest enhanced the protein folding prediction accuracy better than the other applied Meta classifiers, as well as the previous works found in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rodríguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)

    Article  Google Scholar 

  2. Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  3. Freund, Y., Schapier, R.E.: A Short Introduction to Boosting. Journal of Japanese Society for Artificial Intelligence 14(5), 771–780 (1997)

    Google Scholar 

  4. Kuncheva, L.I., Rodríguez, J.J.: An Experimental Study on Rotation Forest Ensembles. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 459–468. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Stiglic, G., Kokol, P.: Effectiveness of Rotation Forest in Meta-learning Based Gene Expression Classification. In: Proceedings of Twentieth IEEE International Symposium on Computer-Based Medical Systems (2007) ISBN: 0-7695-2905-4

    Google Scholar 

  6. Friedman, J., Hastie, T., Tibshirani, R.: (Published version) Additive Logistic Regression: a Statistical View of Boosting Annals of Statistics 28(2), 337–407 (2001)

    Google Scholar 

  7. Breiman, L.: Random Forest. Machine learning. Kluwer Academic Publishers, Dordrecht (2001) ISSN: 0885-6125

    Google Scholar 

  8. Kecman, V., Yang, T.: Protein Fold Recognition with Adaptive Local Hyper plane Algorithm. In: IEEE Symposium Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2009 (2009); 4925710

    Google Scholar 

  9. Stanley, Y., Shi, M.P., Suganthan, N.: Multiclass protein fold recognition using multiobjective evolutionary algorithms. Computational Intelligence in Bioinformatics and Computational Biology (2004); 0-7803-8728-7

    Google Scholar 

  10. Okun, O.G.: Protein Fold Recognition with K-local Hyperplane Distance Nearest Neighbor Algorithm. In: Proceedings in the Second European Workshop on Data Mining and Text Mining in Bioinformatics, Pisa, Italy, pp. 51–57 (2004)

    Google Scholar 

  11. Chinnasamy, A., Sung, W.K., Mittal, A.: Protein structure and fold prediction using tree-augmented naive Bayesian classifier. Pacific Symposium on Biocomputing. In: Pacific Symposium on Biocomputing, vol. 9, pp. 387–398 (2004)

    Google Scholar 

  12. Nanni, L.: Ensemble of classifiers for protein fold recognition. In: New Issues in Neurocomputing: 13th European Symposium on Artificial Neural Networks, vol. 69, pp. 850–853 (2006)

    Google Scholar 

  13. Karplus, K.: SAM-T08, HMM-based protein structure prediction. Nucleic Acids Research 37(suppl. 2), W492–W497 (2009)

    Google Scholar 

  14. Bologna, G., Appel, R.D.: A comparison study on protein fold recognition. In: Proceedings of the Ninth International Conference on Neural Information Processing, November 2002, vol. 5, pp. 2492–2496 (2002)

    Google Scholar 

  15. Huang, C., Lin, C., Pal, N.: Hierarchical learning architecture with automatic feature selection for multi class protein fold classification. IEEE Transactions on Nano Bioscience 2(4), 221–232 (2003)

    Google Scholar 

  16. Lin, K.L., Li, C.Y., Huang, C.D., Chang, H.M., Yang, C.Y., Lin, C.T., Tang, C.Y., Hsu, D.F.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transactions on Nano Bioscience 6(2) (2008)

    Google Scholar 

  17. Lin, K.L., Lin, C.Y., Huang, C.D., Chang, H.M., Yang, C.Y., Lin, C.T., Tang, C.Y., Hsu, D.F.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transactions on Nano Bioscience (2007) ISSN: 1536–1241

    Google Scholar 

  18. Chen, C., Zhou, X.B., Tian, Y.X., Zou, X.Y., Cai, P.X.: Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal. Biochem. 357, 116–121 (2006)

    Article  Google Scholar 

  19. Lewis, D.P., Jebara, T., Noble, W.S.: Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure. Bioinformatics 22, 2753–2760 (2006)

    Article  Google Scholar 

  20. Zhang, S.W., Pan, Q., Zhang, H.C., Shao, Z.C., Shi, J.Y.: Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion. Amino Acids 30, 461–468 (2006)

    Article  Google Scholar 

  21. Zhou, X.B., Chen, C., Li, Z.C., Zou, X.Y.: Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine. Amino Acids 35, 383–388 (2008)

    Article  Google Scholar 

  22. Ding, C., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4), 349–358 (2001)

    Article  Google Scholar 

  23. Dubchak, I., Muchnik, I., Kim, S.K.: Protein folding class predictor for SCOP: approach based on global descriptors. In: 5th International Conference on Intelligent Systems for Molecular Biology, vol. 5, pp. 104–107 (1997)

    Google Scholar 

  24. Chung, I.F., Huang, C.D., Shen, Y.H., Lin, C.T.: Recognition of structure classification of protein folding by NN and SVM hierarchical learning architecture. In: Artificial Neural Networks and Neural Information Processing, pp. 1159–1167 (2003)

    Google Scholar 

  25. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006) ISBN-13: 978-0387-31073-2

    Google Scholar 

  26. Krishnaraj, Y., Reddy, C.K.: Boosting methods for Protein Fold Recognition: An Empirical Comparison. In: IEEE International Conference on Bioinformatics and Biomedicine (2008) ISBN: 978-0-7695-3452-7

    Google Scholar 

  27. Cai, Y.D., Feng, K.Y., Lu, W.C., Chou, K.C.: Using LogitBoost classifier to predict protein structural classes. Journal of Theoretical Biology 238, 172–176 (2006)

    Article  Google Scholar 

  28. Zhang, C.X., Zhang, J.S., Wang, J.W.: An empirical study of using Rotation Forest to improve regressors. Applied Mathematics and Computation 195, 618–629 (2007)

    Article  Google Scholar 

  29. Scholkopf, B., Smola, A., Muller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)

    Article  Google Scholar 

  30. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  31. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)

    Google Scholar 

  32. Lo Conte, L., Ailey, B., Hubbard, T.J.P., Braner, S.E., Murzin, A.G., Chothia, C.: SCOP a structural classification of proteins database 28(1), 257–259 (2000)

    Google Scholar 

  33. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology 247, 536–540 (1995)

    Google Scholar 

  34. Hobohm, U., Scharf, M., Schneider, R., Sander, C.: Selection of a representative set of structure from the Brookhaven Protein Bank protein. Science 1, 409–417 (1992)

    Google Scholar 

  35. Duwairi, R., Kassawneh, A.: A Framework for Predicting Proteins 3D Structures. In: Computer Systems and Applications, AICCSA 2008 (2008); 978-1-4244-1968

    Google Scholar 

  36. Chou, K.C., Zhang, C.T.: Prediction of protein structural classes, Critical Review. Biochem. Mol. Biol. 30(4), 275–349 (1995)

    Article  Google Scholar 

  37. Livingston, F.: Implementation of Breiman’s Random Forest Machine Learning Algorithm. ECE591Q Machine Learning Journal Paper (2005)

    Google Scholar 

  38. Hashemi, H.B., Shakery, A., Naeini, M.P.: Protein Fold Pattern Recognition Using Bayesian nsemble of RBF Neural Networks. In: International Conference of Soft Computing and Pattern Recognition, pp. 436–441 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dehzangi, A., Phon-Amnuaisuk, S., Manafi, M., Safa, S. (2010). Using Rotation Forest for Protein Fold Prediction Problem: An Empirical Study. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2010. Lecture Notes in Computer Science, vol 6023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12211-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12211-8_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12210-1

  • Online ISBN: 978-3-642-12211-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics