Advertisement

An Effective Feature Selection Algorithm Based on the Class Similarity Used with a SVM-RDA Classifier to Protein Fold Recognition

  • Wiesław Chmielnicki
  • Katarzyna Sta̧por
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6679)

Abstract

Feature selection is very important procedure in many pattern recognition problems. It is effective in reducing dimensionality, removing irrelevant data, and increasing accuracy of a classifier. In our previous work we propose a classifier combining the support vector machine (SVM) classifier with regularized discriminant analysis (RDA) classifier used to protein fold recognition problem. However high dimensionality of the feature vectors and small number of samples in the training data set caused that the problem is ill-posed for an RDA classifier and the feature selection is crucible for the accuracy of the classifier. In this paper we propose a simple and effective algorithm based on the class similarity which solves our problem and helps us to achieve very good acuracy on a real-world data set.

Keywords

Feature Selection Support Vectore Machine Statistical classifiers RDA classifier protein fold recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baldi, P., Brunak, S., Chauvin, Y., Andersen, C., Nielsen, H.: Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000)CrossRefGoogle Scholar
  2. 2.
    Bologna, G., Appel, R.D.: A comparison study on protein fold recognition. In: Proceedings of the 9th ICONIP, Singapore, vol. 5, pp. 2492–2496 (2002)Google Scholar
  3. 3.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. Software (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
  4. 4.
    Chmielnicki, W., Sta̧por, K.: Protein Fold Recognition with Combined SVM-RDA Classifier. In: Graña Romay, M., Corchado, E., Garcia Sebastian, M.T. (eds.) HAIS 2010. LNCS, vol. 6076, pp. 162–169. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)zbMATHGoogle Scholar
  6. 6.
    Ding, C.H., Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17, 349–358 (2001)CrossRefGoogle Scholar
  7. 7.
    Dubchak, I., Muchnik, I., Kim, S.H.: Protein folding class predictor for SCOP: approach based on global descriptors. In: Proceedings ISMB (1997)Google Scholar
  8. 8.
    Friedman, J.H.: Regularized Discriminant Analysis. Journal of the American Statistical Association 84(405), 165–175 (1989)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, New York (1990)zbMATHGoogle Scholar
  10. 10.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)zbMATHGoogle Scholar
  11. 11.
    Haindl, M., Somol, P., Ververidis, D., Kotropoulos, C.: Feature Selection Based on Mutual Correlation. In: Proceedings of Progress in Pattern Recognition, Image Analysis and Application, vol. 4225, pp. 569–577 (2006)Google Scholar
  12. 12.
    Hobohm, U., Sander, C.: Enlarged representative set of Proteins. Protein Sci. 3, 522–524 (1994)CrossRefGoogle Scholar
  13. 13.
    Hobohm, U., Scharf, M., Schneider, R., Sander, C.: Selection of a representative set of structures from the Brookhaven Protein Bank. Protein Sci. 1, 409–417 (1992)CrossRefGoogle Scholar
  14. 14.
    Lai, C., Reinders, M.J., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recognition Letters 27(10), 1067–1076 (2006)CrossRefGoogle Scholar
  15. 15.
    Liu, C.L., Fujisawa, H.: Classification and Learning for Character Recognition: Comparison of Methods and Remaining Problems. In: Proc. Int. Workshop on Neural Networks and Learning in Document Analysis and Recognition, Seoul, Korea (2005)Google Scholar
  16. 16.
    Lo Conte, L., Ailey, B., Hubbard, T.J.P., Brenner, S.E., Murzin, A.G., Chotchia, C.: SCOP: A structural classification of protein database. Nucleic Acids Res. 28, 257–259 (2000)CrossRefGoogle Scholar
  17. 17.
    Nanni, L.: A novel ensemble of classifiers for protein fold recognition. Neurocomputing 69, 2434–2437 (2006)CrossRefGoogle Scholar
  18. 18.
    Okun, O.: Protein fold recognition with k-local hyperplane distance nearest neighbor algorithm. In: Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics, Pisa, Italy, September 24, pp. 51–57 (2004)Google Scholar
  19. 19.
    Pal, N.R., Chakraborty, D.: Some new features for protein fold recognition. In: Artificial Neural Networks and Neural Information Processing ICANN/ICONIP, Turkey, Istanbul, June 26–29, vol. 2714, pp. 1176–1183 (2003)Google Scholar
  20. 20.
    Shen, H.B., Chou, K.C.: Ensemble classifier for protein fold pattern recognition. Bioinformatics 22, 1717–1722 (2006)CrossRefGoogle Scholar
  21. 21.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Wiesław Chmielnicki
    • 1
  • Katarzyna Sta̧por
    • 2
  1. 1.Faculty of Physics, Astronomy and Applied Computer ScienceJagiellonian UniversityPoland
  2. 2.Institute of Computer ScienceSilesian University of TechnologyPoland

Personalised recommendations