Feature Ranking for Protein Classification

  • Faouzi Mhamdi
  • Ricco Rakotomalala
  • Mourad Elloumi
Part of the Advances in Soft Computing book series (AINSC, volume 30)


In this paper, a knowledge discovery framework is used for protein classification. The processing is achieved in three steps: feature extraction, feature ranking and feature selection. Inspirited from text mining results for the first step, we use n-grams descriptors; descriptors are ranked from chi-2 statistical indices in the second step; and in the final step, the subset of descriptors is selected which will minimize the prediction error rate using a k-nearest neighbor classifier. Experiments show that this framework gives good results: the dimensionality reduction is effective and increases the classifier performances.


Feature Selection Text Mining Feature Ranking Protein Classification Estimate Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fayyad UM, Shapiro G, Smyth P (1996) From data mining to knowledge discovery: An overview, Advances in Knowledge Discovery and Data Mining. AAAI Press and the MIT Press, Chapter 1: 1–34Google Scholar
  2. 2.
    Sebastiani F (2002) Machine learning in automated text categorisation. In ACM Surveys, 34(1): 1–47CrossRefGoogle Scholar
  3. 3.
    Mhamdi F, Elloumi M, Rakotomalala R (2004) Textmining, features selection and datamining for proteins classification. In IEEE/ICTTA’04, Damascus, SyriaGoogle Scholar
  4. 4.
    Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statical Learning: Datamining, Inference, and Prediction, Springer-VerlagGoogle Scholar
  5. 5.
    Lefébure R, Venturi G, (2001) Data mining: Gestion de la relation client personnalisation de sites web, EyrollesGoogle Scholar
  6. 6.
    Molina LC, Belanche L, Nebot A (2002) Feature Selection Algorithms: A Survey and Experimental Evaluation, In ICDM’02, Maebashi City, JapanGoogle Scholar
  7. 7.
    Duch W, Wieczorek T, Biesiada J, Blachnik M (2004) Comparison of feature ranking methods based on information entropy Proc. of International Joint Conference on Neural Networks (IJCNN), Budapest, IEEE Press: 1415–1420Google Scholar
  8. 8.
    Isabelle G, André E (2003) An introduction to variable and feature selection. Journal of Machine Learning Research 3: 1157–1182zbMATHCrossRefGoogle Scholar
  9. 9.
    Murzin GA, Brenner ES, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Bio.. 247: 536–540CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Faouzi Mhamdi
    • 1
  • Ricco Rakotomalala
    • 2
  • Mourad Elloumi
    • 1
  1. 1.Faculty of Sciences of TunisURPAHTunisia
  2. 2.ERICUniversity of Lyon 2LyonFrance

Personalised recommendations