Feature Ranking for Protein Classification
In this paper, a knowledge discovery framework is used for protein classification. The processing is achieved in three steps: feature extraction, feature ranking and feature selection. Inspirited from text mining results for the first step, we use n-grams descriptors; descriptors are ranked from chi-2 statistical indices in the second step; and in the final step, the subset of descriptors is selected which will minimize the prediction error rate using a k-nearest neighbor classifier. Experiments show that this framework gives good results: the dimensionality reduction is effective and increases the classifier performances.
KeywordsFeature Selection Text Mining Feature Ranking Protein Classification Estimate Error Rate
Unable to display preview. Download preview PDF.
- 1.Fayyad UM, Shapiro G, Smyth P (1996) From data mining to knowledge discovery: An overview, Advances in Knowledge Discovery and Data Mining. AAAI Press and the MIT Press, Chapter 1: 1–34Google Scholar
- 3.Mhamdi F, Elloumi M, Rakotomalala R (2004) Textmining, features selection and datamining for proteins classification. In IEEE/ICTTA’04, Damascus, SyriaGoogle Scholar
- 4.Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statical Learning: Datamining, Inference, and Prediction, Springer-VerlagGoogle Scholar
- 5.Lefébure R, Venturi G, (2001) Data mining: Gestion de la relation client personnalisation de sites web, EyrollesGoogle Scholar
- 6.Molina LC, Belanche L, Nebot A (2002) Feature Selection Algorithms: A Survey and Experimental Evaluation, In ICDM’02, Maebashi City, JapanGoogle Scholar
- 7.Duch W, Wieczorek T, Biesiada J, Blachnik M (2004) Comparison of feature ranking methods based on information entropy Proc. of International Joint Conference on Neural Networks (IJCNN), Budapest, IEEE Press: 1415–1420Google Scholar