Feature Subset Selection for Protein Subcellular Localization Prediction
Most of the existing methods for protein subcellular localization prediction are based on a large number of features that are considered to be potentially useful for determining protein subcellular localizations. However, predictors with large numbers of input variables usually suffer from the curse of dimensionality as well as the risk of overfitting. Using only those features that are relevant for protein subcellular localization might improve the prediction performance and might also provide us with some biologically useful knowledge. In this paper, we present a feature ranking based feature subset selection approach for subcellular localization prediction of proteins in the context of support vector machines (SVMs). Experimental results show that this method improves the prediction performance with selected subsets of features. It is anticipated that the proposed method will be a powerful tool for large-scale annotation of biological data.
KeywordsSupport Vector Machine Feature Selection Prediction Performance Location Accuracy Total Accuracy
Unable to display preview. Download preview PDF.
- 30.Degroeve, S., Baets, B.D., de Peer, Y.V., Rouze, P.: Feature Subset Selection for Splice Site Prediction. Bioinformatics 18, S75–S83 (2002)Google Scholar
- 33.Yang, M.Q., Yang, J.K., Zhang, Y.Z.: Extracting Features from Primary Structure to Enhance Structural and Functional Prediction. In: RECOMB (2005)Google Scholar
- 36.ScholkÖpf, B., Burges, C., Vapnik, V.: Extracting Support Data for a Given Task. In: Proc. First Int. Conf. KDDM, AAAI Press, Menlo Park (1995)Google Scholar
- 38.Chang, C.C., Lin, C.J.: LIBSVM: a Library for Support Vector Machines (2001), Software is available at: http://www.csie.ntu.edu.tw/~cjlin/libsvm