Feature Selection and Classification for Small Gene Sets
- Cite this paper as:
- Stiglic G., Rodriguez J.J., Kokol P. (2008) Feature Selection and Classification for Small Gene Sets. In: Chetty M., Ngom A., Ahmad S. (eds) Pattern Recognition in Bioinformatics. PRIB 2008. Lecture Notes in Computer Science, vol 5265. Springer, Berlin, Heidelberg
Random Forests, Support Vector Machines and k-Nearest Neighbors are successful and proven classification techniques that are widely used for different kinds of classification problems. One of them is classification of genomic and proteomic data that is known as a problem with extremely high dimensionality and therefore demands suited classification techniques. In this domain they are usually combined with gene selection techniques to provide optimal classification accuracy rates. Another reason for reducing the dimensionality of such datasets is their interpretability. It is much easier to interpret a small set of ranked genes than 20 or 30 thousands of unordered genes. In this paper we present a classification ensemble of decision trees called Rotation Forest and evaluate its classification performance on small subsets of ranked genes for 14 genomic and proteomic classification problems. An important feature of Rotation Forest is demonstrated – i.e. robustness and high classification accuracy using small sets of genes.
KeywordsGene expression analysis machine learning feature selection ensemble of classifiers
Unable to display preview. Download preview PDF.