Gene Ranking from Microarray Data for Cancer Classification–A Machine Learning Approach
Traditional gene selection methods often select the top–ranked genes according to their individual discriminative power. We propose to apply feature evaluation measure broadly used in the machine learning field and not so popular in the DNA microarray field. Besides, the application of sequential gene subset selection approaches is included. In our study, we propose some well-known criteria (filters and wrappers) to rank attributes, and a greedy search procedure combined with three subset evaluation measures. Two completely different machine learning classifiers are applied to perform the class prediction. The comparison is performed on two well–known DNA microarray data sets. We notice that most of the top-ranked genes appear in the list of relevant–informative genes detected by previous studies over these data sets.
KeywordsFeature Selection Feature Subset Gene Ranking Gene Subset Feature Subset Selection
Unable to display preview. Download preview PDF.
- 3.Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 98–109 (2000)Google Scholar
- 7.Hall, M.: Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Dept Computer Science, Hamilton, New Zealand (1999)Google Scholar
- 8.Hellem, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology 3(4), 0017.1–0017.11 (2002)Google Scholar
- 10.Kononenko, I.: Estimating attributes: Analysis and estensions of relief. In: European Conf. on Machine Learning, Vienna, pp. 171–182. Springer, Heidelberg (1994)Google Scholar
- 11.Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: 7th IEEE Int. Conf. on Tools with Artificial Intelligence (1995)Google Scholar
- 13.Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
- 15.Witten, I., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2005)Google Scholar
- 16.Xing, E., Jordan, M., Karp, R.: Feature selection for high-dimensional genomic microarray data. In: Proc. 18th Int. Conf. on Machine Learning, pp. 601–608. Morgan Kaufmann, San Francisco (2001)Google Scholar
- 17.Xiong, M., Jin, L., Li, W., Boerwinkle, E.: Computatinal methods for gene expression-based tumor classification. BioTechniques 29, 1264–1270 (2000)Google Scholar
- 18.Yu, L., Liu, H.: Redundancy based feature selection for microarry data. In: 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2004)Google Scholar