Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification
Multi-subtype tumor diagnosis based on gene expression profiles is promising in clinical medicine application. Therefore, a great deal of research on tumor classification based on gene expression profiles has been developed, where various machine learning approaches were applied to constructing the best tumor classification model to improve the classification performance as much as possible. To achieve this goal, extracting features or finding informative genes that have good classification ability is crucial. We propose a novel gene selection approach, which adopts Kruskal-Wallis rank sum test to rank all genes and then apply an algorithm based on neighborhood rough set model to gene reduction to obtain gene subsets with fewer genes and more classification ability. Experiments on a small round blue cell tumor (SRBCT) dataset show that our approach can achieve very high classification accuracy with only three or four genes as evaluated by three classifiers: support vector machines, K-nearest neighbor and neighborhood classifier, respectively.
Keywordstumor classification gene expression profiles support vector machines neighborhood classifier K-nearest neighbor
Unable to display preview. Download preview PDF.
- 2.Fung, B.Y.M., Vincent, T.Y.N.: Meta-classification of Multi-type Cancer Gene Expression Data. BIOKDD, 31–39 (2004)Google Scholar
- 6.Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved Gene Selection for Classification of Microarrays. In: Pacific Symposium on Biocomputing, vol. 8, pp. 53–64 (2003)Google Scholar
- 7.Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
- 10.Xiong, M.M., Fang, X.Z., Zhao, J.Y.: Biomarker Identification by Feature Wrappers. Genome Research 11(11), 1878–1887 (2001)Google Scholar
- 13.Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)Google Scholar
- 14.Lehmann, E.L.: Non-parametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco (1975)Google Scholar
- 15.Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometr. 1, 80–83 (1945)Google Scholar
- 17.Deng, L., Pei, J., Ma, J.W., Lee, D.L.: A Rank Sum Test Method for Informative Gene discovery. In: KDD 2004, Seattle, USA, pp. 410–419 (2004)Google Scholar
- 18.Wang, S.L., Chen, H.W., Li, F.R., Zhang, D.X.: Gene Selection with Rough Sets for the Molecular Diagnosing of Tumor Based on Support Vector Machines. In: International Computer Symposium, Taiwan, pp. 1368–1373 (2006)Google Scholar
- 20.Dasarathy, B.: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)Google Scholar
- 21.Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks. Nature Medicine 7(6), 673–679 (2001)CrossRefGoogle Scholar