Neighborhood Rough Set Model Based Gene Selection for Multi-subtype Tumor Classification

  • Shulin Wang
  • Xueling Li
  • Shanwen Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5226)


Multi-subtype tumor diagnosis based on gene expression profiles is promising in clinical medicine application. Therefore, a great deal of research on tumor classification based on gene expression profiles has been developed, where various machine learning approaches were applied to constructing the best tumor classification model to improve the classification performance as much as possible. To achieve this goal, extracting features or finding informative genes that have good classification ability is crucial. We propose a novel gene selection approach, which adopts Kruskal-Wallis rank sum test to rank all genes and then apply an algorithm based on neighborhood rough set model to gene reduction to obtain gene subsets with fewer genes and more classification ability. Experiments on a small round blue cell tumor (SRBCT) dataset show that our approach can achieve very high classification accuracy with only three or four genes as evaluated by three classifiers: support vector machines, K-nearest neighbor and neighborhood classifier, respectively.


tumor classification gene expression profiles support vector machines neighborhood classifier K-nearest neighbor 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fu, L.M., Fu-Liu, C.S.: Multi-class Cancer Subtype Classification Based on Gene Expression Signatures with Reliablity Analysis. FEBS Lett. 561, 186–190 (2004)CrossRefGoogle Scholar
  2. 2.
    Fung, B.Y.M., Vincent, T.Y.N.: Meta-classification of Multi-type Cancer Gene Expression Data. BIOKDD, 31–39 (2004)Google Scholar
  3. 3.
    Chen, D.C., Liu, Z.Q., Ma, X.B., Hua, D.: Selecting Genes by Test Statistics. J. Biomed. Biotechnol. 2, 132–138 (2005)CrossRefGoogle Scholar
  4. 4.
    Furey, T.S., Christianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Hauessler, D.: Support Vector Machine Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinform. 16(10), 906–914 (2000)CrossRefGoogle Scholar
  5. 5.
    Xiong, M.M., Li, W.J., Zhao, J.Y., Li, J., Boerwinkle, E.: Feature (Gene) Selection in Gene Expression-based Tumor Classification. Mol. Genet. Metab. 73, 239–247 (2001)CrossRefGoogle Scholar
  6. 6.
    Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved Gene Selection for Classification of Microarrays. In: Pacific Symposium on Biocomputing, vol. 8, pp. 53–64 (2003)Google Scholar
  7. 7.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  8. 8.
    Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue Classification with Gene Expression Profiles. J. Comput. Biol. 7(3-4), 559–584 (2000)CrossRefGoogle Scholar
  9. 9.
    Deng, L., Ma, J.W., Pei, J.: Rank Sum Method for Related Gene Selection and Its Application to Tumor Diagnosis. Chinese Sci. Bull. 49(15), 1652–1657 (2004)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Xiong, M.M., Fang, X.Z., Zhao, J.Y.: Biomarker Identification by Feature Wrappers. Genome Research 11(11), 1878–1887 (2001)Google Scholar
  11. 11.
    Hu, Q.H., Yu, D.R., Xie, Z.X.: Neighborhood Classifiers. Expert Syst. Appl. 34(2), 866–876 (2008)CrossRefGoogle Scholar
  12. 12.
    Hu, Q.H., Yu, D.R., Xie, Z.X.: Numerical Attribute Reduction Based on Neighborhood Granulation and Rough Approximation. J. Software 19(3), 640–649 (2008)CrossRefGoogle Scholar
  13. 13.
    Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)Google Scholar
  14. 14.
    Lehmann, E.L.: Non-parametrics: Statistical Methods Based on Ranks. Holden-Day, San Francisco (1975)Google Scholar
  15. 15.
    Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometr. 1, 80–83 (1945)Google Scholar
  16. 16.
    Kruskal, W.H., Wallis, W.A.: Use of Ranks in One-criterion Variance Analysis. J. Amer. Statist. Assoc. 47(260), 583–621 (1952)MATHCrossRefGoogle Scholar
  17. 17.
    Deng, L., Pei, J., Ma, J.W., Lee, D.L.: A Rank Sum Test Method for Informative Gene discovery. In: KDD 2004, Seattle, USA, pp. 410–419 (2004)Google Scholar
  18. 18.
    Wang, S.L., Chen, H.W., Li, F.R., Zhang, D.X.: Gene Selection with Rough Sets for the Molecular Diagnosing of Tumor Based on Support Vector Machines. In: International Computer Symposium, Taiwan, pp. 1368–1373 (2006)Google Scholar
  19. 19.
    Vapnik, V.N.: Statistical Learning Theory. Springer, New York (1998)MATHGoogle Scholar
  20. 20.
    Dasarathy, B.: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)Google Scholar
  21. 21.
    Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C.R., Peterson, C., Meltzer, P.S.: Classification and Diagnostic Prediction of Cancers Using Gene Expression Profiling and Artificial Neural Networks. Nature Medicine 7(6), 673–679 (2001)CrossRefGoogle Scholar
  22. 22.
    Deutsch, J.M.: Evolutionary Algorithms for Finding Optimal Gene Sets in Microarray Prediction. Bioinform. 19(1), 45–52 (2003)CrossRefGoogle Scholar
  23. 23.
    Wang, L.P., Chu, F., Xie, W.: Accurate Cancer Classification Using Expressions of Very Few Genes. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(1), 40–53 (2007)CrossRefMathSciNetGoogle Scholar
  24. 24.
    Lee, Y., Lee, C.K.: Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data. Bioinform. 19(9), 1132–1139 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Shulin Wang
    • 1
    • 2
  • Xueling Li
    • 1
  • Shanwen Zhang
    • 1
  1. 1.Hefei Institute of Intelligent MachinesChinese Academy of SciencesHeifeiChina
  2. 2.School of Computer and CommunicationHunan UniversityChangshaChina

Personalised recommendations