Gene selection in class space for molecular classification of cancer

  • Zhang Junying 
  • Yue Joseph Wang
  • Javed Khan
  • Robert Clarke


Gene selection (feature selection) is generally performed in gene space (feature space), where a very serious curse of dimensionality problem always exists because the number of genes is much larger than the number of samples in gene space (G-space). This results in difficulty in modeling the data set in this space and the low confidence of the result of gene selection. How to find a gene subset in this case is a challenging subject. In this paper, the above G-space is transformed into its dual space, referred to as class space (C-space) such that the number of dimensions is the very number of classes of the samples in G-space and the number of samples in C-space is the number of genes in G-space. It is obvious that the curse of dimensionality in C-space does not exist. A new gene selection method which is based on the principle of separating different classes as far as possible is presented with the help of Principal Component Analysis (PCA). The experimental results on gene selection for real data set are evaluated with Fisher criterion, weighted Fisher criterion as well as leave-one-out cross validation, showing that the method presented here is effective and efficient.


feature space (gene space) class space feature selection (gene selection) PCA 


  1. 1.
    Anil, K., Robert, P. R., Mar, J. C., Statistical pattern recognition: a review, IEEE Trans. on Pattern Analysis and Machine Intelligence, 2000, 22(1): 4–37.CrossRefGoogle Scholar
  2. 2.
    Trunk, G. V., A problem of dimensionality: A simple example, IEEE Trans. on Pattern Analysis and Machine Intelligence, 1979, 1(3): 306–307.CrossRefGoogle Scholar
  3. 3.
    Herrero, J., Valencia, A., Dopazo, J., A Hierarchical unsupervised growing neural network for clustering gene expression patterns, Bioinformatics, 2001, 17(2): 126–136.CrossRefGoogle Scholar
  4. 4.
    Guyon, I., Weston, J., Barnhill, S. et al., Gene selection for cancer classification using support vector machines, Machine Learning, 2002, 46(3): 389–422.MATHCrossRefGoogle Scholar
  5. 5.
    Xiong Momiao, Fang Xiangzhong, Zhao Jinying, Biomarker identification by feature wrappers, Genome Research, (see, 2001, 11: 1878–1887.Google Scholar
  6. 6.
    Kaykin, S., Neural Networks: A Comprehensive Foundation, 2nd edition, New York: Prentice Hall Inc., 1999.Google Scholar
  7. 7.
    Choi, K., Input feature selection for classification problems, IEEE Trans. on Neural Networks, 2002, 13(1): 143–159.CrossRefGoogle Scholar
  8. 8.
    Wang, Y., Zhang, J., Huang, K. et al., Proc. IEEE Intl. Symp. Biomed. Imaging, July 7–10, Washington, DC, 2002, 457–460.Google Scholar
  9. 9.
    Dudoit, S., Fridlyand, J., P. Speed, T., Comparison of discrimination methods for the classification of tumors using gene expression data, Technical Report #576, University of California, Berkeley, June 2000.Google Scholar
  10. 10.
    Wang, Y., Zhang, J., Wang, Z. et al., Gene selection by machine learning in microarray studies, Technical Report-CUA01-018, The Catholic University of America, November, 2001.Google Scholar
  11. 11.
    Loog, M., Duin, R.P.W., Haeb-Umbach, R., Multiclass linear dimension reduction by weighted pairwise fisher criteria, IEEE Trans. Pattern Analysis and Machine Intelligence, 201, 23(7): 762–766.Google Scholar

Copyright information

© Science in China Press 2004

Authors and Affiliations

  • Zhang Junying 
    • 1
    • 2
  • Yue Joseph Wang
    • 2
  • Javed Khan
    • 3
  • Robert Clarke
    • 4
  1. 1.National Key Laboratory of Radar Signal Processing & School of Computer Science and EngineeringXidian UniversitySi’anChina
  2. 2.Department of Electrical Engineering & Computer EngineerVirginia Polytechnic Institute and State UniversityAlexandriaUSA
  3. 3.National Human Genome Research InstituteNational Institutes of HealthBethesdaUSA
  4. 4.Lombardi Cancer CenterGeorgetown UniversityWashington, DCUSA

Personalised recommendations