Boost Feature Subset Selection: A New Gene Selection Algorithm for Microarray Dataset

  • Xian Xu
  • Aidong Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3992)


Gene selection is usually the crucial first step in microarray data analysis. One class of typical approaches is to calculate some discriminative scores using data associated with a single gene. Such discriminative scores are then sorted and top ranked genes are selected for further analysis. However, such an approach will result in redundant gene set since it ignores the complex relationships between genes. Recent researches in feature subset selection began to tackle this problem by limiting the correlations of the selected feature set. In this paper, we propose a novel general framework BFSS: Boost Feature Subset Selection to improve the performance of single-gene based discriminative scores using bootstrapping techniques. Features are selected from dynamically adjusted bootstraps of the training dataset. We tested our algorithm on three well-known publicly available microarray data sets in the bioinformatics community. Encouraging results are reported in this paper.


Selection Algorithm Microarray Dataset Probability Table Discriminative Score Feature Subset Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96(12), 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles, vol. 7, pp. 559–583 (2000)Google Scholar
  3. 3.
    Bø, T.H., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology 3(4), research0017.1–0017.11 (2002)CrossRefGoogle Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning (1996)Google Scholar
  5. 5.
    Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proc. ICML 1996 (1996)Google Scholar
  6. 6.
    Golub, T.R., et al.: Molecular classifications of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  7. 7.
    Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved gene selection for classification of microarrays. In: Proc. PSB (2003)Google Scholar
  8. 8.
    Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on pattern analysis and machine intelligence 22(1), 4–37 (2000)CrossRefGoogle Scholar
  9. 9.
    Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)CrossRefGoogle Scholar
  10. 10.
    Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.R.: Multiclass cancer diagnosis using tumor gene expression signatures. PNAS 98(26), 15149–15154 (2001)CrossRefGoogle Scholar
  11. 11.
    Wang, Y., Makedon, F.S., Ford, J.C., Pearlman, J.: Hykgene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21(8), 1530–1537 (2005)CrossRefGoogle Scholar
  12. 12.
    Wu, Y., Zhang, A.: Feature selection for classifying high-dimensional numerical data. In: IEEE Conference on Computer Vision and Pattern Recognition 2004, vol. 2, pp. 251–258 (2004)Google Scholar
  13. 13.
    Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proc. 18th International Conf. on Machine Learning, pp. 601–608. Morgan Kaufmann, San Francisco (2001)Google Scholar
  14. 14.
    Xu, X., Zhang, A.: Virtual gene: Using correlations between genes to select informative genes on microarray datasets. In: Priami, C., Zelikovsky, A. (eds.) Transactions on Computational Systems Biology II. LNCS (LNBI), vol. 3680, pp. 138–152. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proc. of SIGKDD (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xian Xu
    • 1
  • Aidong Zhang
    • 1
  1. 1.State University of New York at BuffaloBuffaloUSA

Personalised recommendations