A Method for Feature Selection on Microarray Data Using Support Vector Machine

  • Xiao Bing Huang
  • Jian Tang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4081)


The data collected from a typical microarray experiment usually consists of tens of samples and thousands of genes (i.e., features). Usually only a small subset of features is relevant and non-redundant to differentiate the samples. Identifying an optimal subset of relevant genes is crucial for accurate classification of samples. In this paper, we propose a method for relevant gene subset selection for microarray gene expression data. Our method is based on gap tolerant classifier, a variation of support vector machine, and uses a hill-climbing search strategy. Unlike most other hill-climbing approaches, where classification accuracies are used as a criterion for feature selection, the proposed method uses a mixture of accuracy and SVM margin to select features. Our experimental results show that this strategy is effective both in selecting relevant and in eliminating redundant features.


Support Vector Machine Feature Selection Microarray Data Chronic Myeloid Leukemia Acute Myelogenous Leukemia 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the Ninth National Conference on Artificial Intelligence, vol. 2, pp. 547–552 (1991)Google Scholar
  2. 2.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)CrossRefGoogle Scholar
  3. 3.
    Dosil, M., Alvarez-Fernandez, L., Gomez-Marquez, J.: Differentiation-linked expression of prothymosin alpha gene in human myeloid leukemic cells. Experimental Cell Research 204(1), 94–101 (1993)CrossRefGoogle Scholar
  4. 4.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  5. 5.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)zbMATHCrossRefGoogle Scholar
  6. 6.
    Huang, K.-B., Cho, D.-Y., Park, S.-W., Kim, S.-D., Zhang, B.-T.: Applying machine learning techniques to analysis of gene expression data: cancer diagnosis. In: Methods of Microaray Data Analysis (2001)Google Scholar
  7. 7.
    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 121–129 (1994)Google Scholar
  8. 8.
    Kira, K., Rendell, L.: The feature selection problem: Traditional methods and a new algorithm. In: Tenth National Conference on Artificial Intelligence, pp. 129–134 (1992)Google Scholar
  9. 9.
    Kira, K., Rendell, L.: A practical approach to feature selection. In: Nineth International Conference on Machine Learning (1992)Google Scholar
  10. 10.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Journal of Artificial Intelligence Research 97(1-2), 273–324 (1997)zbMATHCrossRefGoogle Scholar
  11. 11.
    Mao, K.Z.: Feature subset selection for support vector machines through discriminative function pruning analysis. IEEE Transactions on Systems, Man and Cybernetics, Part B 34(1), 60–67 (2004)CrossRefGoogle Scholar
  12. 12.
    Reddy, P., Teshima, T., Hildebrandt, G., Duffner, U., Maeda, Y., Cooke, K., Ferrara, J.: Interleukin 18 preserves a perforin-dependent graft-versus-leukemia effect after allogeneic bone marrow transplantation. Blood 100(9), 3429–3431 (2002)CrossRefGoogle Scholar
  13. 13.
    Ross, J., Oeffinger, K., Davies, S., Mertens, A., Langer, E., Kiffmeyer, W., Sklar, C., Stovall, M., Yasui, Y., Robison, L.: Genetic variation in the leptin receptor gene and obesity in survivors of childhood acute lymphoblastic leukemia: a report from the childhood cander survivor study. Journal of clinical Ontology 22(17), 3558–3562 (2004)CrossRefGoogle Scholar
  14. 14.
    Shilatifard, A., Duandagger, D., Haque, D., Florence, C., Schubach, E., Conaway, J., Conaway, R.: Ell2, a new member of an ell family of rna polymerase ii elongation factors. Proceedings of Natural Academic Science 94, 3639–3643 (1997)CrossRefGoogle Scholar
  15. 15.
    Sindhwani, V., Rakshit, S., Deodhare, D., Erdogmus, D., Principe, J.C., Niyogi, P.: Feature selection in mlps and svms based on maximum output information. IEEE Transactions on Neural Networks 15(4), 937–948 (2004)CrossRefGoogle Scholar
  16. 16.
    Sjolinder, M., Stenke, L., Glaser, B., Widell, S., Doucet, J., Jakobsson, P., Lindgren, J.: Aberrant expression of active leukotriene c4 synthase in cd16+ neutrophils from patients with chronic myeloid leukemia. Blood 95(4), 1456–1464 (2000)Google Scholar
  17. 17.
    Thorsteinsdottir, U., Krosl, J., Kroon, E., Haman, A., Hoang, T., Sauvageau, G.: The oncoprotein e2a-pbx1a collaborates with hoxa9 to acutely transform primary bone marrow cells. Molecular Cell Biology 19(9), 6355–6366 (1999)Google Scholar
  18. 18.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)zbMATHGoogle Scholar
  19. 19.
    Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)zbMATHGoogle Scholar
  20. 20.
    Wang, D.F., Chan, P.P.K., Yeung, D.S., Tsang, E.C.C.: Feature subset selection for support vector machines through sensitivity analysis. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics, vol. 7, pp. 4257–4262 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Xiao Bing Huang
    • 1
  • Jian Tang
    • 1
  1. 1.Computer Science DepartmentMemorial University of NewfoundlandSt. John’s, NLCanada

Personalised recommendations