Machine Learning

, Volume 46, Issue 1–3, pp 389–422 | Cite as

Gene Selection for Cancer Classification using Support Vector Machines

  • Isabelle Guyon
  • Jason Weston
  • Stephen Barnhill
  • Vladimir Vapnik


DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinctive signatures of gene expression over normal tissues or other types of cancer tissues.

In this paper, we address the problem of selection of a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays. Using available training examples from cancer and normal patients, we build a classifier suitable for genetic diagnosis, as well as drug discovery. Previous attempts to address this problem select genes with correlation techniques. We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE). We demonstrate experimentally that the genes selected by our techniques yield better classification performance and are biologically relevant to cancer.

In contrast with the baseline method, our method eliminates gene redundancy automatically and yields better and more compact gene subsets. In patients with leukemia our method discovered 2 genes that yield zero leave-one-out error, while 64 genes are necessary for the baseline method to get the best result (one leave-one-out error). In the colon cancer database, using only 4 genes our method is 98% accurate, while the baseline method is only 86% accurate.

diagnosis diagnostic tests drug discovery RNA expression genomics gene selection DNA micro-array proteomics cancer classification feature selection support vector machines recursive feature elimination 


  1. Aerts, H. (1996). Chitotriosidase-New biochemical marker. Gauchers News.Google Scholar
  2. Alizadeh, A. et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:3, 503-511.Google Scholar
  3. Alon, U. et al. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays. PNAS, 96, 6745-6750, Cell Biology. The data is available on-line at Scholar
  4. Aronson, N. (1999). Remodeling the mammary GI and at the termination of breast feeding: Role of a new regulator protein BRP39. The Beat, University of South Alabama College of Medecine, July, 1999.Google Scholar
  5. Ben Hur, A., Horn, D., Siegelman, H., & Vapnik, V. (2000). A support vector method for clustering. Advances in Neural Information Processing Systems 13, Cambridge, MA: MIT Press.Google Scholar
  6. Blum, A. & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245-271.Google Scholar
  7. Boser, B., Guyon, I., & Vapnik, V. (1992). An training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory (pp. 144-152). Pittsburgh: ACM.Google Scholar
  8. Bradley, P. & Mangasarian, O. (1998). Feature selection via concave minimization and support vector machines. In Proceedings of the 13th International Conference on Machine Learning (pp. 82-90). San Francisco, CA.Google Scholar
  9. Bradley, P., Mangasarian, O., & Street, W. (1998). Feature selection via mathematical programming. Technical Report. INFORMS Journal on Computing, 10, 209-217.Google Scholar
  10. Bredensteiner, E. & Bennett, K. (1999). Multicategory classification for support vector machines. Computational Optimizations and Applications, 12, 53-79.Google Scholar
  11. Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M., Jr., & Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines.Google Scholar
  12. Chapelle, O., Vapnik, V., Bousquet, O., & Mukherjee, S. (2000). Choosing kernel parameters for support vector machines. AT &T Labs Technical Report.Google Scholar
  13. Cortes, C. & Vapnik, V. (1995). Support vector networks. Machine Learning, 20:3, 273-297.Google Scholar
  14. Cristianini, N. & Shawe-Taylor, J. (1999). An introduction to support vector machines. Cambridge,MA: Cambridge University Press.Google Scholar
  15. Duda, R. O. & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley.Google Scholar
  16. Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. PNAS, 95, 14863-14868.Google Scholar
  17. Fodor, S. A. (1997). Massively parallel genomics. Science, 277, 393-395.Google Scholar
  18. Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16, 906-914.Google Scholar
  19. Ghigna, C., Moroni, M., Porta, C., Riva, I., & Biamonti, G. (1998). Altered expression of heterogeneous nuclear ribonucleoproteins and SR factors in human. Cancer Research, 58, 5818-5824.Google Scholar
  20. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., & Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286, 531-537. The data is available on-line at edu/MPR/data set ALL AML.html.Google Scholar
  21. Guyon, I. (1999). SVM Application Survey: Scholar
  22. Guyon, I., Makhoul, J., Schwartz, R., & Vapnik, V. (1998). What size test set gives good error rate estimates? PAMI, 20:1, 52-64, IEEE.Google Scholar
  23. Guyon, I., Matic, N., & Vapnik, V. (1996). Discovering informative patterns and data cleaning. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy, (Eds.). Advances in knowledge discovery and data mining (pp. 181-203). Cambridge, MA: MIT Press.Google Scholar
  24. Guyon, I., Vapnik, V., Boser, B., Bottou, L., & Solla, S. A. (1992). Structural risk minimization for character recognition. In J. E. Moody et al. (Ed), Advances in neural information processing systems 4 (NIPS 91), (pp. 471-479). San Mateo CA: Morgan Kaufmann.Google Scholar
  25. Harlan, D. M., Graff, J. M., Stumpo, D. J., Eddy Jr, R. L., Shows, T. B., Boyle, J. M., & Blackshear, P. J. (1991). The human myristoylated alanine-rich C kinase substrate (MARCKS) gene (MACS). Analysis of its gene product, promoter, and chromosomal localization. Journal of Biological Chemistry, 266:22, 14399-14405.Google Scholar
  26. Hastie, T., Tibshirani, R., Eisen, M., Brown, P., Ross, D., Scherf, U., Weinstein, J., Alisadeh, A., Staudt, L., & Botstein, D. (2000). Gene shaving: A new class of clustering methods for expression arrays. Stanford Technical Report.Google Scholar
  27. Jebara, T. & Jaakkola, T. (2000). Feature selection and dualities in maximum entropy discrimination. In 16th Conference on Uncertainty in Artificial Intelligence, UAI 2000, July 2000.Google Scholar
  28. Karakiulakis, G., Papanikolaou, C., Jankovic, S. M., Aletras, A., Papakonstantinou, E., Vretou, E., & Mirtsou-Fidani, V. (1997). Increased type IV collagen-degrading activity in metastases originating from primary tumors of the human colon. Invasion and Metastasis, 17:3, 158-168.Google Scholar
  29. Kearns, M., Mansour, Y., Ng, A. Y., & Ron, D. (1997). An experimental and theoretical comparison of model selection methods. Machine Learning, 27, 7-50.Google Scholar
  30. Kohavi, R. & John, G. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97:12, 273-324.Google Scholar
  31. LeCun, Y., Denker, J. S., & Solla, S. A. (1990). Optimum brain damage. In D. Touretzky (Ed.). Advances in neural information processing systems 2 (pp. 598-605). San Mateo, CA: Morgan Kaufmann.Google Scholar
  32. Macalma, T., Otte, J., Hensler, M. E., Bockholt, S. M., Louis, H. A., Kalff-Suske, M., Grzeschik, K. H., von der Ahe, D., & Beckerle, M. C. (1996). Molecular characterization of human zyxin. Journal of Biological Chemistry, 271:49, 31470-31478.Google Scholar
  33. Moser, T. L., Sharon Stack, M., Asplin, I., Enghild, J. J., Højrup, P., Everitt, L., Hubchak, S., William Schnaper, H., & Pizzo, S. V. (1999). Angiostatin binds ATP synthase on the surface of human endothelial cells. PNAS, 96:6, 2811-2816.Google Scholar
  34. Mukherjee, S., Tamayo, P., Slonim, D., Verri, A., Golub, T., Messirov, J. P., & Poggio, T. (2000). Support vector machine classification of microarray data. AI memo 182. CBCL paper 182. MIT. Can be retrieved from Scholar
  35. de Oliveira, E. C. (1999). Chronic Trypanosoma cruzi infection associated to colon cancer. An experimental study in rats. Resumo di Tese. Revista da Sociedade Brasileira de Medicina Tropical, 32:1, 81-82.Google Scholar
  36. Osaka, M., Rowley, J. D., & Zeleznik-Le, N. J. (1999). MSF (MLL septin-like fusion), a fusion partner gene of MLL, in a therapy-related acute myeloid leukemia with at (11; 17)(q23; q25). PNAS, 96:11, 6428-6433.Google Scholar
  37. Pavlidis, P., Weston, J., Cai, J., & Grundy, W. N. (2000). Gene functional analysis from heterogeneous data. Submitted for publication.Google Scholar
  38. Perou, C. M. et al. (1999). Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. PNAS, 96, 9212-9217.Google Scholar
  39. Schölkopf, B., Smola, A., & Muller, K.-R. (1998). Non-linear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299-1319.Google Scholar
  40. Shürmann, J. (1996). Pattern classification. Wiley Interscience.Google Scholar
  41. Smola, A. & Schölkopf, B. (2000). Sparce greedy matrix approximation for machine learning. In Proceedings of the 17th International Conference on Machine Learning (pp. 911-918).Google Scholar
  42. Thorsteinsdottir, U., Krosl, J., Kroon, E., Haman, A., Hoang, T., & Sauvageau, G. (1999). The oncoprotein E2APbx1a collaborates with Hoxa9 to acutely transform primary bone marrow cells. Molecular Cell Biology, 19:9, 6355-6366.Google Scholar
  43. Vapnik, V. N. (1998). Statistical learning theory. Wiley Interscience.Google Scholar
  44. Walsh, J. H. (1999). Epidemiologic evidence underscores role for folate as foiler of colon cancer. Gastroenterology, 116, 3-4.Google Scholar
  45. Weston, J., Muckerjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. In Proceedings of NIPS 2000, to appear.Google Scholar
  46. Weston, J. & Guyon, I. (2000b). Feature selection for kernel machines using stationary weight approximation. In preparation.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Isabelle Guyon
    • 1
  • Jason Weston
    • 1
  • Stephen Barnhill
    • 1
  • Vladimir Vapnik
    • 2
  1. 1.SavannahUSA
  2. 2.AT&T LabsRed BankUSA

Personalised recommendations