Gene Ranking from Microarray Data for Cancer Classification–A Machine Learning Approach

  • Roberto Ruiz
  • Beatriz Pontes
  • Raúl Giráldez
  • Jesús S. Aguilar–Ruiz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4252)


Traditional gene selection methods often select the top–ranked genes according to their individual discriminative power. We propose to apply feature evaluation measure broadly used in the machine learning field and not so popular in the DNA microarray field. Besides, the application of sequential gene subset selection approaches is included. In our study, we propose some well-known criteria (filters and wrappers) to rank attributes, and a greedy search procedure combined with three subset evaluation measures. Two completely different machine learning classifiers are applied to perform the class prediction. The comparison is performed on two well–known DNA microarray data sets. We notice that most of the top-ranked genes appear in the list of relevant–informative genes detected by previous studies over these data sets.


Feature Selection Feature Subset Gene Ranking Gene Subset Feature Subset Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Ben-Dor, A., et al.: Tissue classification with gene expression profiles. Proc. Natl. Acad. Sci. USA 98(26), 15149–15154 (2001)CrossRefGoogle Scholar
  3. 3.
    Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 98–109 (2000)Google Scholar
  4. 4.
    Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  5. 5.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)MATHCrossRefGoogle Scholar
  6. 6.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machine. Machine Learning 46(1-3), 389–422 (2002)MATHCrossRefGoogle Scholar
  7. 7.
    Hall, M.: Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Dept Computer Science, Hamilton, New Zealand (1999)Google Scholar
  8. 8.
    Hellem, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology 3(4), 0017.1–0017.11 (2002)Google Scholar
  9. 9.
    Inza, I., et al.: Filter versus wrapper gene selection approaches in dna microarray domains. Artificial Intelligence in Medicine 31, 91–103 (2004)CrossRefGoogle Scholar
  10. 10.
    Kononenko, I.: Estimating attributes: Analysis and estensions of relief. In: European Conf. on Machine Learning, Vienna, pp. 171–182. Springer, Heidelberg (1994)Google Scholar
  11. 11.
    Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: 7th IEEE Int. Conf. on Tools with Artificial Intelligence (1995)Google Scholar
  12. 12.
    Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowledge and Data Eng. 17(3), 1–12 (2005)MATHCrossRefGoogle Scholar
  13. 13.
    Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  14. 14.
    Ruiz, R., Riquelme, J., Aguilar-Ruiz, J.: Projection-based measure for efficient feature selection. Journal of Intelligent and Fuzzy System 12(3–4), 175–183 (2002)MATHGoogle Scholar
  15. 15.
    Witten, I., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2005)Google Scholar
  16. 16.
    Xing, E., Jordan, M., Karp, R.: Feature selection for high-dimensional genomic microarray data. In: Proc. 18th Int. Conf. on Machine Learning, pp. 601–608. Morgan Kaufmann, San Francisco (2001)Google Scholar
  17. 17.
    Xiong, M., Jin, L., Li, W., Boerwinkle, E.: Computatinal methods for gene expression-based tumor classification. BioTechniques 29, 1264–1270 (2000)Google Scholar
  18. 18.
    Yu, L., Liu, H.: Redundancy based feature selection for microarry data. In: 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Roberto Ruiz
    • 1
  • Beatriz Pontes
    • 1
  • Raúl Giráldez
    • 2
  • Jesús S. Aguilar–Ruiz
    • 2
  1. 1.Department of Computer ScienceUniversity of SevilleSevillaSpain
  2. 2.Area of Computer ScienceUniversity of Pablo de OlavideSevillaSpain

Personalised recommendations