Parsimonious Selection of Useful Genes in Microarray Gene Expression Data

  • Félix F. González-Navarro
  • Lluís A. Belanche-Muñoz
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 696)


Machine learning methods have of late made significant efforts to solving multidisciplinary problems in the field of cancer classification in microarray gene expression data. These tasks are characterized by a large number of features and a few observations, making the modeling a nontrivial undertaking. In this study, we apply entropic filter methods for gene selection, in combination with several off-the-shelf classifiers. The introduction of bootstrap resampling techniques permits the achievement of more stable performance estimates. Our findings show that the proposed methodology permits a drastic reduction in dimension, offering attractive solutions in terms of both prediction accuracy and number of explanatory genes; a dimensionality reduction technique preserving discrimination capabilities is used for visualization of the selected genes.


Biological data mining and knowledge discovery Cancer informatics Gene expression analysis Tools and methods for computational biology and bioinformatics 


  1. 1.
    Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In: Proceedings of the National Academy of Sciences USA 96(12) 6745–6750 (1999)CrossRefGoogle Scholar
  2. 2.
    Amin, K., et al.: Wilms’ tumor 1 susceptibility (wt1) gene products are selectively expressed in malignant mesothelioma. The American Journal of Pathology 146(2) 344–356 (1995)PubMedGoogle Scholar
  3. 3.
    Duan, K.B., et al.: Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE/ACM Transactions on Nanobioscience 4(3) 228–234 (2005)CrossRefGoogle Scholar
  4. 4.
    Bu, H.L., et al.: Reducing error of tumor classification by using dimension reduction with feature selection. In: The First International Symposium on Optimization and Systems Biology, Beijing, China, 232–241 (2007)Google Scholar
  5. 5.
    Cai, R., et al.: An efficient gene selection algorithm based on mutual information. Neurocomputing 72 991–999 (2009)CrossRefGoogle Scholar
  6. 6.
    Chakraborty, S.: Simultaneous cancer classification and gene selection with bayesian nearest neighbor method: An integrated approach. Computational Statistics and Data Analysis 53(4) 1462–1474 (2009)CrossRefGoogle Scholar
  7. 7.
    Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Proceedings of the European working session on Machine learning, Springer, New York, 164–178 (1991)Google Scholar
  8. 8.
    Chu, F., Wang, L.: Applications of support vector machines to cancer classification with microarray data. International Journal of Neural Systems 15(6) 475–484 (2005)PubMedCrossRefGoogle Scholar
  9. 9.
    Chu, W., et al.: Biomarker discovery in microarray gene expression data with gaussian processes. Bioinformatics 21(16) 3385–3393 (June 2005)PubMedCrossRefGoogle Scholar
  10. 10.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of IEEE Computational Systems Bioinformatics (2003)Google Scholar
  11. 11.
    Dumont, N., Arteaga, C.: Transforming growth factor-β and breast cancer: Tumor promoting effects of transforming growth factor-β. Breast Cancer Research 2 125–132 (2000)PubMedCrossRefGoogle Scholar
  12. 12.
    Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439) 531–537 (October 1999)PubMedCrossRefGoogle Scholar
  13. 13.
    Gordon, G.J., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62 4963–4967 (September 2002)PubMedGoogle Scholar
  14. 14.
    Goutebroze, L., et al.: Cloning and characterization of SCHIP-1, a novel protein interacting specifically with spliced isoforms and naturally occurring mutant NF2 proteins. Molecular and Cellular Biology 20(5) 1699–1712 (2000)PubMedCrossRefGoogle Scholar
  15. 15.
    Hedenfalk, I., et al.: Gene-expression profiles in hereditary breast cancer. The New England Journal of Medicine 344 539–548 (2001)PubMedCrossRefGoogle Scholar
  16. 16.
    Hewett, R., Kijsanayothin, F.: Tumor classification ranking from microarray data. BMC Genomics 9(2) (2008)Google Scholar
  17. 17.
    Hong, J.H., Cho, S.B.: Cancer classification with incremental gene selection based on DNA microarray data. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics 70–74 (2008)Google Scholar
  18. 18.
    Hong-Qiang, W., et al.: Extracting gene regulation information for cancer classification. Pattern Recognition 40(12) 3379–3392 (2007)CrossRefGoogle Scholar
  19. 19.
    Jiang, W., et al.: Constructing disease-specific gene networks using pair-wise relevance metric: Application to colon cancer identifies interleukin 8, desmin and enolase 1 as the central elements. BMC Systems Biology 2 (2008)Google Scholar
  20. 20.
    Johansson, B., et al.: The prostate. Proteomic comparison of prostate cancer cell lines LNCaP-FGC and LNCaP-r reveals heatshock protein 60 as a marker for prostate malignancy 66(12) 1235–1244 (2006)Google Scholar
  21. 21.
    Kurgan, L.A., Cios, K.J.: Caim discretization algorithm. IEEE Transactions on Knowledge and Data Engineering 16(2) 145–153 (2004)CrossRefGoogle Scholar
  22. 22.
    Lisboa, P., et al.: Cluster based visualisation with scatter matrices. Pattern Recognition Letters 29(13) 1814–1823 (2008)CrossRefGoogle Scholar
  23. 23.
    Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric atributes. In: IEEE 7th International Conference on Tools with Artificial Intelligence, 338–395 (1995)Google Scholar
  24. 24.
    Lurje, G., et al.: Polymorphisms in VEGF and IL-8 predict tumor recurrence in stage III colon cancer. Annals of Oncology 19 1734–1741 (2008)PubMedCrossRefGoogle Scholar
  25. 25.
    Meyer, P.E., Schretter C., Bontempi, G. Information-theoretic feature selection in microarray data using variable complementarity. IEEE Journal of Selected Topics in Signal Processing 2(3) (2008)Google Scholar
  26. 26.
    National center of biothecnology information.
  27. 27.
    Ng, M., Chan, L.: Informative gene discovery for cancer classification from microarray expression data. In: IEEE Machine Learning for Signal Processing, 393–398 (2005)Google Scholar
  28. 28.
    Öhrvik, A., et al.: Sensitive nonradiometric method for determining thymidine kinase 1 activity. Clinical Chemistry 50(9) 1597–1606 (2004)PubMedCrossRefGoogle Scholar
  29. 29.
    Plesa, C., et al.: Prognostic value of immunophenotyping in elderly patients with acute myeloid leukemia: A single-institution experience. Cancer 112(3) 572–580 (2007)CrossRefGoogle Scholar
  30. 30.
    Potamias, G., et al.: Gene selection via discretized gene-expression profiles and greedy feature-elimination. In: SETN, 256–266 (2004)Google Scholar
  31. 31.
    Ruiz, R., et al.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognition 39 2383–2392 (2006)CrossRefGoogle Scholar
  32. 32.
    Scherz-Shouval, R., et al.: Reactive oxygen species are essential for autophagy and specifically regulate the activity of Atg4. The EMBO Journal 26 1749–1760 (2007)PubMedCrossRefGoogle Scholar
  33. 33.
    Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 203–209 (March 2002)PubMedCrossRefGoogle Scholar
  34. 34.
    Tang, Y., et al.: Development of two-stage svm-rfe gene selection strategy for microarray expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3) 365–381 (2007)PubMedCrossRefGoogle Scholar
  35. 35.
    Wang, H.: Towards a Unified Framework of Relevance. PhD thesis, University of Ulster (1996)Google Scholar
  36. 36.
    Wang, L., et al.: Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3) 412–419 (2008)PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Félix F. González-Navarro
  • Lluís A. Belanche-Muñoz
    • 1
  1. 1.Departament de Llenguatges i Sistemes InformàticsUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations