A Scalable Feature Selection Method to Improve the Analysis of Microarrays

  • Aida de Haro-García
  • Javier Pérez-Rodríguez
  • Nicolás García-Pedrajas
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 431)


DNA microarray experiments are used to collect information from tissue and cell samples regarding gene expression differences that are useful for diagnosis and treatment of many different diseases. The predictive accuracy is hindered by the large dimensionality of these datasets and the existence of irrelevant and redundant features. The performance of a feature selection process could improve the classification accuracy of this demanding research field.

However, standard feature selection method performance may be very poor in high-dimensional microarray data. We propose a scalable evolutionary method to select relevant genes. We use a divide-and-conquer approach to deal with the scalability issues of the evolutionary algorithms, and a combination of different rounds of feature selection to increase the accuracy results and storage reduction. Our proposal improves the results of standard classifiers and feature selection methods in accuracy and storage reduction for 8 different microarray datasets.


Genetic Algorithm Feature Selection Feature Selection Method Microarray Dataset Feature Selection Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cano, J.R., Herrera, F., Lozano, M.: Using Evolutionary Algorithms as Instance Selection for Data Reduction in KDD: An Experimental Study. IEEE Transactions on Evolutionary Computation 7(6), 561–575 (2003)CrossRefGoogle Scholar
  2. 2.
    Craven, M., DiPasquoa, D., Freitagb, D., McCalluma, A., Mitchella, T., Nigama, K., Slatterya, S.: Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence 118(1-2), 69–113 (2000)CrossRefzbMATHGoogle Scholar
  3. 3.
    Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature Selection for Clustering - A Filter Solution. In: Proceedings of the Second International Conference on Data Mining, pp. 115–122 (2002)Google Scholar
  4. 4.
    Ding, Y., Wilkins, D.: Improving the performance of svm-rfe to select genes in microarray data. BMC Bioinformatics 7(suppl. 2), S12 (2006)Google Scholar
  5. 5.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)CrossRefzbMATHGoogle Scholar
  6. 6.
    de Haro-García, A., García-Pedrajas, N.: Scaling up feature selection by means of democratization. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds.) IEA/AIE 2010. LNCS, vol. 6097, pp. 662–672. Springer, Heidelberg (2010), CrossRefGoogle Scholar
  7. 7.
    Kim, Y., Street, W.N., Menczer, F.: Feature selection in unsupervised learning via evolutionary search. In: The 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 365–369. ACM Press (2000)Google Scholar
  8. 8.
    Narendra, P.M., Fukunaga, K.: Branch, and bound algorithm for feature subset selection. IEEE Transactions Computer C-26(9), 917–922 (1977)CrossRefzbMATHGoogle Scholar
  9. 9.
    Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection. Pattern Recognition Lett. 10, 335–347 (1989)CrossRefzbMATHGoogle Scholar
  10. 10.
    Somorjai, R.L., Dolenko, B., Baumgartner, R.: Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions (2003)Google Scholar
  11. 11.
    Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Aida de Haro-García
    • 1
  • Javier Pérez-Rodríguez
    • 1
  • Nicolás García-Pedrajas
    • 1
  1. 1.Department of Computing and Numerical AnalysisUniversity of CórdobaCórdobaSpain

Personalised recommendations