Gene Selection for Microarray Data by a LDA-Based Genetic Algorithm

  • Edmundo Bonilla Huerta
  • Béatrice Duval
  • Jin-Kao Hao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)


Gene selection aims at identifying a (small) subset of informative genes from the initial data in order to obtain high predictive accuracy. This paper introduces a new wrapper approach to this difficult task where a Genetic Algorithm (GA) is combined with Fisher’s Linear Discriminant Analysis (LDA). This LDA-based GA algorithm has the major characteristic that the GA uses not only a LDA classifier in its fitness function, but also LDA’s discriminant coefficients in its dedicated crossover and mutation operators. The proposed algorithm is assessed on a set of seven well-known datasets from the literature and compared with 16 state-of-art algorithms. The results show that our LDA-based GA obtains globally high classification accuracies (81%-100%) with a very small number of genes (2-19).


Linear discriminant analysis genetic algorithm gene selection classification wrapper 


  1. 1.
    Alizadeh, A., Eisen, B.M., Davis, R.E., et al.: Distinct types of diffuse large (b)–cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRefPubMedGoogle Scholar
  2. 2.
    Alon, U., Barkai, N., Notterman, D., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. USA. 96, 6745–6750 (1999)CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles. Journal of Computational Biology 7(3-4), 559–583 (2000)CrossRefPubMedGoogle Scholar
  4. 4.
    Bonilla Huerta, E., Duval, B., Hao, J.K.: A hybrid ga/svm approach for gene selection and classification of microarray data. In: Rothlauf, F., Branke, J., Cagnoni, S., Costa, E., Cotta, C., Drechsler, R., Lutton, E., Machado, P., Moore, J.H., Romero, J., Smith, G.D., Squillero, G., Takagi, H. (eds.) EvoWorkshops 2006. LNCS, vol. 3907, pp. 34–44. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Bonilla Huerta, E., Duval, B., Hao, J.K.: Fuzzy logic for elimination of redundant information of microarray data. In: Genomics, Proteomics and Bioinformatics (June 2008) (to appear)Google Scholar
  6. 6.
    Cho, S.-B., Won, H.-H.: Cancer classification using ensemble of neural networks with multiple significant gene subsets. Applied Intelligence 26(3), 243–250 (2007)CrossRefGoogle Scholar
  7. 7.
    Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinformatics and Computational Biology 3(2), 185–206 (2005)CrossRefPubMedGoogle Scholar
  8. 8.
    Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97, 77–87 (2002)CrossRefGoogle Scholar
  9. 9.
    Golub, T., Slonim, D., Tamayo, P., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefPubMedGoogle Scholar
  10. 10.
    Gordon, G.J., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62, 4963 (2002)PubMedGoogle Scholar
  11. 11.
    Hernandez Hernandez, J.C., Duval, B., Hao, J.K.: A genetic embedded approach for gene selection and classification of microarray data. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds.) EvoBIO 2007. LNCS, vol. 4447, pp. 90–101. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  12. 12.
    Li, G.-Z., Zeng, X.-Q., Yang, J.Y., Yang, M.Q.: Partial least squares based dimension reduction with gene selection for tumor classification. In: Proc. of 7th IEEE Intl. Symposium on Bioinformatics and Bioengineering, pp. 1439–1444 (2007)Google Scholar
  13. 13.
    Li, S., Wu, X., Hu, X.: Gene selection using genetic algorithm and support vectors machines. Soft Comput. 12(7), 693–698 (2008)CrossRefGoogle Scholar
  14. 14.
    Liu, B., Cui, Q., Jiang, T., Ma, S.: A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics 5(136), 1–12 (2004)Google Scholar
  15. 15.
    Marchiori, E., Sebag, M.: Bayesian learning with local support vector machines for cancer classification with gene expression data. In: Rothlauf, F., Branke, J., Cagnoni, S., Corne, D.W., Drechsler, R., Jin, Y., Machado, P., Marchiori, E., Romero, J., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2005. LNCS, vol. 3449, pp. 74–83. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Pang, S., Havukkala, I., Hu, Y., Kasabov, N.: Classification consistency analysis for bootstrapping gene selection. Neural Computing and Appli. 16, 527–539 (2007)CrossRefGoogle Scholar
  17. 17.
    Park, H., Park, C.: A comparison of generalized linear discriminant analysis algorithms. Pattern Recognition 41(3), 1083–1097 (2008)CrossRefGoogle Scholar
  18. 18.
    Peng, Y., Li, W., Liu, Y.: A hybrid approach for biomarker discovery from microarray gene expression data. Cancer Informatics, 301–311 (2006)Google Scholar
  19. 19.
    Petricoin, E.F., Ardekani, A.M., Hitt, B., Levine, P., Steinberg, S., Mills, G., Simone, C., Fishman, D., Kohn, E., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)CrossRefPubMedGoogle Scholar
  20. 20.
    Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)CrossRefPubMedGoogle Scholar
  21. 21.
    Singh, D., Febbo, P.B., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., Amico, A.V., Richie, J.P., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)CrossRefPubMedGoogle Scholar
  22. 22.
    Tan, A.C., Gilbert, D.: Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinformatics 2(3 Suppl), 75–83 (2003)Google Scholar
  23. 23.
    Wang, S., Chen, H., Li, S., Zhang, D.: Feature extraction from tumor gene expression profiles using DCT and DFT. In: Neves, J., Santos, M.F., Machado, J.M. (eds.) EPIA 2007. LNCS (LNAI), vol. 4874, pp. 485–496. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. 24.
    Wang, Z., Palade, V., Xu, Y.: Neuro-fuzzy ensemble approach for microarray cancer gene expression data analysis. In: Proc. Evolving Fuzzy Systems, pp. 241–246 (2006)Google Scholar
  25. 25.
    Yang, W.-H., Dai, D.-Q., Yan, H.: Generalized discriminant analysis for tumor classification with gene expression data. In: Machine Learning and Cybernetics, pp. 4322–4327 (2006)Google Scholar
  26. 26.
    Ye, J.: Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research 6, 483–502 (2005)Google Scholar
  27. 27.
    Ye, J., Li, T., Xiong, T., Janardan, R.: Using uncorrelated discriminant analysis for tissue classification with gene expression data. IEEE/ACM Trans. Comput. Biology Bioinform. 1(4), 181–190 (2004)CrossRefGoogle Scholar
  28. 28.
    Yue, F., Wang, K., Zuo, W.: Informative gene selection and tumor classification by null space LDA for microarray data. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 435–446. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  29. 29.
    Zhang, L., Li, Z., Chen, H.: An effective gene selection method based on relevance analysis and discernibility matrix. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1088–1095. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Edmundo Bonilla Huerta
    • 1
  • Béatrice Duval
    • 1
  • Jin-Kao Hao
    • 1
  1. 1.LERIAUniversité d’AngersAngersFrance

Personalised recommendations