Soft Computing

, Volume 12, Issue 11, pp 1039–1048 | Cite as

Gene selection using hybrid particle swarm optimization and genetic algorithm

Original Paper


Selecting high discriminative genes from gene expression data has become an important research. Not only can this improve the performance of cancer classification, but it can also cut down the cost of medical diagnoses when a large number of noisy, redundant genes are filtered. In this paper, a hybrid Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) method is used for gene selection, and Support Vector Machine (SVM) is adopted as the classifier. The proposed approach is tested on three benchmark gene expression datasets: Leukemia, Colon and breast cancer data. Experimental results show that the proposed method can reduce the dimensionality of the dataset, and confirm the most informative gene subset and improve classification accuracy.


Gene selection Particle swarm optimization Genetic algorithm Support vector machine 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Almuallim H and Dietterich T (1994). Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2): 279–305 MATHCrossRefMathSciNetGoogle Scholar
  2. Alon U, Barkai U and Notterman DA et al (1999). Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96: 6745–6750 CrossRefGoogle Scholar
  3. Ben-Dor A, Bruhn L and Friedman N et al (2000). Tissue classification with gene expression profiles. J Comput Biol 7: 559–583 CrossRefGoogle Scholar
  4. Cristianini N and Shawe-Taylor J (1999). An introduction to SVM. Cambridge University Press, Cambridge Google Scholar
  5. Deng L, Pei J, Ma J et al (2004) A rank sum test method for informative gene discovery. In: Kim W, Kohavi R, Gehrke J et al (eds) Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 410–490Google Scholar
  6. Furey TS, Cristianini N and Duffy N et al (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16: 906–914 CrossRefGoogle Scholar
  7. Goldberg DE (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New York MATHGoogle Scholar
  8. Golub T, Slonim D and Tamayo P et al (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 28: 531–537 CrossRefGoogle Scholar
  9. Guyon I, Weston J and Barnhill S et al (2002). Gene selection for cancer classification using support vector machines. Mach Learn 46: 389–422 MATHCrossRefGoogle Scholar
  10. Hall M (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: 17th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CAGoogle Scholar
  11. He W (2004). A spline function approach for detecting differentially expressed genes in microarray data analysis. Bioinformatics 20: 2954–2963 CrossRefGoogle Scholar
  12. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks, pp 1942–1948Google Scholar
  13. Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In Proceedings of the IEEE International Conference on Systems. Man, and Cybernetics, pp 4104–4109Google Scholar
  14. Li L, Darden TA and Weingberg CR et al (2001a). Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb Chem High Throughput Screen 4: 727–739 Google Scholar
  15. Li L, Weinberg CR and Darden TA et al (2001b). Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17: 1131–1142 CrossRefGoogle Scholar
  16. Ooi CH and Tan P (2003). Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19: 37–44 CrossRefGoogle Scholar
  17. Pan W (2002). A comparative review of statistical methods for discovering differentially expressed genes in replicated Microarray experiments. Bioinformatics 18: 546–554 CrossRefGoogle Scholar
  18. Peng S, Xu Q and Ling XB et al (2003). Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Lett 555: 358–362 CrossRefGoogle Scholar
  19. Ruiz R, Riquelme JC and Aguilar-Ruiz JS (2006). Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit 39(12): 2383–2392 CrossRefGoogle Scholar
  20. Shen Q, Shi WM and Kong W et al (2007). A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. Talanta 71: 1679–1683 CrossRefGoogle Scholar
  21. Shi XH, Lu YH, Zhou CG et al (2003) Hybrid evolutionary algorithms based on pso and ga. In: Sarker R, Reynolds R, Abbass H et al (eds) Proceeding of IEEE Congress on Evolutionary computation, pp 2393–2399Google Scholar
  22. Thomas JG, Olson JM and Tapscott SJ et al (2001). An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11: 1227–1236 CrossRefGoogle Scholar
  23. Tinker AV, Boussioutas A and Bowtell DDL (2006). The challenges of gene expression microarrays for the study of human cancer. Cancer Cell 9: 333–339 CrossRefGoogle Scholar
  24. Troyanskaya OG, Garber ME and Brown PO et al (2002). Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18: 1454–1461 CrossRefGoogle Scholar
  25. Vapnik V (1995). The nature of statistical learning theory. Springer, New York MATHGoogle Scholar
  26. West M, Blanchette C and Dressman H et al (2001). Predicting the clinical status of human breast cancer using gene expression profiles. Proc Natl Acad Sci 98: 11462–11467 CrossRefGoogle Scholar
  27. Weston J, Mukherjee S and Chapelle O et al (2000). Feature selection for SVMs. Adv Neural Inf Process Syst 13: 668–674 Google Scholar
  28. Yu L and Liu H (2004). Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5: 1205–1224 MathSciNetGoogle Scholar
  29. Zhang H, Ahn J and Lin X et al (2005). Gene selection using support vector machines with non-convex penalty. Bioinformatics 22: 88–95 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.College of Electrical and Information EngineeringHunan UniversityChangshaChina

Personalised recommendations