Abstract
Selecting high discriminative genes from gene expression data has become an important research. Not only can this improve the performance of cancer classification, but it can also cut down the cost of medical diagnoses when a large number of noisy, redundant genes are filtered. In this paper, a hybrid Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) method is used for gene selection, and Support Vector Machine (SVM) is adopted as the classifier. The proposed approach is tested on three benchmark gene expression datasets: Leukemia, Colon and breast cancer data. Experimental results show that the proposed method can reduce the dimensionality of the dataset, and confirm the most informative gene subset and improve classification accuracy.
Similar content being viewed by others
References
Almuallim H and Dietterich T (1994). Learning boolean concepts in the presence of many irrelevant features. Artif Intell 69(1–2): 279–305
Alon U, Barkai U and Notterman DA et al (1999). Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96: 6745–6750
Ben-Dor A, Bruhn L and Friedman N et al (2000). Tissue classification with gene expression profiles. J Comput Biol 7: 559–583
Cristianini N and Shawe-Taylor J (1999). An introduction to SVM. Cambridge University Press, Cambridge
Deng L, Pei J, Ma J et al (2004) A rank sum test method for informative gene discovery. In: Kim W, Kohavi R, Gehrke J et al (eds) Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 410–490
Furey TS, Cristianini N and Duffy N et al (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16: 906–914
Goldberg DE (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley, New York
Golub T, Slonim D and Tamayo P et al (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 28: 531–537
Guyon I, Weston J and Barnhill S et al (2002). Gene selection for cancer classification using support vector machines. Mach Learn 46: 389–422
Hall M (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: 17th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA
He W (2004). A spline function approach for detecting differentially expressed genes in microarray data analysis. Bioinformatics 20: 2954–2963
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks, pp 1942–1948
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In Proceedings of the IEEE International Conference on Systems. Man, and Cybernetics, pp 4104–4109
Li L, Darden TA and Weingberg CR et al (2001a). Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Comb Chem High Throughput Screen 4: 727–739
Li L, Weinberg CR and Darden TA et al (2001b). Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17: 1131–1142
Ooi CH and Tan P (2003). Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19: 37–44
Pan W (2002). A comparative review of statistical methods for discovering differentially expressed genes in replicated Microarray experiments. Bioinformatics 18: 546–554
Peng S, Xu Q and Ling XB et al (2003). Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Lett 555: 358–362
Ruiz R, Riquelme JC and Aguilar-Ruiz JS (2006). Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit 39(12): 2383–2392
Shen Q, Shi WM and Kong W et al (2007). A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. Talanta 71: 1679–1683
Shi XH, Lu YH, Zhou CG et al (2003) Hybrid evolutionary algorithms based on pso and ga. In: Sarker R, Reynolds R, Abbass H et al (eds) Proceeding of IEEE Congress on Evolutionary computation, pp 2393–2399
Thomas JG, Olson JM and Tapscott SJ et al (2001). An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res 11: 1227–1236
Tinker AV, Boussioutas A and Bowtell DDL (2006). The challenges of gene expression microarrays for the study of human cancer. Cancer Cell 9: 333–339
Troyanskaya OG, Garber ME and Brown PO et al (2002). Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18: 1454–1461
Vapnik V (1995). The nature of statistical learning theory. Springer, New York
West M, Blanchette C and Dressman H et al (2001). Predicting the clinical status of human breast cancer using gene expression profiles. Proc Natl Acad Sci 98: 11462–11467
Weston J, Mukherjee S and Chapelle O et al (2000). Feature selection for SVMs. Adv Neural Inf Process Syst 13: 668–674
Yu L and Liu H (2004). Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5: 1205–1224
Zhang H, Ahn J and Lin X et al (2005). Gene selection using support vector machines with non-convex penalty. Bioinformatics 22: 88–95
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, S., Wu, X. & Tan, M. Gene selection using hybrid particle swarm optimization and genetic algorithm. Soft Comput 12, 1039–1048 (2008). https://doi.org/10.1007/s00500-007-0272-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-007-0272-x