Skip to main content
Log in

Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm

  • Biotechnology and Biomedicine
  • Published:
Journal of Zhejiang University Science B Aims and scope Submit manuscript

Abstract

In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X.,et al., 2000. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling.Nature,403:503–511.

    Article  PubMed  CAS  Google Scholar 

  • Chapelle, O., Vapnik, V. Bousquet, O., Mukherjee, S., 2002. Choosing kernel parameters for support vector machines.Machine Learning,46:131–159.

    Article  Google Scholar 

  • Cristianini, N., Shawe-Taylor, J., 2000. An Introduction to Support Vector Machines. Cambridge University Press, Cambridge.

    Google Scholar 

  • Dudoit, S., Fridlyand, J., Speed, T.P., 2002. Comparison of discrimination methods for the classification of tumors using gene expression data.Journal of the American Statistical Association,97:77–87.

    Article  Google Scholar 

  • Furlanello, C., Serafini, M., Merler, S., Jurman, G., 2003. An accelerated procedure for recursive feature ranking on microarray data.Neural Networks,16:641–648.

    Article  PubMed  CAS  Google Scholar 

  • Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A.,et al., 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.Science,286:531–537.

    Article  PubMed  CAS  Google Scholar 

  • Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer classification using support vector machines.Machine Learning,46:389–422.

    Article  Google Scholar 

  • Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Rafeld, M.,et al., 2001. Gene expression profiles in hereditary breast cancer.The New England Journal of Medicine,344:539–548.

    Article  PubMed  CAS  Google Scholar 

  • Houck, C., Joines, J., Kay, M., 1995. A Genetic Algorithm for Function Optimization: A Matlab Implementatio. NCSU-IE TR 95-09, North Carolina State University, USA.

    Google Scholar 

  • Kim, S., Dougherty, E.R., Chen, Y., Sivakumar, K., Meltzer, P., Trent, J.M., Bittner, M., 2000. Multivariate measurement of gene expression relations.Geonomics,67:201–209.

    Article  CAS  Google Scholar 

  • Kim, S., Dougherty, E.R., Barrea, J., Chen, Y., Bittner, M., Trent, J.M., 2002. Strong feature sets from small samples.Journal of Computational Biology,9:127–146.

    Article  PubMed  CAS  Google Scholar 

  • Lee, K.E., Sha, N., Dougherty, E.R., Vannucci, M., Mallick, B.K., 2003. Gene selection: a Bayesian variable selection approach.Bioinformatics,19:90–97.

    Article  PubMed  CAS  Google Scholar 

  • Li, W., Yang, Y., 2002. How Many Genes are Needed for a Discriminant Microarray Data Analysis.In: Lin, S.M., Johnson, K.F. (Eds.), Methods of Microarray Data Analysis, Kluwer Academic, Boston, p. 137–150.

    Google Scholar 

  • Mao, Y., Zhou, X., Pi, D.Y., Wong, T.C., Sun, Y.X., 2004. Multi-class cancer classification by using fuzzy support vector machine and binary decision tree with gene selection.Journal of Biomedicine and Biotechnology, in Press.

  • Miettinen, K., Neittaanmaki, P., Makela, M.M., 1999. Evolutionary Algorithms in Engineering and Computer Science. Wiley, New York.

    Google Scholar 

  • Shashua, A., Wolf, L., 2004. Kernel Feature Selection with Side Data using a Spectral Approach. Computer Vision-ECCV 2004: 8th European Conference on Computer Vision. Prague, Czech Republic, p.39–53.

  • Srinivas, M., Patnaik, L.M., 1994. Adaptive probabilities of crossover and mutation in genetic algorithm.IEEE Trans. Syst. Man, Cybem.,24(4):656–667.

    Article  Google Scholar 

  • Tabus, I., Astola, J., 2001. On the use of MDL principle in gene expression prediction.J. Appl. Signal Process,4:297–303.

    Article  Google Scholar 

  • Vapnik, V.N., 2000. The Nature of Statistical Learning Theory, 2nd Ed., Springer, New York.

    Google Scholar 

  • Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V., 2001.In: Leen T.K., Dietterich, T.G., Tresp, V. (Eds.), Advances in Neural Information Processing System 13. MIT Press, Cambridge, MA p. 668–674.

    Google Scholar 

  • Zhang, X., Wong, W., 2001. Recursive Sample Classification and Gene Selection Based on SVM: Method and Software Description. Technical Report, Department of Biostatistics, Harvard School of Public Health, USA.

    Google Scholar 

  • Zhou, X., Wang, X., Dougherty, E.R., 2003a. Construction of genomic networks using mutual-information clustering and reversible-jump Markov-Chain-Monte-Carlo predictor design.Signal Process,83:745–761.

    Article  Google Scholar 

  • Zhou, X., Wang, X., Dougherty, E.R., 2003b. Binarization of microarray data based on a mixture model.Molecular Cancer Therapeutics,2:679–684.

    PubMed  CAS  Google Scholar 

  • Zhou, X., Wang, X., Dougherty, E.R., 2003c. Missing value estimation based on linear and nonlinear regression with Bayesian gene selection.Bioinformatics,19:2302–2307.

    Article  PubMed  CAS  Google Scholar 

  • Zhou, X., Wang, X., Dougherty, E.R., 2004a. A Bayesian approach to nonlinear probit gene selection and classification.Journal of Franklin Institute, Special Issue on Genomics, Signal Processing and Statistics,341:137–156.

    Google Scholar 

  • Zhou, X., Wang, X., Dougherty, E.R., 2004b. Nonlinear-probit gene classification using mutual-information and wavelet-based feature selection.Biological Systems, in Press.

  • Zhou, X., Wang, X., Dougherty, E.R., 2005. Gene selection using logistic regressions based on AIC, BIC and MDL criteria.Journal of New Mathematics and Natural Computation,1(1):129–145.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pi Dao-ying.

Additional information

Project supported by the National Basic Research Program (973) of China (No. 2002CB312200) and the Center for Bioinformatics Program Grant of Harvard Center of Neurodegeneration and Repair, Harvard Medical School, Harvard University, Boston, USA

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yong, M., Xiao-bo, Z., Dao-ying, P. et al. Parameters selection in gene selection using Gaussian kernel support vector machines by genetic algorithm. J. Zhejiang Univ. Sci. B 6, 961–973 (2005). https://doi.org/10.1007/BF02888487

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02888487

Key words

Document code

CLC number

Navigation