Abstract
In recent years, the rapid development of DNA Microarray technology has made it possible for scientists to monitor the expression level of thousands of genes in a single experiment. As a new technology, Microarray data presents some fresh challenges to scientists since Microarray data contains a large number of genes (around tens thousands) with a small number of samples (around hundreds). Both filter and wrapper gene selection methods aim to select the most informative genes among the massive data in order to reduce the size of the expression database. Gene selection methods are used in both data preprocessing and classification stages. We have conducted some experiments on different existing gene selection methods to preprocess Microarray data for classification by benchmark algorithms SVMs and C4.5. The study suggests that the combination of filter and wrapper methods in general improve the accuracy performance of gene expression Microarray data classification. The study also indicates that not all filter gene selection methods help improve the performance of classification. The experimental results show that among tested gene selection methods, Correlation Coefficient is the best gene selection method for improving the classification accuracy on both SVMs and C4.5 classification algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh, A., Eishen, M.B., Davis, E., Ma, C., et al.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Brown, M., Grundy, W., Lin, D., Cristianini, N., Sugnet, C., Furey, T., Jr., M., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using suport vector machines. In: Proc. Natl. Acad. Sci., vol. 97, pp. 262–267 (2000)
Cho, S.-B., Won, H.-H.: Machine learning in dna microarray analysis for cancer classification. In: CRPITS ’19: Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003, Adelaide, Australia, pp. 189–198. Australian Computer Society, Inc., Darlinghurst, Australia (2003)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Dettling, M.: Bagboosting for tumor classification with gene expression data. Bioinformatics 20(18), 3583–3593 (2004)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine learning 40, 139–157
Furey, T.S., Christianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Hauessler, D.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Gordon, G., Jensen, R., Hsiao, L.-L., Gullans, S., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gege expression ratios in lung cancer and mesothelioma. Cancer Research 62, 4963–4967 (2002)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3), 389–422 (2002)
Joachims, T.: Making large-scale support vector machine learning practical. In: Advances in Kernel Methods: Support Vector Machines (1998)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) Proceedings of ECML 1998, 10th European Conference on Machine Learning, pp. 137–142. Springer, Heidelberg (1998)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Koller, D., Sahami, M.: Toward optimal feature selection. In: International Conference on Machine Learning, pp. 284–292 (1996)
Li, J., Liu, H.: Kent ridge bio-medical data set repository (2002), http://sdmc.lit.org.sg/gedatasets/datasets.html
Li, J., Liu, H., Ng, S.-K., Wong, L.: Discovery of significant rules for classifying cancer diagnosis data. In: ECCB, pp. 93–102 (2003)
Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems, vol. 10, pp. 570–576. MIT Press, Cambridge (1998)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Osuna, E., Freund, R., Girosi, F.: Training support vector machines:an application to face detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (1997)
Quinlan, J.: Improved use of continuous attributes in C4.5. Artificial Intelligence Research 4, 77–90 (1996)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, California (1993)
Tan, A.C., Gibert, D.: Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2(3), 75–83 (2003)
Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Van’t Veer, L.J., Dai, H., Van de Vijver, M.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
West, M., Blanchette, C., et al.: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. U S A 98, 11462–11467 (2001)
Yeang, C., Ramaswamy, S., Tamayo, P., et.,, al,: Molecular classification of multiple tumor types. Bioinformatics 17(Suppl. 1), 316–322 (2001)
Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, pp. 737–742 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, H., Li, J., Wang, H., Daggard, G. (2006). Combined Gene Selection Methods for Microarray Data Analysis. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11892960_117
Download citation
DOI: https://doi.org/10.1007/11892960_117
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46535-5
Online ISBN: 978-3-540-46536-2
eBook Packages: Computer ScienceComputer Science (R0)