Abstract
Gene selection from microarray gene expression data is very difficult due to the large dimensionality of the data. The number of samples in the microarray data set is very small compared to the number of genes as features. To reduce dimensionality, selection of significant genes is necessary. An effective method of gene feature selection helps in dimensionality reduction and improves the performance of the sample classification. In this work, we have examined if combination of feature selection methods can improve the performance of classification algorithms. We propose two methods of combination of feature selection techniques. Experimental results suggest that appropriate combination of filter gene selection methods is more effective than individual techniques for microarray data classification. We have compared our combination methods using different learning algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ferreira, A.J., Figueiredo, M.A.T.: Efficient feature selection filters for high dimensional data. Pattern Recogn. Lett. 33, 1794–1804 (2012)
Chan, D., Bridges, S.M., Burgess, S.C.: An Ensemble Method for Identifying Robust Features for Biomarker Discovery, pp. 377–392. Chapman & Hall, Boca Raton (2007)
Chandra, B., Gupta, M.: An efficient statistical feature selection approach for classification of gene expression data. J. Biomed. Inform. 44(4), 529–535 (2011)
Chopra, P., Lee, J., Kang, J., Lee, S.: Improving cancer classification accuracy using gene pairs. PLoS ONE 5(12), e14305 (2010)
Deegalla, S., Bostrom, H.: Improving fusion of dimensionality reduction methods for nearest neighbor classification. In: Proceedings of the 12th International Conference on Information Fusion, pp. 460–465 (2009)
Fawcett, T.: An introduction to ROC analysis. ROC Anal. Pattern Recogn. 27, 861–874 (2006)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Han, F., Sun, W., Ling, Q.H.: A novel strategy for gene selection of microarray data based on gene-to-class sensitivity information. PLoS ONE 9(5), e97530 (2014)
Hand, D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn. 77, 103–123 (2009)
Dutkowski, J., Gambin, A.: On consensus biomarker selection. BMC Bioinform. 8(Suppl. 5), S5 (2007)
Jin, C.L., Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: Proceedings of 18th International Conference on Artificial Intelligence, pp. 329–341 (2003)
Keedwell, E.C., Narayanan, A.: Intelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems. Wiley, London (2005)
Kolde, R., Laur, S., Adler, P., Vilo, J.: Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4), 573–580 (2012)
Mamitsuka, H.: Selecting features in microarray classification using ROC curves. Pattern Recogn. 39, 2393–2404 (2006)
Perez, M.: Machine learning and soft computing approaches to microarray differential expression analysis and feature selection. Ph.D. Thesis 2011, University of the Witwatersrand, Johannesburg (2012)
MathWorks: Bioinformatics Toolbox. MATLAB edn. (2007)
Nguyen, T., Khosravi, A., Creighton, D.: Heirarchical gene selection and genetic fuzzy system for cancer microarray data classification. PLoS ONE 10(3), e0120364 (2015)
Yang, P., Yang, Y.H., Zhou, B.B., Zomaya, A.Y.: A review of ensemble methods in bioinformatics. Curr. Bioinform. 5(4), 296–308 (2010)
Yang, P., Zhou, B.B., Zhang, Z., Zomaya, A.Y.: A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinform. 11(Suppl. 1), S5 (2010). doi:10.1186/1471-2105-11-S1-S5
Pepe, M.S., Longton, G., Anderson, G.L., Schummer, M.: Selecting differentially expressed genes from microarray experiments. Biometrics 59, 133–142 (2003)
Saeys, Y., Lnza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Saeys, Y., Abeel, T., Peer, Y.: Robust feature selection using ensemble feature selection techniques. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS (LNAI), vol. 5212, pp. 313–325. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_21
Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)
Weka: A multi-task machine learning software. http://www.cs.waikato.ac.nz/ml/weka
Xu, J., Sun, L., Gao, Y., Xu, T.: An ensemble feature selection technique for cancer recognition. Biomed. Mater. Eng. 24(1), 1001–1008 (2014). doi:10.3233/BME-130897
Yang, Y.H., Xiao, Y., Segal, M.R.: Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics 21(7), 1084–1093 (2005)
Peng, Y., Wu, Z., Jiang, J.: A novel feature selection approach for biomedical data classification. J. Biomed. Inform. 43, 15–23 (2010)
Zhang, Z., Yang, P., Wu, X., Zhang, C.: An agent-based hybrid system for microarray data analysis. IEEE Intell. Syst. 24(5), 53–63 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sheela, T., Rangarajan, L. (2017). Combination of Feature Selection Methods for the Effective Classification of Microarray Gene Expression Data. In: Santosh, K., Hangarge, M., Bevilacqua, V., Negi, A. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2016. Communications in Computer and Information Science, vol 709. Springer, Singapore. https://doi.org/10.1007/978-981-10-4859-3_13
Download citation
DOI: https://doi.org/10.1007/978-981-10-4859-3_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4858-6
Online ISBN: 978-981-10-4859-3
eBook Packages: Computer ScienceComputer Science (R0)