Abstract
High-throughput RNA-Sequencing technologies produce large gene expression datasets whose analysis leads to a better understanding and treatment of diseases like cancer. The data’s high dimensionality poses challenges to its computational analysis, which is addressed by applying gene selection. Traditional gene selection methods are based on the data only. In turn, integrative approaches include curated biological information from external knowledge bases in the gene selection process, which improves result accuracy and computational complexity.
This paper presents a framework for comparing knowledge based and computational gene selection. Moreover, a novel integrative method of the automatic combination of both approaches is presented. Results on a cancer dataset show that simple computational methods enriched by external knowledge can compete with complex computational techniques.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Acharya, S., Saha, S., Nikhil, N.: Unsupervised gene selection using biological knowledge: application in sample clustering. BMC Bioinform. 18(1), 513 (2017)
Ang, J.C., et al.: Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(5), 971–989 (2016)
Bellazzi, R., Zupan, B.: Towards knowledge-based gene expression data mining. J. Biomed. Inform. 40(6), 787–802 (2007)
Bolón-Canedo, V., et al.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)
Consortium, G.O., et al.: Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 45(D1), D331–D338 (2017)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinform. 7(1), 3 (2006)
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(02), 185–205 (2005)
Durbin, B.P., et al.: A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics 18(suppl. 1), S105–S110 (2002)
Fang, O.H., Mustapha, N., Sulaiman, M.N.: An integrative gene selection with association analysis for microarray data classification. Intell. Data Anal. 18(4), 739–758 (2014)
Grossman, R.L., et al.: Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375(12), 1109–1112 (2016)
Guyon, I., et al.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)
Hall, M., et al.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: European Conference on Machine Learning, pp. 171–182. Springer (1994)
Kukurba, K.R., Montgomery, S.B.: RNA sequencing and analysis. Cold Spring Harb. Protoc. 2015(11), 951–69 (2015)
Leung, Y., Hung, Y.: A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans. Comput. Biol. Bioinform. 7(1), 108–117 (2010)
Liu, H., Liu, L., Zhang, H.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inform. 43(1), 81–87 (2010)
Mundra, P.A., Rajapakse, J.C.: SVM-RFE with MRMR filter for gene selection. IEEE Trans. Nanobiosci. 9(1), 31–37 (2010)
Ooi, C., Tan, P.: Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1), 37–44 (2003)
Papachristoudis, G., Diplaris, S., Mitkas, P.A.: SoFoCles: feature filtering for microarray classification based on gene ontology. J. Biomed. Inform. 43(1), 1–14 (2010)
Piñero, J., et al.: DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database 2015 (2015)
Qi, J., Tang, J.: Integrating gene ontology into discriminative powers of genes for feature selection in microarray data. In: Proceedings of APGV. ACM (2007)
Quanz, B., Park, M., Huan, J.: Biological pathways as features for microarray data classification. In: International Workshop on Data and Text Mining in Biomedical Informatics, pp. 5–12. ACM (2008)
Raghu, V.K., et al.: Integrated theory-and data-driven feature selection in gene expression data analysis. In: Proceedings of International Conference on Data Engineering, pp. 1525–1532. IEEE (2017)
Sharma, A., Imoto, S., Miyano, S.: A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(3), 754–764 (2012)
Soh, D., et al.: Enabling more sophisticated gene expression analysis for understanding diseases and optimizing treatments. SIGKDD Explor. 9(1), 3–13 (2007)
Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)
Yang, F., Mao, K.: Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(4), 1080–1092 (2011)
Zhao, Z., et al.: An integrative approach to identifying biologically relevant genes. In: Proceedings of SIAM International Conference Data Mining 2010, pp. 838–849. SIAM (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Grasnick, B., Perscheid, C., Uflacker, M. (2019). A Framework for the Automatic Combination and Evaluation of Gene Selection Methods. In: Fdez-Riverola, F., Mohamad, M., Rocha, M., De Paz, J., González, P. (eds) Practical Applications of Computational Biology and Bioinformatics, 12th International Conference. PACBB2018 2018. Advances in Intelligent Systems and Computing, vol 803. Springer, Cham. https://doi.org/10.1007/978-3-319-98702-6_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-98702-6_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98701-9
Online ISBN: 978-3-319-98702-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)