An Ensemble Framework Coping with Instability in the Gene Selection Process

Castellanos-Garzón, José A.; Ramos, Juan; López-Sánchez, Daniel; de Paz, Juan F.; Corchado, Juan M.

doi:10.1007/s12539-017-0274-z

An Ensemble Framework Coping with Instability in the Gene Selection Process

Original Research Article
Published: 08 January 2018

Volume 10, pages 12–23, (2018)
Cite this article

Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

José A. Castellanos-Garzón ORCID: orcid.org/0000-0002-9452-1477^1,2,
Juan Ramos¹,
Daniel López-Sánchez¹,
Juan F. de Paz¹ &
…
Juan M. Corchado^1,3

268 Accesses
6 Citations
Explore all metrics

Abstract

This paper proposes an ensemble framework for gene selection, which is aimed at addressing instability problems presented in the gene filtering task. The complex process of gene selection from gene expression data faces different instability problems from the informative gene subsets found by different filter methods. This makes the identification of significant genes by the experts difficult. The instability of results can come from filter methods, gene classifier methods, different datasets of the same disease and multiple valid groups of biomarkers. Even though there is a wide number of proposals, the complexity imposed by this problem remains a challenge today. This work proposes a framework involving five stages of gene filtering to discover biomarkers for diagnosis and classification tasks. This framework performs a process of stable feature selection, facing the problems above and, thus, providing a more suitable and reliable solution for clinical and research purposes. Our proposal involves a process of multistage gene filtering, in which several ensemble strategies for gene selection were added in such a way that different classifiers simultaneously assess gene subsets to face instability. Firstly, we apply an ensemble of recent gene selection methods to obtain diversity in the genes found (stability according to filter methods). Next, we apply an ensemble of known classifiers to filter genes relevant to all classifiers at a time (stability according to classification methods). The achieved results were evaluated in two different datasets of the same disease (pancreatic ductal adenocarcinoma), in search of stability according to the disease, for which promising results were achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on ensemble learning

Article 30 August 2019

Artificial intelligence and machine learning in precision and genomic medicine

Article 15 June 2022

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Bourne PE, Wissig H (2003) Structural bioinformatics. Wiley-Liss Inc, Hoboken
Book Google Scholar
Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16(11):1370–1386
Article Google Scholar
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, deSchaetzen V, Duque R, Bersini H, Nowé A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 9(4):1106–1118
Article Google Scholar
Inza I, Larrañaga P, Blanco R, Cerrolaza A (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Artif Intell Med 31:91–103
Article Google Scholar
Jager J, Sengupta R, Ruzzo W (2003) Improved gene selection for classification of microarrays. In: Pacific symposium on biocomputing (UW CSE Computational Biology Group)
Kumari B, Swarnkar T (2011) Filter versus wrapper feature subset selection in large dimensionality microarray: a review. Int J Comput Sci Inf Technol (IJCSIT) 2(3):1048–1053
Google Scholar
Abeel T, Helleputte T, Van de Peer Y, Dupont P, Saeys Y (2009) Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3):392–398
Article Google Scholar
He Z, Yu W (2010) Stable feature selection for biomarker discovery. Comput Biol Chem 34(4):215–225
Article CAS Google Scholar
Xue B, Zhang M, Browne W, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Article Google Scholar
Yang P, Hwa Y, Zhou B, Zomaya A (2016) A review of ensemble methods in bioinformatics: including stability of feature selection and ensemble feature selection methods. Bioinformatics 4:296–308
Google Scholar
Baruque B, Corchado E, Mata A, Corchado JM (2010) A forecasting solution to the oil spill problem based on a hybrid intelligent system. Inf Sci 180(10):2029–2043
Article Google Scholar
Guyon I (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Google Scholar
Natarajan A, Ravi T (2014) A survey on gene feature selection using microarray data for cancer classification. Int J Comput Sci Commun (IJCSC) 5(1):126–129
Google Scholar
Shraddha S, Anuradha N, Swapnil S (2014) Feature selection techniques and microarray data: a survey. Int J Emerg Technol Adv Eng 4(1):179–183
Google Scholar
Tyagi V, Mishra A (2013) A survey on different feature selection methods for microarray data analysis. Int J Comput Appl 67(16):36–40
Google Scholar
Wang Y, Tetko I, Hall M, Frank E, Facius A, Mayer K, Mewes H (2005) Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem 29:37–46
Article Google Scholar
Liu H, Liu L, Zhang H (2010) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43:81–87
Article CAS Google Scholar
Bol’on-Canedo V, Sánchez-Marońo N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recognit 45:531–539
Article Google Scholar
Das A, Das S, Ghosha A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl Based Syst 118:124–139
Article Google Scholar
Seijo-Pardo B, Porto-Daz I, Boln-Canedo V, Alonso-Betanzos A (2017) Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl Based Syst 123:116–127
Article Google Scholar
Badea L, Herlea V, Olimpia S, Dumitrascu T, Popescu I (2008) Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. Hepato-Gastroenterology 88:2015–2026
Google Scholar
Kota J, Hancock J, Kwon J, Korc M (2017) Pancreatic cancer: stroma and its current and emerging targeted therapies. Cancer Lett 391:38–49
Article CAS Google Scholar
Bhaw-Luximon A, Jhurry D (2015) New avenues for improving pancreatic ductal adenocarcinoma (pdac) treatment: selective stroma depletion combined with nano drug delivery. Cancer Lett 369(2):266–273
Article CAS Google Scholar
Hidalgo M, Cascinu S, Kleeff J, Labianca R, Löhr JM, Neoptolemos J, Real FX, Van Laethem JL, Heinemann V (2015) Addressing the challenges of pancreatic cancer: future directions for improving outcomes. Pancreatology 15(1):8–18
Article Google Scholar
Korc M (2007) Pancreatic cancer-associated stroma production. Am J Surg 194(4):S84–S86
Article CAS Google Scholar
Fang Z, Du R, Cui X (2012) Uniform approximation is more appropriate for Wilcoxon rank-sum test in gene set analysis. PLoS One 7(2):e31,505
Article CAS Google Scholar
Weiss P (2005) Applications of generating functions in nonparametric tests. Math J 9(4):803–823
Google Scholar
Berrar DP, Dubitzky W, Granzow M (2003) A practical approach to microarray data analysis. Kluwer Academic Publishers, New York
Book Google Scholar
Wolters M (2015) A genetic algorithm for fixed-size subset selection. R-Package kofnGA, Version 1.2
Wolters M (2015) A genetic algorithm for selection of fixed-size subsets with application to design problems. J Stat Soft 68(1):1–18
Google Scholar
Kursa M, Rudnicki W (2010) Feature selection with the Boruta package. J Stat Softw 36(11):1–13
Article Google Scholar
Kursa M, Rudnicki W (2016) Wrapper algorithm for all relevant feature selection. Package Boruta, Version 5.1.0. https://m2.icm.edu.pl/boruta/
Mahmoud O, Harrison A, Perperoglou A, Gul A, Khan Z, Metodiev M, Lausen B (2014) A feature selection method for classification within functional genomics experiments based on the proportional overlapping score. BMC Bioinform 15(274):1–20
Google Scholar
Mahmoud O, Harrison A, Perperoglou A, Gul A, Khan Z, Lausen B (2015) propOverlap: feature (gene) selection based on the proportional overlapping scores. R package version 1.0. http://CRAN.R-project.org/package=propOverlap
Ahdesmaki AKS (2010) Feature selection in omics prediction problems using CAT scores and false non-discovery rate control. Ann Appl Stat 4:503–519
Article Google Scholar
Ahdesmaki M, Zuber V, Gibb S, Strimmer K (2015) sda: shrinkage discriminant analysis and CAT score variable selection. R package version 1.3.7. http://CRAN.R-project.org/package=sda
Ishwaran H, Rao J (2005) Spike and slab variable selection: frequentist and Bayesian strategies. Ann Stat 33(2):730–773
Article Google Scholar
Ishwaran H, Rao J, Kogalur UB (2013) spikeslab: prediction and variable selection using spike and slab regression. R package version 1.1.5. http://web.ccs.miami.edu/~hishwaran. http://www.kogalur.com
Friedman J, Hastie T, Tibshirani R (2008) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22. http://www.stanford.edu/~hastie/Papers/glmnet.pdf
Zhou F, Luo Y, Meng Q, Ge R, Mai G, Liu J (2015) Sublasso: gene selection using lasso for microarray data with user-defined genes fixed in model. R-Project, package version 1.0
Flach P (2012) Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, Cambridge
Book Google Scholar
Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou ZH, Steinbach M, Hand D, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37
Article Google Scholar
Vervoort S, Boxtel V, Coffer P (2013) he role of sry-related hmg box transcription factor 4 (sox4) in tumorigenesis and metastasis: friend or foe? Oncogene 32(29):339–409. https://www.ncbi.nlm.nih.gov/pubmed/23246969
Article Google Scholar
Hasegawa S, Nagano H, Konno M, Eguchi H, Tomokuni A, Tomimaru Y, Asaoka T, Wada H, Hama N, Kawamoto K, Marubashi S, Nishida N, Koseki J, Mori M, Doki Y, Ishii H (2016) A crucial epithelial to mesenchymal transition regulator, sox4/ezh2 axis is closely related to the clinical outcome in pancreatic cancer patients. Int J Oncol 48(1):145–152. https://www.ncbi.nlm.nih.gov/pubmed/26648239
Article Google Scholar
Li Q, Hou L, Ding G, Li Y, Wang J, Qian B, Sun J, Wang Q (2015) Kdm6b induces epithelial-mesenchymal transition and enhances clear cell renal cell carcinoma metastasis through the activation of slug. Int J Clin Exp Pathol 8(6):6334–6344. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4525843/
Yamamoto K, Tateishi K, Kudo Y, Sato T, Yamamoto S, Miyabayashi K, Matsusaka K, Asaoka Y, Ijichi H, Hirata Y, Otsuka M, Nakai Y, Isayama H, Ikenoue T, Kurokawa M, Fukayama M, Kokudo N, Omata M, Koike K (2014) Loss of histone demethylase KDM6B enhances aggressiveness of pancreatic cancer through downregulation of c/ebp. Carcinogenesis 35(11):2404–2414. https://www.ncbi.nlm.nih.gov/pubmed/24947179
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

IBSAL/BISITE Research Group, University of Salamanca, Edificio I+D+i, 37007, Salamanca, Spain
José A. Castellanos-Garzón, Juan Ramos, Daniel López-Sánchez, Juan F. de Paz & Juan M. Corchado
CISUC, ECOS Research Group, University of Coimbra, Pólo II-Pinhal de Marrocos, 3030-290, Coimbra, Portugal
José A. Castellanos-Garzón
Osaka Institute of Technology, Osaka, 535-8585, Japan
Juan M. Corchado

Authors

José A. Castellanos-Garzón
View author publications
You can also search for this author in PubMed Google Scholar
Juan Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Daniel López-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Juan F. de Paz
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Corchado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José A. Castellanos-Garzón.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Castellanos-Garzón, J.A., Ramos, J., López-Sánchez, D. et al. An Ensemble Framework Coping with Instability in the Gene Selection Process. Interdiscip Sci Comput Life Sci 10, 12–23 (2018). https://doi.org/10.1007/s12539-017-0274-z

Download citation

Received: 22 August 2017
Revised: 06 November 2017
Accepted: 08 November 2017
Published: 08 January 2018
Issue Date: March 2018
DOI: https://doi.org/10.1007/s12539-017-0274-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Ensemble Framework Coping with Instability in the Gene Selection Process

Abstract

Access this article

Similar content being viewed by others

A survey on ensemble learning

Artificial intelligence and machine learning in precision and genomic medicine

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Ensemble Framework Coping with Instability in the Gene Selection Process

Abstract

Access this article

Similar content being viewed by others

A survey on ensemble learning

Artificial intelligence and machine learning in precision and genomic medicine

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation