Abstract
The automatic classification of DNA microarray data is one of the hot topics in the field of bioinformatics, since it is an effective tool for the diagnosis of diseases in patients. The aim of this chapter is to present the most relevant aspects related to the classification of microarrays. We carried out an analysis of the strategies used for the classification of microarray data and a review of the main methods used in the literature. In addition, other related aspects are addressed as the reduction of dimensionality, to try to eliminate redundant information in genes, or the treatment of imbalanced data and missing of data. To conclude, we present an exhaustive review of the main scientific works in journals to show the most successful techniques applied in this discipline as well as the most used datasets to verify their effectiveness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Peng Y (2006) A novel ensemble machine learning for robust microarray data classification. Comput Biol Med 36(6):553–573
Sánchez-Maroño N, Alonso-Betanzos A, García-González P, Bolón-Canedo V (2010) Multiclass classifiers vs multiple binary classifiers using filters for feature selection. In: The 2010 international joint conference on neural networks (IJCNN). IEEE, Piscataway, pp 1–8
Golestani A, Ali Amiri KA, Jahed Motlagh MR (2007) A novel adaptive-boost-based strategy for combining classifiers using diversity concept. In: 6th IEEE/ACIS international conference on computer and information science, 2007, ICIS 2007. IEEE, Piscataway, pp 128–134
Liu Z, Tang D, Cai Y, Wang R, Chen F (2017) A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data. Neurocomputing 266:641–650
Mohapatra P, Chakravarty S, Dash PK (2016) Microarray medical data classification using kernel ridge regression and modified cat swarm optimization based gene selection system. Swarm Evol Comput 28:144–160
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163
Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286
Liu K-H, Zeng Z-H, Ng VTY (2016) A hierarchical ensemble of ECOC for cancer classification based on multi-class microarray data. Inf Sci 349:102–118
Lorena AC, De Carvalho ACPLF, Gama JMP (2008) A review on the combination of binary classifiers in multiclass problems. Artif Intell Rev 30(1–4):19
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth International Group, Belmont
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Van Der Maaten L, Postma E, Van den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71
Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective, vol 453. Springer Science & Business Media, New York
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2008) Feature extraction: foundations and applications, vol 207. Springer, Berlin
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2012) An ensemble of filters and classifiers for microarray data classification. Pattern Recogn 45(1):531–539
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Recent advances and emerging challenges of feature selection in the context of big data. Knowl-Based Syst 86:33–45
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Gan X, Liew AW-C, Yan H (2006) Microarray missing data imputation based on a set theoretic framework and biological knowledge. Nucleic Acids Res 34(5):1608–1619
Xiang Q, Dai X, Deng Y, He C, Wang J, Feng J, Dai Z (2008) Missing value imputation for microarray gene expression data using histone acetylation information. BMC Bioinformatics 9(1):252
Chiu C-C, Chan S-Y, Wang C-C, Wu W-S (2013) Missing value imputation for microarray data: a comprehensive comparison study and a web tool. BMC Syst Biol 7(6):S12
Liew AW-C, Law N-F, Yan H (2011) Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief Bioinform 12(5):498–513
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
Bramer M (2007) Principles of data mining, vol 180. Springer, London
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Boca Raton
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135
Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3):374–380
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2014) Data classification using an ensemble of filters. Neurocomputing 135:13–20
Huerta EB, Duval B, Hao J-K (2010) A hybrid LDA and genetic algorithm for gene selection and classification of microarray data. Neurocomputing 73(13):2375–2383
Cadenas JM, Garrido MC, Martínez R (2013) Feature subset selection filter-wrapper based on low quality data. Expert Syst Appl 40(16):6241–6252
Cao J, Zhang L, Wang B, Li F, Yang J (2015) A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inform 53:381–389
Czajkowski M, Grześ M, Kretowski M (2014) Multi-test decision tree and its application to microarray data classification. Artif Intell Med 61(1):35–44
Deng H, Runger G (2013) Gene selection with guided regularized random forest. Pattern Recogn 46(12):3483–3489
Guo S, Guo D, Chen L, Jiang Q (2016) A centroid-based gene selection method for microarray data classification. J Theor Biol 400:32–41
Jain I, Jain VK, Jain R (2018) Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl Soft Comput 62:203–215
Karimi S, Farrokhnia M (2014) Leukemia and small round blue-cell tumor cancer detection using microarray gene expression data set: Combining data dimension reduction and variable selection technique. Chemom Intell Lab Syst 139:6–14
Pramod Kumar P, Vadakkepat P, Poh LA (2011) Fuzzy-rough discriminative feature selection and classification algorithm, with application to microarray and image datasets. Appl Soft Comput 11(4):3429–3440
Lee K, Man Z, Wang D, Cao Z (2013) Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis. Neural Comput Appl 22(3):457–468
Liu H, Liu L, Zhang H (2010) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43(1):81–87
Nanni L, Lumini A (2011) Wavelet selection for disease classification by DNA microarray data. Expert Syst Appl 38(1):990–995
Porto-Díaz I, Bolón-Canedo V, Alonso-Betanzos A, Fontenla-Romero O (2011) A study of performance on microarray data sets for a classifier based on information theoretic learning. Neural Netw 24(8):888–896
Reboiro-Jato M, Díaz F, Glez-Peña D, Fdez-Riverola F (2014) A novel ensemble of classifiers that use biological relevant gene sets for microarray classification. Appl Soft Comput 17:117–126
Shah M, Marchand M, Corbeil J (2012) Feature selection with conjunctions of decision stumps and learning from microarray data. IEEE Trans Pattern Anal Mach Intell 34(1):174–186
Sharbaf FV, Mosafer S, Moattar MH (2016) A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6):231–238
Zainuddin Z, Ong P (2011) Reliable multiclass cancer classification of microarray gene expression profiles using an improved wavelet neural network. Expert Syst Appl 38(11):13711–13722
Dashtban M, Balafar M (2017) Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109(2):91–107
Ganesh Kumar P, Aruldoss Albert Victoire T, Renukadevi P, Devaraj D (2012) Design of fuzzy expert system for microarray data classification using a novel genetic swarm algorithm. Expert Syst Appl 39(2):1811–1821
Leung Y, Hung Y (2010) A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Trans Comput Biol Bioinform 7(1):108–117
Li HD, Liang YZ, Xu QS, Cao DS, Tan BB, Deng BC, Lin CC (2011) Recipe for uncovering predictive genes using support vector machines based on model population analysis. IEEE/ACM Trans Comput Biol Bioinform 8(6):1633–1641
Liu HC, Peng PC, Hsieh TC, Yeh TC, Lin CJ, Chen CY, Hou JY, Shih LY, Liang DC (2013) Comparison of feature selection methods for cross-laboratory microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 10(3):593–604
Maji P (2011) Fuzzy-rough supervised attribute clustering algorithm and classification of microarray data. IEEE Trans Syst Man Cybern B Cybern 41(1):222–233
Maldonado S, Weber R, Famili F (2014) Feature selection for high-dimensional class-imbalanced data sets using support vector machines. Inf Sci 286:228–246
Nguyen T, Khosravi A, Creighton D, Nahavandi S (2015) A novel aggregate gene selection method for microarray data classification. Pattern Recogn Lett 60–61:16–23
Orsenigo C, Vercellis C (2012) An effective double-bounded tree-connected Isomap algorithm for microarray data classification. Pattern Recogn Lett 33(1):9–16
Tong M, Liu K-H, Xu C, Ju W (2013) An ensemble of SVM classifiers based on gene pairs. Comput Biol Med 43(6):729–737
Wang X, Park T, Carriere KC (2010) Variable selection via combined penalization for high-dimensional data analysis. Comput Stat Data Anal 54(10):2230–2243
Castaño A, Fernández-Navarro F, Hervás-Martínez C, Gutiérrez PA (2011) Neuro-logistic models based on evolutionary generalized radial basis function for the microarray gene expression classification problem. Neural Process Lett 34(2):117–131
Hernández-Lobato D, Hernández-Lobato JM, Suárez A (2010) Expectation propagation for microarray data classification. Pattern Recogn Lett 31(12):1618–1626
Lee C-P, Lin W-S, Chen Y-M, Kuo B-J (2011) Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method. Expert Syst Appl 38(5):4661–4667
Li J, Jia Y, Li W (2011) Adaptive huberized support vector machine and its application to microarray classification. Neural Comput Appl 20(1):123–132
De Paz JF, Bajo J, Vera V, Corchado JM (2011) Microcbr: a case-based reasoning architecture for the classification of microarray data. Appl Soft Comput 11(8):4496–4507
Ocampo-Vega R, Sanchez-Ante G, de Luna MA, Vega R, Falcón-Morales LE, Sossa H (2016) Improving pattern classification of DNA microarray data by using PCA and logistic regression. Intell Data Anal 20(s1):S53–S67
Twala B, Phorah M (2010) Predicting incomplete gene microarray data with the use of supervised learning algorithms. Pattern Recogn Lett 31(13):2061–2069
Wang H, Jing X, Niu B (2017) A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data. Knowl-Based Syst 126:8–19
Zhou P, Hu X, Li P, Wu X (2017) Online feature selection for high-dimensional class-imbalanced data. Knowl-Based Syst 136:187–199
Cheng Q (2010) A sparse learning machine for high-dimensional data with application to microarray gene analysis. IEEE/ACM Trans Comput Biol Bioinform 7(4):636–646
Garro BA, Rodríguez K, Vázquez RA (2016) Classification of DNA microarrays using artificial neural networks and ABC algorithm. Appl Soft Comput 38:548–560
Ghaddar B, Naoum-Sawaya J (2018) High dimensional data classification and feature selection using support vector machines. Eur J Oper Res 265(3):993–1004
Bielza C, Robles V, Larrañaga P (2011) Regularized logistic regression without a penalty term: an application to cancer classification with microarray data. Expert Syst Appl 38(5):5110–5118
Luque-Baena RM, Urda D, Gonzalo Claros M, Franco L, Jerez JM (2014) Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords. J Biomed Inform 49(C):32–44
Fernández-Navarro F, Hervás-Martínez C, Ruiz R, Riquelme JC (2012) Evolutionary generalized radial basis function neural networks for improving prediction accuracy in gene classification using feature selection. Appl Soft Comput 12(6):1787–1800
Wu MY, Dai DQ, Shi Y, Yan H, Zhang XF (2012) Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage. IEEE/ACM Trans Comput Biol Bioinform 9(6):1649–1662
Park CH, Kim SB (2015) Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl 42(5):2336–2342
Alonso-González CJ, Moro-Sancho QI, Simon-Hurtado A, Varela-Arrabal R (2012) Microarray gene expression classification with few genes: criteria to combine attribute selection and classification methods. Expert Syst Appl 39(8):7270–7280
Chakraborty D, Maulik U (2014) Identifying cancer biomarkers from microarray data using feature selection and semisupervised learning. IEEE J Translat Eng Health Med 2:1–11
Debnath R, Kurita T (2010) An evolutionary approach for gene selection and classification of microarray data based on SVM error-bound theories. Biosystems 100(1):39–46
García V, Sánchez JS (2015) Mapping microarray gene expression data into dissimilarity spaces for tumor classification. Inf Sci 294:362–375
Tabakhi S, Najafi A, Ranjbar R, Moradi P (2015) Gene selection for microarray data classification using a novel ant colony optimization. Neurocomputing 168:1024–1036
Lu H, Chen J, Yan K, Jin Q, Xue Y, Gao Z (2017) A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256:56–62
Fan L, Poh K-L, Zhou P (2010) Partition-conditional ICA for Bayesian classification of microarray data. Expert Syst Appl 37(12):8188–8192
Wang A, An N, Chen G, Li L, Alterovitz G (2015) Improving PLS-RFE based gene selection for microarray data classification. Comput Biol Med 62:14–24
Kumar M, Rath SK (2015) Classification of microarray using MapReduce based proximal support vector machine classifier. Knowl-Based Syst 89(C):584–602
Zintzaras E, Kowald A (2010) Forest classification trees and forest support vector machines algorithms: demonstration using microarray data. Comput Biol Med 40(5):519–524
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Sánchez-Maroño, N., Fontenla-Romero, O., Pérez-Sánchez, B. (2019). Classification of Microarray Data. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_8
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9442-7_8
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9441-0
Online ISBN: 978-1-4939-9442-7
eBook Packages: Springer Protocols