Abstract
Independent component analysis (ICA) is a powerful concept for reducing the dimension of big data in many applications. It has been used for the feature extraction of microarray gene expression data in numerous works. One of the merits of ICA is that a number of extracted features are always equal to the number of samples. When ICA is applied to microarray data, whenever, it faces the challenges of how to find the best subset of genes (features) from extracted features. To resolve this problem, in this paper, we propose a new (artificial bee colony) ABC-based feature selection approach for microarray data. Our approach is based on two stages: ICA-based extraction approach to reduce the size of data and ABC-based wrapper approach to optimize the reduced feature vectors. To validate our proposed approach, extensive experiments were conducted to compare the performance of ICA + ABC with the results obtained from recently published and other previously suggested methods of gene selection for Naïve Bayes (NB) classifier. To compare the performance of the proposed approach with other algorithms, a statistical hypothesis test was employed with six benchmark cancer classification datasets of the microarray. The experimental result shows that the proposed approach demonstrates an improvement over all the algorithms for NB classifier with a certain level of significance.
Similar content being viewed by others
References
Abo-Hammour Z, Abu Arqub O, Mohammad Momani S, Shawagfeh N (2014) Optimization solution of Troesch’s and Bratu’s problems of ordinary type using novel continuous genetic algorithm. Discrete Dyn Nat Soc 2014. https://doi.org/10.1155/2014/401696
Abu-Mouti FS, El-Hawary ME (2012) Overview of artificial bee colony (ABC) algorithm and its applications. In: Systems conference (SysCon), 2012 IEEE international. IEEE, pp 1–6
Ahmadi MA (2011) Prediction of asphaltene precipitation using artificial neural network optimized by imperialist competitive algorithm. J Pet Explor Prod Technol 1(2–4):99–106
Ahmadi MA (2015a) Connectionist approach estimates gas–oil relative permeability in petroleum reservoirs: application to reservoir simulation. Fuel 140:429–439
Ahmadi MA (2015b) Developing a robust surrogate model of chemical flooding based on the artificial neural network for enhanced oil recovery implications. Math Probl Eng
Ahmadi MA (2016) Toward reliable model for prediction drilling fluid density at wellbore conditions: a LSSVM model. Neurocomputing 211:143–149
Ahmadi M-A, Bahadori A (2015) A LSSVM approach for determining well placement and conning phenomena in horizontal wells. Fuel 153:276–283
Ahmadi MA, Bahadori A (2016) Prediction performance of natural gas dehydration units for water removal efficiency using a least-square support vector machine. Int J Ambient Energy 37(5):486–494
Ahmadi MA, Ebadi M (2014) Evolving smart approach for determination dew point pressure through condensate gas reservoirs. Fuel 117:1074–1084
Ahmadi MA, Golshadi M (2012) Neural network based swarm concept for prediction asphaltene precipitation due to natural depletion. J Pet Sci Eng 98:40–49
Ahmadi MA, Mahmoudi B (2016) Development of robust model to estimate gas–oil interfacial tension using least square support vector machine: experimental and modeling study. J Supercrit Fluids 107:122–128
Ahmadi MA, Shadizadeh SR (2012) New approach for prediction of asphaltene precipitation due to natural depletion by using evolutionary algorithm concept. Fuel 102:716–723
Ahmadi M-A, Ahmadi MR, Hosseini SM, Ebadi M (2014a) Connectionist model predicts the porosity and permeability of petroleum reservoirs by means of petro-physical logs: application of artificial intelligence. J Pet Sci Eng 123:183–200
Ahmadi MA, Ebadi M, Hosseini SM (2014b) Prediction breakthrough time of water coning in the fractured reservoirs by implementing low parameter support vector machine approach. Fuel 117:579–589
Ahmadi MA, Ebadi M, Marghmaleki PS, Fouladi MM (2014c) Evolving predictive model to determine condensate-to-gas ratio in retrograded condensate gas reservoirs. Fuel 124:241–257
Ahmadi MA, Ebadi M, Yazdanpanah A (2014d) Robust intelligent tool for estimating dew point pressure in retrograded condensate gas reservoirs: application of particle swarm optimization. J Pet Sci Eng 123:7–19
Ahmadi MA, Masoumi M, Askarinezhad R (2014e) Evolving connectionist model to monitor the efficiency of an in situ combustion process: application to heavy oil recovery. Energy Technol 2(9–10):811–818
Ahmadi MA, Masumi M, Kharrat R, Mohammadi AH (2014f) Gas analysis by in situ combustion in heavy-oil recovery process: experimental and modeling studies. Chem Eng Technol 37(3):409–418
Ahmadi MA, Soleimani R, Bahadori A (2014g) A computational intelligence scheme for prediction equilibrium water dew point of natural gas in TEG dehydration systems. Fuel 137:145–154
Ahmadi M-A, Bahadori A, Shadizadeh SR (2015a) A rigorous model to predict the amount of dissolved calcium carbonate concentration throughout oil field brines: side effect of pressure and temperature. Fuel 139:154–159
Ahmadi M-A, Pouladi B, Javvi Y, Alfkhani S, Soleimani R (2015b) Connectionist technique estimates H2S solubility in ionic liquids through a low parameter approach. J Supercrit Fluids 97:81–87
Ahmadi M, Hasanvand MZ, Bahadori A (2015c) A LSSVM approach to predict temperature drop accompanying a given pressure drop for the natural gas production and processing systems. Int J Ambient Energy 38:122–129
Ahmadi MA, Ebadi M, Samadi A, Siuki MZ (2015d) Phase equilibrium modeling of clathrate hydrates of carbon dioxide + 1,4-dioxine using intelligent approaches. J Dispers Sci Technol 36(2):236–244
Ahmadi MA, Lee M, Bahadori A (2015e) Prediction of a solid desiccant dehydrator performance using least squares support vector machines algorithm. J Taiwan Inst Chem Eng 50:115–122
Ahmadi MA, Masoumi M, Askarinezhad R (2015f) Evolving smart model to predict the combustion front velocity for in situ combustion. Energy Technol 3(2):128–135
Ahmadi MA, Zahedzadeh M, Shadizadeh SR, Abbassi R (2015g) Connectionist model for predicting minimum gas miscibility pressure: application to gas injection process. Fuel 148:202–211
Ahmadi MH, Ahmadi MA, Sadatsakkak SA, Feidt M (2015h) Connectionist intelligent model estimates output power and torque of stirling engine. Renew Sustain Energy Rev 50:871–883
Akay B, Karaboga D (2009) Parameter tuning for the artificial bee colony algorithm. In: International conference on computational collective intelligence. Springer, pp 608–619
Ali Ahmadi M, Ahmadi A (2016) Applying a sophisticated approach to predict CO2 solubility in brines: application to CO2 sequestration. Int J Low-Carbon Technol 11(3):325–332
Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96(12):6745–6750
Alshamlan H, Badr G, Alohali Y (2015a) mRMR-ABC: a hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. BioMed Res Int 2015. https://doi.org/10.1155/2015/604910
Alshamlan HM, Badr GH, Alohali YA (2015b) Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput Biol Chem 56:49–60
Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, Sallan SE, Lander ES, Golub TR, Korsmeyer SJ (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30(1):41–47
Arqub OA, Abo-Hammour Z (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415
Aziz R, Verma C, Srivastava N (2016) A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data. Genomics Data 8:4–15
Baghban A, Ahmadi MA, Pouladi B, Amanna B (2015) Phase equilibrium modeling of semi-clathrate hydrates of seven commonly gases in the presence of TBAB ionic liquid promoter based on a low parameter connectionist technique. J Supercrit Fluids 101:184–192
Chen J, Huang H, Tian S, Qu Y (2009) Feature selection for text classification with Naïve Bayes. Expert Syst Appl 36(3):5432–5435
Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18
Fan L, Poh K-L, Zhou P (2009) A sequential feature extraction approach for Naïve Bayes classification of microarray data. Expert Syst Appl 36(6):9919–9923
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Lear 29(2–3):131–163
Garro BA, Rodríguez K, Vázquez RA (2016) Classification of DNA microarrays using artificial neural networks and ABC algorithm. Appl Soft Comput 38:548–560
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537
Gordon GJ, Jensen RV, Hsiao L-L, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4967
Hall M (2007) A decision tree-based attribute weighting filter for naive Bayes. Knowl-Based Syst 20(2):120–126
Hsu C-C, Chen M-C, Chen L-S (2010) Integrating independent component analysis and support vector machine for multivariate process monitoring. Comput Ind Eng 59(1):145–156
Huang C-L, Wang C-J (2006) A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl 31(2):231–240
Hyvarinen A, Karhunen J, Oja E (2001) Independent component analysis. Wiley, London
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical Report-tr06, Erciyes University, Engineering Faculty, Computer Engineering Department,
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324
Kong W, Vanderburg CR, Gunshin H, Rogers JT, Huang X (2008) A review of independent component analysis application to microarray gene expression data. Biotechniques 45(5):501
Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, De Schaetzen V, Duque R, Bersini H, Nowe A (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinform (TCBB) 9(4):1106–1119
Lin S-W, Ying K-C, Chen S-C, Lee Z-J (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824
Nutt CL, Mani D, Betensky RA, Tamayo P, Cairncross JG, Ladd C, Pohl U, Hartmann C, McLaughlin ME, Batchelor TT (2003) Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res 63(7):1602–1607
Rabia A, Namita S, Chandan KV (2015a) t-Independent component analysis For SVM classification of DNA-microarray data. Int J Bioinform Res 6(1):305–312
Rabia A, Namita S, Chandan KV (2015b) A weighted-SNR feature selection from independent component subspace for NB classification of microarray data. Int J Adv Biotechnol Res 6(2):245–255
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Sandberg R, Winberg G, Bränden C-I, Kaske A, Ernberg I, Cöster J (2001) Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier. Genome Res 11(8):1404–1409
Shafiei A, Ahmadi MA, Zaheri SH, Baghban A, Amirfakhrian A, Soleimani R (2014) Estimating hydrogen sulfide solubility in ionic liquids using a machine learning approach. J Supercrit Fluids 95:525–534
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Song B, Zhang G, Zhu W, Liang Z (2014) ROC operating point selection for classification of imbalanced data with application to computer-aided polyp detection in CT colonography. Int J Comput Assist Radiol Surg 9(1):79–89
Tabakhi S, Moradi P, Akhlaghian F (2014) An unsupervised feature selection algorithm based on ant colony optimization. Eng Appl Artif Intell 32:112–123
Zar JH (1999) Biostatistical analysis. Pearson Education India, New Delhi
Zhao W, Wang G, H-b Wang, H-l Chen, Dong H, Z-d Zhao (2011) A novel framework for gene selection. Int J Adv Comput Technol 3:184–191
Zheng C-H, Huang D-S, Shang L (2006) Feature selection in independent component subspace for microarray data classification. Neurocomputing 69(16):2407–2410
Zheng C-H, Huang D-S, Kong X-Z, Zhao X-M (2008) Gene expression data classification using consensus independent component analysis. Genomics Proteomics Bioinform 6(2):74–82
Zibakhsh A, Abadeh MS (2013) Gene selection for cancer tumor detection using a novel memetic algorithm with a multi-view fitness function. Eng Appl Artif Intell 26(4):1274–1281
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Rabia Aziz, C. K. Verma, Namita Srivastava declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Musheer, R.A., Verma, C.K. & Srivastava, N. Novel machine learning approach for classification of high-dimensional microarray data. Soft Comput 23, 13409–13421 (2019). https://doi.org/10.1007/s00500-019-03879-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-03879-7