Skip to main content
Log in

A Novel Map Reduced Based Parallel Feature Selection and Extreme Learning for Micro Array Cancer Data Classification

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Microarray-based gene expression profiling is an emerging method to predict, classify, diagnose and to treat cancer efficiently. The characteristics of this cancer disease may change frequently which creates large volume of data. In this paper we propose a Novel Map reduced based parallel feature selection and extreme learning for micro array cancer data classification. Initially the gene expression data sets are pre-processed by attribute-wise normalization and setting thresholds on the original data. The second phase uses a wrapper model that uses Adaptive Whale Optimization Algorithm (AWOA) with Nelder–Mead algorithm (NMA) to accomplish the feature (gene) subset selection. Wrapper models are used to describe the selection process of feature sets as a search issue. Here, various combinations are formulated, estimated and compared with other combinations. At last, to demonstrate the effectiveness of the selected genes using the proposed feature selection method, a Regularized Extreme Learning Machine (RELM) classifier is used to classify the gene expression data subsets chosen by AWOA algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Availability of data and material

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Code availability

Custom code.

References

  1. Leung, Y. F., & Cavalieri, D. (2003). Fundamentals of cDNA microarray data analysis. Trends in Genetics, 19(11), 649–659.

    Article  Google Scholar 

  2. Kumar, M., Rath, N. K., & Rath, S. K. (2016). Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier. Journal of Biomedical Informatics., 60, 395–409.

    Article  Google Scholar 

  3. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., & Caligiuri, M. A. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531–537.

    Article  Google Scholar 

  4. Kumar, M., & Rath, S. K. (2015). Classification of microarray using MapReduce based proximal support vector machine classifier. Knowledge-Based Systems., 89, 584–602.

    Article  Google Scholar 

  5. Hernandez, J. C. H., Duval, B., & Hao, J.-K. (2007). A genetic embedded approach for gene selection and classification of microarray data. Evolutionary Computation (pp. 90–101). Springer.

    Google Scholar 

  6. Peng, Y., Li, W., & Liu, Y. (2006). A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Informatics, 2(301).

  7. Youcong, N., Zhiqiang, Y., & Ruliang, X. (2013). High performance parallel evolutionary algorithm model based on MapReduce framework. International Journal of Computer Application Technology., 46(3), 290–295.

    Article  Google Scholar 

  8. Chen, A. H., & Lin, C. H. (2011). A novel support vector sampling technique to improve classification accuracy and to identify key genes of leukaemia and prostate cancers. Expert Systems with Applications, 38(4), 3209–3219.

    Article  Google Scholar 

  9. Pradipta, M., & Chandra, D. (2012). Relevant and significant supervised gene clusters for Microarray cancer classification. NanoBioscience., 11(2), 161–168.

    Article  Google Scholar 

  10. Kdogan, A., Demiryurek, U., & Banaei-Kashani, F. S. (2010). Voronoi-based geospatial query processing with map reduce. In: Cloud computing technology and science (CloudCom), 2nd international conference on IEEE (pp. 9–16).

  11. Schatz, M. C. (2009). Cloud burst: Highly sensitive read mapping with MapReduce. Bioinformatics, 25(11), 1363–1369.

    Article  Google Scholar 

  12. Ding, W., Lin, C.-T., & Chen, S. (2018). Multiagent-consensus-MapReduce-based attribute reduction using co-evolutionary quantum PSO for big data applications. Neurocomputing, 272, 136–153.

    Article  Google Scholar 

  13. Cho, J.-H., Lee, D., Park, J. H., & Lee, I.-B. (2004). Gene selection and classification from microarray data using kernel machine. Elsevier., 571(1), 93–98.

    Google Scholar 

  14. Caruana, G., Li, M., & Qi, M. A. (2011). MapReduce based parallel SVM for large scale spam filtering Fuzzy systems and knowledge discovery (FSKD). In 2011 8th international conference (Vol. 4, pp. 2659–2662)

  15. Kiran, M., Kumar, A., & Mukherjee, S. P. (2013). Verification and validation of MapReduce program model for parallel support vector machine algorithm on Hadoop. Cluster, 10, 317–325.

    Google Scholar 

  16. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., & Hellerstein, J. M. (2010). Graphlab a new parallel framework for machine learning. Conference on uncertainty in artificial intelligence (UAI)

  17. Ghesmoune, M., Lebbah, M., & Azzag, H. (2015). Micro-batching growing neural gas for clustering data streams using spark streaming. Procedia Computer Science INNS Conference on Big Data 2015Program San Francisco, 53, 158–166

  18. Karau, H., Konwinski, A., Wendell, P., & Zaharia M. (2015). Learning spark: Lightning-fast big data analytics. O’Reilly Media, Incorporated.

  19. Hosseini, B., & Kiani, K. (2018). FWCMR: A scalable and robust fuzzy weighted clustering based on MapReduce with application to microarray gene expression. Expert Systems with Applications., 91, 198–210.

    Article  Google Scholar 

  20. Yan, X., Zhu, Z., & Wu, Q. (2018). Intelligent inversion method for pre-stack seismic big data based on MapReduce. Computers & Geosciences, 110, 81–89.

    Article  Google Scholar 

  21. Chu, C.-T., Kim S., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A., & Olukotun, K. (2007). Map-reduce for machine learning on multicore. Advances in Neural Information Processing Systems, 281–288.

  22. Boeva, V. (2014). Clustering approaches for dealing with multiple DNA microarray datasets. Journal of Computational Science., 5(3), 368–376.

    Article  Google Scholar 

  23. Kumar, M., Rath, N. K., & Swain, A. (2015). Feature selection and classification of microarray data using MapReduce based ANOVA and K-nearest neighbor. Procedia Computer Science, 54, 301–310.

    Article  Google Scholar 

  24. Islam, T., Jeong, B.-S., & Bari, G. (2015). MapReduce based parallel gene selection method. ApplIntell, 42, 147–156.

    Google Scholar 

  25. Mennour, R., & Batouche, M. (2015). Drug discovery for breast cancer based on big data analytics techniques. International Conference on Information & Communication Technology and Accessibility (ICTA), 1–6.

  26. Jenifer, X. R., & Lawrance, R. (2016). An adaptive classification model form microarray analysis using big data. International Conference on Computing Technologies and Intelligent Data Engineering (ICCTIDE'16), 1–5.

  27. Alshamlan, H., Badr, G., & Alohali, Y. (2015). mRMR-ABC: A hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. BioMed Research International.

  28. Alshamlan, H. M., Badr, G. H., & Alohali, Y. A. (2015). Genetic bee colony (GBC) algorithm: A new gene selection method for microarray cancer classification. Computational Biology and Chemistry, 56, 49–60.

    Article  Google Scholar 

  29. Lu, H., Chen, J., Yan, K., Jin, Q., Xue, Y., & Gao, Z. (2017). A hybrid feature selection algorithm for gene expression data classification. Neurocomputing, 256, 56–62.

    Article  Google Scholar 

  30. Salem, H., Attiya, G., & El-Fishawy, N. (2017). Classification of human cancer diseases by gene expression profiles. Applied Soft Computing, 50, 124–134.

    Article  Google Scholar 

  31. Aziz, R., Verma, C. K., & Srivastava, N. (2017). A novel approach for dimension reduction of microarray. Computational Biology and Chemistry, 71, 161–169.

    Article  Google Scholar 

  32. Moradi, P., & Gholampour, M. (2016). A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Applied Soft Computing, 43, 117–130.

    Article  Google Scholar 

  33. Dashtban, M., & Balafar, M. (2017). Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics, 109(2), 91–107.

    Article  Google Scholar 

  34. Dashtban, M., Balafar, M., & Suravajhala, P. (2018). Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics, 110(1), 10–17.

    Article  Google Scholar 

  35. Sharbaf, F. V., Mosafer, S., & Moattar, M. H. (2016). A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics, 107(6), 231–238.

    Article  Google Scholar 

  36. Vural, H., & Subaşı, A. (2015). Data-mining techniques to classify microarray gene expression data using gene selection by SVD and information gain. Modeling of Artificial Intelligence, 2, 171–182.

    Article  Google Scholar 

  37. Kar, S., Sharma, K. D., & Maitra, M. (2015). Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique. Expert Systems with Applications, 42(1), 612–627.

    Article  Google Scholar 

  38. Mirjalili, S., & Lewis, A. (2016). The whale optimization algorithm. Advances in Engineering Software., 95, 51–67.

    Article  Google Scholar 

  39. Huang, G.-B., Zhou, H., Ding, X., & Zhang, R. (2011). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(2), 513–529.

  40. Maronna, R. A. (2011). Robust ridge regression for high-dimensional data. Technometrics, 53(1), 44–53.

    Article  MathSciNet  Google Scholar 

  41. Motieghader, H., Najafi, A., Sadeghi, B., & Masoudi-Nejad, A. (2017). A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata. Informatics in Medicine Unlocked, 9, 246–254.

    Article  Google Scholar 

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Swati Hira.

Ethics declarations

Conflict of interest

Swati Hira and Anita Bai declared that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hira, S., Bai, A. A Novel Map Reduced Based Parallel Feature Selection and Extreme Learning for Micro Array Cancer Data Classification. Wireless Pers Commun 123, 1483–1505 (2022). https://doi.org/10.1007/s11277-021-09196-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-021-09196-3

Keywords

Navigation