Skip to main content
Log in

Toward improving the performance of learning by joining feature selection and ensemble classification techniques: an application for cancer diagnosis

  • Research
  • Published:
Journal of Cancer Research and Clinical Oncology Aims and scope Submit manuscript

Abstract

Introduction

Breast cancer is known as the most common type of cancer in women, and this has raised the importance of its diagnosis in medical science as one of the most important issues. In addition to reducing costs, the diagnosis of benign or malignant breast cancer is very important in determining the treatment method.

Objective

The purpose of this paper is to present a model based on data mining techniques including feature selection and ensemble classification that can accurately predict breast cancer patients in the early stages.

Methodology

The proposed breast cancer detection model is developed by joining Adaptive Differential Evolution (ADE) algorithm for feature selection and Learning Vector Quantization (LVQ) neural network for classification. Our proposed model as ADE–LVQ has the ability to automatically and quickly diagnose breast cancer patients into two classes, benign and malignant. As a new evolutionary approach, ADE performs optimal configuration for LVQ neural network in addition to selecting effective features from breast cancer data. Meanwhile, we configure an ensemble classification technique based on LVQ, which significantly improves the prediction performance.

Results

ADE–LVQ has been analyzed from different perspectives on different datasets from Wisconsin breast cancer database. We apply different approaches to handle missing values and improve data quality on this database. The results of the simulations showed that the ADE–LVQ model is more successful than the equivalent and state-of-the-art models in diagnosing breast cancer patients. Also, ADE–LVQ provides better performance with less complexity, considering feature selection and ensemble learning. In particular, ADE–LVQ improves accuracy (up to 3.4%) and runtime (up to 2.3%) on average compared to the existing best method.

Conclusion

Combined methods based on data mining techniques for breast cancer diagnosis can help doctors in making better decisions for disease treatment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of data and materials

The datasets analyzed during the current study are available in the UCI repository.

References

  • Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 28:753–763

    Article  Google Scholar 

  • Alickovic E, Subasi A (2020) Normalized neural networks for breast cancer classification. In: CMBEBIH 2019: Proceedings of the International Conference on Medical and Biological Engineering, 16–18 May 2019, Banja Luka, Bosnia and Herzegovina. Springer International Publishing, pp. 519–524

  • Aruna S, Rajagopalan SP, Nandakishore LV (2011) Knowledge based analysis of various statistical tools in detecting breast cancer. Comput Sci Inf Technol 2(2011):37–45

    Google Scholar 

  • Bazazeh D, Shubair R (2016) Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In: 2016 5th international conference on electronic devices, systems and applications (ICEDSA). IEEE, pp. 1–4

  • Bilalović O, Avdagić Z (2018) Robust breast cancer classification based on GA optimized ANN and ANFIS-voting structures. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, pp. 0279–0284

  • Cao C, Wang J, Kwok D, Cui F, Zhang Z, Zhao D, Zou Q (2022) webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res 50(D1):D1123–D1130

    Article  PubMed  CAS  Google Scholar 

  • Cheng F, Niu B, Xu N, Zhao X, Ahmad AM (2023) Fault detection and performance recovery design with deferred actuator replacement via a low-computation method. IEEE Trans Autom Sci Eng. https://doi.org/10.1109/TASE.2023.3300723

    Article  Google Scholar 

  • Ed-Daoudy A, Maalmi K (2020) Breast cancer classification with reduced feature set using association rules and support vector machine. Netw Model Anal Health Inform Bioinform 9:1–10

    Article  Google Scholar 

  • Forouzandeh S, Berahmand K, Sheikhpour R, Li Y (2023) A new method for recommendation based on embedding spectral clustering in heterogeneous networks (RESCHet). Expert Syst Appl 231:120699

    Article  Google Scholar 

  • Ganji MF, Abadeh MS (2010) Parallel fuzzy rule learning using an ACO-based algorithm for medical data mining. In: 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA). IEEE, pp. 573–581

  • Guo S, Zhao X, Wang H, Xu N (2023) Distributed consensus of heterogeneous switched nonlinear multiagent systems with input quantization and DoS attacks. Appl Math Comput 456:128127

    Google Scholar 

  • Hakim A, Awale RN (2020) Thermal imaging-an emerging modality for breast cancer detection: a comprehensive review. J Med Syst 44:1–18

    Article  Google Scholar 

  • Huang S, Zong G, Wang H, Zhao X, Alharbi KH (2023) Command filter-based adaptive fuzzy self-triggered control for MIMO nonlinear systems with time-varying full-state constraints. Int J Fuzzy Syst. https://doi.org/10.1007/s40815-023-01560-8

    Article  Google Scholar 

  • Jerez JM, Molina I, García-Laencina PJ, Alba E, Ribelles N, Martín M, Franco L (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115

    Article  PubMed  Google Scholar 

  • Kumar A, Poonkodi M (2019) Comparative study of different machine learning models for breast cancer diagnosis. In: Innovations in soft computing and information technology: proceedings of ICEMIT 2017, vol. 3. Springer, Singapore. pp. 17–25

  • Kumar KS, Sitamahalakshmi T (2016) Performance variation of support vector machine and probabilistic neural network in classification of cancer datasets. Int J Appl Eng Res 11(4):2224–2234

    Google Scholar 

  • Land WH Jr, Verheggen EA (2009) Multiclass primal support vector machines for breast density classification. Int J Comput Biol Drug Des 2(1):21–57

    Article  PubMed  Google Scholar 

  • Lei X, Li Z, Zhong Y, Li S, Chen J, Ke Y, Yu X (2022) Gli1 promotes epithelial–mesenchymal transition and metastasis of non-small cell lung carcinoma by regulating snail transcriptional activity and stability. Acta Pharm Sin B 12(10):3877–3890

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Li Y, Wang H, Zhao X, Xu N (2022) Event-triggered adaptive tracking control for uncertain fractional-order nonstrict-feedback nonlinear systems via command filtering. Int J Robust Nonlinear Control 32(14):7987–8011

    Article  Google Scholar 

  • Li X, Chen X, Rezaeipanah A (2023) Automatic breast cancer diagnosis based on hybrid dimensionality reduction technique and ensemble classification. J Cancer Res Clin Oncol 149:7609–7627

    Article  PubMed  Google Scholar 

  • Liu S, Niu B, Zong G, Zhao X, Xu N (2023) Adaptive neural dynamic-memory event-triggered control of high-order random nonlinear systems with deferred output constraints. IEEE Transactions on Automation Science and Engineering. https://doi.org/10.1109/TASE.2023.3269509

    Article  Google Scholar 

  • Lugo-Reyes SO, Maldonado-Colín G, Murata C (2014) Artificial intelligence to assist clinical diagnosis in medicine. Rev Alerg Mex (tecamachalco, Puebla, Mex: 1993) 61(2):110–120

    Article  Google Scholar 

  • Narvekar SD, Patil A, Patil J, Kudoo S (2019) Prognostication of breast cancer using data mining and machine learning. Int J Adv Res Ideas Innov Technol 5(2):921–924

    Google Scholar 

  • Onan A (2015) A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Syst Appl 42(20):6844–6852

    Article  Google Scholar 

  • Preetha R, Jinny SV (2021) Early diagnose breast cancer with PCA-LDA based FER and neuro-fuzzy classification system. J Ambient Intell Humaniz Comput 12:7195–7204

    Article  Google Scholar 

  • Reiss-Mirzaei M, Ghobaei-Arani M, Esmaeili L (2023) A review on the edge caching mechanisms in the mobile edge computing: a social-aware perspective. Internet Things 22:100690

    Article  Google Scholar 

  • Rezaeipanah A, Ahmadi G (2022) Breast cancer diagnosis using multi-stage weight adjustment in the MLP neural network. Comput J 65(4):788–804

    Article  Google Scholar 

  • Rostami M, Oussalah M, Berahmand K, Farrahi V (2023) Community detection algorithms in healthcare applications: a systematic review. IEEE Access 11:30247–30272

    Article  Google Scholar 

  • Salama GI, Abdelhalim M, Zeid MAE (2012) Breast cancer diagnosis on three different datasets using multi-classifiers. Breast Cancer (WDBC) 32(569):2

    Google Scholar 

  • Sarvestani AS, Safavi AA, Parandeh NM, Salehi M (2010) Predicting breast cancer survivability using data mining techniques. In: 2010 2nd International Conference on Software Technology and Engineering, vol. 2. IEEE, pp. V2–227

  • Sewak M, Vaidya P, Chan CC, Duan ZH (2007) SVM approach to breast cancer classification. In: Second international multi-symposiums on computer and computational sciences (IMSCCS 2007). IEEE, pp. 32–37

  • Shahidinejad A, Ghobaei-Arani M, Masdari M (2021) Resource provisioning using workload clustering in cloud computing environment: a hybrid approach. Clust Comput 24(1):319–342

    Article  Google Scholar 

  • Sheikhpour R, Sarram MA, Sheikhpour R (2016) Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Appl Soft Comput 40:113–131

    Article  Google Scholar 

  • Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl-Based Syst 269:110521

    Article  Google Scholar 

  • Singla S, Ghosh P, Kumari U (2019) Breast cancer detection using genetic algorithm with correlation based feature selection: experiment on different datasets. Int J Comput Sci Eng 7(4):406–410

    Google Scholar 

  • Sood A (2023) Breast cancer detection using neural networks. NEU J Artif Intell Internet Things 1(1):12–18

    Google Scholar 

  • Talatian Azad S, Ahmadi G, Rezaeipanah A (2022) An intelligent ensemble classification method based on multi-layer perceptron neural network and evolutionary algorithms for breast cancer diagnosis. J Exp Theor Artif Intell 34(6):949–969

    Article  Google Scholar 

  • Tan J, Liu L, Li F, Chen Z, Chen GY, Fang F, Guo J, He M, Zhou X (2022) Screening of endocrine disrupting potential of surface waters via an affinity-based biosensor in a rural community in the Yellow River Basin, China. Environ Sci Technol 56(20):14350–14360

    Article  PubMed  CAS  Google Scholar 

  • Tang F, Niu B, Zong G, Zhao X, Xu N (2022) Periodic event-triggered adaptive tracking control design for nonlinear discrete-time systems via reinforcement learning. Neural Networks 154:43–55

    Article  PubMed  Google Scholar 

  • Tang F, Wang H, Zhang L, Xu N, Ahmad AM (2023) Adaptive optimized consensus control for a class of nonlinear multi-agent systems with asymmetric input saturation constraints and hybrid faults. Commun Nonlinear Sci Numer Simul 126:107446

    Article  Google Scholar 

  • Torabi E, Ghobaei-Arani M, Shahidinejad A (2022) Data replica placement approaches in fog computing: a review. Clust Comput 25(5):3561–3589

    Article  Google Scholar 

  • Verma AK, Chakraborty M, Biswas SK (2021) Breast cancer management system using decision tree and neural network. SN Comput Sci 2(3):234

    Article  Google Scholar 

  • Wang H, Zheng B, Yoon SW, Ko HS (2018) A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur J Oper Res 267(2):687–699

    Article  Google Scholar 

  • Wang J, Jiang X, Zhao L, Zuo S, Chen X, Zhang L, Yu XY (2020) Lineage reprogramming of fibroblasts into induced cardiac progenitor cells by CRISPR/Cas9-based transcriptional activators. Acta Pharm Sin B 10(2):313–326

    Article  PubMed  Google Scholar 

  • Werner JC, Fogarty TC (2001) Genetic programming applied to severe diseases diagnosis. In: Proceedings Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP-2001)

  • Wu W, Xu N, Niu B, Zhao X, Ahmad AM (2023) Low-computation adaptive saturated self-triggered tracking control of uncertain networked systems. Electronics 12(13):2771

    Article  Google Scholar 

  • Wutsqa DU, Abadi AM (2022) Breast cancer classification using a hybrid model of fuzzy and neural network. IAENG Int J Comput Sci 49(2):550–557

    Google Scholar 

  • Xue B, Yang Q, Xia K, Li Z, Chen GY, Zhang D, Zhou X (2022) An AuNPs/mesoporous NiO/nickel foam nanocomposite as a miniaturized electrode for heavy metal detection in groundwater. Engineering. https://doi.org/10.1016/j.eng.2022.06.005

    Article  Google Scholar 

  • Zhang L, Deng S, Zhang Y, Peng Q, Li H, Wang P, Yu X (2020) Homotypic targeting delivery of siRNA with artificial cancer cells. Adv Healthc Mater 9(9):1900772

    Article  CAS  Google Scholar 

  • Zhang H, Zou Q, Ju Y, Song C, Chen D (2022) Distance-based support vector machine to predict DNA N6-methyladenine modification. Curr Bioinform 17(5):473–482

    Article  CAS  Google Scholar 

  • Zhang H, Zhao X, Wang H, Zong G, Xu N (2022) Hierarchical sliding-mode surface-based adaptive actor–critic optimal control for switched nonlinear systems with unknown perturbation. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3183991

    Article  PubMed  Google Scholar 

  • Zhao Y, Niu B, Zong G, Zhao X, Alharbi KH (2023) Neural network-based adaptive optimal containment control for non-affine nonlinear multi-agent systems within an identifier-actor-critic framework. J Franklin Inst 360(12):8118–8143

    Article  Google Scholar 

  • Zhou ZH, Jiang Y (2003) Medical diagnosis with C4. 5 rule preceded by artificial neural network ensemble. IEEE Trans Inf Technol Biomed 7(1):37–42

    Article  PubMed  Google Scholar 

Download references

Funding

No external funding was used.

Author information

Authors and Affiliations

Authors

Contributions

All authors accept public responsibility for the content of the work and have an equal contribution to the study.

Corresponding author

Correspondence to Dan Wang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval and consent to participate

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, D. Toward improving the performance of learning by joining feature selection and ensemble classification techniques: an application for cancer diagnosis. J Cancer Res Clin Oncol 149, 16993–17006 (2023). https://doi.org/10.1007/s00432-023-05422-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00432-023-05422-6

Keywords

Navigation