Abstract
Introduction
Breast cancer is known as the most common type of cancer in women, and this has raised the importance of its diagnosis in medical science as one of the most important issues. In addition to reducing costs, the diagnosis of benign or malignant breast cancer is very important in determining the treatment method.
Objective
The purpose of this paper is to present a model based on data mining techniques including feature selection and ensemble classification that can accurately predict breast cancer patients in the early stages.
Methodology
The proposed breast cancer detection model is developed by joining Adaptive Differential Evolution (ADE) algorithm for feature selection and Learning Vector Quantization (LVQ) neural network for classification. Our proposed model as ADE–LVQ has the ability to automatically and quickly diagnose breast cancer patients into two classes, benign and malignant. As a new evolutionary approach, ADE performs optimal configuration for LVQ neural network in addition to selecting effective features from breast cancer data. Meanwhile, we configure an ensemble classification technique based on LVQ, which significantly improves the prediction performance.
Results
ADE–LVQ has been analyzed from different perspectives on different datasets from Wisconsin breast cancer database. We apply different approaches to handle missing values and improve data quality on this database. The results of the simulations showed that the ADE–LVQ model is more successful than the equivalent and state-of-the-art models in diagnosing breast cancer patients. Also, ADE–LVQ provides better performance with less complexity, considering feature selection and ensemble learning. In particular, ADE–LVQ improves accuracy (up to 3.4%) and runtime (up to 2.3%) on average compared to the existing best method.
Conclusion
Combined methods based on data mining techniques for breast cancer diagnosis can help doctors in making better decisions for disease treatment.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00432-023-05422-6/MediaObjects/432_2023_5422_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00432-023-05422-6/MediaObjects/432_2023_5422_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00432-023-05422-6/MediaObjects/432_2023_5422_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00432-023-05422-6/MediaObjects/432_2023_5422_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00432-023-05422-6/MediaObjects/432_2023_5422_Fig5_HTML.png)
Similar content being viewed by others
Availability of data and materials
The datasets analyzed during the current study are available in the UCI repository.
References
Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 28:753–763
Alickovic E, Subasi A (2020) Normalized neural networks for breast cancer classification. In: CMBEBIH 2019: Proceedings of the International Conference on Medical and Biological Engineering, 16–18 May 2019, Banja Luka, Bosnia and Herzegovina. Springer International Publishing, pp. 519–524
Aruna S, Rajagopalan SP, Nandakishore LV (2011) Knowledge based analysis of various statistical tools in detecting breast cancer. Comput Sci Inf Technol 2(2011):37–45
Bazazeh D, Shubair R (2016) Comparative study of machine learning algorithms for breast cancer detection and diagnosis. In: 2016 5th international conference on electronic devices, systems and applications (ICEDSA). IEEE, pp. 1–4
Bilalović O, Avdagić Z (2018) Robust breast cancer classification based on GA optimized ANN and ANFIS-voting structures. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, pp. 0279–0284
Cao C, Wang J, Kwok D, Cui F, Zhang Z, Zhao D, Zou Q (2022) webTWAS: a resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res 50(D1):D1123–D1130
Cheng F, Niu B, Xu N, Zhao X, Ahmad AM (2023) Fault detection and performance recovery design with deferred actuator replacement via a low-computation method. IEEE Trans Autom Sci Eng. https://doi.org/10.1109/TASE.2023.3300723
Ed-Daoudy A, Maalmi K (2020) Breast cancer classification with reduced feature set using association rules and support vector machine. Netw Model Anal Health Inform Bioinform 9:1–10
Forouzandeh S, Berahmand K, Sheikhpour R, Li Y (2023) A new method for recommendation based on embedding spectral clustering in heterogeneous networks (RESCHet). Expert Syst Appl 231:120699
Ganji MF, Abadeh MS (2010) Parallel fuzzy rule learning using an ACO-based algorithm for medical data mining. In: 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA). IEEE, pp. 573–581
Guo S, Zhao X, Wang H, Xu N (2023) Distributed consensus of heterogeneous switched nonlinear multiagent systems with input quantization and DoS attacks. Appl Math Comput 456:128127
Hakim A, Awale RN (2020) Thermal imaging-an emerging modality for breast cancer detection: a comprehensive review. J Med Syst 44:1–18
Huang S, Zong G, Wang H, Zhao X, Alharbi KH (2023) Command filter-based adaptive fuzzy self-triggered control for MIMO nonlinear systems with time-varying full-state constraints. Int J Fuzzy Syst. https://doi.org/10.1007/s40815-023-01560-8
Jerez JM, Molina I, García-Laencina PJ, Alba E, Ribelles N, Martín M, Franco L (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115
Kumar A, Poonkodi M (2019) Comparative study of different machine learning models for breast cancer diagnosis. In: Innovations in soft computing and information technology: proceedings of ICEMIT 2017, vol. 3. Springer, Singapore. pp. 17–25
Kumar KS, Sitamahalakshmi T (2016) Performance variation of support vector machine and probabilistic neural network in classification of cancer datasets. Int J Appl Eng Res 11(4):2224–2234
Land WH Jr, Verheggen EA (2009) Multiclass primal support vector machines for breast density classification. Int J Comput Biol Drug Des 2(1):21–57
Lei X, Li Z, Zhong Y, Li S, Chen J, Ke Y, Yu X (2022) Gli1 promotes epithelial–mesenchymal transition and metastasis of non-small cell lung carcinoma by regulating snail transcriptional activity and stability. Acta Pharm Sin B 12(10):3877–3890
Li Y, Wang H, Zhao X, Xu N (2022) Event-triggered adaptive tracking control for uncertain fractional-order nonstrict-feedback nonlinear systems via command filtering. Int J Robust Nonlinear Control 32(14):7987–8011
Li X, Chen X, Rezaeipanah A (2023) Automatic breast cancer diagnosis based on hybrid dimensionality reduction technique and ensemble classification. J Cancer Res Clin Oncol 149:7609–7627
Liu S, Niu B, Zong G, Zhao X, Xu N (2023) Adaptive neural dynamic-memory event-triggered control of high-order random nonlinear systems with deferred output constraints. IEEE Transactions on Automation Science and Engineering. https://doi.org/10.1109/TASE.2023.3269509
Lugo-Reyes SO, Maldonado-Colín G, Murata C (2014) Artificial intelligence to assist clinical diagnosis in medicine. Rev Alerg Mex (tecamachalco, Puebla, Mex: 1993) 61(2):110–120
Narvekar SD, Patil A, Patil J, Kudoo S (2019) Prognostication of breast cancer using data mining and machine learning. Int J Adv Res Ideas Innov Technol 5(2):921–924
Onan A (2015) A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Syst Appl 42(20):6844–6852
Preetha R, Jinny SV (2021) Early diagnose breast cancer with PCA-LDA based FER and neuro-fuzzy classification system. J Ambient Intell Humaniz Comput 12:7195–7204
Reiss-Mirzaei M, Ghobaei-Arani M, Esmaeili L (2023) A review on the edge caching mechanisms in the mobile edge computing: a social-aware perspective. Internet Things 22:100690
Rezaeipanah A, Ahmadi G (2022) Breast cancer diagnosis using multi-stage weight adjustment in the MLP neural network. Comput J 65(4):788–804
Rostami M, Oussalah M, Berahmand K, Farrahi V (2023) Community detection algorithms in healthcare applications: a systematic review. IEEE Access 11:30247–30272
Salama GI, Abdelhalim M, Zeid MAE (2012) Breast cancer diagnosis on three different datasets using multi-classifiers. Breast Cancer (WDBC) 32(569):2
Sarvestani AS, Safavi AA, Parandeh NM, Salehi M (2010) Predicting breast cancer survivability using data mining techniques. In: 2010 2nd International Conference on Software Technology and Engineering, vol. 2. IEEE, pp. V2–227
Sewak M, Vaidya P, Chan CC, Duan ZH (2007) SVM approach to breast cancer classification. In: Second international multi-symposiums on computer and computational sciences (IMSCCS 2007). IEEE, pp. 32–37
Shahidinejad A, Ghobaei-Arani M, Masdari M (2021) Resource provisioning using workload clustering in cloud computing environment: a hybrid approach. Clust Comput 24(1):319–342
Sheikhpour R, Sarram MA, Sheikhpour R (2016) Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Appl Soft Comput 40:113–131
Sheikhpour R, Berahmand K, Forouzandeh S (2023) Hessian-based semi-supervised feature selection using generalized uncorrelated constraint. Knowl-Based Syst 269:110521
Singla S, Ghosh P, Kumari U (2019) Breast cancer detection using genetic algorithm with correlation based feature selection: experiment on different datasets. Int J Comput Sci Eng 7(4):406–410
Sood A (2023) Breast cancer detection using neural networks. NEU J Artif Intell Internet Things 1(1):12–18
Talatian Azad S, Ahmadi G, Rezaeipanah A (2022) An intelligent ensemble classification method based on multi-layer perceptron neural network and evolutionary algorithms for breast cancer diagnosis. J Exp Theor Artif Intell 34(6):949–969
Tan J, Liu L, Li F, Chen Z, Chen GY, Fang F, Guo J, He M, Zhou X (2022) Screening of endocrine disrupting potential of surface waters via an affinity-based biosensor in a rural community in the Yellow River Basin, China. Environ Sci Technol 56(20):14350–14360
Tang F, Niu B, Zong G, Zhao X, Xu N (2022) Periodic event-triggered adaptive tracking control design for nonlinear discrete-time systems via reinforcement learning. Neural Networks 154:43–55
Tang F, Wang H, Zhang L, Xu N, Ahmad AM (2023) Adaptive optimized consensus control for a class of nonlinear multi-agent systems with asymmetric input saturation constraints and hybrid faults. Commun Nonlinear Sci Numer Simul 126:107446
Torabi E, Ghobaei-Arani M, Shahidinejad A (2022) Data replica placement approaches in fog computing: a review. Clust Comput 25(5):3561–3589
Verma AK, Chakraborty M, Biswas SK (2021) Breast cancer management system using decision tree and neural network. SN Comput Sci 2(3):234
Wang H, Zheng B, Yoon SW, Ko HS (2018) A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur J Oper Res 267(2):687–699
Wang J, Jiang X, Zhao L, Zuo S, Chen X, Zhang L, Yu XY (2020) Lineage reprogramming of fibroblasts into induced cardiac progenitor cells by CRISPR/Cas9-based transcriptional activators. Acta Pharm Sin B 10(2):313–326
Werner JC, Fogarty TC (2001) Genetic programming applied to severe diseases diagnosis. In: Proceedings Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP-2001)
Wu W, Xu N, Niu B, Zhao X, Ahmad AM (2023) Low-computation adaptive saturated self-triggered tracking control of uncertain networked systems. Electronics 12(13):2771
Wutsqa DU, Abadi AM (2022) Breast cancer classification using a hybrid model of fuzzy and neural network. IAENG Int J Comput Sci 49(2):550–557
Xue B, Yang Q, Xia K, Li Z, Chen GY, Zhang D, Zhou X (2022) An AuNPs/mesoporous NiO/nickel foam nanocomposite as a miniaturized electrode for heavy metal detection in groundwater. Engineering. https://doi.org/10.1016/j.eng.2022.06.005
Zhang L, Deng S, Zhang Y, Peng Q, Li H, Wang P, Yu X (2020) Homotypic targeting delivery of siRNA with artificial cancer cells. Adv Healthc Mater 9(9):1900772
Zhang H, Zou Q, Ju Y, Song C, Chen D (2022) Distance-based support vector machine to predict DNA N6-methyladenine modification. Curr Bioinform 17(5):473–482
Zhang H, Zhao X, Wang H, Zong G, Xu N (2022) Hierarchical sliding-mode surface-based adaptive actor–critic optimal control for switched nonlinear systems with unknown perturbation. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2022.3183991
Zhao Y, Niu B, Zong G, Zhao X, Alharbi KH (2023) Neural network-based adaptive optimal containment control for non-affine nonlinear multi-agent systems within an identifier-actor-critic framework. J Franklin Inst 360(12):8118–8143
Zhou ZH, Jiang Y (2003) Medical diagnosis with C4. 5 rule preceded by artificial neural network ensemble. IEEE Trans Inf Technol Biomed 7(1):37–42
Funding
No external funding was used.
Author information
Authors and Affiliations
Contributions
All authors accept public responsibility for the content of the work and have an equal contribution to the study.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Ethical approval and consent to participate
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, D. Toward improving the performance of learning by joining feature selection and ensemble classification techniques: an application for cancer diagnosis. J Cancer Res Clin Oncol 149, 16993–17006 (2023). https://doi.org/10.1007/s00432-023-05422-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00432-023-05422-6