Abstract
Machine Learning (ML) algorithms have been used widely in the domain of medical science especially in classifying clinical data. Random Forest, Decision Tree, K-Nearest Neighbor, Support Vector Machine, Naive Bayes, Logistic Regression, and Multilayer Perceptron are some of the ML algorithms used for classification and prediction of various diseases. This paper reviews 40 recent ML algorithms published for breast cancer classification and breast cancer prediction along with the associated data pre-processing and feature selection techniques. The paper identifies from literature the pre-processing, feature selection steps, and the ML algorithms used for classification and prediction of breast cancer and tabulates them according to the accuracy. The paper also briefs the aspects of three common clinical breast cancer datasets used to train most such ML algorithms. The review helps prospective researchers in identifying different aspects of research in the domain of providing ML solutions from breast cancer datasets using suitable pre-processing and feature selection techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akram M, Iqbal M, Daniyal M, Khan AU (2017) Awareness and current knowledge of breast cancer. Biol Res 50(1):1–23
Kaur A, Kumari C, Bass S (2021) Breast cancer: the role of herbal medication. Modern Phytomorphol 15:6–75
Rajaguru H, Prabhakar SK (2017) Bayesian linear discriminant analysis for breast cancer classification. In: 2017 2nd international conference on communication and electronics systems (ICCES). IEEE, pp 266–269
Sweetlin EJ, Ponraj DN (2021) Comparative performance analysis of various classifiers on a breast cancer clinical dataset. In: Intelligence in big data technologies-beyond the hype. Springer Singapore, pp 509–516
Alzubaidi L, Al-Shamma O, Fadhel MA, Farhan L, Zhang J, Duan Y (2020) Optimizing the performance of breast cancer classification by employing the same domain transfer learning from hybrid deep convolutional neural network model. Electronics 9(3):445
Fondon I, Sarmiento A, Garcia AI, Silvestre M, Eloy C, Polonia A, Aguiar P (2018) Automatic classification of tissue malignancy for breast carcinoma diagnosis. Comput Biol Med 96:41–51
Sweetlin EJ, Saudia S (2021) Exploratory data analysis on breast cancer dataset about survivability and recurrence. In: 3rd international conference on signal processing and communication (ICPSC). IEEE, pp 304–308
Yue W, Wang Z, Chen H, Payne A, Liu X (2018) Machine learning with applications in breast cancer diagnosis and prognosis. Designs 2(2):13
Li Y, Chen Z (2018) Performance evaluation of machine learning methods for breast cancer prediction. Appl Comput Math 7(4):212–216
Kalafi EY, Nor NAM, Taib NA, Ganggayah MD, Town C, Dhillon SK (2019) Machine learning and deep learning approaches in breast cancer survival prediction using clinical data. Folia Biol 65(5/6):212–220
Salehi M, Razmara J, Lotfi S (2020) A novel data mining on breast cancer survivability using MLP ensemble learners. Comput J 63(3):435–447
Eltalhi S, Kutrani (2019) Breast cancer diagnosis and prediction using machine learning and data mining techniques: a review. IOSR J Dental Med Sci 18(4):85–94
Prastyo PH, Paramartha IGY, Pakpahan MSM, Ardiyanto I (2020) Predicting breast cancer: a comparative analysis of machine learning algorithms. Proc Int Conf Sci Eng 3:455–459
Ivaturi A, Singh A, Gunanvitha B, Chethan KS (2020) Soft classification techniques for breast cancer detection and classification. In: 2020 international conference on intelligent engineering and management (ICIEM). IEEE, pp 437–442
Gupta M, Gupta B (2018) A comparative study of breast cancer diagnosis using supervised machine learning techniques. In 2018 second international conference on computing methodologies and communication (ICCMC). IEEE, pp 997–1002
Wu X, Khorshidi HA, Aickelin U, Edib Z, Peate M (2019) Imputation techniques on missing values in breast cancer treatment and fertility data. Health Inf Sci Syst 7(1):1–8
UC Irvine Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets. Last Accessed 19 Oct 2021
SEER Homepage. https://seer.cancer.gov/data-software. Last Accessed 19 Oct 2021
Wolberg WH, Street WN, Heisey DM, Mangasarian OL (1995) Computerized breast cancer diagnosis and prognosis from fine-needle aspirates. Arch Surg 130(5):511–516
Street WN, Wolberg WH, Mangasarian OL (1993) Nuclear feature extraction for breast tumor diagnosis. In: Biomedical image processing and biomedical visualization, international society for optics and photonics, vol 1905, pp 861–870
Guo H, Zhang Q, Nandi KA (2008) Breast cancer detection using genetic programming. In: Proceedings of the first international conference on bio-inspired systems and signal processing, pp 334–341
Higa A (2018) Diagnosis of breast cancer using decision tree and artificial neural network algorithms. Int J Comput Appl Technol Res 7(1):23–27
Solanki YS, Chakrabarti P, Jasinski M, Leonowicz Z, Bolshev V, Vinogradov A, Jasinska E, Gono R, Nami M (2021) A hybrid supervised machine learning classifier system for breast cancer prognosis using feature selection and data imbalance handling approaches. Electronics 10(6):699
Prabadevi B, Deepa N, Krithika LB, Vinod V (2020) Analysis of machine learning algorithms on cancer dataset. In: 2020 international conference on emerging trends in information technology and engineering (ic-ETITE), pp 1–10
Kaklamanis MM, Filippakis MΕ, Touloupos M, Christodoulou K (2019) An experimental comparison of machine learning classification algorithms for breast cancer diagnosis. In: European, mediterranean, and middle eastern conference on information systems. Springer, Cham, pp 18–30
Ray S, AlGhamdi A, Alshouiliy K, Agrawal DP (2020) Selecting features for breast cancer analysis and prediction. In: 2020 international conference on advances in computing and communication engineering (ICACCE). IEEE, pp 1–6
Parhusip HA, Susanto B, Linawati L, Trihandaru S, Sardjono Y, Mugirahayu AS (2020) Classification breast cancer revisited with machine learning. Int J Data Sci 1(1):42–50
Balaraman S (2020) Comparison of classification models for breast cancer identification using Google Colab
Laghmati S, Cherradi B, Tmiri A, Daanouni O, Hamida S (2020) Classification of patients with breast cancer using neighbourhood component analysis and supervised machine learning techniques. In: 2020 3rd international conference on advanced communication technologies and networking (CommNet). IEEE, pp 1–6
Al Bataineh A (2019) A comparative analysis of nonlinear machine learning algorithms for breast cancer detection. Int J Mach Learn Comput 9(3)
Rajamohana SP, Umamaheswari K, Karunya K, Deepika R (2019) Analysis of classification algorithms for breast cancer prediction. In: Data management, analytics and innovation, advances in intelligent systems and computing. Springer
Sathiyanarayanan P, Pavithra S, Saranya MS, Makeswari M (2019) Identification of breast cancer using the decision tree algorithm. In: 2019 IEEE international conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–6
Dhahri H, Al Maghayreh E, Mahmood A, Elkilani W, Faisal Nagi M (2019) Automated breast cancer diagnosis based on machine learning algorithms. J Healthc Eng 1–11
Omondiagbe DA, Veeramani S, Sidhu AS (2019) Machine learning classification techniques for breast cancer diagnosis. In: IOP conference series: materials science and engineering 495
Assiri AS, Nazir S, Velastin SA (2020) Breast tumor classification using an ensemble machine learning method. J Imag 6(6):39
Unal HT, Basciftci F (2019) An empirical comparison of machine learning algorithms for predicting breast cancer. Bilge Int J Sci Technol Res 3(Special Issue):9–20
Al-Shargabi B, Al-Shami F (2019) An experimental study for breast cancer prediction algorithm. In: E-Learning and information systems (Data’19), association for computing machinery, Article 12, pp 1–6
Saygili A (2018) Classification and diagnostic prediction of breast cancers via different classifiers. Int Sci Vocat Stud J 2(2):48–56
Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl 41(4):1476–1482
Mert A, Kilic N, Akan A (2011) Breast cancer classification by using support vector machines with reduced dimension. In: Proceedings ELMAR. IEEE, pp 37–40
Khorshid SF, Abdulazeez AM, Sallow AB (2021) A comparative analysis and predicting for breast cancer detection based on data mining models. Asian J Res Comput Sci 45–59
Ed-daoudy A, Maalmi K (2020) Breast cancer classification with reduced feature set using association rules and support vector machine. Netw Modell Anal Health Inf Bioinf 9:1–10
Shamrat FJM, Raihan MA, Rahman AS, Mahmud I, Akter R (2020) An analysis on breast disease prediction using machine learning approaches. Int J Sci Technol Res 9(02):2450–2455
Kumar V, Mishra BK, Mazzara M, Thanh DNH, Verma A (2020) Prediction of malignant and benign breast cancer: a data mining approach in healthcare applications. In: Advances in data science and management, Lecture Notes on Data Engineering and Communications Technologies. Springer
Islam MM, Haque MR, Iqbal H, Hasan MM, Hasan M, Kabir MN (2020) Breast cancer prediction: a comparative study using machine learning techniques. SN Comput Sci 1(5):1–14
Akbugday B (2019) Classification of breast cancer data using machine learning algorithms. In: Medical Technologies Congress (TIPTEKNO). IEEE, pp 1–4
Bayrak EA, Kirci P, Ensari T (2019) Comparison of machine learning methods for breast cancer diagnosis. In: 2019 scientific meeting on electrical-electronics and biomedical engineering and computer science (EBBT), pp 1–3
Amrane M, Oukid S, Gagaoua I, Ensari T (2018) Breast cancer classification using machine learning. In: Electric electronics, computer science, biomedical engineering’s meeting (EBBT)
Osmanovic A, Halilovic S, Abdel Ilah L, Fojnica A, Gromilic Z (2018) Machine learning techniques for classification of breast cancer. In: World congress on medical physics and biomedical engineering, IFMBE proceedings. Springer
Nemissi M, Salah H, Seridi H (2018) Breast cancer diagnosis using an enhanced extreme learning machine based-neural network. In: 2018 international conference on signal, image, vision and their applications. IEEE, pp 1–4
Singh SN, Thakral S (2018) Using data mining tools for breast cancer prediction and analysis. In: 2018 4th international conference on computing communication and automation (ICCCA). IEEE, pp 1–4
Onan A (2015) A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Syst Appl 42(20):6844–6852
Karabatak M (2015) A new classifier for breast cancer detection based on Naive Bayesian. Measurement 72:32–36
Liou DM, Chang WP (2015) Applying data mining for the analysis of breast cancer data. In: Data mining in clinical medicine, pp 175–189
Simsek S, Kursuncu U, Kibis E, AnisAbdellatif M, Dag A (2019) A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival. In: Expert systems with applications
Rajesh K, Anand S (2012) Analysis of SEER dataset for breast cancer diagnosis using C4.5 classification algorithm. Int J Adv Res Comput Commun Eng 1(2)
Liu Y-Q, Wang C, Zhang L (2009) Decision tree based predictive models for breast cancer survivability on imbalanced data. In: 2009 3rd international conference on bioinformatics and biomedical engineering
Choi JP, Han TH, Park RW (2009) A hybrid Bayesian network model for predicting breast cancer prognosis. J Korean Soc Med Inf 15(1):49
Endo A, Shibata T, Tanaka H (2008) Comparison of seven algorithms to predict breast cancer survival. Biomed Soft Comput Human Sci 13:11–16
Umer Khan M, Pill Choi J, Shin H, Kim M (2008) Predicting breast cancer survivability using fuzzy decision trees for personalized healthcare. In: 30th annual international conference of the IEEE engineering in medicine and biology society
Bellachia A, Guvan E (2006) Predicting breast cancer survivability using data mining techniques. In: Scientific data mining workshop, in conjunction with the 2006 SIAM conference on data mining
Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Annexure-Expansion of Acronyms Mentioned in Sect. 3
Annexure-Expansion of Acronyms Mentioned in Sect. 3
AB-Ada Boost | LR-Logistic Regression |
ANN-Artificial Neural Networks | LSVM-Library for SVM |
AR-Association Rules | LV-Low Variance |
BC-Breast Cancer | MC-Multiclass Classifier |
BF-Best First | MLP-Multilayer Perceptron |
BLR-Bayesian Logistic Regression | MMN-Min–Max Normalization |
BN-Bayes Net | MO-Model Overfitting |
BPN-Back Propagation Network | MV-Missing Values |
BS-Backward Selection | NB-Naive Bayes |
B-SVM-Binary SVM | NCA-Neighborhood Component Analysis |
CART-Classification and Regression Tree | NN-Neural Network |
CFS-Correlation-based Feature Selection | PCA-Principal Component Analysis |
CSE-Consistency-based Subset Evaluation | PCC-Pearson Correlation Coefficient |
DF-Discretization Filters | PNN-Probabilistic Neural Network |
DT-Decision Tree | PO-Parameter Optimization |
DTa-Decision Table | PSO-Particle Swarm Optimization |
ELM-Extreme Learning Machine | PURELIN-Linear Transfer function |
FE-Feature Extraction | QK-Quadratic Kernel |
FDT-Fuzzy Decision Tree | RBF-Radial Basis Function |
F-RISM-Fuzzy Rough Instance selection method | RDF-Random Decision Forest |
F-RNN-Fuzzy Rough Nearest Neighbor | RDT-Random Decision Tree |
FS-Feature Selection | RF-Random Forest |
GA-Genetic Algorithm | RFE-Recursive Feature Elimination |
GB-Gradient Boosting | RRA-Re-Ranking Algorithm |
GNB-Gaussian Naive Bayes | RS-Random Sampling |
GP-Genetic Programming | RT-Random Tree |
GR-Gain Ratio | RUS-Random Under Sampling |
GRS-Greedy Stepwise | SE-Standard Error |
GS-Genetic Search | SEv-Subset Evaluation |
HV-Hard Voting | SGD-Stochastic Gradient Descent |
ID3-Iterative Dichotomiser 3 | SMO-Sequential Minimal Optimization |
IA-Irrelevant Attribute | SMOTE-Synthetic Minority Over-sampling Technique |
ICA-Independent Component Analysis | SS-Standard Scaler |
IG-Information Gain | SV-Soft Voting |
IGAE-Information Gain Attribute Evaluation | SVM-Support Vector Machine |
J-Rip-J-Repeated Incremental Pruning | TANSIG-Hyperbolic Tangent Sigmoid |
KNN-K-Nearest Neighbor | TF-Transfer Function |
K-SVM-Kernel SVM | UFS-Univariate Feature Selection |
LASSO-Least Absolute Shrinkage and Selection Operator | VP-Voted Perceptron |
Lazy-IBK-Instance-Based K-NN | WBFS-Wrapper-Based Feature Selection |
Lazy K-star-KNN Star | WKNN-Weighted KNN |
LDA-Linear Discriminant Analysis | WNB-Weighted Naive Bayes |
LOGSIG-Log Sigmoid Activation Function | WSE-Wrapper Subset Evaluation XGB-Extreme Gradient Boosting |
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sweetlin, E.J., Saudia, S. (2024). A Review of Machine Learning Algorithms on Different Breast Cancer Datasets. In: Borah, M.D., Laiphrakpam, D.S., Auluck, N., Balas, V.E. (eds) Big Data, Machine Learning, and Applications. BigDML 2021. Lecture Notes in Electrical Engineering, vol 1053. Springer, Singapore. https://doi.org/10.1007/978-981-99-3481-2_51
Download citation
DOI: https://doi.org/10.1007/978-981-99-3481-2_51
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3480-5
Online ISBN: 978-981-99-3481-2
eBook Packages: Computer ScienceComputer Science (R0)