Skip to main content

A Review of Machine Learning Algorithms on Different Breast Cancer Datasets

  • Conference paper
  • First Online:
Big Data, Machine Learning, and Applications (BigDML 2021)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1053))

  • 265 Accesses

Abstract

Machine Learning (ML) algorithms have been used widely in the domain of medical science especially in classifying clinical data. Random Forest, Decision Tree, K-Nearest Neighbor, Support Vector Machine, Naive Bayes, Logistic Regression, and Multilayer Perceptron are some of the ML algorithms used for classification and prediction of various diseases. This paper reviews 40 recent ML algorithms published for breast cancer classification and breast cancer prediction along with the associated data pre-processing and feature selection techniques. The paper identifies from literature the pre-processing, feature selection steps, and the ML algorithms used for classification and prediction of breast cancer and tabulates them according to the accuracy. The paper also briefs the aspects of three common clinical breast cancer datasets used to train most such ML algorithms. The review helps prospective researchers in identifying different aspects of research in the domain of providing ML solutions from breast cancer datasets using suitable pre-processing and feature selection techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akram M, Iqbal M, Daniyal M, Khan AU (2017) Awareness and current knowledge of breast cancer. Biol Res 50(1):1–23

    Article  Google Scholar 

  2. Kaur A, Kumari C, Bass S (2021) Breast cancer: the role of herbal medication. Modern Phytomorphol 15:6–75

    Google Scholar 

  3. Rajaguru H, Prabhakar SK (2017) Bayesian linear discriminant analysis for breast cancer classification. In: 2017 2nd international conference on communication and electronics systems (ICCES). IEEE, pp 266–269

    Google Scholar 

  4. Sweetlin EJ, Ponraj DN (2021) Comparative performance analysis of various classifiers on a breast cancer clinical dataset. In: Intelligence in big data technologies-beyond the hype. Springer Singapore, pp 509–516

    Google Scholar 

  5. Alzubaidi L, Al-Shamma O, Fadhel MA, Farhan L, Zhang J, Duan Y (2020) Optimizing the performance of breast cancer classification by employing the same domain transfer learning from hybrid deep convolutional neural network model. Electronics 9(3):445

    Article  Google Scholar 

  6. Fondon I, Sarmiento A, Garcia AI, Silvestre M, Eloy C, Polonia A, Aguiar P (2018) Automatic classification of tissue malignancy for breast carcinoma diagnosis. Comput Biol Med 96:41–51

    Article  Google Scholar 

  7. Sweetlin EJ, Saudia S (2021) Exploratory data analysis on breast cancer dataset about survivability and recurrence. In: 3rd international conference on signal processing and communication (ICPSC). IEEE, pp 304–308

    Google Scholar 

  8. Yue W, Wang Z, Chen H, Payne A, Liu X (2018) Machine learning with applications in breast cancer diagnosis and prognosis. Designs 2(2):13

    Article  Google Scholar 

  9. Li Y, Chen Z (2018) Performance evaluation of machine learning methods for breast cancer prediction. Appl Comput Math 7(4):212–216

    Article  Google Scholar 

  10. Kalafi EY, Nor NAM, Taib NA, Ganggayah MD, Town C, Dhillon SK (2019) Machine learning and deep learning approaches in breast cancer survival prediction using clinical data. Folia Biol 65(5/6):212–220

    Article  Google Scholar 

  11. Salehi M, Razmara J, Lotfi S (2020) A novel data mining on breast cancer survivability using MLP ensemble learners. Comput J 63(3):435–447

    Article  Google Scholar 

  12. Eltalhi S, Kutrani (2019) Breast cancer diagnosis and prediction using machine learning and data mining techniques: a review. IOSR J Dental Med Sci 18(4):85–94

    Google Scholar 

  13. Prastyo PH, Paramartha IGY, Pakpahan MSM, Ardiyanto I (2020) Predicting breast cancer: a comparative analysis of machine learning algorithms. Proc Int Conf Sci Eng 3:455–459

    Google Scholar 

  14. Ivaturi A, Singh A, Gunanvitha B, Chethan KS (2020) Soft classification techniques for breast cancer detection and classification. In: 2020 international conference on intelligent engineering and management (ICIEM). IEEE, pp 437–442

    Google Scholar 

  15. Gupta M, Gupta B (2018) A comparative study of breast cancer diagnosis using supervised machine learning techniques. In 2018 second international conference on computing methodologies and communication (ICCMC). IEEE, pp 997–1002

    Google Scholar 

  16. Wu X, Khorshidi HA, Aickelin U, Edib Z, Peate M (2019) Imputation techniques on missing values in breast cancer treatment and fertility data. Health Inf Sci Syst 7(1):1–8

    Article  Google Scholar 

  17. UC Irvine Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets. Last Accessed 19 Oct 2021

  18. SEER Homepage. https://seer.cancer.gov/data-software. Last Accessed 19 Oct 2021

  19. Wolberg WH, Street WN, Heisey DM, Mangasarian OL (1995) Computerized breast cancer diagnosis and prognosis from fine-needle aspirates. Arch Surg 130(5):511–516

    Article  Google Scholar 

  20. Street WN, Wolberg WH, Mangasarian OL (1993) Nuclear feature extraction for breast tumor diagnosis. In: Biomedical image processing and biomedical visualization, international society for optics and photonics, vol 1905, pp 861–870

    Google Scholar 

  21. Guo H, Zhang Q, Nandi KA (2008) Breast cancer detection using genetic programming. In: Proceedings of the first international conference on bio-inspired systems and signal processing, pp 334–341

    Google Scholar 

  22. Higa A (2018) Diagnosis of breast cancer using decision tree and artificial neural network algorithms. Int J Comput Appl Technol Res 7(1):23–27

    Google Scholar 

  23. Solanki YS, Chakrabarti P, Jasinski M, Leonowicz Z, Bolshev V, Vinogradov A, Jasinska E, Gono R, Nami M (2021) A hybrid supervised machine learning classifier system for breast cancer prognosis using feature selection and data imbalance handling approaches. Electronics 10(6):699

    Google Scholar 

  24. Prabadevi B, Deepa N, Krithika LB, Vinod V (2020) Analysis of machine learning algorithms on cancer dataset. In: 2020 international conference on emerging trends in information technology and engineering (ic-ETITE), pp 1–10

    Google Scholar 

  25. Kaklamanis MM, Filippakis MΕ, Touloupos M, Christodoulou K (2019) An experimental comparison of machine learning classification algorithms for breast cancer diagnosis. In: European, mediterranean, and middle eastern conference on information systems. Springer, Cham, pp 18–30

    Google Scholar 

  26. Ray S, AlGhamdi A, Alshouiliy K, Agrawal DP (2020) Selecting features for breast cancer analysis and prediction. In: 2020 international conference on advances in computing and communication engineering (ICACCE). IEEE, pp 1–6

    Google Scholar 

  27. Parhusip HA, Susanto B, Linawati L, Trihandaru S, Sardjono Y, Mugirahayu AS (2020) Classification breast cancer revisited with machine learning. Int J Data Sci 1(1):42–50

    Article  Google Scholar 

  28. Balaraman S (2020) Comparison of classification models for breast cancer identification using Google Colab

    Google Scholar 

  29. Laghmati S, Cherradi B, Tmiri A, Daanouni O, Hamida S (2020) Classification of patients with breast cancer using neighbourhood component analysis and supervised machine learning techniques. In: 2020 3rd international conference on advanced communication technologies and networking (CommNet). IEEE, pp 1–6

    Google Scholar 

  30. Al Bataineh A (2019) A comparative analysis of nonlinear machine learning algorithms for breast cancer detection. Int J Mach Learn Comput 9(3)

    Google Scholar 

  31. Rajamohana SP, Umamaheswari K, Karunya K, Deepika R (2019) Analysis of classification algorithms for breast cancer prediction. In: Data management, analytics and innovation, advances in intelligent systems and computing. Springer

    Google Scholar 

  32. Sathiyanarayanan P, Pavithra S, Saranya MS, Makeswari M (2019) Identification of breast cancer using the decision tree algorithm. In: 2019 IEEE international conference on system, computation, automation and networking (ICSCAN). IEEE, pp 1–6

    Google Scholar 

  33. Dhahri H, Al Maghayreh E, Mahmood A, Elkilani W, Faisal Nagi M (2019) Automated breast cancer diagnosis based on machine learning algorithms. J Healthc Eng 1–11

    Google Scholar 

  34. Omondiagbe DA, Veeramani S, Sidhu AS (2019) Machine learning classification techniques for breast cancer diagnosis. In: IOP conference series: materials science and engineering 495

    Google Scholar 

  35. Assiri AS, Nazir S, Velastin SA (2020) Breast tumor classification using an ensemble machine learning method. J Imag 6(6):39

    Article  Google Scholar 

  36. Unal HT, Basciftci F (2019) An empirical comparison of machine learning algorithms for predicting breast cancer. Bilge Int J Sci Technol Res 3(Special Issue):9–20

    Google Scholar 

  37. Al-Shargabi B, Al-Shami F (2019) An experimental study for breast cancer prediction algorithm. In: E-Learning and information systems (Data’19), association for computing machinery, Article 12, pp 1–6

    Google Scholar 

  38. Saygili A (2018) Classification and diagnostic prediction of breast cancers via different classifiers. Int Sci Vocat Stud J 2(2):48–56

    Google Scholar 

  39. Zheng B, Yoon SW, Lam SS (2014) Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl 41(4):1476–1482

    Article  Google Scholar 

  40. Mert A, Kilic N, Akan A (2011) Breast cancer classification by using support vector machines with reduced dimension. In: Proceedings ELMAR. IEEE, pp 37–40

    Google Scholar 

  41. Khorshid SF, Abdulazeez AM, Sallow AB (2021) A comparative analysis and predicting for breast cancer detection based on data mining models. Asian J Res Comput Sci 45–59

    Google Scholar 

  42. Ed-daoudy A, Maalmi K (2020) Breast cancer classification with reduced feature set using association rules and support vector machine. Netw Modell Anal Health Inf Bioinf 9:1–10

    Google Scholar 

  43. Shamrat FJM, Raihan MA, Rahman AS, Mahmud I, Akter R (2020) An analysis on breast disease prediction using machine learning approaches. Int J Sci Technol Res 9(02):2450–2455

    Google Scholar 

  44. Kumar V, Mishra BK, Mazzara M, Thanh DNH, Verma A (2020) Prediction of malignant and benign breast cancer: a data mining approach in healthcare applications. In: Advances in data science and management, Lecture Notes on Data Engineering and Communications Technologies. Springer

    Google Scholar 

  45. Islam MM, Haque MR, Iqbal H, Hasan MM, Hasan M, Kabir MN (2020) Breast cancer prediction: a comparative study using machine learning techniques. SN Comput Sci 1(5):1–14

    Article  Google Scholar 

  46. Akbugday B (2019) Classification of breast cancer data using machine learning algorithms. In: Medical Technologies Congress (TIPTEKNO). IEEE, pp 1–4

    Google Scholar 

  47. Bayrak EA, Kirci P, Ensari T (2019) Comparison of machine learning methods for breast cancer diagnosis. In: 2019 scientific meeting on electrical-electronics and biomedical engineering and computer science (EBBT), pp 1–3

    Google Scholar 

  48. Amrane M, Oukid S, Gagaoua I, Ensari T (2018) Breast cancer classification using machine learning. In: Electric electronics, computer science, biomedical engineering’s meeting (EBBT)

    Google Scholar 

  49. Osmanovic A, Halilovic S, Abdel Ilah L, Fojnica A, Gromilic Z (2018) Machine learning techniques for classification of breast cancer. In: World congress on medical physics and biomedical engineering, IFMBE proceedings. Springer

    Google Scholar 

  50. Nemissi M, Salah H, Seridi H (2018) Breast cancer diagnosis using an enhanced extreme learning machine based-neural network. In: 2018 international conference on signal, image, vision and their applications. IEEE, pp 1–4

    Google Scholar 

  51. Singh SN, Thakral S (2018) Using data mining tools for breast cancer prediction and analysis. In: 2018 4th international conference on computing communication and automation (ICCCA). IEEE, pp 1–4

    Google Scholar 

  52. Onan A (2015) A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. Expert Syst Appl 42(20):6844–6852

    Article  Google Scholar 

  53. Karabatak M (2015) A new classifier for breast cancer detection based on Naive Bayesian. Measurement 72:32–36

    Article  Google Scholar 

  54. Liou DM, Chang WP (2015) Applying data mining for the analysis of breast cancer data. In: Data mining in clinical medicine, pp 175–189

    Google Scholar 

  55. Simsek S, Kursuncu U, Kibis E, AnisAbdellatif M, Dag A (2019) A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival. In: Expert systems with applications

    Google Scholar 

  56. Rajesh K, Anand S (2012) Analysis of SEER dataset for breast cancer diagnosis using C4.5 classification algorithm. Int J Adv Res Comput Commun Eng 1(2)

    Google Scholar 

  57. Liu Y-Q, Wang C, Zhang L (2009) Decision tree based predictive models for breast cancer survivability on imbalanced data. In: 2009 3rd international conference on bioinformatics and biomedical engineering

    Google Scholar 

  58. Choi JP, Han TH, Park RW (2009) A hybrid Bayesian network model for predicting breast cancer prognosis. J Korean Soc Med Inf 15(1):49

    Article  Google Scholar 

  59. Endo A, Shibata T, Tanaka H (2008) Comparison of seven algorithms to predict breast cancer survival. Biomed Soft Comput Human Sci 13:11–16

    Google Scholar 

  60. Umer Khan M, Pill Choi J, Shin H, Kim M (2008) Predicting breast cancer survivability using fuzzy decision trees for personalized healthcare. In: 30th annual international conference of the IEEE engineering in medicine and biology society

    Google Scholar 

  61. Bellachia A, Guvan E (2006) Predicting breast cancer survivability using data mining techniques. In: Scientific data mining workshop, in conjunction with the 2006 SIAM conference on data mining

    Google Scholar 

  62. Delen D, Walker G, Kadam A (2005) Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 34(2):113–127

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Jenifer Sweetlin .

Editor information

Editors and Affiliations

Annexure-Expansion of Acronyms Mentioned in Sect. 3

Annexure-Expansion of Acronyms Mentioned in Sect. 3

AB-Ada Boost

LR-Logistic Regression

ANN-Artificial Neural Networks

LSVM-Library for SVM

AR-Association Rules

LV-Low Variance

BC-Breast Cancer

MC-Multiclass Classifier

BF-Best First

MLP-Multilayer Perceptron

BLR-Bayesian Logistic Regression

MMN-Min–Max Normalization

BN-Bayes Net

MO-Model Overfitting

BPN-Back Propagation Network

MV-Missing Values

BS-Backward Selection

NB-Naive Bayes

B-SVM-Binary SVM

NCA-Neighborhood Component Analysis

CART-Classification and Regression Tree

NN-Neural Network

CFS-Correlation-based Feature Selection

PCA-Principal Component Analysis

CSE-Consistency-based Subset Evaluation

PCC-Pearson Correlation Coefficient

DF-Discretization Filters

PNN-Probabilistic Neural Network

DT-Decision Tree

PO-Parameter Optimization

DTa-Decision Table

PSO-Particle Swarm Optimization

ELM-Extreme Learning Machine

PURELIN-Linear Transfer function

FE-Feature Extraction

QK-Quadratic Kernel

FDT-Fuzzy Decision Tree

RBF-Radial Basis Function

F-RISM-Fuzzy Rough Instance selection method

RDF-Random Decision Forest

F-RNN-Fuzzy Rough Nearest Neighbor

RDT-Random Decision Tree

FS-Feature Selection

RF-Random Forest

GA-Genetic Algorithm

RFE-Recursive Feature Elimination

GB-Gradient Boosting

RRA-Re-Ranking Algorithm

GNB-Gaussian Naive Bayes

RS-Random Sampling

GP-Genetic Programming

RT-Random Tree

GR-Gain Ratio

RUS-Random Under Sampling

GRS-Greedy Stepwise

SE-Standard Error

GS-Genetic Search

SEv-Subset Evaluation

HV-Hard Voting

SGD-Stochastic Gradient Descent

ID3-Iterative Dichotomiser 3

SMO-Sequential Minimal Optimization

IA-Irrelevant Attribute

SMOTE-Synthetic Minority Over-sampling Technique

ICA-Independent Component Analysis

SS-Standard Scaler

IG-Information Gain

SV-Soft Voting

IGAE-Information Gain Attribute Evaluation

SVM-Support Vector Machine

J-Rip-J-Repeated Incremental Pruning

TANSIG-Hyperbolic Tangent Sigmoid

KNN-K-Nearest Neighbor

TF-Transfer Function

K-SVM-Kernel SVM

UFS-Univariate Feature Selection

LASSO-Least Absolute Shrinkage and Selection Operator

VP-Voted Perceptron

Lazy-IBK-Instance-Based K-NN

WBFS-Wrapper-Based Feature Selection

Lazy K-star-KNN Star

WKNN-Weighted KNN

LDA-Linear Discriminant Analysis

WNB-Weighted Naive Bayes

LOGSIG-Log Sigmoid Activation Function

WSE-Wrapper Subset Evaluation

XGB-Extreme Gradient Boosting

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sweetlin, E.J., Saudia, S. (2024). A Review of Machine Learning Algorithms on Different Breast Cancer Datasets. In: Borah, M.D., Laiphrakpam, D.S., Auluck, N., Balas, V.E. (eds) Big Data, Machine Learning, and Applications. BigDML 2021. Lecture Notes in Electrical Engineering, vol 1053. Springer, Singapore. https://doi.org/10.1007/978-981-99-3481-2_51

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-3481-2_51

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-3480-5

  • Online ISBN: 978-981-99-3481-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics