Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer

Chaurasia, Vikas; Pal, Saurabh

doi:10.1007/s42979-020-00296-8

Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer

Original Research
Published: 14 August 2020

Volume 1, article number 270, (2020)
Cite this article

SN Computer Science Aims and scope Submit manuscript

2097 Accesses
36 Citations
1 Altmetric
Explore all metrics

Abstract

This article compares six machine learning (ML) algorithms: Classification and Regression Tree (CART), Support Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbors (KNN), Linear Regression (LR) and Multilayer Perceptron (MLP) on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset by estimating their classification test accuracy, standardized data accuracy and runtime analysis. The main objective of this study is to improve the accuracy of prediction using a new statistical method of feature selection. The data set has 32 features, which are reduced using statistical techniques (mode), and the same measurements as above are applied for comparative studies. In the reduced attribute data subset (12 features), we applied 6 integrated models AdaBoost (AB), Gradient Boosting Classifier (GBC), Random Forest (RF), Extra Tree (ET) Bagging and Extra Gradient Boost (XGB), to minimize the probability of misclassification based on any single induced model. We also apply the stacking classifier (Voting Classifier) to basic learners: Logistic Regression (LR), Decision Tree (DT), Support-vector clustering (SVC), K-Nearest Neighbors (KNN), Random Forest (RF) and Naïve Bays (NB) to find out the accuracy obtained by voting classifier (Meta level). To implement the ML algorithm, the data set is divided in the following manner: 80% is used in the training phase and 20% is used in the test phase. To adjust the classifier, manually assigned hyper-parameters are used. At different stages of classification, all ML algorithms perform best, with test accuracy exceeding 90% especially when it is applied to a data subset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heart Disease Prediction using Machine Learning Techniques

Article 16 October 2020

Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study

Article 30 October 2023

A review on extreme learning machine

Article Open access 22 May 2021

References

https://www.nationalbreastcancer.org/about-breast-cancer/, 2019.
Luca M, Kleinberg J, Mullainathan S. Algorithms need managers, too. Brighton: Chapman & Hall Ltd; 2016.
Google Scholar
Coiera E. Guide to medical informatics, the Internet and telemedicine. London: Chapman & Hall Ltd; 1997.
Google Scholar
Elsayad AM. Predicting the severity of breast masses with ensemble of Bayesian classifiers. J Comput Sci. 2010;6(5):576–84.
Article Google Scholar
Huang M, Hung Y, Chen W. Neural network classifier with entropy based feature selection on breast cancer diagnosis. J Med Syst. 2010;34:865–73. https://doi.org/10.1007/s10916-009-9301-x.
Article Google Scholar
Lavanya D, Rani DK. Analysis of feature selection with classification: Breast cancer datasets. Indian J Comput Sci Eng (IJCSE). 2011;2(5):756–63.
Google Scholar
Bekaddour F. A neuro-fuzzy inference model for breast cancer recognition. Int J Comput Sci Inf Technol. 2012;4(5):163–73.
Google Scholar
Al-Bahrani R, Agrawal A, Choudhary A (2013) Colon cancer survival prediction using ensemble mining on SEER data. In: Proceeding of IEEE International Conference on Big Data, pp 9–16.
Zheng B, Yoon SW, Lam SS. Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl. 2014;41(4):1476–82.
Article Google Scholar
Chaurasia V, Pal S. Data Mining techniques: to predict and resolve breast cancer survivability. IJCSMC. 2014;3:10–22.
Google Scholar
Zhang L, Li J, Xiao Y, et al. Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision. Sci Rep. 2015;5:11085. https://doi.org/10.1038/srep11085.
Article Google Scholar
Hazra A, Mandal S, Gupta A. Study and analysis of breast cancer cell detection using Naïve Bayes, SVM and ensemble algorithms. Int J Comput Appl. 2016;145(2):0975–8887.
Google Scholar
Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L. A knowledge-based system for breast cancer classification using fuzzy logic method. Telemat Inf. 2017;34(4):133–44.
Article Google Scholar
Chaurasia V, Pal S, Tiwari BB. Prediction of benign and malignant breast cancer using data mining techniques. J Algorithms Comput Technol. 2018;12(2):119–26.
Article Google Scholar
Emami N, Pakzad A. A new knowledge-based system for diagnosis of breast cancer by a combination of affinity propagation clustering and firefly algorithm. J AI Data Min. 2018;7:59–68.
Google Scholar
Kadam VJ, Jadhav SM, Vijayakumar K. Breast cancer diagnosis using feature ensemble learning based on stacked sparse Autoencoders and Softmax Regression. J Med Syst. 2019;43:263. https://doi.org/10.1007/s10916-019-1397-z.
Article Google Scholar
Saritas M, Yasar A (2019) Performance Analysis of ANN and Naive Bayes classification algorithm for data classification. In: IJISAE, 2019, vol. 7, no. 2, pp. 88–91.
Rahman MA, Muniyandi RC. An enhancement in cancer classification accuracy using a two-step feature selection method based on artificial neural networks with 15 neurons. Symmetry. 2020;12:271.
Article Google Scholar
Dua D, Graff C. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. 2019.
Batyrshin I. Constructing time series shape association measures: Minkowski distance and data standardization. In: BRICS CCI 2013, Brasil, Porto de Galhinas. 2013. http://arxiv.org/pdf/1311.1958v3.
Kavitha R, Kannan E. An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining. in: IEEE Int. Conf. on Emerging Trends in Engineering Technology and Science (ICETETS), 2016, pp 1–5.
Uysal AK, Gunal S, Ergin S. The impact of feature extraction and selection on SMS spam filtering. Electronics and Electrical Engineering. 2013;19(5):67–72.
Google Scholar
Maier O, Wilms M, von der Gablentz J, Krämer UM, Münte TF, Handels H. Extra Tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences. J Neurosci Methods. 2015;240:89–100.
Article Google Scholar
Li L, Cui X, Yu S, Zhang Y, Luo Z, Yang H, Zhou Y, Zheng X. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations. PLoS One. 2014;9:e92863.
Article Google Scholar
Scanlon P, Kennedy IO, Liu Y. Feature extraction approaches to RF fingerprinting for device identification in femtocells. Bell Labs Tech J. 2010;15(3):141–51.
Article Google Scholar
Kwac K, Lee H, Cho M. Non-Gaussian statistics of amide I mode frequency fluctuation of N-methylacetamide in methanol solution: linear and nonlinear vibrational spectra. J Chem Phys. 2004;120:1477–90.
Article Google Scholar
Labatut V, Cherifi H Accuracy measures for the comparison of classifiers. 2012. http://arxiv.org/abs/1207.3790.
Guyon I, Gunn S, Nikravesh M, Zadeh L, editors. Feature extraction, foundations and applications. New York: Springer; 2006.
Google Scholar
Araque O, Corcuera-Platas I, Sanchez-Rada JF, Iglesias CA. Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Syst Appl. 2017;77:236–46.
Article Google Scholar
Malmasi S, Dras M. Native language identification with classifier stacking and ensembles. Comput Linguist. 2018;44(3):403–46. https://doi.org/10.1162/coli_a_00323.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Applications, VBS Purvanchal University, Jaunpur, India
Vikas Chaurasia & Saurabh Pal

Authors

Vikas Chaurasia
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Pal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saurabh Pal.

Ethics declarations

Conflict of Interest

Authors declare no conflict of Interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chaurasia, V., Pal, S. Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer. SN COMPUT. SCI. 1, 270 (2020). https://doi.org/10.1007/s42979-020-00296-8

Download citation

Received: 29 July 2020
Accepted: 08 August 2020
Published: 14 August 2020
DOI: https://doi.org/10.1007/s42979-020-00296-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer

Abstract

Access this article

Similar content being viewed by others

Heart Disease Prediction using Machine Learning Techniques

Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study

A review on extreme learning machine

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer

Abstract

Access this article

Similar content being viewed by others

Heart Disease Prediction using Machine Learning Techniques

Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study

A review on extreme learning machine

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation