Skip to main content

Advertisement

Log in

Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

This article compares six machine learning (ML) algorithms: Classification and Regression Tree (CART), Support Vector Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbors (KNN), Linear Regression (LR) and Multilayer Perceptron (MLP) on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset by estimating their classification test accuracy, standardized data accuracy and runtime analysis. The main objective of this study is to improve the accuracy of prediction using a new statistical method of feature selection. The data set has 32 features, which are reduced using statistical techniques (mode), and the same measurements as above are applied for comparative studies. In the reduced attribute data subset (12 features), we applied 6 integrated models AdaBoost (AB), Gradient Boosting Classifier (GBC), Random Forest (RF), Extra Tree (ET) Bagging and Extra Gradient Boost (XGB), to minimize the probability of misclassification based on any single induced model. We also apply the stacking classifier (Voting Classifier) ​​to basic learners: Logistic Regression (LR), Decision Tree (DT), Support-vector clustering (SVC), K-Nearest Neighbors (KNN), Random Forest (RF) and Naïve Bays (NB) to find out the accuracy obtained by voting classifier (Meta level). To implement the ML algorithm, the data set is divided in the following manner: 80% is used in the training phase and 20% is used in the test phase. To adjust the classifier, manually assigned hyper-parameters are used. At different stages of classification, all ML algorithms perform best, with test accuracy exceeding 90% especially when it is applied to a data subset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. https://www.nationalbreastcancer.org/about-breast-cancer/, 2019.

  2. Luca M, Kleinberg J, Mullainathan S. Algorithms need managers, too. Brighton: Chapman & Hall Ltd; 2016.

    Google Scholar 

  3. Coiera E. Guide to medical informatics, the Internet and telemedicine. London: Chapman & Hall Ltd; 1997.

    Google Scholar 

  4. Elsayad AM. Predicting the severity of breast masses with ensemble of Bayesian classifiers. J Comput Sci. 2010;6(5):576–84.

    Article  Google Scholar 

  5. Huang M, Hung Y, Chen W. Neural network classifier with entropy based feature selection on breast cancer diagnosis. J Med Syst. 2010;34:865–73. https://doi.org/10.1007/s10916-009-9301-x.

    Article  Google Scholar 

  6. Lavanya D, Rani DK. Analysis of feature selection with classification: Breast cancer datasets. Indian J Comput Sci Eng (IJCSE). 2011;2(5):756–63.

    Google Scholar 

  7. Bekaddour F. A neuro-fuzzy inference model for breast cancer recognition. Int J Comput Sci Inf Technol. 2012;4(5):163–73.

    Google Scholar 

  8. Al-Bahrani R, Agrawal A, Choudhary A (2013) Colon cancer survival prediction using ensemble mining on SEER data. In: Proceeding of IEEE International Conference on Big Data, pp 9–16.

  9. Zheng B, Yoon SW, Lam SS. Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms. Expert Syst Appl. 2014;41(4):1476–82.

    Article  Google Scholar 

  10. Chaurasia V, Pal S. Data Mining techniques: to predict and resolve breast cancer survivability. IJCSMC. 2014;3:10–22.

    Google Scholar 

  11. Zhang L, Li J, Xiao Y, et al. Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision. Sci Rep. 2015;5:11085. https://doi.org/10.1038/srep11085.

    Article  Google Scholar 

  12. Hazra A, Mandal S, Gupta A. Study and analysis of breast cancer cell detection using Naïve Bayes, SVM and ensemble algorithms. Int J Comput Appl. 2016;145(2):0975–8887.

    Google Scholar 

  13. Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L. A knowledge-based system for breast cancer classification using fuzzy logic method. Telemat Inf. 2017;34(4):133–44.

    Article  Google Scholar 

  14. Chaurasia V, Pal S, Tiwari BB. Prediction of benign and malignant breast cancer using data mining techniques. J Algorithms Comput Technol. 2018;12(2):119–26.

    Article  Google Scholar 

  15. Emami N, Pakzad A. A new knowledge-based system for diagnosis of breast cancer by a combination of affinity propagation clustering and firefly algorithm. J AI Data Min. 2018;7:59–68.

    Google Scholar 

  16. Kadam VJ, Jadhav SM, Vijayakumar K. Breast cancer diagnosis using feature ensemble learning based on stacked sparse Autoencoders and Softmax Regression. J Med Syst. 2019;43:263. https://doi.org/10.1007/s10916-019-1397-z.

    Article  Google Scholar 

  17. Saritas M, Yasar A (2019) Performance Analysis of ANN and Naive Bayes classification algorithm for data classification. In: IJISAE, 2019, vol. 7, no. 2, pp. 88–91.

  18. Rahman MA, Muniyandi RC. An enhancement in cancer classification accuracy using a two-step feature selection method based on artificial neural networks with 15 neurons. Symmetry. 2020;12:271.

    Article  Google Scholar 

  19. Dua D, Graff C. UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. 2019.

  20. Batyrshin I. Constructing time series shape association measures: Minkowski distance and data standardization. In: BRICS CCI 2013, Brasil, Porto de Galhinas. 2013. http://arxiv.org/pdf/1311.1958v3.

  21. Kavitha R, Kannan E. An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining. in: IEEE Int. Conf. on Emerging Trends in Engineering Technology and Science (ICETETS), 2016, pp 1–5.

  22. Uysal AK, Gunal S, Ergin S. The impact of feature extraction and selection on SMS spam filtering. Electronics and Electrical Engineering. 2013;19(5):67–72.

    Google Scholar 

  23. Maier O, Wilms M, von der Gablentz J, Krämer UM, Münte TF, Handels H. Extra Tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences. J Neurosci Methods. 2015;240:89–100.

    Article  Google Scholar 

  24. Li L, Cui X, Yu S, Zhang Y, Luo Z, Yang H, Zhou Y, Zheng X. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations. PLoS One. 2014;9:e92863.

    Article  Google Scholar 

  25. Scanlon P, Kennedy IO, Liu Y. Feature extraction approaches to RF fingerprinting for device identification in femtocells. Bell Labs Tech J. 2010;15(3):141–51.

    Article  Google Scholar 

  26. Kwac K, Lee H, Cho M. Non-Gaussian statistics of amide I mode frequency fluctuation of N-methylacetamide in methanol solution: linear and nonlinear vibrational spectra. J Chem Phys. 2004;120:1477–90.

    Article  Google Scholar 

  27. Labatut V, Cherifi H Accuracy measures for the comparison of classifiers. 2012. http://arxiv.org/abs/1207.3790.

  28. Guyon I, Gunn S, Nikravesh M, Zadeh L, editors. Feature extraction, foundations and applications. New York: Springer; 2006.

    Google Scholar 

  29. Araque O, Corcuera-Platas I, Sanchez-Rada JF, Iglesias CA. Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Syst Appl. 2017;77:236–46.

    Article  Google Scholar 

  30. Malmasi S, Dras M. Native language identification with classifier stacking and ensembles. Comput Linguist. 2018;44(3):403–46. https://doi.org/10.1162/coli_a_00323.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saurabh Pal.

Ethics declarations

Conflict of Interest

Authors declare no conflict of Interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Advances in Computational Approaches for Artificial Intelligence, Image Processing, IoT and Cloud Applications” guest edited by Bhanu Prakash K N and M. Shivakumar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaurasia, V., Pal, S. Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer. SN COMPUT. SCI. 1, 270 (2020). https://doi.org/10.1007/s42979-020-00296-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-00296-8

Keywords

Navigation