Skip to main content

Advertisement

Log in

Stacking-Based Ensemble Framework and Feature Selection Technique for the Detection of Breast Cancer

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Breast cancer is the second most common cancer in women worldwide. The uncontrolled growth of breast cells is called breast cancer. The treatment of human breast cancer is a very critical process, and sometimes certain indicators may produce negative results. To avoid this misleading outcome situation, a reliable and accurate breast cancer diagnosis system must be available. The machine learning (ML) method is a modern and accurate technique that researchers have recently applied to predict and diagnose breast cancer. In this research article, we developed stack-based ensemble techniques and feature selection methods for the comprehensive performance of the algorithm and comparative analysis of breast cancer datasets with reduced attributes and all attributes. In this article, we first take the SVM, k nearest neighbors, Naive Bayes and perceptron as four ML algorithms as sub-models that have been trained and predicted from, and then combine them to make a new model called blending (stacking). Finally, logistic regression is used to predict the stacked model. It is significant that sub-models produce different results that are not correlated predictions. The stacking technique is best when all the sub-models are skillfully combined together. This article uses the five-feature selection technique because it affects the overall performance of the model. Unrelated or moderately related features may adversely affect the behavior of the model. After applying the feature selection method, now we have data set with reduced features as well as all features. We implemented logistic regression on a dataset with all features and a dataset with reduced features. Finally, we see that the dataset with reduced features has got improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. World Health Organization. WHO PEN protocol 4.1: Assessment and referral of women with suspected breast cancer at primary health care, 2010. Available at: http://www.who.int/entity/ncds/management/Protocol4_1_BreastCancerAssessment_and_referral.pdf?ua=1. Accessed 13 Sep 2017.

  2. Aggarwal CC. Data Mining: The Textbook. Switzerland: Springer International Publishing; 2015. https://doi.org/10.1007/978-3-319-14142-8.

    Book  MATH  Google Scholar 

  3. Aggarwal C. Outlier ensembles: position paper. ACM SIGKDD Explor Newsl. 2012;14(2):49–58.

    Article  Google Scholar 

  4. Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 115–122. IEEE Computer Society Press, Los Alamitos, 2002.

  5. Guyon I, Steve G, Masoud N, Zadeh LA. Feature extraction: foundations and applications. Vol. 207. Springer, 2008. pp. 1–25.

  6. Wolberg WH, Street WN, Mangasarian OL. Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Anal Quant Cytol Histol. 1995;17(2):77–87.

    Google Scholar 

  7. Abbass H. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med. 2002;25(3):265–81.

    Article  Google Scholar 

  8. Tingting Mu, Nandi AK. Breast cancer detection from FNA using SVM with different parameter tuning systems and SOM-RBF classifier. J Franklin Inst. 2007;344(3):285–311.

    MathSciNet  MATH  Google Scholar 

  9. Thongkam J, Guandong X, Yanchun Z, Fuchun H. Breast cancer survivability via AdaBoost algorithms. In: Proceedings of the second Australasian workshop on Health data and knowledge management Vol 80, pp. 55–64. 2008.

  10. Ya-Qin L, Wang C, Zhang L. Decision tree based predictive models for breast cancer survivability on imbalanced data. In: 2009 3rd international conference on bioinformatics and biomedical engineering, pp. 1–4. IEEE, 2009.

  11. Murat Karabatak M, Ince C. An expert system for detection of breast cancer based on association rules and neural network. Expert Syst Appl. 2009;36(2):3465–9.

    Article  Google Scholar 

  12. Akay MF. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl. 2009;36(2):3240–7.

    Article  Google Scholar 

  13. Dong C, YiLong Y, XiuKun Y. Detecting malignant patients via modified boosted tree. Science China Information Sciences 53, no. 7, 1369–1378 (2010).

  14. Marcano-Cedeño A, Quintanilla-Domnguez J, Andina D. WBCD breast cancer database classification applying artificial metaplasticity neural network. Expert Syst Appl. 2011;38(8):9573–9.

    Article  Google Scholar 

  15. Salama GI, Abdelhalim M, Zeid MA. Breast cancer diagnosis on three different datasets using multi-classifiers. Breast Cancer (WDBC). 2012;32(569):2.

    Google Scholar 

  16. Chaurasia V, Pal S. Data mining techniques: to predict and resolve breast cancer survivability. Int J Comput Sci Mobile Comput. 2014;3:10–22.

    Google Scholar 

  17. Chaurasia V, Pal S. A novel approach for breast cancer detection using data mining techniques. Int J Innov Res Comput Commun Eng. 2014;2:2456–65.

    Google Scholar 

  18. Vikas C, Pal S. Performance analysis of data mining algorithms for diagnosis and prediction of heart and breast cancer disease. Rev Res. 2014;3:1–13.

    Google Scholar 

  19. Asri H, Mousannif H, Moatassime HA, Noel T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput Sci. 2016;83:1064–9.

    Article  Google Scholar 

  20. Chaurasia V, Pal S, Tiwari BB. Prediction of benign and malignant breast cancer using data mining techniques. J Algorithms Comput Technol. 2018;12(2):119–26. https://doi.org/10.1177/1748301818756225 (ISSN (Online):1748-3026, UK).

    Article  Google Scholar 

  21. Ramaswamy S, Rastogi R. Shim K. Efficient algorithms for mining outliers from large datasets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data. Dallas, USA 427 2000;438.

  22. Borah, Rupam, Sunil Dhimal, and Kalpana Sharma. "Medical Diagnostic Models an Implementation of Machine Learning Techniques for Diagnosis in Breast Cancer Patients." In Advanced Computational and Communication Paradigms, pp. 395–405. Springer, Singapore, 2018.

  23. Shaikh TA, Rashid A. Applying machine learning algorithms for early diagnosis and prediction of breast cancer risk. In: Proceedings of 2nd international conference on communication, computing and networking. Springer, Singapore, 2019.

  24. Sri, MN, Hari Priyanka JSVS, Sailaja D, Ramakrishna Murthy M. A comparative analysis of breast cancer data set using different classification methods. In Smart Intelligent Computing and Applications, pp. 175–81. Springer, Singapore, 2019.

  25. Dutta S, Sujata G, Abhijit S, Rechik P, Rohit P, Rohit R. Cancer prediction based on fuzzy inference system. In: Smart innovations in communication and computational sciences, pp. 127–36. Springer, Singapore, 2019.

  26. Morel D, Singh C, Levy WB. Linearization of excitatory synaptic integration at no extra cost. J Comput Neurosci. 2018;44(2):173–88. https://doi.org/10.1007/s10827-017-0673-5.

    Article  MathSciNet  MATH  Google Scholar 

  27. Hosmer D. Applied logistic regression. Hoboken New Jersey: Wiley; 2013. (ISBN 978-0470582473).

    Book  Google Scholar 

  28. Saghapour, E, Saeed K, Mohammadreza S. A novel feature ranking method for prediction of cancer stages using proteomics data. PLoS One 12, no. 9 2017; e0184203.

  29. Einicke GA. Maximum-entropy rate selection of features for classifying changes in knee and ankle dynamics during running. IEEE J Biomed Health Inf. 2018;28(4):1097–103.

    Article  Google Scholar 

  30. Kai Han; Yunhe Wang; Chao Zhang; Chao Li; Chao Xu. Autoencoder inspired unsupervised feature selection. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.

  31. Wolberg, W.H.; Breast cancer Wisconsin (original) data set. Retrieved from http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original). 1992, July 15

  32. Vilalta R, Giraud-Carrier C, Brazdil P, Soares C. Using meta-learning to support data-mining. Intern J Comput Sci Appl. 2004;I(31):31–45.

    MATH  Google Scholar 

Download references

Funding

No funding was received from any organization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saurabh Pal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Computational Biology and Biomedical Informatics” guest-edited by Dhruba Kr Bhattacharyya, Sushmita Mitra and Jugal Kr Kalita.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 12 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chaurasia, V., Pal, S. Stacking-Based Ensemble Framework and Feature Selection Technique for the Detection of Breast Cancer. SN COMPUT. SCI. 2, 67 (2021). https://doi.org/10.1007/s42979-021-00465-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00465-3

Keywords

Navigation