Abstract
Breast cancer is the second most common cancer in women worldwide. The uncontrolled growth of breast cells is called breast cancer. The treatment of human breast cancer is a very critical process, and sometimes certain indicators may produce negative results. To avoid this misleading outcome situation, a reliable and accurate breast cancer diagnosis system must be available. The machine learning (ML) method is a modern and accurate technique that researchers have recently applied to predict and diagnose breast cancer. In this research article, we developed stack-based ensemble techniques and feature selection methods for the comprehensive performance of the algorithm and comparative analysis of breast cancer datasets with reduced attributes and all attributes. In this article, we first take the SVM, k nearest neighbors, Naive Bayes and perceptron as four ML algorithms as sub-models that have been trained and predicted from, and then combine them to make a new model called blending (stacking). Finally, logistic regression is used to predict the stacked model. It is significant that sub-models produce different results that are not correlated predictions. The stacking technique is best when all the sub-models are skillfully combined together. This article uses the five-feature selection technique because it affects the overall performance of the model. Unrelated or moderately related features may adversely affect the behavior of the model. After applying the feature selection method, now we have data set with reduced features as well as all features. We implemented logistic regression on a dataset with all features and a dataset with reduced features. Finally, we see that the dataset with reduced features has got improved accuracy.
Similar content being viewed by others
References
World Health Organization. WHO PEN protocol 4.1: Assessment and referral of women with suspected breast cancer at primary health care, 2010. Available at: http://www.who.int/entity/ncds/management/Protocol4_1_BreastCancerAssessment_and_referral.pdf?ua=1. Accessed 13 Sep 2017.
Aggarwal CC. Data Mining: The Textbook. Switzerland: Springer International Publishing; 2015. https://doi.org/10.1007/978-3-319-14142-8.
Aggarwal C. Outlier ensembles: position paper. ACM SIGKDD Explor Newsl. 2012;14(2):49–58.
Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), Maebashi City, Japan, pp. 115–122. IEEE Computer Society Press, Los Alamitos, 2002.
Guyon I, Steve G, Masoud N, Zadeh LA. Feature extraction: foundations and applications. Vol. 207. Springer, 2008. pp. 1–25.
Wolberg WH, Street WN, Mangasarian OL. Image analysis and machine learning applied to breast cancer diagnosis and prognosis. Anal Quant Cytol Histol. 1995;17(2):77–87.
Abbass H. An evolutionary artificial neural networks approach for breast cancer diagnosis. Artif Intell Med. 2002;25(3):265–81.
Tingting Mu, Nandi AK. Breast cancer detection from FNA using SVM with different parameter tuning systems and SOM-RBF classifier. J Franklin Inst. 2007;344(3):285–311.
Thongkam J, Guandong X, Yanchun Z, Fuchun H. Breast cancer survivability via AdaBoost algorithms. In: Proceedings of the second Australasian workshop on Health data and knowledge management Vol 80, pp. 55–64. 2008.
Ya-Qin L, Wang C, Zhang L. Decision tree based predictive models for breast cancer survivability on imbalanced data. In: 2009 3rd international conference on bioinformatics and biomedical engineering, pp. 1–4. IEEE, 2009.
Murat Karabatak M, Ince C. An expert system for detection of breast cancer based on association rules and neural network. Expert Syst Appl. 2009;36(2):3465–9.
Akay MF. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl. 2009;36(2):3240–7.
Dong C, YiLong Y, XiuKun Y. Detecting malignant patients via modified boosted tree. Science China Information Sciences 53, no. 7, 1369–1378 (2010).
Marcano-Cedeño A, Quintanilla-Domnguez J, Andina D. WBCD breast cancer database classification applying artificial metaplasticity neural network. Expert Syst Appl. 2011;38(8):9573–9.
Salama GI, Abdelhalim M, Zeid MA. Breast cancer diagnosis on three different datasets using multi-classifiers. Breast Cancer (WDBC). 2012;32(569):2.
Chaurasia V, Pal S. Data mining techniques: to predict and resolve breast cancer survivability. Int J Comput Sci Mobile Comput. 2014;3:10–22.
Chaurasia V, Pal S. A novel approach for breast cancer detection using data mining techniques. Int J Innov Res Comput Commun Eng. 2014;2:2456–65.
Vikas C, Pal S. Performance analysis of data mining algorithms for diagnosis and prediction of heart and breast cancer disease. Rev Res. 2014;3:1–13.
Asri H, Mousannif H, Moatassime HA, Noel T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Comput Sci. 2016;83:1064–9.
Chaurasia V, Pal S, Tiwari BB. Prediction of benign and malignant breast cancer using data mining techniques. J Algorithms Comput Technol. 2018;12(2):119–26. https://doi.org/10.1177/1748301818756225 (ISSN (Online):1748-3026, UK).
Ramaswamy S, Rastogi R. Shim K. Efficient algorithms for mining outliers from large datasets. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data. Dallas, USA 427 2000;438.
Borah, Rupam, Sunil Dhimal, and Kalpana Sharma. "Medical Diagnostic Models an Implementation of Machine Learning Techniques for Diagnosis in Breast Cancer Patients." In Advanced Computational and Communication Paradigms, pp. 395–405. Springer, Singapore, 2018.
Shaikh TA, Rashid A. Applying machine learning algorithms for early diagnosis and prediction of breast cancer risk. In: Proceedings of 2nd international conference on communication, computing and networking. Springer, Singapore, 2019.
Sri, MN, Hari Priyanka JSVS, Sailaja D, Ramakrishna Murthy M. A comparative analysis of breast cancer data set using different classification methods. In Smart Intelligent Computing and Applications, pp. 175–81. Springer, Singapore, 2019.
Dutta S, Sujata G, Abhijit S, Rechik P, Rohit P, Rohit R. Cancer prediction based on fuzzy inference system. In: Smart innovations in communication and computational sciences, pp. 127–36. Springer, Singapore, 2019.
Morel D, Singh C, Levy WB. Linearization of excitatory synaptic integration at no extra cost. J Comput Neurosci. 2018;44(2):173–88. https://doi.org/10.1007/s10827-017-0673-5.
Hosmer D. Applied logistic regression. Hoboken New Jersey: Wiley; 2013. (ISBN 978-0470582473).
Saghapour, E, Saeed K, Mohammadreza S. A novel feature ranking method for prediction of cancer stages using proteomics data. PLoS One 12, no. 9 2017; e0184203.
Einicke GA. Maximum-entropy rate selection of features for classifying changes in knee and ankle dynamics during running. IEEE J Biomed Health Inf. 2018;28(4):1097–103.
Kai Han; Yunhe Wang; Chao Zhang; Chao Li; Chao Xu. Autoencoder inspired unsupervised feature selection. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018.
Wolberg, W.H.; Breast cancer Wisconsin (original) data set. Retrieved from http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original). 1992, July 15
Vilalta R, Giraud-Carrier C, Brazdil P, Soares C. Using meta-learning to support data-mining. Intern J Comput Sci Appl. 2004;I(31):31–45.
Funding
No funding was received from any organization.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Computational Biology and Biomedical Informatics” guest-edited by Dhruba Kr Bhattacharyya, Sushmita Mitra and Jugal Kr Kalita.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Chaurasia, V., Pal, S. Stacking-Based Ensemble Framework and Feature Selection Technique for the Detection of Breast Cancer. SN COMPUT. SCI. 2, 67 (2021). https://doi.org/10.1007/s42979-021-00465-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-021-00465-3