Abstract
Healthcare systems around the world are facing huge challenges in responding to trends of the rise of chronic diseases. The objective of our research study is the adaptation of Data Science and its approaches for prediction of various diseases in early stages. In this study we review latest proposed approaches with few limitations and their possible solutions for future work. This study also shows importance of finding significant features that improves results proposed by existing methodologies. This work aimed to build classification models such as Naïve Bayes, Logistic Regression, k-Nearest neighbor, Support vector machine, Decision tree, Random Forest, Artificial neural network, Adaboost, XGBoost and Gradient boosting. The experimental study chooses group of features by means of three feature selection approaches such as Correlation-based selection, Information Gain based selection and Sequential feature selection. Various Machine learning classifiers are applied on these feature subsets and based on their performance best feature subset is selected. Finally, ensemble based Max Voting Classifier is proposed on top of three best performing models. The proposed model produces an enhanced performance label with accuracy score of 99.41%.
Similar content being viewed by others
References
Consoli S, Recupero DR, Petkovic M (2019) Data science for healthcare. Springer International Publishing, Berlin
Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554
Qin J, Chen L, Liu Y, Liu C, Feng C, Chen B (2019) A machine learning methodology for diagnosing chronic kidney disease. IEEE Access 8:20991–21002
Haq AU, Li JP, Memon MH, Malik A, Ahmad T, Ali A, Shahid M (2019) Feature selection based on L1-norm support vector machine and effective recognition system for Parkinson’s disease using voice recordings. IEEE Access 7:37718–37734
Sampath R, Saradha A (2015) Alzheimer’s disease classification using hybrid neuro fuzzy Runge-Kutta (HNFRK) classifier. Res J Appl Sci Eng Technol 10(1):29–34
Fitriyani NL, Syafrudin M, Alfian G, Rhee J (2019) Development of disease prediction model based on ensemble learning approach for diabetes and hypertension. IEEE Access 7:144777–144789
Poudel P, Illanes A, Ataide EJ, Esmaeili N, Balakrishnan S, Friebe M (2019) Thyroid ultrasound texture classification using autoregressive features in conjunction with machine learning approaches. IEEE Access 7:79354–79365
Kour H, Manhas J, Sharma V (2020) Usage and implementation of neuro-fuzzy systems for classification and prediction in the diagnosis of different types of medical disorders: a decade review. Artif Intell Rev 53:4651–4706
Wu W, Zhou H (2017) Data-driven diagnosis of cervical cancer with support vector machine-based approaches. IEEE Access 5:25189–25195
Abdoh SF, Rizka MA, Maghraby FA (2018) Cervical cancer diagnosis using random forest classifier with SMOTE and feature reduction techniques. IEEE Access 6:59475–59485
Meiquan X et al. (2018) Cervical cytology intelligent diagnosis based on object detection technology. In: Proceedings of the 1st Conference on Medical Imaging with Deep Learning (MIDL 2018), Amsterdam, The Netherlands (2018)
Nithya B, Ilango V (2019) Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction. SN Appl Sci 1(6):641
Howell A, Sims AH, Ong KR, Harvie MN, Evans DGR, Clarke RB (2005) Mechanisms of disease: prediction and prevention of breast cancer: cellular and molecular interactions. Nat Clin Pract Oncol 2(12):635–646
Asri H, Mousannif H, Al Moatassime H, Noel T (2016) Using machine learning algorithms for breast cancer risk prediction and diagnosis. Proced Comput Sci 83:1064–1069
Mohandas M, Deriche M, Aliyu SO (2018) Classifiers combination techniques: a comprehensive review. IEEE Access 6:19626–19639
Jain D, Singh V (2018) Feature selection and classification systems for chronic disease prediction: a review. Egypt Inform J 19(3):179–189
Mishra S, Triptahi AR (2019) Platforms oriented business and data analytics in digital ecosystem. Int J Financ Eng 6(04):1950036
Ketu S, Mishra PK (2021) Empirical analysis of machine learning algorithms on imbalance electrocardiogram based arrhythmia dataset for heart disease detection. Arabian Journal for Science and Engineering, pp 1–23
Sengur A (2008) An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases. Expert Syst Appl 35(1–2):214–222
Vijayashree J, Sultana HP (2018) A machine learning framework for feature selection in heart disease classification using improved particle swarm optimization with support vector machine classifier. Program Comput Softw 44(6):388–397
Javeed A, Zhou S, Yongjian L, Qasim I, Noor A, Nour R (2019) An intelligent learning system based on random search algorithm and optimized random forest model for improved heart disease detection. IEEE Access 7:180235–180243
Mishra S, Tripathi AR (2020) IoT Platform Business Model for Innovative Management Systems. Int J Financ Eng (IJFE) 7(03):1–31
Kar S, Majumder DD (2019) A novel approach of mathematical theory of shape and neuro-fuzzy based diagnostic analysis of cervical cancer. Pathol Oncol Res 25(2):777–790
Patil BM, Joshi RC, Toshniwal D (2010) Hybrid prediction model for type-2 diabetic patients. Expert Syst Appl 37(12):8102–8108
Chen T, Shang C, Su P, Antoniou G, Shen Q (2018) Effective diagnosis of diabetes with a decision tree-initialised neuro-fuzzy approach. UK workshop on computational intelligence. Springer, Cham, pp 227–239
Abdullah AS, Selvakumar S (2019) Assessment of the risk factors for type II diabetes using an improved combination of particle swarm optimization and decision trees by evaluation with Fisher’s linear discriminant analysis. Soft Comput 23(20):9995–10017
Tama BA, Rhee KH (2019) Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif Intell Rev 51(3):355–370
Mishra S, Tripathi AR (2021) AI business model: an integrative business approach. J Innov Entrepreneurship 10(1):1–21
Sarwar A, Ali M, Manhas J, Sharma V (2020) Diagnosis of diabetes type-II using hybrid machine learning based ensemble model. Int J Inf Technol 12(2):419–428
Nematzadeh Z, Ibrahim R, Selamat A (2015) Comparative studies on breast cancer classifications with k-fold cross validations using machine learning techniques. In: 2015 10th Asian Control Conference (ASCC), IEEE, pp 1–6
Gayathri BM, Sumathi CP (2016) Comparative study of relevance vector machine with various machine learning techniques used for detecting breast cancer. In: 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), IEEE, pp 1–5
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Muthukaruppan S, Er MJ (2012) A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary artery disease. Expert Syst Appl 39(14):11657–11665
Xu R, Anagnostopoulos GC, Wunsch DC (2007) Multiclass cancer classification using semisupervised ellipsoid ARTMAP and particle swarm optimization with gene expression data. IEEE/ACM Trans Comput Biol Bioinf 4(1):65–77
Mishra S, Tripathi AR (2020) Literature review on business prototypes for digital platform. J Innov Entrepreneurship 9(1):1–19
Ketu S, Mishra PK (2021) Scalable kernel-based SVM classification algorithm on imbalance air quality data for proficient healthcare. Complex & Intelligent Systems, pp 1–19
Mishra S (2018) Financial management and forecasting using business intelligence and big data analytic tools. Int J Financ Eng 5(02):1850011
Ketu S, Mishra PK (2021) Hybrid classification model for eye state detection using electroencephalogram signals. Cognitive Neurodynamics pp 1–18
Karegowda AG, Manjunath AS, Jayaram MA (2010) Comparative study of attribute selection using gain ratio and correlation-based feature selection. Int J Inform Technol Knowl Manag 2(2):271–277
Shailaja K, Seetharamulu B, Jabbar MA (2018) Machine learning in healthcare: a review. In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, pp 910–914
Ketu S, Mishra PK (2021) Enhanced Gaussian process regression-based forecasting model for COVID-19 outbreak and significance of IoT for its detection. Appl Intell 51(3):1492–1512
Ketu S, Mishra PK (2020) A hybrid deep learning model for COVID-19 prediction and current status of clinical trials worldwide. Comput Mater Contin 66(2)
Sharma A, Mishra PK (2020) State-of-the-art in performance metrics and future directions for data science algorithms. J Sci Res 64(2):221–238
Ojha U, Goel S (2017) A study on prediction of breast cancer recurrence using data mining techniques. In: 2017 7th International Conference on Cloud Computing, Data Science and Engineering-Confluence, IEEE, pp 527–530
Acknowledgements
The authors are highly thankful to the editor and reviewers for kind suggestions and critical comments for improving the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sharma, A., Mishra, P.K. Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. Int. j. inf. tecnol. 14, 1949–1960 (2022). https://doi.org/10.1007/s41870-021-00671-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41870-021-00671-5