Skip to main content

Advertisement

Log in

Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Healthcare systems around the world are facing huge challenges in responding to trends of the rise of chronic diseases. The objective of our research study is the adaptation of Data Science and its approaches for prediction of various diseases in early stages. In this study we review latest proposed approaches with few limitations and their possible solutions for future work. This study also shows importance of finding significant features that improves results proposed by existing methodologies. This work aimed to build classification models such as Naïve Bayes, Logistic Regression, k-Nearest neighbor, Support vector machine, Decision tree, Random Forest, Artificial neural network, Adaboost, XGBoost and Gradient boosting. The experimental study chooses group of features by means of three feature selection approaches such as Correlation-based selection, Information Gain based selection and Sequential feature selection. Various Machine learning classifiers are applied on these feature subsets and based on their performance best feature subset is selected. Finally, ensemble based Max Voting Classifier is proposed on top of three best performing models. The proposed model produces an enhanced performance label with accuracy score of 99.41%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Consoli S, Recupero DR, Petkovic M (2019) Data science for healthcare. Springer International Publishing, Berlin

    Book  Google Scholar 

  2. Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554

    Article  Google Scholar 

  3. Qin J, Chen L, Liu Y, Liu C, Feng C, Chen B (2019) A machine learning methodology for diagnosing chronic kidney disease. IEEE Access 8:20991–21002

    Article  Google Scholar 

  4. Haq AU, Li JP, Memon MH, Malik A, Ahmad T, Ali A, Shahid M (2019) Feature selection based on L1-norm support vector machine and effective recognition system for Parkinson’s disease using voice recordings. IEEE Access 7:37718–37734

    Article  Google Scholar 

  5. Sampath R, Saradha A (2015) Alzheimer’s disease classification using hybrid neuro fuzzy Runge-Kutta (HNFRK) classifier. Res J Appl Sci Eng Technol 10(1):29–34

    Article  Google Scholar 

  6. Fitriyani NL, Syafrudin M, Alfian G, Rhee J (2019) Development of disease prediction model based on ensemble learning approach for diabetes and hypertension. IEEE Access 7:144777–144789

    Article  Google Scholar 

  7. Poudel P, Illanes A, Ataide EJ, Esmaeili N, Balakrishnan S, Friebe M (2019) Thyroid ultrasound texture classification using autoregressive features in conjunction with machine learning approaches. IEEE Access 7:79354–79365

    Article  Google Scholar 

  8. Kour H, Manhas J, Sharma V (2020) Usage and implementation of neuro-fuzzy systems for classification and prediction in the diagnosis of different types of medical disorders: a decade review. Artif Intell Rev 53:4651–4706

    Article  Google Scholar 

  9. Wu W, Zhou H (2017) Data-driven diagnosis of cervical cancer with support vector machine-based approaches. IEEE Access 5:25189–25195

    Article  Google Scholar 

  10. Abdoh SF, Rizka MA, Maghraby FA (2018) Cervical cancer diagnosis using random forest classifier with SMOTE and feature reduction techniques. IEEE Access 6:59475–59485

    Article  Google Scholar 

  11. Meiquan X et al. (2018) Cervical cytology intelligent diagnosis based on object detection technology. In: Proceedings of the 1st Conference on Medical Imaging with Deep Learning (MIDL 2018), Amsterdam, The Netherlands (2018)

  12. Nithya B, Ilango V (2019) Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction. SN Appl Sci 1(6):641

    Article  Google Scholar 

  13. Howell A, Sims AH, Ong KR, Harvie MN, Evans DGR, Clarke RB (2005) Mechanisms of disease: prediction and prevention of breast cancer: cellular and molecular interactions. Nat Clin Pract Oncol 2(12):635–646

    Article  Google Scholar 

  14. Asri H, Mousannif H, Al Moatassime H, Noel T (2016) Using machine learning algorithms for breast cancer risk prediction and diagnosis. Proced Comput Sci 83:1064–1069

    Article  Google Scholar 

  15. Mohandas M, Deriche M, Aliyu SO (2018) Classifiers combination techniques: a comprehensive review. IEEE Access 6:19626–19639

    Article  Google Scholar 

  16. Jain D, Singh V (2018) Feature selection and classification systems for chronic disease prediction: a review. Egypt Inform J 19(3):179–189

    Article  Google Scholar 

  17. Mishra S, Triptahi AR (2019) Platforms oriented business and data analytics in digital ecosystem. Int J Financ Eng 6(04):1950036

    Article  Google Scholar 

  18. Ketu S, Mishra PK (2021) Empirical analysis of machine learning algorithms on imbalance electrocardiogram based arrhythmia dataset for heart disease detection. Arabian Journal for Science and Engineering, pp 1–23

  19. Sengur A (2008) An expert system based on linear discriminant analysis and adaptive neuro-fuzzy inference system to diagnosis heart valve diseases. Expert Syst Appl 35(1–2):214–222

    Article  Google Scholar 

  20. Vijayashree J, Sultana HP (2018) A machine learning framework for feature selection in heart disease classification using improved particle swarm optimization with support vector machine classifier. Program Comput Softw 44(6):388–397

    Article  Google Scholar 

  21. Javeed A, Zhou S, Yongjian L, Qasim I, Noor A, Nour R (2019) An intelligent learning system based on random search algorithm and optimized random forest model for improved heart disease detection. IEEE Access 7:180235–180243

    Article  Google Scholar 

  22. Mishra S, Tripathi AR (2020) IoT Platform Business Model for Innovative Management Systems. Int J Financ Eng (IJFE) 7(03):1–31

    Google Scholar 

  23. Kar S, Majumder DD (2019) A novel approach of mathematical theory of shape and neuro-fuzzy based diagnostic analysis of cervical cancer. Pathol Oncol Res 25(2):777–790

    Article  Google Scholar 

  24. Patil BM, Joshi RC, Toshniwal D (2010) Hybrid prediction model for type-2 diabetic patients. Expert Syst Appl 37(12):8102–8108

    Article  Google Scholar 

  25. Chen T, Shang C, Su P, Antoniou G, Shen Q (2018) Effective diagnosis of diabetes with a decision tree-initialised neuro-fuzzy approach. UK workshop on computational intelligence. Springer, Cham, pp 227–239

    Google Scholar 

  26. Abdullah AS, Selvakumar S (2019) Assessment of the risk factors for type II diabetes using an improved combination of particle swarm optimization and decision trees by evaluation with Fisher’s linear discriminant analysis. Soft Comput 23(20):9995–10017

    Article  Google Scholar 

  27. Tama BA, Rhee KH (2019) Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif Intell Rev 51(3):355–370

    Article  Google Scholar 

  28. Mishra S, Tripathi AR (2021) AI business model: an integrative business approach. J Innov Entrepreneurship 10(1):1–21

    Article  Google Scholar 

  29. Sarwar A, Ali M, Manhas J, Sharma V (2020) Diagnosis of diabetes type-II using hybrid machine learning based ensemble model. Int J Inf Technol 12(2):419–428

    Google Scholar 

  30. Nematzadeh Z, Ibrahim R, Selamat A (2015) Comparative studies on breast cancer classifications with k-fold cross validations using machine learning techniques. In: 2015 10th Asian Control Conference (ASCC), IEEE, pp 1–6

  31. Gayathri BM, Sumathi CP (2016) Comparative study of relevance vector machine with various machine learning techniques used for detecting breast cancer. In: 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), IEEE, pp 1–5

  32. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  33. Muthukaruppan S, Er MJ (2012) A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary artery disease. Expert Syst Appl 39(14):11657–11665

    Article  Google Scholar 

  34. Xu R, Anagnostopoulos GC, Wunsch DC (2007) Multiclass cancer classification using semisupervised ellipsoid ARTMAP and particle swarm optimization with gene expression data. IEEE/ACM Trans Comput Biol Bioinf 4(1):65–77

    Article  Google Scholar 

  35. Mishra S, Tripathi AR (2020) Literature review on business prototypes for digital platform. J Innov Entrepreneurship 9(1):1–19

    Article  Google Scholar 

  36. Ketu S, Mishra PK (2021) Scalable kernel-based SVM classification algorithm on imbalance air quality data for proficient healthcare. Complex & Intelligent Systems, pp 1–19

  37. Mishra S (2018) Financial management and forecasting using business intelligence and big data analytic tools. Int J Financ Eng 5(02):1850011

    Article  MathSciNet  Google Scholar 

  38. Ketu S, Mishra PK (2021) Hybrid classification model for eye state detection using electroencephalogram signals. Cognitive Neurodynamics pp 1–18

  39. Karegowda AG, Manjunath AS, Jayaram MA (2010) Comparative study of attribute selection using gain ratio and correlation-based feature selection. Int J Inform Technol Knowl Manag 2(2):271–277

    Google Scholar 

  40. Shailaja K, Seetharamulu B, Jabbar MA (2018) Machine learning in healthcare: a review. In: 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), IEEE, pp 910–914

  41. Ketu S, Mishra PK (2021) Enhanced Gaussian process regression-based forecasting model for COVID-19 outbreak and significance of IoT for its detection. Appl Intell 51(3):1492–1512

    Article  Google Scholar 

  42. Ketu S, Mishra PK (2020) A hybrid deep learning model for COVID-19 prediction and current status of clinical trials worldwide. Comput Mater Contin 66(2)

  43. Sharma A, Mishra PK (2020) State-of-the-art in performance metrics and future directions for data science algorithms. J Sci Res 64(2):221–238

    MathSciNet  Google Scholar 

  44. Ojha U, Goel S (2017) A study on prediction of breast cancer recurrence using data mining techniques. In: 2017 7th International Conference on Cloud Computing, Data Science and Engineering-Confluence, IEEE, pp 527–530

Download references

Acknowledgements

The authors are highly thankful to the editor and reviewers for kind suggestions and critical comments for improving the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ajay Sharma.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, A., Mishra, P.K. Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. Int. j. inf. tecnol. 14, 1949–1960 (2022). https://doi.org/10.1007/s41870-021-00671-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41870-021-00671-5

Keywords

Navigation