Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features

Williamson, Sheldon; Vijayakumar, K.; Kadam, Vinod J.

doi:10.1007/s11042-021-11114-5

Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features

1211: AIoT Support and Applications with Multimedia
Published: 04 June 2021

Volume 81, pages 36869–36889, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sheldon Williamson¹,
K. Vijayakumar² &
Vinod J. Kadam³

916 Accesses
21 Citations
1 Altmetric
Explore all metrics

Abstract

To look for early breast cancer signs and indications, mammography screening is one of the best approaches available. Screening mammograms are the most commonly recognized procedure and remain the gold standard for early breast cancer screening. But many times, a relatively low positive predictive rate of breast biopsy demonstrated by this diagnostic technique leads to unneeded biopsies for abnormal findings that are ultimately proven benign in many cases. Random Forest (RF)—which evolves from Decision Trees (DTs)—is one of the most practical and powerful ensemble learning concepts (or meta estimators). Breast Imaging Reporting and Data System (BI-RADS) is developed as a standardized system or tool for reporting breast mammograms. This technique is used to locate unusual findings into groups. In this study, the RF classifier with Chi-Square (χ²) and Mutual Information (MI) procedures of relevant Feature Selection (FS) has been applied successfully, in an attempt to predict cancer biopsy outcomes from BI-RAD findings and the patient’s age. For validation purposes, the UCI Mammographic Mass dataset has been used and assessed using accuracy, AUC, and several other performance criteria through a 10-fold CV approach. The prediction findings from the proposed method were very encouraging (84.70% accuracy and AUC 0.9023). Similarly, the proposed system gave better results in terms of MCC and F1-score. The results were directly compared with the RF classifiers and other state-of-the-art classification methods. This comparative analysis indicates that the proposed model is superior in terms of various efficiency indicators to the RF classifier and all standard models used in the study. These findings also confirm that the χ² and MI FS approaches correctly as well as efficiently obtained the relevant and discriminating feature subset. The result also points out that the suggested approach is a comparable approach to different classification models present in the relevant literature. It is an advantageous, practical, and sound method to predict cancer biopsy outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of mammograms using adaptive binary TLBO with ensemble classifier for early detection of breast cancer

Article 17 June 2022

Performance Analysis of Breast Cancer Data Using Mann–Whitney U Test and Machine Learning

Cancer Prediction Using Machine Learning

References

A. C. of Radiology (ACR). (2003) Breast imaging reporting and data system atlas (bi-rads atlas). reston, va: © american college of radiology.
Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput 9(7):1545–1588
Article Google Scholar
Baker JA, Kornguth PJ, Lo JY, Williford ME, Floyd CE Jr (1995) Breast cancer: prediction with artificial neural network based on bi-rads standardized lexicon. Radiology 196(3):817–822
Article Google Scholar
Bakirarar B, ˙Kar I, Gökmen D, Elhan AH, Genç V (2019) The prediction of breast biopsy outcomes using two data mining algorithms based on parameter variations. Turkiye Klinikleri Journal of Biostatistics 11(2)
Bethapudi P, Reddy ES, Varma KV (2015) Classification of breast cancer using gini index based fuzzy supervised learning in quest decision tree algorithm. International Journal of Computer Applications 975:8887
Google Scholar
Bhat VH, Rao PG, Krishna S, Shenoy PD, Venugopal K, Patnaik LM (2011) An efficient framework for prediction in healthcare data using soft computing techniques, in International Conference on Advances in Computing and Communications. Springer, pp. 522–532.
Bilska-Wolak AO, Floyd Jr CE (2001) Investigating different similarity measures for a case-based reasoning classifier to predict breast cancer, in Medical Imaging 2001: Image Processing, vol. 4322. International Society for Optics and Photonics, pp. 1862–1866
Bilska-Wolak AO, Floyd CE Jr (2002) Development and evaluation of a case-based reasoning classifier for prediction of breast biopsy outcome with bi-rads™ lexicon. Med Phys 29(9):2090–2100
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Article MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
D’Orsi C, Bassett L, Feig S et al (2018) Breast imaging reporting and data system (bi-rads). Breast Imaging. In: Lee CI, Lehman CD, Bassett LW (eds) . Oxford University Press, New York
Google Scholar
Dua C, Dheeru, Graff (2019) UCI machine learning repository. [Online]. Available: http://archive.ics.uci.edu/ml
Elsayad AM (2010) Predicting the severity of breast masses with ensemble of bayesian classifiers. J Comput Sci 6(5):576
Article Google Scholar
Elter M, Schulz-Wendtland R, Wittenberg T (2007) The prediction of breast cancer biopsy outcomes using two cad approaches that both emphasize an intelligible decision process. Med Phys 34(11):4164–4172
Article Google Scholar
Eltieb MA et al (2018) A comparative study of machine learning algorithms to predict Brest cancer. Sudan University of Science & Technology, Ph.D. dissertation
Google Scholar
Fischer E, Lo J, Markey M (2004) Bayesian networks of bi-rads/spl trade/descriptors for breast lesion classification, in The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2. IEEE, pp. 3031–3034.
Floyd CE Jr, Lo JY, Tourassi GD (2000) Case-based reasoning computer algorithm that uses mammographic findings for breast biopsy decisions. Am J Roentgenol 175(5):1347–1352
Article Google Scholar
Gastounioti A, McCarthy AM, Pantalone L, Synnestvedt M, Kontos D, Conant EF (2019) Effect of mammographic screening modality on breast density assessment: digital mammography versus digital breast tomosynthesis. Radiology 291(2):320–327
Article Google Scholar
Halawani S, Alhaddad M, Ahmad A (2012) A study of digital mammograms by using clustering algorithms
Hassim YMM, Ghazali R (2015) Improving functional link neural network learning scheme for mammographic classification, in International Workshop on Neural Networks. Springer, pp. 213–221.
Heine JJ, Deans SR, Cullers DK, Stauduhar R, Clarke LP (1997) Multiresolution statistical analysis of high-resolution digital mammograms. IEEE Trans Med Imaging 16(5):503–515
Article Google Scholar
Ho TK (1995) Random decision forests. Proceedings of 3rd international conference on document analysis and recognition 1. IEEE:278–282
Article Google Scholar
Huang M-L, Hung Y-H, Lee W-M, Li R, Wang T-H (2012) Usage of casebased reasoning, neural network and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification diagnosis. J Med Syst 36(2):407–414
Article Google Scholar
Ibrikci T, Karabulut EM, Uwisengeyimana JD (2016) Meta learning on small biomedical datasets, in Information Science and Applications (ICISA) 2016. Springer, pp. 933–939.
Karssemeijer N (1993) Adaptive noise equalization and recognition of microcalcification clusters in mammograms. Int J Pattern Recognit Artif Intell 7(06):1357–1376
Article Google Scholar
Kaushik D, Kaur K (2016) Application of data mining for high accuracy prediction of breast tissue biopsy results, in 2016 Third International Conference on Digital Information Processing, Data Mining, and Wireless Communications (DIPDMWC). IEEE, pp. 40–45.
Kaya M, Yıldız O, Bilge HS (2013) Breast cancer diagnosis based on naïve bayes machine learning classifier with knn missing data imputation. Global Journal on Technology 4(2)
Kharya S, Agrawal S, Soni S (2014) Using bayesian belief networks for prognosis & diagnosis of breast cancer. IJARCCE 3:5423–5427
Google Scholar
Kozachenko L, Leonenko NN (1987) Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23(2):9–16
MathSciNet MATH Google Scholar
Kraskov A, Stögbauer H, Grassberger P (2004) Estimating mutual information. Physical review E 69(6):066138
Article MathSciNet Google Scholar
Kumar GR, Ramachandra G, Nagamani K (2014) An efficient feature selection system to integrating svm with genetic algorithm for large medical datasets. Int J 4(2):272–277
Google Scholar
B. Lairenjam and S. K. Wasan (2009) Neural network with classification based on multiple association rule for classifying mammographic data, in International Conference on Intelligent Data Engineering and Automated Learning. Springer, pp. 465–476.
Lairenjam B, Wasan SK (2010) Naïve bayes associative classification of mammographic data, in 2010 International Conference on Educational and Network Technology. IEEE, pp. 276–281.
Lairenjam B, Wasan SK (2010) A note on analysis of mammography data. Int J Open Problems Compt Math 3(5)
Liberman N (2017) Decision trees and random forests, 01 2017. [Online]. Available: https://towardsdatascience.com/decision-trees-and-random-forests-df0c3 123f991
Liberman L, Menell JH (2002) Breast imaging reporting and data system (bi-rads). Radiologic Clinics 40(3):409–430
Article Google Scholar
Ludwig SA (2010) Prediction of breast cancer biopsy outcomes using a distributed genetic programming approach, in Proceedings of the 1st ACM International Health Informatics Symposium, pp. 694–699.
Luo S-T, Cheng B-W (2012) Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst 36(2):569–577
Article Google Scholar
Malmartel A, Tron A, Caulliez S (2019) Accuracy of clinical breast examination’s abnormalities for breast cancer screening: cross-sectional study. European Journal of Obstetrics & Gynecology and Reproductive Biology 237:1–6
Article Google Scholar
Markey MK, Lo JY, Vargas-Voracek R, Tourassi GD, Floyd CE Jr (2002) Perceptron error surface analysis: a case study in breast cancer diagnosis. Comput Biol Med 32(2):99–109
Article Google Scholar
Mokhtar SA, Elsayad A et al. (2013) Predicting the severity of breast masses with data mining methods, arXiv preprint arXiv:1305.7057
Muši’c L, Gabelji’c N (2019) Predicting the severity of a mammographic tumor using an artificial neural network, in International Conference on Medical and Biological Engineering. Springer, pp. 775–778.
Nguyen TT, Tsoy Y (2017) A kernel pls based classification method with missing data handling. Stat Pap 58(1):211–225
Article MathSciNet MATH Google Scholar
Nilashi M, Ibrahim O, Ahmadi H, Shahmoradi L (2017) A knowledge-based system for breast cancer classification using fuzzy logic method. Telematics Inform 34(4):133–144
Article Google Scholar
Nithya R, Santhi B (2015) Decision tree classifiers for mass classification. International Journal of Signal and Imaging Systems Engineering 8(1–2):39–45
Article Google Scholar
Novakovic J, Veljovic A (2011) Interpretation of mammograms with rotation forest and pca, in 2011 6th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI). IEEE, pp. 571–575.
Nugroho KA, Setiawan NA, Adji TB (2013) Cascade generalization for breast cancer detection, in 2013 International Conference on Information Technology and Electrical Engineering (ICITEE). IEEE, pp. 57–61.
Priebe C, Lorey R, Marchette D, Solka J, Rogers G (1994) Nonparametric spatio-temporal change point analysis for early detection in mammography
Rakowski W, Clark M (1998) Do groups of women aged 50 to 75 match the national average mammography rate? Am J Prev Med 15(3):187–197
Article Google Scholar
Rathi V, Aggarwal S (2014) Comparing the performance of ann with fnn on mammography mass data set, in 2014 IEEE International Advance Computing Conference (IACC). IEEE, pp. 1307–1314.
Ross BC (2014) Mutual information between discrete and continuous data sets. PloS one 9(2)
Saritas I (2012) Prediction of breast cancer using artificial neural networks. J Med Syst 36(5):2901–2907
Article Google Scholar
Sebastiani F (2002) Machine learning in automated text categorization. ACM computing surveys (CSUR) 34(1):1–47
Article Google Scholar
sklearn.feature selection.chi2. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html
sklearn.impute.knnimputer. [Online]. Available: https://scikit-learn.org/stable/ modules/generated/sklearn.impute.KNNImputer.html
sklearn.preprocessing.minmaxscaler. [Online]. Available: https://scikit-learn.org /stable/modules/generated/sklearn.preprocessing. MinMaxScaler.html
Sondakh DE (2017) Data mining for healthcare data: a comparison of neural networks algorithms. Cogito Smart Journal 3(1):10–19
Article Google Scholar
The Python Standard Library — Python 3.9.2 documentation [Online]. Available: https://docs.python.org/3.9/library/
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for dna microarrays. Bioinformatics 17(6):520–525
Article Google Scholar
Yan Y-T, Zhang Y-P, Zhang Y-W, Du X-Q (2017) A selective neural network ensemble classification for incomplete data. Int J Mach Learn Cybern 8(5):1513–1524
Article Google Scholar
Zahriah S, Fahmi A, Sharifah Sakinah Syed A, Rabiah A (2017) Imputing missing values in mammography mass dataset: Will it increase classification performance of machine learning algorithms? in Proceeding 8th International Conference on Agricultural, Biological, Environmental and Medical Sciences (ABEMS-2017) Oct. 11–12, 2017 Bali (Indonesia)

Download references

Author information

Authors and Affiliations

Faculty of Engineering and Applied Science, OntarioTech University, Oshawa, Canada
Sheldon Williamson
Department of Computer Science and Engineering, St. Joseph’s Institute of Technology, OMR, Chennai, India
K. Vijayakumar
Department of Information Technology, Dr. Babasaheb Ambedkar Technological University, Lonere, Maharashtra, India
Vinod J. Kadam

Authors

Sheldon Williamson
View author publications
You can also search for this author in PubMed Google Scholar
K. Vijayakumar
View author publications
You can also search for this author in PubMed Google Scholar
Vinod J. Kadam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheldon Williamson.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Williamson, S., Vijayakumar, K. & Kadam, V.J. Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features. Multimed Tools Appl 81, 36869–36889 (2022). https://doi.org/10.1007/s11042-021-11114-5

Download citation

Received: 15 October 2020
Revised: 02 March 2021
Accepted: 28 May 2021
Published: 04 June 2021
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11042-021-11114-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features

Abstract

Access this article

Similar content being viewed by others

Classification of mammograms using adaptive binary TLBO with ensemble classifier for early detection of breast cancer

Performance Analysis of Breast Cancer Data Using Mann–Whitney U Test and Machine Learning

Cancer Prediction Using Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting breast cancer biopsy outcomes from BI-RADS findings using random forests with chi-square and MI features

Abstract

Access this article

Similar content being viewed by others

Classification of mammograms using adaptive binary TLBO with ensemble classifier for early detection of breast cancer

Performance Analysis of Breast Cancer Data Using Mann–Whitney U Test and Machine Learning

Cancer Prediction Using Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation