Abstract
Breast cancer is one of the common reasons for deaths of women over the globe. It has been found that a Computer-Aided Diagnosis (CAD) system can be designed using X-ray mammograms for early-stage detection of breast cancer, which can decrease the death rate to a large extent. This paper work proposes a novel 2-way threshold-based intelligent water drops IWD “algorithm for feature selection to design an effective and efficient CAD system that can detect breast cancer in early stage. This approach first extracts the local binary patterns in wavelet domain from mammograms and then applies our introduced 2-way threshold-based IWD algorithm to extract most important subset of features from the extracted features set. Two-way thresholding is a technique to find a lower bound and an upper bound on the number of features to be selected in the optimal subset. So, using these threshold values, IWD is capable of producing multiple optimal subsets of features rather than producing a single optimal subset of features. The best subset among the above subsets is then used to train and deploy support vector machine (SVM) to classify new mammograms. The results have shown that the proposed model outperforms many of the existing CAD systems. Further we have compared our introduced feature selection technique with other meta-heuristic features selection techniques such as ant colony optimization, particle swarm optimization, simulated annealing, genetic algorithm, gravitational search algorithm, inclined planes optimization and gray wolf optimization algorithm and found that it outperforms the other feature selection techniques. The accuracy, precision, recall, specificity and F1-score of our proposed framework are measured on MIAS dataset as 99%, 98.7%, 98.123%, 96.2% and 98.4%, respectively, and on DDSM dataset as 97.89%, 96.9%, 96.4%, 94.8% and 96.2%.
Similar content being viewed by others
References
Abubacker NF, Azman A, Doraisamy S, Murad MAA (2017) An integrated method of associative classification and neuro-fuzzy approach for effective mammographic classification. Neural Comput Appl 28(12):3967–3980
Aličković E, Subasi A (2017) Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Comput Appl 28(4):753–763
Alijla BO, Peng LC, Khader AT, Al-Betar MA (2013) Intelligent water drops algorithm for rough set feature selection in Asian conference on intelligent information and database systems. Springer, Berlin, Heidelberg, pp 356–365
Alijla BO, Wong LP, Lim CP, Khader AT, Al-Betar MA (2014) A modified intelligent water drops algorithm and its application to optimization problems. Expert Syst Appl 41(15):6555–6569
Alirezanejad M, Enayatifar R, Motameni H, Nematzadeh H (2020) Heuristic filter feature selection methods for medical datasets. Genomics 112(2):1173–1181
Ancy C, Nair LS (2018) Tumour classification in graph-cut segmented mammograms using GLCM features-fed SVM. Intelligent engineering informatics. Springer, pp 197–208
Anitha J, Peter JD (2012) A wavelet based morphological mass detection and classification in mammograms. In: International conference on machine vision and image processing (MVIP), pp 25–28
Anter AM, Hassenian AE (2016) Computer aided diagnosis system for mammogram abnormality. Medical imaging in clinical applications. Springer, pp 175–191
Babatunde OH, Armstrong L, Leng J, Diepeveen D (2014) A genetic algorithm-based feature selection
Baker JA, Rosen EL, Lo JY et al (2003) Computer-aided detection (CAD) in screening mammography: sensitivity of commercial CAD systems for detecting architectural distortion. Am J Roentgenol 181(4):1083–1088
Bandyopadhyay SK (2010) Pre-processing of mammogram images. Int J Eng Sci Technol 2(11):6753–6758
Beura S, Majhi B, Dash R (2015) Mammogram classification using two dimensional discrete wavelet transform and gray-level co-occurrence matrix for detection of breast cancer. Neurocomputing 154:1–14
Bhosle U, Deshmukh J (2019) Mammogram classification using AdaBoost with RBFSVM and Hybrid KNN–RBFSVM as base estimator by adaptively adjusting γ and C value. Int J Inf Technol 11(4):719–726
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271
Capizzi G, Sciuto GL, Napoli C, Połap D, Woźniak M (2019) Small lung nodules detection based on fuzzy-logic and probabilistic neural network with bioinspired reinforcement learning. IEEE Trans Fuzzy Syst 28
Chaieb R, Kalti K (2019) Feature subset selection for classification of malignant and benign breast masses in digital mammography. Pattern Anal Appl 22(3):803–829
Chandy DA, Christinal AH, Theodore AJ, Selvan SE (2017) Neighbourhood search feature selection method for content-based mammogram retrieval. Med Biol Eng Compu 55(3):493–505
Chougrad H, Zouaki H, Alheyane O (2020) Multi-label transfer learning for the early diagnosis of breast cancer. Neurocomputing 392:168–180
Chuang LY, Chang HW, Tu CJ, Yang CH (2008) Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 32(1):29–38
Comer ML, Liu S, Delp EJ (1996) Statistical segmentation of mammograms. In: Proceedings of the 3rd international workshop on digital mammography, pp 475–478
Computer –aided Diagnosis: Tipping Point of Digital Pathology (2017) Digital pathology association
Deng W, Xu J, Song Y, Zhao H (2020) An effective improved co-evolution ant colony optimisation algorithm with multi-strategies and its application. Int J Bio-Insp Comput 16(3):158–170
Dhahbi S, Barhoumi W, Zagrouba E (2015) Breast cancer diagnosis in digitized mammograms using curvelet moments. Comput Biol Med 1(64):79–90. https://doi.org/10.1016/j.compbiomed.2015.06.012
Dheeba J, Singh NA, Selvi ST (2014) Computer-aided detection of breast cancer on mammograms: a swarm intelligence optimized wavelet neural network approach. J Biomed Inform 49:45–52
Dorigo M, Maniezzo V, Colorni A (1996) Ant system: optimization by a colony of cooperating agents. IEEE Trans Syst Man Cybern Part B Cybern 26(1):29–41. https://doi.org/10.1109/3477.484436
Du L, You X, Xu H, Gao Z, Tang Y (2010) Wavelet domain local binary pattern features for writer identification. In: 2010 20th international conference on pattern recognition, pp 3691–3694. IEEE
Feature selection using PSO-SVM (2007) Int J Comput Sci
Goldberg DE (1989) Optimization and machine learning. Genet Algorith Search
Görgel P, Sertbas A, Ucan ON (2013) Mammographical mass detection and classification using local seed region growing–spherical wavelet transform (lsrg–swt) hybrid scheme. Comput Biol Med 43(6):765–774. https://doi.org/10.1016/j.compbiomed.2013.03.008
Gøtzsche PC, Jørgensen KJ (2013) Screening for breast cancer with mammography. Cochrane Database Syst Rev 6
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Hassanzadeh HR, Rouhani M (2010) A multi-objective gravitational search algorithm. In: 2010 2nd international conference on computational intelligence, communication systems and networks, pp 7–12. IEEE
Heinlein P, Drexl J, Schneider W (2003) Integrated wavelets for enhancement of microcalcifications in digital mammography. IEEE Trans Med Imaging 22(3):402–413
Hong BW, Sohn BS (2009) Segmentation of regions of interest in mammograms in a topographic approach. IEEE Trans Inf Technol Biomed 14(1):129–139
Hosseini HS (2007). Problem solving by intelligent water drops. In: 2007 IEEE congress on evolutionary computation, pp. 3226–3231, IEEE
Ibrahim NSA, Soliman NF, Abdallah M, El-Samie FEA (2016) An algorithm for pre-processing and segmentation of mammogram images. In: 2016 11th international conference on computer engineering and systems (ICCES), pp 187–190. IEEE
Jiao Z, Gao X, Wang Y, Li J (2018) A parasitic metric learning net for breast mass classification based on mammography. Pattern Recognit 75:292–301
Jona J, Nagaveni N (2012) A hybrid swarm optimization approach for feature set reduction in digital mammograms. WSEAS Trans Inf Sci Appl 9:340–349
Kanan HR, Faez K, Taheri SM (2007) Feature selection using ant colony optimization (ACO): a new method and comparative study in the application of face recognition system. Industrial conference on data mining. Springer, Berlin, Heidelberg, pp 63–76
Kashef S, Nezamabadi-pour H (2015) An advanced ACO algorithm for feature subset selection. Neurocomputing 147:271–279
Kennedy J, Eberhart R (1948) IEEE, particle swarm optimization. In: 1995 IEEE international conference on neural networks proceedings, vols. 1–61995
Khosravi MH, Bagherzadeh P (2019) A new method for feature selection based on intelligent water drops. Appl Intell 49(3):1172–1184
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Kupinski MA, Giger ML (1998) Automated seeded lesion segmentation on digital mammograms. IEEE Trans Med Imaging 17(4):510–517
Lia Y, Chena H, Yangb Y, Yanga N (2013) Pectoral muscle segmentation in mammograms based on homogenous texture and intensity deviation. Pattern Recogn 46(3):681–691
Liu X, Tang J (2013) Mass classification in mammograms using selected geometry and texture features, and a new SVM-based feature selection method. IEEE Syst J 8(3):910–920
Lu W, Dou R, Zhang G (2013) A new method for extracting region of interest in mammograms. In: 2013 IEEE international conference on medical imaging physics and engineering, pp 228–230. IEEE
Luo ST, Cheng BW (2012) Diagnosing breast masses in digital mammography using feature selection and ensemble methods. J Med Syst 36(2):569–577
Mabrouk MS, Afify HM, Marzouk SY (2019) Fully automated computeraided diagnosis system for micro calcifications cancer based on improved mammographic image techniques. Ain Shams Eng J 10:517–527
Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312
Maleki F, Nooshyar M, Fatin GZ (2014) Breast cancer segmentation in digital mammograms based on harmony search optimization. Tech J Eng Appl Sci 4(4):477–484
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Mirjalili S, Saremi S, Mirjalili SM, Coelho LDS (2016) Multi-objective grey wolf optimizer: a novel algorithm for multi-criterion optimization. Expert Syst Appl 47:106–119
Mirjalili S, Dong JS (2020) Multi-objective grey wolf optimizer. In: Multi-objective optimization using artificial intelligence techniques. Springer briefs in applied sciences and technology. Cham: Springer. https://doi.org/10.1007/978-3-030-24835-2_5
Mirzaalian H, Ahmadzadeh MR., Sadri S, Jafari M (2007) Pre-processing algorithms on digital mammograms. MVA, pp 118–121
Mohammadi A, Zahiri SH (2017) IIR model identification using a modified inclined planes system optimization algorithm. Artif Intell Rev 48(2):237–259
Mohammadi A, Mohammadi M, Zahiri SH (2015) A novel solution based on multi-objective AI techniques for optimization of CMOS LC_VCOs. J Telecommun Electron Comput Eng (JTEC) 7(2):137–144
Mohammadi A, Mohammadi M, Zahiri SH (2018) Design of optimal CMOS ring oscillator using an intelligent optimization tool. Soft Comput 22(24):8151–8166
Mohammadi A, Zahiri SH, Razavi SM, Suganthan PN (2021) Design and modeling of adaptive IIR filtering systems using a weighted sum-variable length particle swarm optimization. Appl Soft Comput 109:107529
Mohammadi-Esfahrood S, Mohammadi A, Zahiri SH (2019) A simplified and efficient version of inclined planes system optimization algorithm. In: 2019 5th conference on knowledge based engineering and innovation (KBEI), pp 504–509. IEEE
Mohanty F, Rup S, Dash B, Majhi B, Swamy MNS (2019) Mammogram classification using contourlet features with forest optimization-based feature selection approach. Multimed Tools Appl 78(10):12805–12834
Mohebian MR, Marateb HR, Mansourian M, Mañanas MA, Mokarian F (2017) A hybrid computer-aided-diagnosis system for prediction of breast cancer recurrence (HPBCR) using optimized ensemble learning. Comput Struct Biotechnol J 15:75–85
Mozaffari MH, Abdy H, Zahiri SH (2016) IPO: an inclined planes system optimization algorithm. Comput Inform 35(1):222–240
Mustra M, Grgic M, Rangayyan RM (2016) Review of recent advances in segmentation of the breast boundary and the pectoral muscle in mammograms. Med Biol Eng Comput 54(7):1003–1024
Nadimi-Shahraki MH, Taghian S, Mirjalili S (2021) An improved grey wolf optimizer for solving engineering problems. Exp Syst Appl 166:113917
Pisano ED, Yaffe MJ (2005) Digital mammography. Radiology 234(2):353–362
Punitha S, Amuthan A, Joseph KS (2018) Benign and malignant breast cancer segmentation using optimized region growing technique. Future Comput Inform J 3(2):348–358
Rashedi E, Nezamabadi-Pour H, Saryazdi S (2009) GSA: a gravitational search algorithm. Inf Sci 179(13):2232–2248
Rashedi E, Nezamabadi-Pour H, Saryazdi S (2010) BGSA: binary gravitational search algorithm. Nat Comput 9(3):727–745
Sahiner B, Chan HP, Petrick N, Helvie MA, Goodsitt MM (1998) Design of a high-sensitivity classifier based on a genetic algorithm: application to computer-aided diagnosis. Phys Med Biol 43(10):2853
Sambandam RK, Jayaraman S (2018) Self-adaptive dragonfly based optimal thresholding for multilevel segmentation of digital images. J King Saud Univ Comput Inf Sci 30(4):449–461
Shah-Hosseini H (2009) The intelligent water drops algorithm: a nature-inspired swarm-based optimization algorithm. Int J Bio-Insp Comput 1(1–2):71–79
Shah-Hosseini H (2010) Intelligent water drops algorithm a new optimization method for solving the vehicle routing problem. In: IEEE international conference on systems, man and cybernetics (SMC), pp 4142–4146
Shahraki NS, Zahiri SH (2020) Multi-objective optimization algorithms in analog active filter design. In: 2020 8th Iranian joint congress on fuzzy and intelligent systems (CFIS), pp 105–109. IEEE
Shaikh TA, Ali R (2020) An intelligent healthcare system for optimized breast cancer diagnosis using harmony search and simulated annealing (HS-SA) algorithm. Inform Med Unlock 21:100408
Shuaib M, Adebayo OS, Osho O, Idris I, Alhassan JK, Rana N (2019) Whale optimization algorithm-based email spam feature selection method using rotation forest algorithm for classification. SN Appl Sci 1(5):390
Singh VP, Srivastava A, Kulshreshtha D, Chaudhary A, Srivastava R (2016) Mammogram classification using selected GLCM features and random forest classifier. Int J Comput Sci Inf Secur 14(6):82
Singh VP, Srivastava S, Srivastava R (2017) Effective mammogram classification based on center symmetric-LBP features in wavelet domain using random forests. Technol Health Care 25(4):709–727
Soulami KB, Saidi MN, Tamtaoui A (2016) A cad system for the detection of abnormalities in the mammograms using the metaheuristic algorithm particle swarm optimization (pso). International symposium on ubiquitous networking. Springer, Singapore, pp 505–517
Statistical data for breast cancer, from Breast Cancer Research Foundation (BCRF). Available in: https://www.bcrf.org/breast-cancer-statistics-and-resources/. Accessed Feb 2020
Suzuki K, Li F, Sone S, Doi K (2005) Computer-aided diagnostic scheme for distinction between benign and malignant nodules in thoracic low-dose CT by use of massive training artificial neural network. IEEE Trans Med Imaging 24(9):1138–1150
Tan F, Fu X, Zhang Y, Bourgeois AG (2008) A genetic algorithm-based method for feature subset selection. Soft Comput 12(2):111–120
Thawkar S, Ingolikar R (2018a) Classification of masses in digital mammograms using biogeography-based optimization technique. J King Saud Univ Comput Inf Sci 29(831):845
Thawkar S, Ingolikar R (2018b) Classification of masses in digital mammograms using firefly based optimization. Int J Image Graph Signal Process 10(2):25
Turabieh H, Muhanna M (2016) GA-based feature selection with ANFIS approach to breast cancer recurrence. Int J Comput Sci Issues (IJCSI) 13(1):36
van Laarhoven PJ, Aarts EH (1987) Simulated annealing. Simulated annealing: theory and applications. Springer, Dordrecht, pp 7–15
Wang S, Rao RV, Chen P, Zhang Y, Liu A, Wei L (2017) Abnormal breast detection in mammogram images by feed-forward neural network trained by jaya algorithm. Fund Inform 151(1–4):191–211
Woźniak M, Połap D (2018) Bio-inspired methods modeled for respiratory disease detection from medical images. Swarm Evol Comput 41:69–96
Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. Icml 1, pp 601–608
Xue B, Zhang M, Browne WN (2014) Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl Soft Comput 18:261–276
Zhang X, Zhang Y, Han EY, Jacobs N, Han Q, Wang X, Liu J (2017). Whole mammogram image classification with convolutional neural networks. In: 2017 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 700–704. IEEE.
Zheng B, Chang YH, Wang XH, Good WF, Gur D (1999) Feature selection for computerized mass detection in digitized mammograms by using a genetic algorithm. Acad Radiol 6(6):327–332
Acknowledgements
This work is financially supported by a granted project 1-573645901 from “Collaborative Research Scheme (TEQIP-III)” under “All India council of Technical Education,” New Delhi, India.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals by any authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A
Acronyms and abbreviations
CAD, Computer-aided diagnosis.
IWD, Intelligent water drops.
LBP, Local binary patterns.
LB, Lower bound.
UB, Upper bound.
SVM, Support vector machine.
ACO, Ant colony optimization.
PSO, Particle swarm optimization.
SA, Simulated annealing.
GA, Genetic algorithm.
GSA, Gravitational search algorithm.
IPO, Inclined planes optimization.
GWO, Gray wolf optimization.
MIAS, Mammographic image analysis society.
DDSM, Digital database for screening mammography.
BCRF, Breast Cancer Research Foundation.
GLCM, Gray-level co-occurrence matrix.
PCA, Principal component analysis.
FNN, Feedforward neural network.
k-NN, k-nearest neighbors.
CS-LBP, Center symmetric-local binary pattern.
FS, Forward selection.
BS, Backward selection.
LDA, Linear discriminant analysis.
HS, Harmony search.
PSOWNN, Particle swarm neural networks.
FFNN, Feed-forward neural network.
DFO, Dragon fly optimization.
RF, Random forest.
ANFIS, Adaptive neurofuzzy interface system.
DWT, Discrete wavelet transform.
HUD, Heuristic undesirability.
TP, True positive.
FP, False positive.
TN, True negative.
FN, False negative.
NB, Naïve bayes.
DT, Decision tree.
RF, Random forest.
ANN, Artificial neural network.
AUC, Area under curve.
ROC, Receiver operating characteristic.
List of variables
-
Variables related to feature extraction:
\(g_{i}\): The gray-scale value of neighborhood pixel.
\(g_{c}\): The gray-scale value of the center pixel.
\(P\): Connectivity from the neighborhood pixels.
\(R\): Neighborhood radius for \(N\) equally spaced pixels.
-
Variables used in IWD algorithm:
\(T^{{{\text{IWD}}}}\): The complete solution.
\(T^{IB}\): Iteration best solution.
\(N_{{{\text{Features}}}}\): Number of final features.
\(N_{{{\text{IWD}}}}\): Number of water drops.
\(\left( {a_{v} ,b_{v} ,c_{v} } \right)\): Variables to update the velocity of the water drops.
\(\left( {a_{s} ,b_{s} ,c_{s} } \right)\): Variables to update the soil of the local path.
\({\text{MaxIter}}\): Maximum number of iterations.
\({\text{initSoil}}\): Initial value of the local soil.
\({\text{initVel}}\): Initial velocity associated with each of the water drop.
\(V_{c}^{{\left( {{\text{IWD}}_{r} } \right)}}\): Feature list visited by each water drop \(r\).
\({\text{initVel}}^{{\left( {{\text{IWD}}_{r} } \right)}}\): Velocity of the water drop \(r\).
\({\text{soil}}^{{\left( {{\text{IWD}}_{r} } \right)}}\): Soil associated with the water drop \(r\).
\(\rho_{n}\): Local soil updating parameter.
\(\rho_{{{\text{IWD}}}}\): Global soil updating parameter.
\(\varepsilon_{s}\): Parameter to prevent zero division.
-
Variables related to thresholding:
\(N_{f}\): Total number of features.
\(T_{{{\text{dist}}}}\): Threshold distance.
List of control parameters of various metaheuristic optimization algorithms
Algorithm | Notation of the control parameters | Description |
---|---|---|
ACO | \(m\) | Initial number of ants |
\(\alpha\) | Pheromone exponent | |
\(\beta\) | Heuristic exponent | |
\(\rho\) | Pheromone evaporation factor | |
\(\tau_{0}\) | Initial pheromone value | |
PSO | \(N_{{\text{p}}}\) | Number of particles |
\({\text{MAX}}_{{{\text{iter}}}}\) | Maximum iterations | |
\(w\) | Inertia weight | |
\(c_{1}\) | Cognitive factor | |
\(c_{2}\) | Social factor | |
SA | \({\text{MAX}}_{{{\text{iter}}}}\) | Number of iterations |
\({\text{MAX}}_{{{\text{const}}}}\) | Number of iterations at constant temperature | |
\(p_{0}\) | Initial acceptance probability | |
\(\delta_{0}\) | Minimal difference between solutions | |
\(T_{0}\) | Initial temperature | |
\(\alpha_{{\text{r}}}\) | Reduce factor | |
GA | \(P_{{{\text{size}}}}\) | Population size |
\({\text{MAX}}_{{{\text{iter}}}}\) | Number of iterations | |
\(F_{{\text{t}}}\) | Function tolerance | |
GSA | \(S_{p} \left( {{\text{NP}}} \right)\) | Size of the population |
\({\text{MAX}}_{{{\text{iter}}}}\) | Maximum number of iterations | |
\(G_{0} ,\) α | Gravitational constants | |
\(T_{{\text{a}}}\) | Total number of agents | |
GWO | \(G_{{\text{p}}}\) | Population size |
\({\text{MAX}}_{{{\text{iter}}}}\) | Number of iterations | |
\(r_{1} ,r_{2}\) | Random vectors | |
\(\vec{v}\) | Coefficient vector | |
IPO | \(c_{1} ,c_{2}\) | Constants |
Rights and permissions
About this article
Cite this article
Kalita, D.J., Singh, V.P. & Kumar, V. Two-way threshold-based intelligent water drops feature selection algorithm for accurate detection of breast cancer. Soft Comput 26, 2277–2305 (2022). https://doi.org/10.1007/s00500-021-06498-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-06498-3