Insight into breast cancer detection: new hybrid feature selection method

Shaban, Warda M.

doi:10.1007/s00521-022-08062-y

Insight into breast cancer detection: new hybrid feature selection method

Original Article
Open access
Published: 01 December 2022

Volume 35, pages 6831–6853, (2023)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Insight into breast cancer detection: new hybrid feature selection method

Download PDF

Warda M. Shaban¹

1557 Accesses
9 Citations
Explore all metrics

Abstract

Breast cancer, which is also the leading cause of death among women, is one of the most common forms of the disease that affects females all over the world. The discovery of breast cancer at an early stage is extremely important because it allows selecting appropriate treatment protocol and thus, stops the development of cancer cells. In this paper, a new patients detection strategy has been presented to identify patients with the disease earlier. The proposed strategy composes of two parts which are data preprocessing phase and patient detection phase (PDP). The purpose of this study is to introduce a feature selection methodology for determining the most efficient and significant features for identifying breast cancer patients. This method is known as new hybrid feature selection method (NHFSM). NHFSM is made up of two modules which are quick selection module that uses information gain, and feature selection module that uses hybrid bat algorithm and particle swarm optimization. Consequently, NHFSM is a hybrid method that combines the advantages of bat algorithm and particle swarm optimization based on filter method to eliminate many drawbacks such as being stuck in a local optimal solution and having unbalanced exploitation. The preprocessed data are then used during PDP in order to enable a quick and accurate detection of patients. Based on experimental results, the proposed NHFSM improves the efficiency of patients’ classification in comparison with state-of-the-art feature selection approaches by roughly 0.97, 0.76, 0.75, and 0.716 in terms of accuracy, precision, sensitivity/recall, and F-measure. In contrast, it has the lowest error rate value of 0.03.

Improving Breast Cancer Diagnosis Accuracy by Particle Swarm Optimization Feature Selection

Article Open access 13 March 2024

Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: a comprehensive analysis

Article 14 September 2022

A Hybrid Feature Selection Framework for Breast Cancer Prediction Using Mutual Information and AdaBoost-RFE

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Breast cancer is the worst form of cancer among women. According to the American Cancer Society (ACS), there will be 287,850 new cases and 43,250 fatalities in the USA in 2022; see American Cancer Society [1]. Breast cancer is a major health concern because it is diagnosed in at least 1.67 million people annually, leading to an estimated 522,000 deaths [2]. Early detection and classification of breast cancer patients is therefore an essential duty. In reality, there are a number of imaging modalities that aid in the detection of breast cancer [3]. Imaging modalities such as magnetic resonance images (MRI), positron emission tomography (PET), thermography (thermal imaging), ultrasound imaging (sonography), and mammography are shown in Fig. 1.

Mammography is an X-ray examination that is used to screen the breast. It is the most common method for screening breast cancer. Mammography examination also known as mammogram helps to detect and diagnose breast cancer early [4]. Mammography has several advantages such as: (1) it decreases the chances of dying from breast cancer as it can be used to detect all types of breast cancer, including invasive ductal and lobular cancer, (2) after an X-ray examination, the patient’s body does not contain any radiation, (3) it improves the physician’s ability to identify small cancerous cell, and (4) in the typical diagnostic range of this examination, X-rays usually have no side effects [5]. However, it has drawbacks such as: (1) a normal-appearing false-negative mammography despite the existence of breast cancer, (2) an abnormal-looking false-positive mammogram despite the absence of breast cancer, and (3) periods of waiting when follow-up tests are required. Therefore, we need to resort to alternative methods of support, such as machine learning.

Machine learning (ML) is an area of computer science and artificial intelligence (AI) that studies how machines can learn like humans using data and algorithms with the ability to learn and improve their performance over time [6]. By following this principle, it will eventually become more effective and precise. In reality, there is no difference between the various ML methods. First, they begin by feeding training data into the chosen algorithm. The final algorithm is developed using training data, which can be either known or unknown data. After that, the algorithm is then fed new input data to see if it is functioning properly. Then, the prediction and outcomes are cross-checked [7]. Various ML techniques have been used for breast cancer classification [8,9,10,11,12]. However, they suffer from several limitations such as (1) low diagnostic accuracy, (2) time consuming, and (3) high degree of complexity.

Actually, ML algorithms have gained fame in the healthcare field for being able to detect and classify different diseases especially in the case of breast cancer. But the existence of irrelevant features causes that many ML algorithms do not work well. Irrelevant features cause many issues such as (1) degradation of the classifier’s performance, (2) increases the training time, and (3) leads to the over fitting problem [13]. Therefore, it is critical to have good data preparation and dimensionality reduction before applying ML algorithms because it can affect the results. When data are precise, consistent, and free of noise, disease detection and classification become faster and easier [14].

Many ML algorithms are unable to perform well due to the presence of unrelated or irrelevant features. These algorithms are K-nearest neighbors (KNN), support vector machines (SVM), etc. [15]. Removing those features before using the machine learning algorithms is a critical step in improving the classification techniques’ performance [15]. Feature selection is the process of identifying the most significant features from the original features. The main aims of feature selection process are to avoid over fitting problem, obtain the higher accuracy, get better learning performance, and decrease the computational cost [16, 17]. Mainly, feature selection approaches can be divided into two main categories: filter, and wrapper methods [18,19,20]. Filter methods find the significant features without requiring the data to be classified first. Wrapper approaches, on the other hand, use classification algorithms to focus in on the most relevant features [18,19,20]. With regards to classification accuracy, wrapper approaches are preferred over filter methods [18,19,20]. Swarm intelligence algorithms have been used increasingly for feature selection.

Swarm intelligence (SI) is a method of simulating the intelligence of biological groups. It is a popular multi-agent framework that was inspired by natural swarm behaviors [21]. It mimics the actions of a herd of animals fighting for its own survival. Controlling robots and unmanned vehicles, predicting social behaviors, improving telecommunication and computer networks, etc., are just a few of the many complex problems that SI has been applied to solve in real-world applications [21,22,23]. Recently, SI has attracted a lot of interest from feature selection community due to its simplicity and global search ability. There are a variety of swarm intelligence algorithms available today, including the ant colony optimization (ACO), grey wolf optimization (GWO), particle swarm optimization (PSO), bat algorithm (BA), and many others [23].

However, relying solely on the feature selection process will not provide the best classification accuracy because there are a small number of instances in the datasets that represent bad or noisy data [24]. Therefore, to boost the efficiency of the classification techniques, hasty data should be disregarded [25]. Outlier rejection is the process of eliminating or removing noisy data that has a very unusual behavior compared to the others. Consequently, before applying the classification technique, the outliers must be rejected or eliminated from the datasets. Several ML algorithms treat outliers as noise. Since its presence reduces the system’s ability to predict future events, it must be eliminated. In general, there are two types of outlier methods which are classic outlier approach and spatial outlier approach [24, 25].

This paper introduces a new strategy for detecting breast cancer patients called patients detection strategy (PDS). PDS consists of two phases, which are data preprocessing phase (DP²) and patient detection phase (PDP). The main objective of DP² is to transform the mammogram images into a set of features and eliminate the noisy data. Consequently, it consists of two processes, which are outlier rejection and feature selection. Firstly, gray level co-occurrence matrix (GLCM) is used to extract features from mammogram images. Then, hasty data that exhibit extreme behavior in comparison with others are excluded from these extracted features. On the other hand, in feature selection process, the most efficacious and useful features are selected to proceed the next phase (i.e., PDP). Undoubtedly, a good feature selection method speeds up classification by allowing for the consideration of fewer features, which not only increases model efficiency but also improves model performance. In this paper, the main contribution is focused on introducing new hybrid feature selection method (NHFSM). NHFSM is a hybrid method that combines filter and wrapper feature selection methods. It includes two modules which are quick selection module (QSM) and feature selection module (FSM). In the first module (i.e., QSM), information gain (IG) is used to select the active features from those extracted features in another meaning, and IG is used as the initial selection stage. In the second module (i.e., FSM), these selected features are used to initialize the initial population of bat algorithm (BA). FSM uses a hybrid approach to finally select the most informative features. Contrarily, PDP makes use of the filtered data to perform precise classification quickly. The experimental results show that NHFSM is superior to its rivals in terms of accuracy, precision, recall, and error rate.

1.1 Research questions

As such, this work attempts to answer the following research questions (Q):

Q1: which machine-learning models based on feature selection methods exist to detect breast cancer patients?

Q2: to what extend good feature selection method is important for breast cancer detection?

1.2 Research organization

The rest of paper is organized as follows; Sect. 2 briefly discussed the standard PSO and standard BA. Section 3 presents the previous efforts about feature selection methods. Section 4 introduces the proposed patients detection strategy. Section 5 introduces the proposed feature selection method. Section 6 illustrates the experimental results. Conclusions and future work are discussed in Sect. 7.

2 Background and basic concepts

In this section, an overview about particle swarm optimization (PSO) and bat algorithm (BA) will be presented.

2.1 Particle swarm optimization (PSO)

Particle swarm optimization (PSO) is a stochastic, population-based optimization technique. It was influenced by the social behavior of flocks of birds or schools of fish. It has been introduced by Kennedy and Eberhart [26]. PSO uses a group of individuals to explore promising areas in the search space. In order to optimize a problem, this algorithm iteratively enhances the candidate solutions. Firstly, PSO starts randomly with a group of particles or birds as a candidate solution called swarm S. Each particle in the swarm is arranged in relation to the other particles based on all of its experiences [27]. Hence, each particle in the solution space can be a potential solution to an optimization problem and flies at a certain velocity in the multi-dimensional search space. As a result, particles must determine both their personal best position (Pbest) and their global best position (X_G), which refers to their best position among all other particles. Thus, the velocity vector and the position vector can be used to describe the flight state. PSO has several advantages compared to other optimization algorithms, as; (1) it is less sensitive to the nature of the objective function than other heuristic methods, (2) less dependence on the set of initial points than other evolutionary methods, resulting in a more robust convergence algorithm, and (3) it is simple to implement with few parameters to adjust [26, 27]. Figure 2 shows the conventional PSO flowchart.

Let us assume that there are N particles in the population of PSO, and that their coordinates each signify a possible solution associated with two vectors, the position vector X_i and velocity vector V_i. The search space has M dimensions, so the position of the ith particle is given by X_i = (X_i¹, X_i²,…, X_i^M), and ith particle velocity VX_i = (VX_i¹, VX_i², …, VX_i^M). The previous best position of ith particle that has the best fitness value is called Pbest_i = P_pi = (P_pi¹, P_pi²,…., P_pi^M). Additionally, the global best position among all particles in the swarm is X_G also known as X_Global; X_G = (X_G¹, X_G²,…., X_G^M). In fact, the particle’s velocity and position is dynamically updated based on its momentum and both Pbest and X_Global to fly near the optimal solution. The velocity and position of each particle are updated using the following equations [26, 27]:

$$VX_{i} \left( {t + 1} \right) = w*VX_{i} \left( t \right) + \left( {c_{1} r_{1} \left( {P_{{{\text{pi}}}} \left( t \right) - X_{i} \left( t \right)} \right)} \right) + \left( {c_{2} r_{2} \left( {X_{{\text{G}}} \left( t \right) - X_{i} \left( t \right)} \right)} \right)$$

(1)

$$X_{i} \left( {t + 1} \right) = X_{i} \left( t \right) + VX_{i} \left( {t + 1} \right)$$

(2)

where VX_i(t + 1) denotes the velocity of ith particle at iteration (t + 1). VX_i(t) represents the velocity of ith particle at iteration t and $P_{{{\text{pi}}}} \left( t \right)$ is the current personal best position of ith particle; Pbest_i. Additionally, X_G(t) denotes the global best position in the swarm S at the current iteration; X_Global. X_i(t) is the current position of ith particle at the current iteration. w is the inertia weight which is used to control the impact of the previous history of velocities on the current velocity; w ∈ [0.9–1.2]. c₁ and c₂ are the cognitive and social acceleration constants; c₁, c₂ ∈ [2–4]. Additionally, r₁ and r₂ are two uniformly distributed random numbers in the interval [0,1]; r₁, r₂ ∈ [0–1]. $X\left( {t + 1} \right)$ is the position of ith particle at the next iteration. Conventional PSO algorithm is illustrated in algorithm 1.

2.2 Bat algorithm (BA)

Bat algorithm (BA) is a bio-inspired algorithm created by Xin-She Yang in 2010, and it has since been shown to be quite effective. It mimics parts of the micro-bat’s echolocation properties in a simple way [28, 29]. The basic structure of BA is built using three major properties of the micro-bat which are (1) the echolocation behavior, (2) the signal frequency as the micro-bat transmits a signal with a frequency of f and a variable wavelength λ, and (3) the emitted sound’s loudness A₀, which is used to locate prey [28,29,30]. The following are Xin-She Yang’s approximate rules for this method:

(1)
Each bat uses echolocation to detect distance, and, in some mysterious way, it can determine its distance from the prey.
(2)
To find prey, bats fly at a random velocity v_i toward a position x_i, emitting sounds with a fixed frequency f_min, varying wavelength, and loudness A₀.
(3)
Bats can automatically adjust both the wavelength (or frequency) and rate of pulse emission in response to the proximity of their target.
(4)
Although the loudness might fluctuate in a variety of ways, it is acceptable that it reduces from a maximum A₀ to a constant low value A_min.

For m-dimensional search space, bats are defined by their position, velocity, and frequency. Hence, for ith bat the updated velocity, position, and frequency for the next iteration are determined by using the following equations [28]:

$$VX_{i} \left( {t + 1} \right) = VX_{i} \left( t \right) + \left( {X_{i} \left( t \right) - X_{{\text{G}}} \left( t \right)} \right)f_{i}$$

(3)

$$X_{i} \left( {t + 1} \right) = X_{i} \left( t \right) + VX_{i} \left( {t + 1} \right)$$

(4)

where t represents the current iteration and $VX_{i} \left( {t + 1} \right)$ represents the velocity of ith bat at the next iteration. $VX_{i} \left( t \right)$ is the velocity of ith bat at the current iteration and $X_{i} \left( t \right)$ is the position of ith bat at the current iteration. Additionally, $X_{{\text{G}}} \left( t \right)$ represents the current global best solution in the bat’s population and $X_{i} \left( {t + 1} \right)$ indicates the position of ith bat at the next iteration (t + 1). $f_{i}$ represents the frequency of ith bat which is updated at each iteration as represented in (5) [28]:

$$f_{i} = f_{\min } + \left( {f_{\max } - f_{\min } } \right)\beta$$

(5)

where β is a uniformly distributed random number; β ∈ [0,1]. For the algorithm’s local search part, once a solution is chosen from among the current best solutions, a new solution is generated locally for each bat using random walk. This is done by using (6) [30]:

$$X_{i} \left( {t + 1} \right) = X_{i} \left( t \right) + \varepsilon \overline{A}_{t}$$

(6)

where ε is random number; ε ∈ [− 1,1].and $\overline{A}_{t}$ is the average value of A for all bats at the current iteration that bats use to do exploration rather than exploitation as it increases. The frequency value in the BA determines the space and the range of bats movement. When bats get close to their prey, the loudness A decreases and the pulse emission rate r increases. The updated values of loudness A and emission pulse rate r are calculated by using (7) and (8) [28]:

$$A_{i} \left( {t + 1} \right) = \alpha A_{i} \left( t \right)$$

(7)

$$r_{i} \left( {t + 1} \right) = r_{i} \left( 0 \right)\left( {1 - e^{ - \gamma t} } \right)$$

(8)

where $A_{i} \left( {t + 1} \right)$ is the loudness of ith bat at the next iteration and $A_{i} \left( t \right)$ is the loudness of ith bat at the current iteration. Additionally, $r_{i} \left( {t + 1} \right)$ is the emission pulse rate at the next iteration and $r_{i} \left( 0 \right)$ is the initial emission pulse rate. α and γ are constant. Figure 3 shows the flowchart of conventional BA; also, the sequential steps are illustrated in algorithm 2.

3 Literature review

During this subsection, previous efforts related to feature selection in breast cancer classification will be reviewed. Actually, several researchers gave their interest focusing on finding the best feature selection methods that can select the most informative features from among these extracted features in order to achieve the most accurate breast cancer classification possible. In [31], the authors focused on investigating the impact of feature selection techniques when integrated with classification algorithm in diagnosing breast cancer. They used particle swarm optimization (PSO) to select the most effective and informative features. They concluded that when selecting the most importance features, the classifier’s performance was improved. In [32], Raman spectral feature selection using ant colony optimization (ACO) for breast cancer diagnosis was introduced. ACO was provided to find the best features related to cancerous changes to promote the classification accuracy. The experimental results showed that by using ACO to choose the best features, the classification accuracy of normal, benign, and cancerous groups improved by 14%, reaching 87.7%.

As shown in [33], a new method for feature selection was introduced to improve the diagnostic precision of computer-aided diagnosis systems. The proposed method called opposition-based enhanced version of grey wolf optimization (OEGWO). OEGWO was utilized to solve the feature selection problem in breast density classification. Actually, OEGWO passed through three steps which are opposition-based population initialization, modification of exploration and exploitation controlling parameter, and modification of position updating step. Firstly, from mammogram images, forty-five texture features were extracted. After features were extracted, OEGWO was used to select the most useful ones. Based on experimental results, the proposed OEGWO is superior to other feature selection algorithms for identifying breast density.

In [34], a new method for feature selection in breast mass classification was proposed. The proposed method called opposition-based Harris hawks optimization (OHHO) algorithm. Firstly, forty-five texture features and nine shape features were extracted from mammogram images. Then, to deal with the problem of feature selection, the proposed OHHO was converted into binary using sigmoid transfer function. Finally, the proposed OHHO was applied to select the most important feature. Results in [34] showed that OHHO outperforms the other competitors.

It has become clear that selecting the most effective and informative features is an important process to enhance the classification performance as illustrated in [35], the authors came to the conclusion that the system performance is excellent due to the selection of more appropriate features. Also, in [36], a feature selection method called minimal redundancy maximal relevance feature selection (MRMRFS) algorithms was provided to increase the classification accuracy. MRMRFS was used to select the most appropriate features from the breast cancer datasets. In [36], experimental results show that by selecting better features with MRMRFS, the SVM’s performance was enhanced, resulting in an accuracy of 99.71%.

In fact, a lot of research has focused on applying a combination of feature selection techniques to increase performance improvement. In [37], a new hybridization method using grey wolf optimizer and rough set (GWORS) was proposed for feature selection. GWORS was based on two main process which are feature extraction, and feature selection. Firstly, texture, intensity, and shape-based features were extracted from mass segmented mammogram images. Then, GWORS was applied to select the most effective and informative features from these extracted features. Results in [37] indicated that GWORS is superior to other methods in terms of accuracy, F-measure, and receiver operating characteristic curve.

As presented in [38], a novel breast cancer intelligent diagnosis approach was proposed. The proposed approach introduces a new feature selection method called information gain with simulated annealing genetic algorithm wrapper (IGSAGAW). Firstly, the extracted features were ranked using information gain (IG) algorithm. Then, according to the importance of features, SAGAW was used for selecting top m-optimal features to feed the used classifier. IGSAGAW not only reduces the complexity of the classification process, but also maximizes the classification accuracy and minimizes misclassification cost.

In [39], a new feature selection method (FSM) to select the most relevant features for classifying patients into malignant or benign cases was presented. Firstly, the extracted features were ranked using a new feature importance index based on principle component analysis (PCA) and Bhattacharyya distance (BD). The combination between two parameters of PCA and BD was used to highlight features that have higher variance and discriminatory power. Then, the proposed method classified patient records into appropriate classes iteratively using three classification techniques, which are K-nearest neighbor (KNN), linear discriminant analysis (LDA), and probabilistic neural network (PNN). The classification process was carried out on the remaining features until only one feature was left after the index eliminated the least significant feature. Results in [39] demonstrated the effectiveness of the proposed method.

As illustrated in [40], a novel hybrid feature selection algorithm (HFSA) based on Relieff and entropy-based genetic algorithm (EGA) was proposed for breast cancer diagnosis. HFSA was used to combine the advantages of both filter and wrapper methods to deal with datasets with high dimensions and uncertainties. Firstly, the weights of each feature were calculated and evaluated using Relieff ranking method as a filter method. Then, to more effectively remove irrelevant features, EGA was used as a wrapper method. Experimental results in [40] have proven that the proposed method not only generates small subset of informative and significant features, but also provides significant classification accuracy for large datasets. Table 1 shows a brief comparison about the current feature selection methods.

Table 1 Shows a brief comparison about the current feature selection methods

Insight into breast cancer detection: new hybrid feature selection method

Abstract

Similar content being viewed by others

Improving Breast Cancer Diagnosis Accuracy by Particle Swarm Optimization Feature Selection

Two new feature selection methods based on learn-heuristic techniques for breast cancer prediction: a comprehensive analysis

A Hybrid Feature Selection Framework for Breast Cancer Prediction Using Mutual Information and AdaBoost-RFE

1 Introduction

1.1 Research questions

1.2 Research organization

2 Background and basic concepts

2.1 Particle swarm optimization (PSO)

2.2 Bat algorithm (BA)

3 Literature review

4 The proposed patients detection strategy (PDS)

4.1 Data preprocessing phase (DP2)

4.2 Patients detection phase (PDP)

5 The proposed feature selection method

6 Experimental results

6.1 Evaluation metrics

6.2 Testing the proposed feature selection method

7 Conclusions and future work

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

4.1 Data preprocessing phase (DP²)