1 Introduction

Today, the use of machine learning methods to diagnose diseases have become widespread [1]. Breast cancer disease is one of the most common types of malignant cancers among worldwide women and accounts for 25.1% of all cancers [2]. Breast cancer spreads to other organs over time. Research also showed that breast cancer is more common in women whose average age is 47 years than in women whose average age is 63 years [3, 4].

Cancerous tumors are divided into malignant and benign. Benign tumors are non-intense. But malignant tumors are intensive and can spread to other parts of the body. Therefore, the correct diagnosis of the tumor for treatment must be considered. Recurrence of breast cancer can occur 1–20 years after treatment for primary cancer. Cancer patients often face treatment complications. The recurrence of breast cancer can be predicted by examining various factors such as the size of the primary tumor, the number of damaged lymph nodes, the area of the tumor, and similar factors. In recent years Machine learning models have been used in medicine to diagnose cancer and accurately classify benign and malignant tumors in a reasonable time [5, 6].

With the advancement of technology, different types of data are produced with high dimensions. The data produced in the field of medicine or cancer have wide dimensions and variables. When the dimension of the data is high, the classification results may have more error and make data analysis difficult. Also, high-dimensional data has challenges such as search space, time, and computational costs [7]. Nowadays, Machine learning in diagnosing diseases such as COVID-19 [1], cardiovascular disease, diabetes mellitus and analyzing their data was successful. Using the dimension reduction and feature selection (FS) methods makes the disease diagnosis faster, easier and less expensive. It should be easier to store and classify [8] because feature selection can produce fewer features and reduce computational costs [9]. In feature selection, the goal is to reduce the number of features in the dataset and select the most effective features so that the highest possible classification accuracy can be achieved with the least possible number of features. Machine learning methods are widely used in medical studies and automatic diagnosis of cancers such as breast cancer. Many successful detections and prediction methods have been performed, especially in studies using Coimbra, Wisconsin Diagnostic Breast Cancer (WDBC), and Wisconsin Prognostic Breast Cancer dataset (WPBC) datasets. These predictions are made using the dimensions and other features of tumors [10].

The innovation of this research is the combination of ten different machine learning algorithms including ensemble learning methods with the PSO feature selection algorithm to select the most effective features in the diagnosis of breast cancer which is implemented on three famous datasets in the field of breast cancer named Coimbra, WPBC and WDBC datasets. Also, use the PSO algorithm as a method to select more effective features in disease diagnosis and reduce the size of the dataset and by applying it to the Coimbra, WDBC and WPBC datasets to diagnose breast cancer using the most common machine learning methods and performance analysis of each algorithm. These machine learning methods include AdaBoost (ADB), Decision Tree (DT), k Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Logistic Regression (LGR), Linear Regression (LR), Naïve Bayes (NB), Artificial Neural Networks (ANN), Random Forest (RF) and Support Vector Machine (SVM). Also, a comparative analysis between the performance evaluation criteria of machine learning methods on the original dataset and the dataset consist of the selected features by PSO feature selection algorithm was performed.

The following sections of this article are organized as follows. The second part of the article examines related works. This research includes studies on breast cancer, the use of machine learning algorithms and different techniques to diagnose breast cancer and compare their accuracy. In the third part of the article, the information related to the Coimbra, WDBC and WPBC datasets are explained, which are used to evaluate the proposed method. Also, in this section, machine learning methods such as classification methods and PSO algorithm are explained. The fourth part describes the theory and calculation of the proposed method. In the fifth section, the results obtained in this research are stated. It belongs to Discussion in the sixth part. The seventh section provides conclusions and suggestions for future research.

2 Related Works

Meta-heuristics algorithms are widely used in feature selection because they are highly efficient, easy to implement, and can manage large-scale data. Swarm Intelligence (SI) algorithm is a branch of meta-heuristics algorithms. These algorithms imitate from social behavior of animal group life, for example, Insects (instance ant, bee, etc.) birds, and fishes [11, 12]. On the other hand, feature selection is an important and challenging work in machine learning and one of its goals is to maximize the accuracy of classification [12]. For instance, in [13], the authors used the SVM methods to diagnose breast cancer and the kNN, NB, and DT algorithms to detect the type of cancer cells. In [14] a hybrid model based on concepts of neural networks and fuzzy systems presented. This model could manipulate data collected in medical examinations and detect patterns in healthy individuals and individuals with breast cancer with an acceptable level of accuracy. These intelligent techniques have made it possible to create expert systems based on logical rules of the IF/THEN type. According to its results, the hybrid model has a good capacity to predict breast cancer and analyze the characteristics of this cancer. In [10] the DT method was used to diagnose breast cancer. In this study, parameters related to blood analysis have been used. In this methods, the level of importance of the properties is determined by the Gini coefficient. Accuracy in this study is 90.5%. In [15] an analytical evaluation was performed on machine learning methods and breast cancer datasets. Some of the initial processing was performed using WEKA software on the input datasets and its overall effect on the prediction accuracy was also determined. In this research, the filter feature selection method has been used. The results show that correct feature selection can be used to select the best features and the prediction speed and accuracy can be increased. RF had the best accuracy of 69% before using the filter method and 98% after using it. Similarly, LR came in second with 96% accuracy after using the filter and 68% unfiltered, followed by NB with 91% after using the filter method and 71% unfiltered. The authors in [16] diagnosed breast cancer using four factors of resistance, glucose, age and BMI. They using three machine learning methods including SVM, RF, and LGR. They used the Monte Carlo cross-validation method to evaluate the results and the 95% confidence level. Their results indicate the superiority of the SVM method over the other two methods. In [17], four methods including Extreme Learning Machine (ELM), SVM, kNN and ANN were used to diagnose breast cancer using the Wisconsin dataset which includes blood analysis data of patients and healthy individuals. The accuracy obtained by ELM is 80%, 79.4% by the ANN method, 77.5% by the kNN method and 73.5% by the SVM method. In [18] the authors presented a SVM-based ensemble learning method that was applied on two breast cancer datasets including the Wisconsin dataset and one breast cancer dataset registered in the United States. The results of this method showed 33.34% increase in accuracy of diagnosis compared to the best individual SVM method.

3 Materials and Methods

This section examines datasets, machine learning methods, and the PSO algorithm used to increase the accuracy of breast cancer diagnosis. In this research, the data sets available in the database of the University of California Irvine (UCI), USA has been used. The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine [19].

3.1 Description of Datasets

The datasets used in this study were Coimbra, WDBC, and WPBC. The number of samples and features of which are also shown in Table 1.

Table 1 List of used datasets

3.1.1 Coimbra Dataset

The Coimbra dataset is one of the datasets used in this article that contains 116 samples. This dataset is related to a study that was performed on the obstetrics and gynecology department of Coimbra University Hospital between 2009 and 2013 and its data were collected. People have been diagnosed with breast cancer based on mammography results and through the diagnosis of specialist doctors. These data were obtained before treatment. There are ten attributes in this dataset. Among these ten attributes, one of the attributes is binary and nine attributes are continuous. The binary attribute indicates the presence or absence of cancer. Attributes are anthropometric parameters and data obtained from blood analysis [20]. The attributes and statistical parameters of the Coimbra dataset are shown in Table 2.

Table 2 Features and statistical parameters of the Coimbra dataset

3.1.2 WDBC Dataset

Another dataset is WDBC. This dataset has 569 samples. Each instance contains 30 attributes in which the first attribute specifies a unique identification number and the second attribute identifies labels (357 Benign/212 Malignant). For each diagnostic sample, 30 features were computed from a digitized image of a Fine Needle Aspirate (FNA) test. Ten real valued attributes that are shown in Table 3 were computed for each cell nucleus. For each image, the mean, standard error and largest values (worst value) of these ten features were computed and giving 30 features [8, 16, 21, 22]. The ten real value attributes and statistical parameters of the WDBC dataset are shown in Table 3.

Table 3 Features and statistical parameters of the WDBC dataset

This dataset has been collected by Dr.Wolberg from patients since 1984 and there is no evidence of metastasis to other parts of the body in these patients [23].

3.1.3 WPBC Dataset

The WPBC dataset is another dataset used in this study. This dataset contains 198 samples (151 Non recurrence/47 Recurrence). Each sample has 33 attributes. The first attribute is a unique identification number, the second attribute is prognosis status (non-recurrence or recurrence) and the third attribute is recurrence time. The other 30 features were obtained by the process explained in the previous section. Two other features are the diameter of the removed tumor in centimeters and the number of axillary lymph nodes that were evaluated positively during surgery [21, 24]. The ten real valued and other features with statistical parameters of the WPBC dataset are shown in Table 4.

Table 4 Attributes and statistical values of WPBC dataset

3.2 Classification Methods

Ten machine learning classification methods were used to diagnose breast cancer, recurrence or non-recurrence, benign or malignant using breast cancer datasets. Each algorithm is briefly described below.

3.2.1 AdaBoost

The ADB algorithm is a collective learning algorithm. In each iteration of this algorithm, a weak classifier is added. In each round t = 1, …, T. In each call, the weights are updated based on the importance of the samples. The weight of the incorrectly sorted samples is increased and the weight of the correctly sorted samples is reduced. As a result, the new classifier focuses on examples that are more difficult to learn [25].

3.2.2 Decision Tree

The DT algorithm is a hybrid algorithm and can perform classification operations on data [26]. A decision tree is made up of nodes. The director of a tree is a node called a “root” that has no input edges. All other nodes have exactly one input edge. A node with output edges is called an internal node or test node. Other nodes are called decision nodes or leaf. In a decision tree, each internal node divides the sample space into two or more spaces according to the values of the input attributes [27].

3.2.3 K nearest Neighbor

The KNN algorithm is a classification algorithm in which the class of a sample is categorized by a majority vote of its K neighbors. K is a positive value and is generally small. If k = 1, the sample is simply determined in its nearest neighbor class. K is basically considered an odd number to see an extra close neighbor and is prevented from equal votes [28].

3.2.4 Linear Discriminant Analysis

LDA involves finding the super plane and minimizing the variance between each class and maximizing the distance between the predicted mean classes. As a result, the linear composition can best find the attributes that separate two or more classes of objects. LDA is very close to the analysis of variance and regression analysis [29].

3.2.5 Logistic Regression

The LGR algorithm models the relationship between a variable that depends on the X classification and the Y attribute. In the LGR, there is a dependent variable and usually a set of independent variables which may be two categories, quantitative or a combination of them. The independent variable is small values and the qualitative dependent variable will have two values of zero or one [30].

3.2.6 Linear Regression

The LR algorithm measures the effect of an independent variable on a dependent variable and the correlation between them [31]. The common method for obtaining parameters is the least-squares method which the parameters are obtained by minimizing the sum of squares of error [32].

3.2.7 Naïve Bayes

The NB algorithm is a group of Naïve classifiers that use the Bayesian theorem as a strong assumption. In this method, by determining the probabilities of the results, uncertainty about the model is obtained. This classification is named after Thomas Bayes (1702–1761) who proposed this theorem [8, 33].

3.2.8 Artificial Neural Network

The ANN algorithm is a biological model. In this algorithm, a neural network can create the conditions for a computer to learn a problem that has not been seen before. The function of the ANN mimics the function of the human brain to some extent [34]. ANNs are made up of small units called neurons. The intensity of the connection between neurons is determined by synaptic weights. Each neuron can calculate the output value based on the weighted sum of the inputs [35].

3.2.9 Random Forest

The RF algorithm is a collective learning algorithm. For better decision-making, a set of DTs together produces a forest. Bagging is a technique used to generate training data in this algorithm. None of the selected data is deleted from the input dataset but is used to generate the next subset. In the bagging method, a random tree is created by creating N new data from the dataset. The final model is created by averaging or voting between the trees. In this method, the random tree is constructed in such a way that each time some variables are randomly selected and the best variable is selected from them [26]. Samples that are not selected in the training of trees in the bagging process are considered out of bag subsets and can be used to evaluate performance [36].

3.2.10 Support Vector Machine

The SVM algorithm is one of the classification methods whose job is to select a line that has a maximum-margin hyperplane. Input space conversion is done implicitly using the Kernel function. If the data are not separated by a line, a hyperplane is created to categorize the data; So this hyperplane can have the highest margin compared to the samples in the classes [13].

3.3 Particle Swarm Optimization Algorithm

The PSO algorithm is one of the meta-heuristic algorithms. This algorithm was first developed in 1995 by Kennedy and Eberhart [11] and was initially used to simulate the mass flight of birds. After simplifying the initial algorithm, a kind of optimization operation was observed. Optimization is the process of improving performance in reaching the optimal point or points [37, 38].

The PSO algorithm uses candidate solutions and a simple formula to solve the optimization problem and explores the search space of problem to obtain the optimal solution [11]. The usual aim of the PSO algorithm is to find the maximum or minimum of a function defined on a multidimensional vector space: for example find \(x^{*}\) such that \(f\left( {x^{*} } \right) < f\left( x \right)\) for all d-dimensional real vectors x. The objective function \(f:R^{d} \to R\) is called the fitness function. PSO is a swarm intelligence meta-heuristic inspired by the group behavior of animals, for example bird flocks or fish schools. Similarly to genetic algorithms (GAs), it is a population-based method, that is, it represents the state of the algorithm by a population, which is iteratively modified until a termination criterion is satisfied. In PSO algorithms, the population \(P = \left\{ {p_{1} , \ldots ,p_{n} } \right\}\) of the feasible solutions is often called a swarm. The feasible solutions \(p_{1} , \ldots ,p_{n}\) are called particles. The PSO algorithm views the set \(R^{d}\) of feasible solutions as a space where the particles move. For solving practical problems, the number of particles is usually chosen between 10 and 50.

3.3.1 Characteristics of Particle i at Iteration t

\(X_{i}^{\left( t \right)}\): The position (a d-dimensional vector) of ith particle at the iteration t.

\(Pbest_{i}^{\left( t \right)}\): The best solution that has been achieved so far by a particle (personal best).

\(Gbest^{\left( t \right)}\): The best solution that has been achieved so far by the entire swarm (global best).

\(V_{i}^{\left( t \right)}\): The speed of a particle.

At the beginning of the algorithm, the particle positions are randomly initialized, and the velocities are set to 0, or to small random values.

3.3.2 Parameters of the Algorithm

\(W^{(t)}\): Inertia weight; a damping factor, usually decreasing from around 0.9 to around 0.4 during the computation.

\(c_{1} , \, c_{2}\): The acceleration coefficients; usually between 0 and 2.

3.3.3 Update of the Speed and the Positions of the Particles

The velocity of particle i at the iteration t + 1 was obtained by updating the velocity of that particle at the previous iteration according to Eq. (1).

$$V_{i}^{{\left( {t + 1} \right)}} = w^{(t)} V_{i}^{\left( t \right)} + c_{1} u_{1} \left( {Pbest_{i}^{\left( t \right)} - X_{i}^{\left( t \right)} } \right) + c_{2} u_{2} \left( {Gbest^{\left( t \right)} - X_{i}^{\left( t \right)} } \right).$$
(1)

The symbols \(u_{1}\) and \(u_{2}\) represent random variables with the \(U\left( {0,1} \right)\) distribution. The first part of the velocity formula is called “inertia”, the second one “the cognitive (personal) component”, the third one is “the social (neighborhood) component”. Position of particle i at the iteration t + 1 was updated according to Eq. (2).

$$X_{i}^{{\left( {t + 1} \right)}} = X_{i}^{\left( t \right)} + V_{i}^{{\left( {t + 1} \right)}} .$$
(2)

At each iteration, every particle should search around the minimum point it ever found as well as around the minimum point found by the entire swarm of particles. In other words in addition to position and velocity of the particles, the \(Pbest_{i}^{\left( t \right)}\) and \(Gbest^{\left( t \right)}\) was updated during the iterations.

3.3.4 Stopping Rule

The algorithm is terminated after a given number of iterations, or once the fitness values of the particles or the particles themselves are close enough in some sense. Figure 1 shows the flowchart of PSO algorithm.

Fig. 1
figure 1

Flowchart of PSO algorithm

3.3.5 Advantages of PSO Algorithm

In the PSO algorithm, a number of particles are randomly generated by the algorithm and they search for better solutions by moving in the domain of the problem. This is the similarity of PSO algorithm with genetic algorithm [39, 40]. There are very few algorithm parameters. The fitness function can be non-differentiable and only values of the fitness function are used. The method can be applied to optimization problems of large dimensions. The PSO algorithm is insensitive to scaling of design variables and it has a very efficient global search method.

4 Proposed PSO Feature Selection Algorithm

The aim of this research is to improve the accuracy of breast cancer diagnosis based on proposed PSO feature selection algorithm and machine learning classification methods. Figure 3 shows the flow chart of the proposed PSO feature selection algorithm. In the Data collection stage, three breast cancer datasets including Coimbra dataset, WDBC dataset, and WPBC dataset have been used. This process is done each time by one of the datasets. In the next step, after the data pre-processing steps are done, the PSO feature selection algorithm selects the most effective features in breast cancer diagnosis and produce a new dataset which has fewer features than the original dataset and whose dimensions have been reduced. Then, in the next step, the breast cancer diagnosis was done using new dataset which consists only of the features selected by the PSO feature selection algorithm, and using machine learning algorithms described in Sect. 3.2, which the data can be classified into two categories: sick or healthy. Also, using machine learning algorithms and original datasets without PSO feature selection algorithm, data classification into sick or healthy categories was done. At the end, the performance of machine learning methods with and without PSO feature selection algorithm are obtained and compared with each other. The detailed description of steps of the proposed PSO feature selection algorithm are as follows.

4.1 Data Pre-processing

In the data pre-processing phase, the dataset is transformed into understandable data. The tasks performed in the data preprocessing stage include:

  1. 1.

    Detection and replacing the missing data

  2. 2.

    Detection and replacing the outlier data

  3. 3.

    Data normalization

  4. 4.

    Data splitting.

The datasets were evaluated for missing value and outliers. Replacing missing and outlier values were done by replacing with the mean value of that variables.

Importing data without normalization reduce the accuracy and performance of machine learning models. For this reason, all input data were normalized between 0 and 1. Equation (3) was used to normalize the input data [41].

$${\text{XN}}= {X}_{i}-{\text{min}}(x)/{\text{max}}\left(x\right)-{\text{min}}(x).$$
(3)

In this equation, min(x), max(x), and XN are the minimum, maximum and normalized data values of input data, respectively.

In the Data Splitting phase, the data were divided into two parts: Training and Test. Data dividing proportion were 80% for training and 20% for testing. To increase the generalizability of the model and prevent over fitting during model training, Cross-Validation (CV) technique was performed. In this method, the training data are divided into two parts. In an iterative process, in each iteration of cross validation method, one part of the data is used for the testing and the other parts for the training. The type of cross-validation performed was the K-fold method. In the K-fold CV method, the training dataset is randomly divided into k folds of the same size. At each iteration of the cross validation process, k − 1 of these folds can be used as the training dataset and one as the Validation dataset [42]. Figure 2 shows a tenfold cross-validation. In this study, a K-fold CV method with fourfold was implemented.

Fig. 2
figure 2

A tenfold cross validation [43]

Then, along with machine learning methods described in Sect. 4, once the datasets were evaluated without applying the PSO feature selection algorithm and again with applying PSO feature selection algorithm and using the more effective selected features by the PSO algorithm to diagnose breast cancer. After that, the results of these two approaches were compared (Fig. 3).

Fig. 3
figure 3

Flow chart of the proposed method

4.2 PSO Feature Selection Algorithm

The purpose of this research is to increase the accuracy of breast cancer diagnosis. The PSO feature selection objective function \(F(x)\) is shown in Eq. (4). According to this equation, the x vector is the inputs of the objective function which are the effective features selected by the PSO feature selection algorithm, and the output of the objective function is breast cancer diagnosis accuracy which is a summation of accuracy of training and validation dataset. Accuracy formula was described in the next section. Alpha and beta are coefficients of breast cancer diagnosis accuracy in training and validation datasets. The alpha and beta values was considered 0.7 and 0.3, respectively. In each iteration of the PSO feature selection algorithm, different features were selected by PSO algorithm and its fitness value which is breast cancer diagnosis accuracy was achieved by the objective function. Finally, after the termination of the iterations, the features selected by the PSO feature selection algorithm that have led the maximum accuracy are considered as the effective features in breast cancer diagnosis.

$$\begin{gathered} x = \left[ {x_{1} ,x_{2} ,x_{3} , \ldots ,x_{d} } \right], \hfill \\ {\text{Accuracy}} = \left( {{\text{TP}} + {\text{TN}}} \right)/\left( {{\text{TP}} + {\text{FN}} + {\text{FP}} + {\text{TN}}} \right), \hfill \\ F\left( x \right) = \alpha \left( {{\text{Accuracy}}_{{{\text{TR}}}} } \right) + \beta \left( {{\text{Accuracy}}_{{{\text{val}}}} } \right). \hfill \\ \end{gathered}$$
(4)

4.3 Performance Evaluation Parameters

For comparing the performance of different machine learning methods and different features selected by PSO feature selection algorithm, performance evaluation parameters included accuracy, sensitivity, specificity, F1_Score, positive predictive value (precision), negative predictive value, and area under the curve (AUC) was considered [44]. Figure 4 and Eqs. 510 show the confusion matrix for classification of two classes and the confusion matrix evaluation criteria, respectively. True positive (TP) is the number of cases correctly classified as patient. False positive (FP) is the number of cases incorrectly classified as patient. True negative (TN) is the number of cases correctly classified as healthy. False negative (FN) is the number of cases incorrectly classified as healthy. From the confusion matrix Accuracy, Sensitivity, Specificity and F1-score is evaluated using the following equations.

Fig. 4
figure 4

Confusion matrix for classification of two classes

$${\text{Accuracy}}={\text{TP}}+{\text{TN}}/{\text{TP}}+{\text{FN}}+{\text{FP}}+{\text{TN}},$$
(5)
$${\text{Sensitivity}} \left({\text{TPR}}\right)=\mathrm{ TP}/{\text{TP}}+{\text{FN}},$$
(6)
$$\mathrm{Specificity }\left({\text{TNR}}\right)={\text{TN}}/{\text{TN}}+{\text{FP}},$$
(7)
$$F1\_{\text{score}}=2{\text{TP}}/2{\text{TP}}+{\text{FP}}+{\text{FN}},$$
(8)
$$\mathrm{Positive predictive value }\left({\text{precision}}\right)={\text{TP}}/{\text{TP}}+{\text{FP}},$$
(9)
$$\mathrm{Negative predictive value}={\text{TN}}/{\text{FN}}+{\text{TN}}.$$
(10)

4.4 Sensitivity Analysis of Parameters of PSO Algorithm and Machine Learning Methods

In this research, the default values available in the Sklearn library of the Python programming language have been used for the parameters of machine learning algorithms. In order to choose the appropriate values of PSO algorithm parameters, a sensitivity analysis based on accuracy, sensitivity and specificity was performed using one of the machine learning methods. Tables 5, 6, 7 and 8 describe the sensitivity of the PSO algorithm to the variations in its parameters.

Table 5 C1 parameter
Table 6 C2 parameter
Table 7 W parameter
Table 8 Population size parameter

According to the results obtained in the above tables, the values of the parameters for PSO feature selection algorithm have been considered as described in Table 9.

Table 9 Parameters of PSO feature selection algorithm

All the process of breast cancer diagnosis and PSO feature selection algorithm along with machine learning methods were done using python 3 programming language on a PC with Intel (R) Core ™ i5-540 2.53GHz CPU and 8.00 GB of RAM.

5 Results

This section presents the breast cancer diagnosis results obtained from the classification evaluation with and without PSO feature selection for the three datasets Coimbra, WDBC and WPBC.

5.1 Classification Evaluation for Coimbra Dataset

In without PSO feature selection, ten machine learning models were constructed with all features in the Coimbra dataset. In PSO feature selection, ten machine learning models were constructed with selected features by PSO algorithm. The results after comparing these two phases are shown in Figs. 5 and 6, respectively. It is clear from Figs. 5 and 6 that PSO feature selection can improve diagnosis of breast cancer disease with increase accuracy, sensitivity and specificity.

Fig. 5
figure 5

Performance of ML classification models without PSO feature selection (Coimbra dataset)

Fig. 6
figure 6

Performance of ML classification models with PSO feature selection (Coimbra dataset)

In Fig. 7, the AUC evaluation metric is compared in the PSO feature selection and without PSO feature selection. According to the results, it is clear that the PSO feature selection could improve the AUC value.

Fig. 7
figure 7

AUC with and without PSO feature selection (Coimbra dataset)

Breast cancer diagnosis results without PSO feature selection and using PSO feature selection are shown in Tables 10 and 11, respectively. Using proposed PSO feature selection algorithm the accuracy of ADB, DT and LDA models improved by 3% and the accuracy of RF, LR, and LGR models improved by 2%. The accuracy of KNN, NB, NN and SVM models improved by 19, 8, 6 and 4%, respectively, using PSO feature selection algorithm compared to without PSO feature selection.

Table 10 Breast cancer diagnosis results without PSO feature selection (Coimbra dataset)
Table 11 Breast cancer diagnosis results with PSO feature selection (Coimbra dataset)

According to Fig. 8, the experimental results show that the prediction model based on SVM classifier and PSO feature selection algorithm has a higher accuracy percentage to predict the factors influencing cancer breast prediction compared to other algorithms.

Fig. 8
figure 8

Accuracy (%) with and without PSO feature selection (Coimbra Dataset)

Figure 9 also shows the number of features selected with and without PSO feature selection for each ML models in Coimbra dataset. This dataset consists of nine features. Using the PSO feature selection algorithm, the number of features of ADB and SVM models was reduced to four features, and the number of features of DT, KNN, LDA, LR and NN models was reduced to six features. The number of features of LGR, NB, RF models was reduced to five features.

Fig. 9
figure 9

Comparison of the number of features with and without PSO feature selection (Coimbra dataset)

In addition, classification evaluation metrics are given for ten ML models in Fig. 10.

Fig. 10
figure 10

Performance comparison for machine learning methods with and without PSO feature selection (Coimbra dataset)

5.2 Classification Evaluation for WDBC Dataset

Similar to the previous dataset, constructed ten machine learning models without PSO feature selection and with PSO feature selection in WDBC dataset. Comparing results of these two phase are shown in Figs. 11 and 12, respectively.

Fig. 11
figure 11

Performance of ML classification methods without PSO feature selection (WDBC dataset)

Fig. 12
figure 12

Performance of ML classification models with PSO feature selection (WDBC dataset)

Figure 13 shows the comparison of AUC in with and without PSO feature selection. With this comparison, it is clear that the AUC value has improved in the PSO feature selection.

Fig. 13
figure 13

AUC with and without PSO feature selection (WDBC dataset)

The diagnostic results of benign or malignant breast cancer for all features without PSO feature selection and using PSO feature selection are given in Tables 12 and 13, respectively. Using PSO feature selection algorithm the accuracy of ADB, DT, KNN, LR, NN, RF models improved by 1% and the accuracy of NB and SVM models improved by 3% and 2%, respectively. The accuracy for LDA and LGR models were the same with and without using PSO feature selection algorithm but for LDA model the sensitivity has improved by 1% with using the PSO feature selection. The classification results for LGR model with and without PSO feature selection were the same.

Table 12 Breast cancer classification results without PSO feature selection (WDBC dataset)
Table 13 Breast cancer classification results with PSO feature selection (WDBC dataset)

According to Fig. 14, the diagnostic result of the ADB classifier with the PSO feature selection algorithm has a higher accuracy percentage compared to other ML models.

Fig. 14
figure 14

Accuracy (%) with and without PSO feature selection (WDBC dataset)

Figure 15 shows the number of features selected with and without PSO feature selection for each ML models in WDBC dataset. This dataset consists of 30 features. By using the PSO feature selection algorithm, the number of features of ADB and LR models was reduced to 21 features and the number of features of DT and LDA models was reduced to 14 features. The number of features of KNN and SVM models was reduced to 16 features and the number of features of LGR, NB, NN, RF models was reduced to 9, 15, 17 and 19 features, respectively.

Fig. 15
figure 15

Comparison of the number of features with and without PSO feature selection (WDBC dataset)

In addition, classification evaluation metrics are given for ten ML models in Fig. 16.

Fig. 16
figure 16

Performance comparison for machine learning methods with and without PSO feature selection (WDBC dataset)

5.3 Classification Evaluation for WPBC Dataset

Similar to the previous two datasets, constructed ten machine learning models without PSO feature selection and with PSO feature selection. Comparing results of these two phase are shown in Figs. 17 and 18, respectively.

Fig. 17
figure 17

Performance of ML classification methods without PSO feature selection (WPBC dataset)

Fig. 18
figure 18

Performance of ML classification models with PSO feature selection (WPBC dataset)

In Fig. 19, the AUC of with and without PSO feature selection is compared. The AUC has been improved when PSO feature selection is used.

Fig. 19
figure 19

AUC with and without PSO feature selection (WPBC dataset)

The breast cancer recurrence or non-recurrence diagnosis results for all features without PSO feature selection and using PSO feature selection is given in Tables 14 and 15, respectively.

Table 14 Breast cancer classification results without PSO feature selection (WPBC dataset)
Table 15 Breast cancer classification results with PSO feature selection (WPBC dataset)

Using proposed PSO feature selection algorithm the accuracy of DT, LGR, NN, RF and SVM models improved by 2%. Also, the accuracy of ADB and NB models improved by 3%, the accuracy of LDA and LR models improved by 1% and the accuracy of KNN model improved by 4%.

According to Fig. 20, the recurrence diagnosis result of SVM classifier with PSO feature selection algorithm has a higher accuracy percentage compared to other ML models.

Fig. 20
figure 20

Accuracy (%) with and without PSO feature selection (WPBC dataset)

Figure 21 shows the number of features selected with and without PSO feature selection for each ML model in WPBC dataset. This dataset consists of 33 features. Using the PSO feature selection algorithm, the number of features of ADB, DT, NB and RF models was reduced to 20 features and the number of features of LGR and NN models was reduced to 18 features.The number of features of KNN, LDA, LR and SVM models was reduced to 19, 21, 23, and 17 features, respectively.

Fig. 21
figure 21

Comparison of the number of features with and without PSO feature selection (WPBC dataset)

Classification evaluation metrics are given for ten machine learning models in Fig. 22.

Fig. 22
figure 22

Performance comparison for machine learning models with and without PSO feature selection (WPBC dataset)

6 Discussion

In this study, performance evaluation metrics including accuracy, sensitivity, specificity, precision, and AUC have been used.

According to the results, when the PSO feature selection is used, all classifiers perform better in terms of accuracy level than when PSO feature selection. In some classifier, the accuracy obtained is close or equal with or without PSO feature selection. But in the PSO feature selection, the results were obtained with a smaller number of features compared to all features in the datasets (Tables 16, 17).

Table 16 Performance of ADB, DT, KNN, LDA, LGR classifiers
Table 17 Performance of LR, NB, NN, RF, SVM classifiers

Also, classifiers have a higher level of precision with PSO feature selection compared to without PSO feature selection (Figs. 23, 24 and 25).

Fig. 23
figure 23

Precision (%) with and without PSO feature selection (Coimbra dataset)

Fig. 24
figure 24

Precision (%) with and without PSO feature selection (WDBC dataset)

Fig. 25
figure 25

Precision (%) with and without PSO feature selection (WPBC dataset)

In the PSO feature selection, better or equal results are obtained than selecting all features Figure 26 compares the accuracy in the PSO feature selection between the three datasets. According to this figure, the best result of the accuracy level for diagnosis of breast cancer in Coimbra dataset, obtained by the SVM classifier with an accuracy of 91%. After that ADB had the best result with an accuracy of 88%. In the WDBC dataset, the best result obtained by RF classifier with an accuracy of 100% and then ADB classifier with an accuracy of 99%. In the WPBC dataset, the best result obtained by the SVM classifier with an accuracy of 96% and then RF classifier with an accuracy of 95%. Therefore, feature selection reduces the size of the data and improves the results.

Fig. 26
figure 26

A comparison between accuracy of WPBC, WDBC and Coimbra dataset for PSO feature selection

6.1 Comparing the Results of Classifiers in Each Dataset

In this section, a comparative analysis of all evaluation metrics and their measurement for each classifier in each dataset with and without PSO feature selection was made.

When using the PSO feature selection, the accuracy result of all classifiers in three datasets were improved. In some classifiers the accuracy of PSO feature selection may be equal to without PSO feature selection, but in this cases the accuracy of PSO feature selection is obtained with significantly less number of features compared to all features and this is a great advantage of PSO feature selection.

In Coimbra dataset with 9 features, PSO feature selection algorithm improved the accuracy of breast cancer diagnosis by 2, 3, 10, 9, 8, 2, 6, 2, 4% in ADB, DT, KNN, LDA, LGR, LR, NB, NN, RF, SVM classifiers by selecting 4, 6, 6, 6, 5, 6, 5, 6, 5, 4 number of features, respectively (Fig. 27).

Fig. 27
figure 27

Performance of classification models with and without PSO feature selection (Coimbra dataset)

In WDBC dataset with 30 features, PSO feature selection algorithm improved the accuracy of breast cancer malignant or benign diagnosis by 1, 1, 1, 0, 0, 1, 1, 1, 2% in ADB, DT, KNN, LDA, LGR, LR, NB, NN, RF, SVM classifiers by selecting 21, 14, 16, 14, 9, 21, 15, 17, 19, 16 number of features, respectively (Fig. 28).

Fig. 28
figure 28

Performance of classification models with and without PSO feature selection (WDBC dataset)

In WPBC dataset with 33 features, PSO feature selection algorithm improved the accuracy of breast cancer malignant or benign diagnosis by 3, 2, 4, 1, 2, 1, 3, 2, 3, 2% in ADB, DT, KNN, LDA, LGR, LR, NB, NN, RF, SVM classifiers by selecting 20, 20, 19, 21, 18, 23, 20, 18, 20, 17 number of features, respectively (Fig. 29).

Fig. 29
figure 29

Performance of classification models with and without PSO feature selection (WPBC dataset)

Other evaluation metrics revealed that using PSO feature selection algorithm can improve their results too.

6.2 Comparison the Results with the Related Works

Table 18 shows the results of related works along with the parameters and models used in them. This research has been able to achieve a higher level of accuracy than these similar works.

Table 18 A comparison with the related works with this research

7 Conclusion and Future Work

Breast cancer is one of the leading cause of death of women in the world. There are a large number of features involved in correctly diagnosis of this disease. For this reason, reducing the number of these features and finding the effective features related to breast cancer diagnosis can help the patients and speed up the process of diagnosing. This article focused on improving the accuracy of ten machine learning classification methods by integrating Particle swarm optimization (PSO) feature selection algorithm in to them. Using PSO feature selection (FS) algorithm, machine learning classification methods are improved and the number of features is reduced. The proposed method evaluated with three common breast cancer dataset named Coimbra, WDBC and WPBC datasets. The study results showed that in the Coimbra dataset with 9 features, PSO feature selection algorithm improved the accuracy of breast cancer diagnosis by 2, 3, 10, 9, 8, 2, 6, 2, 4% in AdaBoost (ADB), Decision Tree (DT), k Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Logistic Regression (LR), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Artificial Neural Networks (NN), Random Forest (RF), Support Vector Machine (SVM) classifiers by selecting 4, 6, 6, 6, 5, 6, 5, 6, 5, 4 number of features, respectively. In this dataset, the best accuracy of 91% was obtained by the SVM classifier integrated with PSO feature selection algorithm. In WDBC dataset with 30 features, PSO feature selection algorithm improved the accuracy of breast cancer malignant or benign diagnosis by 1, 1, 1, 0, 0, 1, 1, 1, 2% in ADB, DT, KNN, LDA, LGR, LR, NB, NN, RF, SVM classifiers by selecting 21, 14, 16, 14, 9, 21, 15, 17, 19, 16 number of features, respectively. In this dataset, the best accuracy of 100% was obtained by the RF classifier integrated with PSO feature selection algorithm. In WPBC dataset with 33 features, PSO feature selection algorithm improved the accuracy of breast cancer malignant or benign diagnosis by 3, 2, 4, 1, 2, 1, 3, 2, 3, 2% in ADB, DT, KNN, LDA, LGR, LR, NB, NN, RF, SVM classifiers by selecting 20, 20, 19, 21, 18, 23, 20, 18, 20, 17 number of features, respectively. In this dataset, the best accuracy of 96% was obtained by the SVM classifier integrated with PSO feature selection algorithm. The results of this research indicates that the PSO feature selection algorithm could identify most effective features in breast cancer diagnosis. Therefore, the disease could be diagnose with higher accuracy in a shorter time and save patient’s lives. It is recommended to use other machine learning methods such as other ensemble techniques and other meta-heuristic feature selection algorithms and compare their results with the PSO feature selection algorithm.