Background

Breast cancer is the most common cancer and the leading cause of cancer death among females [1]. Early detection and diagnosis is the key to controlling the disease and to improving the survival rate, and pathological diagnosis is the most reliable gold standard of all kinds of methods. Traditional diagnostic methods mostly rely on clinicians’ personal experience and the diagnostic results may be subjectivism with certain probability. In recent years, computational diagnostic tools and artificial intelligence techniques provide automated procedures for objective judgments by making use of quantitative measures and machine learning techniques for medical diagnosis [2,3,4,5,6,7,8,9,10,11]. Similarly, the methods based on artificial intelligence technology for diagnosis of breast cancer have been proposed. Maglogiannis et al. [12] presented using support vector machine (SVM) for diagnosing the breast cancer both on Wisconsin Diagnostic Breast Cancer and the Wisconsin Prognostic Breast Cancer datasets. Kaya et al. [13] proposed a novel approach based on rough set and extreme learning machine for distinguishing the benign or malignant breast cancer. Akay et al. [14] proposed a novel SVM combined with feature selection for breast cancer diagnosis. The experimental results indicate that the proposed method can perform well in terms of accuracy, sensitivity and specificity. Given recent advances on digitized histological studies, it is now possible to use histological tissue patterns with artificial intelligence techniques-aided image analysis to facilitate disease classification [15]. In general, accurate pathological diagnosis of breast cancer depends on features, which are extracted from histopathology images. There are a lot of works for diagnosis of breast cancer based on histopathology images’ features.

Kuse et al. [16] extracted texture features from the cells to train a SVM classier that is used to classify lymphocytes and non-lymphocytes. Dundar et al. [17] proposed to segment cell regions by clustering the pixel data and to identify individual cells by a watershed-based segmentation algorithm, and a proposed MIL approach was used to identify the stage of breast lesion. Sparks et al. [18] presented a CBIR system that leveraged a novel set of explicit shape features which accurately described the similarity between the morphology of objects of interest. Basavanhally et al. [19] presented a novel framework that classifies entire images based on quantitative features extracted from fields of view of varying sizes. In each FOV, cancer nuclei were automatically detected and used to construct graphs (Voronoi Diagram, Delaunay Triangulation, Minimum Spanning Tree). Features describing spatial arrangement of the nuclei were extracted and used to train a boosted classifier that predicts image class for each FOV size.

In all aforementioned works, an objective phenomenon can be found that these studies were usually conducted on the low-level features on image pixels and the high-level ones were discard, which means that these studies may not express prior medical knowledge. Therefore, in this paper, we proposed to diagnose the breast cancer using the high-level features which were defined based on the prior medical knowledge. This definition relies on two very experienced pathologists. Because these features include the experience of doctors, doctors with clinical experience have a high ability to differentiate between breast tumors and breast cancer in general, and have better comprehensibility. We extracted a set of high-level features, including 13 key features, which were the basis for the classification and grading of breast pathology. Based on these features, the pathological data of 470 cases were analyzed by two pathological experts. Then, we proposed a novel learning framework based on SVM for distinguishing malignant breast cancer from the healthy ones. As we all know, the two key parameters in classic SVM are penalty factor and width of kernel function, which traditionally treated by means of grid search and gradient descent. However, these methods are easy to get into local optimal solutions. Recently, some bio-inspired metaheuristic search algorithms (such as genetic algorithms (GA) [20,21,22,23], particle swarm optimization algorithms (PSO) [24,25,26,27], the fruit fly optimization (FOA) [28], moth-flame optimization (MFO) [29]) have made it easier to find the global optimal solution. As a new member of the swarm-intelligence algorithms, FOA [30] is inspired by the foraging behavior of real fruit flies. The FOA has certain outstanding merits, such as a simple computational process, simple implementation, and easy understanding with only a few parameters for tuning. Due to its good properties, FOA has become a useful tool for many real-world problems [10, 28, 31,32,33].

Compared with gradient descent method and grid search method, like other swarm intelligence methods [34, 35], FOA is a global optimization method, which can find the global optimal solution or approximate optimal solution more easily. However, the traditional FOA algorithm has the possibility of falling into the local optimal solution for complex optimization problems, and the convergence rate is not very ideal. Therefore, this paper introduces the Levy flight (LF) strategy to update the positions of fruit flies to further improve its convergence speed, while reducing the probability of FOA falling into the local optimal. LF strategy has been used widely to enhance the lots of metaheuristic algorithms [36,37,38,39,40,41]. The principle of LF strategy can ensure the diversity of algorithms in the process of optimization [42,43,44] and improve the convergence rate. In this study, the improved FOA method, LFOA, was utilized to optimize the two key parameters pair including penalty factor and width of kernel function in SVM method and obtain the optimal model (LFOA-SVM). Furthermore, this model will be investigated to diagnose the breast cancer on high-level features dataset. As far as we know, this paper is the first to solve the parameter optimization problem of SVM with LFOA. In the experiment, a 10-fold cross-validation method was used on data to make detailed comparison between LFOA--SVM, FOA-SVM (model based on the primitive fruit fly optimization model), GA-SVM (model based on genetic algorithms), PSO-SVM (model based on particle swarm optimization algorithms), random forest (RF), back propagation neural network (BPNN) and SVM. The experimental results demonstrated that the proposed LFOA-SVM was superior to other methods in terms of classification accuracy, Mathews correlation coefficient (MCC), sensitivity and specificity.

The rest of this paper is organized as follows. In Preliminaries Section background information used in the study was introduced. In Methods Section the detailed implementation of the proposed method was presented. In Results and discussion Section, experimental designs, results and discussion were delivered. Finally, in Conclusion Section the conclusions and recommendations for future work were summarized.

Preliminaries

Support vector machine

Support Vector Machine (SVM) [45] is a supervised learning model and related learning algorithm for analyzing data in classification and regression analysis. Given a set of training instances, each training instance is marked as one or the other of two classes, the SVM training algorithm creates a model that assigns a new instance to one of two classes, making it a non-probabilistic binary linear classifier.

The SVM model is to represent instances as points in space, so that the mapping allows instances of separate categories to be separated by as wide and distinct intervals as possible. Then, new instances are mapped to the same space and the category is predicted based on which side they fall in the interval. In addition to linear classification, SVM can also use the so-called kernel technique to effectively perform nonlinear classification, mapping its input implicitly into the high-dimensional feature space.

More formally, support vector machines construct hyperplanes in high-dimensional or infinite-dimensional spaces. Which can be used for classification, regression or other tasks. Intuitively, the farther away the nearest training data point is, the better, because this can reduce the generalization error of the classifier.

Fruit-Fly optimization algorithms

The fruit fly optimization algorithm (FOA) [30] was a meta-heuristic algorithm which is inspired by the foraging behavior of fruit fly. Fruit fly relies on vision and smell to position food during foraging. FOA searches for solution space by mimicking the way of fruit fly flight when solving optimization problems. In FOA, first, the fruit fly population (candidate solution) is randomly generated in the solution space, and then each fruit fly will update its position according to the flight mode of the fruit fly. Fruit fly population continuously improves the fitness of the population (quality of solution) during the iterative process.

figure a

Levy flight

Levy flight (LF) mechanism is often used to improve meta-heuristics because its characteristics are similar to the movement of many animals in nature. The phenomena is called Levy statistics [46]. The LF is essentially stochastic non-Gaussian walks. Its step value is dispersed relative to Levy stable distribution. Levy distribution can be represented as the following equation:

$$ Levy(s)\sim {\left|s\right|}^{-1-\beta },0<\beta \le 2 $$
(1)

β represents an important Levy index to adjust the stability, s is the step length.

Methods

Levy flight enhanced FOA (LFOA)

Levy’s flight is characterized by short steps and random directions. This feature can effectively avoid the whole population falling into local optimum, thus enhancing the global detection ability of the algorithm. In this paper, we have introduced the LF strategy into to FOA to explore the search space more efficiently. The new position is updated according to the following rule.

$$ {X}_i^{levy}={X}_i+{X}_i\oplus levy(s) $$
(2)

where \( {X}_i^{levy} \) is the new position of the ith search agent Xi after updating.

Proposed LFOA-SVM model

This study proposes a novel evolutionary SVM that employs the LFOA strategy, and the resultant LFOA-SVM model can adaptively determine the two key hyper-parameters for SVM. The general framework of the proposed method is demonstrated in Fig. 1. The proposed model is primarily comprised of two procedures: the inner parameter optimization and the outer classification performance evaluation. During the inner parameter optimization procedure, the SVM parameters are dynamically adjusted by the LFOA technique via the 5-fold cross validation (CV) analysis. Then, the obtained optimal parameters are fed to the SVM prediction model to perform the classification task for breast cancer diagnosis in the outer loop using the 10-fold CV analysis. The classification accuracy was used as the fitness function.

$$ fitness=\left({\sum}_{i=1}^K AC{C}_i\right)/k $$
(3)

where ACCi represents the average accuracy achieved by the SVM classifier via 5-fold CV.

Fig. 1
figure 1

Flowchart of LFOA-SVM

The main steps conducted by the LFOA-SVM are described in detail as follows:

  • Step 1: Initialize the input parameters for LFOA, include population size, maximum number of iterations, upper bound of the variables, and lower bound of the variables, the dimension of the problem.

  • Step 2: Randomly generated the position of the fruit fly swarm based on the upper and lower bounds of the variables.

  • Step 3: Generate initial population for LFOA based on the position of the fruit fly swarm.

  • Step 4: Evaluate the fitness of all fruit flies in population by SVM with the position of fruit fly as parameters.

  • Step 5: Take the position of the best fruit fly as the position of the fruit fly swarm (global optimum).

  • Step 6: Update the position of each fruit fly in the swarm with Levy-flight mechanism and evaluate the fitness of the fruit fly.

  • Step 7: Update global optimum if the fitness of the best individual in the fruit fly population is better than the global optimum.

  • Step 8: Update iteration t, t = t + 1. If t larger than maximum number of iterations, go to step 6.

  • Step 9: Return the global optimum as the optimal SVM parameter pair (C, γ).

Results and discussion

Data description

The data were collected from Wenzhou people’s Hospital from 2004 to 2015. Four hundred seventy objects have been selected as the research objects. There are 232 benign cases and 238 malignant cases. Based on the prior medical knowledge of the classification and grading of breast pathology, we proposed a set of features descriptor with the help of two well-experienced pathologist from Wenzhou people’s hospital of China. A total of 14 key features were included and quantified in this study. Table 1 gives the brief description and quantization of these features.

Table 1 The brief descriptions and quantization of features used in this study

Experimental setup

The LFOA-SVM, FOA-SVM, PSO-SVM, GA-SVM, RF, BPNN and ELM classification models were implemented using the MATLAB platform. For SVM, the LIBSVM implementation was utilized, which was originally developed by Chang and Lin [47]. For RF, the code package from https://code.google.com/archive/p/randomforest-matlab/ was adopted. We implemented the LFOA, FOA, GA and PSO from scratch. The computational analysis was conducted on a Windows Server 2008 operating system with Intel Xeon CPU E5–2650 v3(2.30 GHz) and 16GB of RAM.

In order to conduct an accurate comparison, the same number of generations and the same population swarm size were used for FOA, PSO, and GA. According to the preliminary experiment, when the number of generations and the swarm size are set to 250 and 8, respectively, the involved methods produce a satisfactory classification performance. For the metaheuristic methods, the same searching range of the parameters C∈[2− 5, 215] and γ∈ [2–15, 2] was used. The parameter settings for relevant algorithms are shown in Table 2.

Table 2 The parameter settings for the relevant methods

The k-fold CV [48] was used to evaluate the classification performance of the model. A nested stratified 10-fold CV was used for the purposes of this study [49]. To evaluate the proposed method, commonly used evaluation criteria such as classification accuracy (ACC), sensitivity, specificity and Matthews Correlation Coefficients (MCC) were analyzed.

Benchmark function verification

To verify the performance of the proposed method LFOA, we use a common set of 23 benchmark functions, including unimodal, multimodal, and fixed-dimension multimodal. The formulas and brief descriptions of these functions can be seen in Tables 3, 4 and 5.

Table 3 Unimodal benchmark functions
Table 4 Multimodal benchmark functions
Table 5 Fixed-dimension multimodal benchmark functions

Moreover, the performance of the LFOA is also compared with the original FOA, MFO, BA, DA, FPA, PSO, and SCA. The relevant parameter settings for the algorithm mentioned above for comparison refer to the previous papers, and as shown in Table 2, specific parameter values have been listed. In order to obtain more accurate experimental results, 30 independent experiments are performed on each test function, and the average value is calculated as the final result of each algorithm. The number of iterations and population size of the algorithm are set to 500 and 30, respectively. The results obtained are reported in Table 6 and Fig. 2. The average (Avg.), standard deviation (Std.) and rankings of the different algorithms in solving the f1-f23 test functions are displayed in Table 6.

Table 6 Results of testing benchmark functions
Fig. 2
figure 2

Convergence curves of LFOA and other algorithms for f1, f2, f3 and f4

As shown in Table 6, on the seven unimodal functions, according to the results of the improved LFOA and other algorithms, it can be clearly seen that except for the function f7, the results achieved on f1-f6 is better than the original FOA and the other six algorithms. For f7, the FOA performs well for 30-dimension problem. For six multimodal functions, the LFOA method surpasses the other competitors on f9-f13. From the results for f8, although our improved algorithm LFOA could not search much better solutions, there is no doubt that LFOA is still very competitive compared to the original FOA. For ten fixed-dimension multimodal functions, LFOA has attained the exact optimal solutions for 30-dimension problem f15. For other nine functions (f14 and f16-f23), although in dealing with some problems the improved LFOA is not better than other methods, it is observed that the optimization effect of proposed LFOA is still improved compared with the original FOA. Moreover, based on rankings, the LFOA is the best overall technique and the overall ranks show that FOA, FPA, BA, SCA, MFO, DA, PSO algorithms are in the next places, respectively.

The convergence trends of LFOA and other methods for different test functions (f1, f2, f3, f4, f10, f11, f12 and f13) are depicted in Figs. 2 and 3

Fig. 3
figure 3

Convergence curves of LFOA and other algorithms for f10, f11, f12 and f13

. From f1, it can be clearly seen that LFOA can take the lead in the initial stage and jump out of the local optimal solution compared with the other seven algorithms. From f2, the improved LFOA can reveal a fast convergence behavior and finally achieved the best solution. It is shown that the LFOA algorithm has the fastest convergence speed initially when using f3. It can be found that f4 and f1 have the same convergence phenomenon. From f10 and f11, the proposed LFOA shows a faster convergence rate in the early stages, but other algorithms are all trapped in local optima due to the weaker search capability. From f12, f13, the original FOA and the improved LFOA have a very fast convergence speed in the early stage, but the difference between FOA and LFOA is that FOA failed to escape from the local optimal solution in the later stage. From Figs. 2 and 3, we can conclude that the proposed algorithm not only has prominent advantages over other algorithms, but also converges very fast on most problems.

In summary, from Table 6 and Figs. 2 and 3, it can be seen that the improved LFOA has outstanding search advantages and faster optimization convergence than other counterparts.

Results on the breast cancer diagnosis

In this section, the performance of the proposed model in the diagnosis of breast cancer has been thoroughly tested and analyzed. Table 7 shows the detailed results obtained by the LFOA-SVM model in the experiment. On average, the model achieves a classification accuracy of 93.83%, sensitivity of 91.22%, specificity of 96.53% and MCC of 0.8799.

Table 7 Classification performance of LFOA-SVM

The proposed model and other six machine learning models including FOA-SVM, GA-SVM, PSO-SVM, RF, BP and ELM were tested simultaneously on the breast cancer dataset and the results are shown in Fig. 4. The figure reveals that the LFOA-SVM model is better than the FOA-SVM model in four evaluation metrics because compared with FOA-SVM, the ACC of LFOA-SVM is not only higher, but also the standard deviation is much smaller. On the ACC metric, the LFOA-SVM model obtained the best results. The results obtained by FOA-SVM and PSO model are very close behind the LFOA-SVM model, followed by RF, GA-SVM and ELM. The BP model has the worst result. On the Sensitivity metric, the PSO-SVM model obtains the best results. LFOA-SVM achieved the second place, followed by RF, BP, FOA-SVM and GA-SVM. The result obtained by ELM is the worst. On the Specificity metric, LFOA-SVM model obtained the best results. ELM achieved the second place. The results obtained by FOA-SVM and PSO model are very close behind the ELM, followed by GA-SVM and RF, the result obtained by GA -SVM and RF are very similar. The result obtained by BP is the worst. On the MCC metric, the LFOA-SVM model still obtains the best results. The PSO-SVM is in the next place, followed by FOA-SVM, RF, GA-SVM and ELM. The result obtained by BP is the worst.

Fig. 4
figure 4

Classification performance obtained by the seven methods in terms of ACC, sensitivity, specificity and MCC

For comparison purpose, we have also recoded the detailed results of the confusion matrix for LFOA-SVM and FOA-SVM. As shown in Table 8, we can see that LFOA-SVM correctly identifies 216 malignant tumors and 225 benign tumors, and misjudges 22 malignant tumors as benign tumors and 7 benign tumors as malignant tumors. FOA-SVM correctly identifies 215 malignant tumors and 220 benign tumors, misjudges 23 malignant tumors as benign tumors and 12 benign tumors as malignant tumors. The results indicate that LFOA is superior to FOA in the recognition of malignant tumors and benign tumors.

Table 8 Confusion matrix obtained by the proposed LFOA-SVM and FOA-SVM

In order to comprehensively evaluate the performance of the model, the convergence curve of the model based on the meta-heuristic algorithms in the training process is also compared and analyzed. The convergence curves of the four models are presented in Fig. 5. As shown, LFOA-SVM model not only has a very fast convergence speed but also achieves the highest classification accuracy. However, FOA-SVM model has a slow convergence speed. The main reason is that LF mechanism can improve the global search ability of FOA. Inspecting the curves in Fig. 5, The FOA-SVM model needs more iterations to converge and the obtained solution is not better than that of LFOA-SVM model. The GA-SVM model converges after a few iterations, which reveals the GA has a weak global search capability, it takes a long time to jump out of the local optimum, and the final result is not satisfactory.

Fig. 5
figure 5

Relationship between the iteration and training accuracy of LFOA-SVM, FOA-SVM, PSO-SVM, and GA-SVM

Discussions

In this study, a new support vector machine model (LFOA-SVM) based on LF strategy enhanced FOA is proposed to diagnose the breast cancer. The main novelty lies in the improved FOA strategy (LFOA) was proposed for the first time and applied to predicting the breast cancer from the perspective of the high-level features as well. Compared with the original FOA and other optimizers, LFOA can achieve the better solution and has a faster convergence speed as well. LFOA has aided SVM to achieve much more suitable parameters for learning and thus get the higher prediction performance for breast cancer diagnosis. The experimental results have demonstrated that the LFOA-SVM model has achieved better performance than the other competitive counterparts.

The main contributions of this study are as follows:

  1. a)

    First, in order to fully explore the potential of the SVM classifier, we introduce a levy flight strategy-enhanced FOA to adaptively determine the two key parameters of SVM, which aided the SVM classifier in more efficiently achieving the maximum classification performance.

  2. b)

    The resulting model, LFOA-SVM, is applied to serve as a computer-aided decision-making tool for diagnosing the breast cancer from high-level features for the first time.

  3. c)

    The proposed LFOA-SVM method achieves superior results and offers more stable and robust results when compared to the other SVM models.

Conclusions

This paper has developed an effective LFOA-SVM method which can well diagnose the breast cancer in clinical diagnosis and provide doctors with meaningful clinical decision. The proposed method has achieved a classification accuracy of 93.83%, sensitivity of 91.22%, specificity of 96.53% and MCC of 0.8799 for breast cancer diagnosis based on the high-level features.

Improving the LFOA method via introducing the mechanisms such as mutation strategy or the opposition-based learning strategy is our future research direction. In addition, we will plan to apply the method to other related disease diagnosis problems.