Introduction Lung cancer has the highest mortality rate in the world. The latest statistics show that in cancers, both the incidence of lung cancer and the death rate of lung cancer rank first for Chinese males. Regarding the statistics in Chinese females, the incidence of lung cancer ranks second, and the death rate of lung cancer ranks first ( Cao et al. 2021). According to the number of cases, lung cancer, which increases by approximately 781000 cases every year, ranks first in China. Lung cancer is also the most common type of cancer that kills both men and women in China. According to other statistics, lung cancer is also the main cause of cancer mortality in the USA, accounting for 22% of all male cancer deaths and 22% of all female cancer deaths (Siegel et al. 2021). Early surgery is the most effective treatment for lung cancer, but most patients are diagnosed in the advanced stages of the disease. In 2015, the European Imaging Society and the European Respiratory Society published the latest white paper on lung cancer screening in the European Respiratory Journal (ERJ) , aiming to guide clinical lung cancer screening to facilitate the early detection and treatment of lung cancer (Kauczor et al. 2015). Pulmonary nodule detection and benign or malignant recognition, which can improve the survival chances of individuals, play a vital role in the early diagnosis of lung cancer. However, the five-year survival rate of lung cancer patients is only 10%–15% (Stewart et al. 2019). If health management can be strengthened to achieve the early screening, detection and treatment of lung cancer, the cure rate of patients can increase to 65%, which can effectively improve the survival rate of lung cancer patients and prevent them from missing the best treatment opportunity (Stewart et al. 2019; Liu et al. 2017).

Early detection of lung cancer can be achieved effectively by computed tomography (CT) or positron emission tomography (PET). Pulmonary nodules, which are the early manifestations of lung cancer, are round or quasi-round dense shadows with a diameter of less than 30 mm on CT images. The Lung CAD system is a comprehensive application combining medical image processing, computer vision and machine learning technology that quickly and accurately detects and identifies lung nodules from CT or PET images to provide an efficient screening scheme. At the same time, the Lung CAD can also reduce the visual fatigue of radiologists caused by excessive reading, provide auxiliary second opinions for radiologists, and help radiologists improve the detection efficiency and accuracy of benign or malignant pulmonary nodule recognition. Standard Lung CAD systems usually include image preprocessing, pulmonary parenchymal segmentation, segmentation of a candidate nodule ROI or volume of interest (VOI), the calculation and selection of ROI or VOI features, and benign or malignant pulmonary nodule recognition.

In recent years, machine learning (ML) has been widely applied to Lung CAD systems, especially the support vector machine (SVM) algorithm, which has achieved certain success (Orozco et al. 2015; Manikandan and Bharathi 2016; Zhou et al. 2016; Jose et al. 2017; Li et al. 2018; Jia et al. 2020; Tong et al. 2021). Designing suitable kernel functions for a given problem is a common challenge for both SVM and kernel learning (KL) methods. Compared with the accuracy and stability of classification in single kernel learning (SKL), those in multikernel learning (MKL) can be improved. Bucak et al. (2014) comprehensively described the MKL method, and the latest progress in this field is also noted. Wang et al. (2021) investigated a careful exploration of the connection between MKL methods and deep learning methods, and provided valuable references for future research directions. The MKL-SVM algorithm, which is constructed by combining SVM with MKL, can effectively achieve both learning and generalization ability. The MKL-SVM algorithm was used to identify pulmonary nodules through the study conducted by Li et al. (2013, 2018). More specifically, Li et al. (2013) proposed a hybrid kernel SVM algorithm in which the ROI characteristics are first calculated and designed for the recognition of pulmonary nodules so that multiple recognition indicators, such as the accuracy (ACC), sensitivity (SEN), receiver operating characteristic (ROC) curve and area under the ROC curve (AUC), are simultaneously improved. Furthermore, as the swarm intelligence idea was introduced, the MKL-SVM-PSO algorithm was proposed, the fitness value and optimization process under different inertia weights were discussed, and relatively ideal nonlinear dynamic inertia weights were obtained ( Li et al. 2018). The experimental results show that the proposed algorithm can converge to the optimal fitness value quickly, effectively and smoothly, which verifies the effectiveness of the algorithm.

In addition, another high-profile technology in machine learning is deep learning; with the rise of deep learning, many researchers in the field of medical image processing have also introduced deep learning to their research (Tajbakhsh and Suzuki 2016; Hongtao et al. 2018; Monkam et al. 2019; Puttagunta and Ravi 2021). Zhao et al. (2018) developed a hybrid convolutional neural networks (CNN) of LeNet and AlexNet, at the same time, the CNN parameter optimization method was explored, and 1018 cases in lung image database consortium and image database resource initiative (LIDC-IDRI) were selected as experimental data. The final accuracy reached 82.2%, which is helpful for the diagnosis of nodules. To improve the accuracy of nodule recognition, Lin et al. (2020) proposed a model using Taguchi method for parameter optimization based on a two-dimensional CNN, and verified the effectiveness of the model in LIDC-IDRI and SPIE-AAPM data sets, respectively. The results showed that the proposed method can improve the accuracy of recognition. In recent years, there has also been interest in combining deep learning methods with traditional machine learning methods. Based on the CNN architecture, Zhang et al. (2019) used multiple deep CNNs to classify lung nodules to build an ensemble learning model and selected 743 medical images in the LIDC-IDRI for experiments: first, deep CNNs with different architectures are used to predict the nodules; then, voting, SVM, decision trees and other methods are selected to achieve the fusion of different prediction results, and finally, the classification of nodules is achieved. Bansal et al. (2020) proposed to extract deep features and handcrafted features using ResNet network and morphological techniques for the lung nodules classification problem, and XGBoost was selected for classification after feature combination, with an experimental accuracy of 88.30%, which is better than the other techniques mentioned. Polat and Danaei Mehr (2019) proposed to use AlexNet and GoogleNet network as a new 3D-CNN model, combined with SVM as a classifier to achieve lung nodule classification. Experiments showed that combining deep learning methods with SVM can improve model performance, and finally the accuracy can achieve 91.8%.

Deep learning has played an important role in medical image processing in recent years. Because of its excellent learning ability, high fitting ability, and high accuracy, it is an important tool in the field of medical research. However, the training stage of a deep learning model depends on a large amount of medical image data, and it is still not easy to obtain a high-quality data set with accurate annotations (Francisco and Azuaje 2019). At the same time, an increase in training data will also lead to longer training times and more complex processing. The establishment of deep learning models has a high demand for equipment. At present, in most cases, experiments can only be conducted through transfer learning, there is a lack of a deep theoretical foundation in mathematics, and the interpretability is low. Nevertheless, deep learning can still achieve excellent experimental results and is widely used. The traditional SVM algorithm is a typical small sample learning method with a solid statistical theoretical foundation and strong interpretability (Abe 2010). The complexity of the calculation depends on the number of support vectors rather than the dimensionality of the sample space, which avoids the curse of dimensionality and, at the same time, has a better generalization ability. In addition, SVM has low requirements for the experimental environment and is easy to implement. However, SVM is not ideal for multiclassification problems, as it is not easy to achieve large-scale training samples, and SVM is also sensitive to the selection of the parameters in the kernel functions. However, at present, in the case of an insufficient data set and the lack of sophisticated experimental equipment, SVM is still a feasible method.

Therefore, for the problem of lung nodule recognition, this paper selects SVM for nodule classification and improves the kernel function of the SVM and parameter optimization method to achieve better experimental results.

The MKL-SVM algorithm has achieved certain results in Lung CAD recognition, but there are still two problems that need to be solved:

  1. 1.

    In the process of medical image recognition, the positive and negative class samples are often unbalanced. If the same penalty coefficient is used for samples of different classes, the classification hyperplane of SVM will be skewed toward the minority class, which will cause overfitting for samples of the majority class, making the samples of the minority class more likely to be misclassified than the samples of the majority class (Tao et al. 2019). In the Lung CAD system, in order to prevent the missed detection of nodules, a large number of false positives are usually retained in pulmonary nodule ROI segmentation step before recognition, so that the number of false positives in the extracted candidate nodule ROI is much higher than true positives. However, due to the dominant influence of more false positive nodules, the recognition model tends to pay less attention to the true positive nodules during the learning process or even ignore it, which easily leads to the missed detection of true positive nodes. Therefore, it is significant to solve the class imbalance problem to improve the performance of the system.

  2. 2.

    The evaluation indicators are diverse due to different evaluation perspectives in pulmonary nodule recognition, and there is no unified standard. Therefore, it is difficult to take into account multiple indicators to evaluate pulmonary nodule recognition. The accuracy was taken as the evaluation standard, that is, the overall recognition effect was taken as the evaluation standard of the Lung CAD system (Sui et al. 2015). However, the nodule detection rate was ignored. The sensitivity was taken as the evaluation standard of the Lung CAD system, and the nodule detection degree was represented by the sensitivity (Wang et al. 2015; Javaid et al. 2016). However, too many false nodules are easily identified as nodules, and too many misjudgments occur when the pulmonary nodule ROIs are not balanced, causing psychological pressure and additional examination burdens for patients. The ROC curve and AUC were taken as the evaluation indexes by some researchers (Aoyama et al. 2003; Gao et al. 2015; Gonçalves et al. 2017; da Nóbrega et al. 2018). Other researchers have also explored this issue(Orozco et al. 2015; Demir and Yılmaz Çamurcu 2015; de Carvalho Filho et al. 2017; Li et al. 2019; Ye et al. 2020), and ACC, SEN and SPE are used to evaluate the experimental results on the test set. However, there was no evaluation of the objective function of the training model, and only the test results were discussed. (Sui et al. 2015) discussed this problem, giving attention to multiple indicators; for the evaluation of SVM, an \(F_{measure}\) function was adopted to evaluate the accuracy and sensitivity of the positive sample classification results, where the closer the function is to 1, the better the classification result. Although there are many classification algorithms that consider multiple evaluation indicators, there is still much potential for research to optimize and improve the MKL-SVM algorithm.

The main contributions of this article are as follows:

  1. 1.

    In the pulmonary nodule recognition problem, a cost-sensitive multikernel learning support vector machine (CS-MKL-SVM) algorithm was proposed for the imbalance of positive and negative samples, which is likely to cause too many missed detections. Different penalty coefficients are used for true positive nodules and false positive nodules to increase the penalty for misclassified samples, so that the model can better learn the features of true positive to improve the recognition results.

  2. 2.

    Since the CS-MKL-SVM algorithm has many parameters, the CS-MKL-SVM algorithm based on PSO is proposed in this paper. The grid search algorithm can be used to find the optimal parameter set in the cross-validation. However, due to the numerous parameters and oversized nested loops and levels, the number of calculations and training time are indeed too long in the CS-MKL-SVM algorithm. In addition, if the grid search step size is not dense enough, the global optimal solution is difficult to find. However, a swarm intelligence algorithm can find the optimal parameter set quickly without traversing all the parameter points. Hence, in this paper, the CS-MKL-SVM algorithm is proposed based on PSO.

  3. 3.

    To better balance the overall accuracy and the detection rate of nodules, to prevent the missed detection of nodules, this paper proposes a F-new function as the PSO evaluation standard for parameter optimization in the CS-MKL-SVM algorithm. Moreover, in this algorithm, F-new under the cross-validation is used as the fitness function value to replace the traditional ACC or SEN single-indicator evaluation system. The theoretical analysis of the F-new function proves that the function can take into account both ACC and SEN, and the experimental results further show the feasibility of this index. The algorithm in this paper can ensure the accuracy of the overall nodule recognition and at the same time improve the detection rate of nodules.

The rest of the article is organized as follows. Section 2 introduces some related work on the CS-SVM algorithm. In Section 3, the proposed framework is specified. Section 4 shows the experimental results of the experiments in this paper. Finally, a conclusion of the work is shown in Section 5.

1 CS-MKL-SVM Algorithm for Unbalanced Data Sets

1.1 Cost-Sensitive MKL-SVM Algorithm

The same penalty parameters, which are the regularization coefficients named C, are used in the standard SVM for positive and negative samples. The values of the penalty parameters determine how the maximum class interval and the minimum training error compromise in SVM. However, the classification will be skewed toward the side containing fewer samples when the positive and negative sample data sets are seriously unbalanced in SVM, thus affecting the classification accuracy. For example, if the same penalty coefficient C is used when the positive sample is small, the sum of errors corresponding to the positive samples (cost function or loss function) will be less than the sum of errors corresponding to the negative samples (cost function or loss function). In other words, a large penalty coefficient is applied to the negative samples so that the segmentation sample plane moves to the side of the positive samples. Based on the MKL-SVM algorithm, a cost-sensitive MKL-SVM (CS-MKL-SVM) algorithm is proposed to solve the positive and negative sample imbalance. Different penalty coefficients \(C_{+}\) and \(C_{-}\) are introduced to the positive and negative samples to flexibly adjust the misclassification costs of the false positives and false negatives.

The training pattern is expressed as:

$$\begin{aligned} T = \left\{ (x_{i},y_{i}) \right\} \in (X \times Y)^l \end{aligned}$$
(1)

where the parameter l is the number of training patterns. The parameter \(x_{i}\) is the input vector of SVM, and the parameter \(y_{i}\in \left\{ -1,+1\right\} \) is the category label. In this paper, \(x_{i}\in R^n \) corresponds to the eigenvector that is extracted from the \(i^{th}\) sample from the n-dimensional ROI. The value of parameter \(y_{i}=1\) corresponds to the positive samples, which are the true positive nodules and are malignant tumor lesions. The value of parameter \(y_{i}=-1\) denotes negative samples, which are false nodules or false tumors.

The mathematical model of the original CS-MKL-SVM algorithm is described as follows:

$$\begin{aligned} \left\{ \begin{array}{l} \min \;\mathrm{{ }}\frac{1}{2}{\left\| w \right\| ^2} + {C_ + }\sum \limits _{{y_i} = + 1} {{\xi _i}} + {C_ - }\sum \limits _{{y_i} = - 1} {{\xi _i}} \\ s.t.\;\mathrm{{ }}\sum \limits _{i = 1}^l {{y_i}} \left( {w \cdot \varPhi \left( {{x_i}} \right) + b} \right) \ge 1 - {\xi _i}\\ \;\;\;\;\;\;\;\;{\xi _i} \ge 0\\ \;\;\;\;\;\;\;\;i = 1,2, \ldots ,l \end{array} \right. \end{aligned}$$
(2)

The parameter \(\xi _ i\) is the slack variable, b is the constant bias, and \(\varPhi (x_{i})\) is the input sample mapping in the high-dimensional feature space. The original CS-MKL-SVM problem is transformed into a Lagrangian dual problem by a Lagrange multiplier:

$$\begin{aligned} \left\{ \begin{array}{l} \max \;\mathrm{{ }}\sum \limits _{i = 1}^l {{\alpha _i}} - \frac{1}{2}\sum \limits _{i = 1}^l {\sum \limits _{j = 1}^l {{\alpha _i}} } {\alpha _j}{y_i}{y_j}K\left( {{x_i},{x_j}} \right) \\ s.t.\;\;\;\mathrm{{ }}\sum \limits _{i = 1}^l {{\alpha _i}} {y_i} = 0\\ \;\;\;\;\;\;\;\;\mathrm{{0}} \le {\alpha _i} \le {C_ + },{y_i} = + 1\\ \;\;\;\;\;\;\;\;\mathrm{{0}} \le {\alpha _i} \le {C_ - },{y_i} = - 1 \end{array} \right. \end{aligned}$$
(3)

where

$$\begin{aligned} K(x_{i},x_{j})= \varPhi (x_{i}) \cdot \varPhi (x_{j}) \end{aligned}$$
(4)

The discriminant function is as follows:

$$\begin{aligned} f\left( x \right) = {\mathop {\mathrm { sgn}}} \left( {\sum \limits _{i = 1}^l {{\alpha _i}{y_i}K\left( {{x_i},x} \right) + b} } \right) \end{aligned}$$
(5)

where the function \(K(x_{i},x_{j})\) is the kernel function involving only the inner product of the input sample vectors, and the value is constant. The kernel function is a symmetric function, and two n-dimensional real eigenvectors are converted into a real number using the kernel function, i.e., \(K:(R^n \times R^n \rightarrow R)\). As long as the Mercer condition is satisfied, the kernel function \(K(x_{i},x_{j})\) will take the dot product of a feature space. One of the keys to improving the performance of SVM is to design suitable kernel functions for a given problem. The common kernel functions include the polynomial kernel function and radial basis function (RBF) which are expressed as \(K_{poly}\) and \(K_{rbf}\) :

$$\begin{aligned} K_{poly}(x,y)= & {} (x^ty+1)^d \end{aligned}$$
(6)
$$\begin{aligned} K_{rbf}(x,y)= & {} exp(-\Vert x-y \Vert ^2/2g^2) \end{aligned}$$
(7)

where the parameter d represents the polynomial kernel degree and g represents the RBF kernel width. The parameters d and g are superparameters that need to be given in advance. The RBF kernel has a strong learning ability, and the polynomial kernel has a strong generalization ability; thus, the combination of the two can take into account the abilities of both learning and generalization. It has been proven that equation (8) is still a kernel function and that the combination of the RBF kernel and polynomial kernel is a convex combination (Li et al. 2018).

$$\begin{aligned} \left\{ \begin{array}{l} {K_{mix}}\left( {{x_i},{x_j}} \right) = m{K_{poly}}\left( {{x_i},{x_j}} \right) + \left( {1 - m} \right) {K_{rbf}}\left( {{x_i},{x_j}} \right) \\ 0< m < 1 \end{array} \right. \nonumber \!\!\!\! \\ \end{aligned}$$
(8)

where m is the weight coefficient of the polynomial kernel function, and the proportion of each basic kernel function in the multikernel function can be adjusted freely by m. The parameter \(K_{mix}\) in equation (8) is taken as the multikernel function K in the CS-MKL-SVM algorithm.

The CS-MKL-SVM parameters are numerous, including the regularization coefficients \(C_+\) and \(C_-\) of the positive and negative samples, the polynomial kernel degree d, the RBF kernel width g, and the multikernel weight m. The grid search algorithm can be used to find the optimal parameter set in the sense of cross-validation (CV). However, due to the excessive numbers of parameters in the nested cycles and layers, grid searching leads to a large number of calculations and a long running time. Moreover, the global optimal solution cannot be found if the grid search step size is not dense enough. However, the optimal parameter set can be found quickly by a heuristic algorithm without traversing all parameter points.

1.2 CS-MKL-SVM Optimization Algorithm based on PSO

Particle swarm optimization (PSO), which searches by tracking the optimal particle in the solution space, is a heuristic algorithm based on swarm intelligence.

Suppose that in a D-dimensional search space, the species group \(X=(X_1,X_2,\ldots ,X_n)\) is composed of n particles, and \(X_i=(x_{i1},x_{i2},\ldots ,x_{iD})^T\) represents the position of the \(i^{th}\) particle in the D-dimensional parameter searching space. The fitness value corresponding to each particle \(X_i\) can be calculated according to the objective function. The velocity of the i th particle is \(V_i=(V_{i1},V_{i2},\ldots ,V_{iD})^T\), its individual extremum is \(P_i=(P_{i1},P_{i2},\ldots ,P_{iD})^T\), and the group extremum is \(P_g=(P_{g1},P_{g2},\ldots ,P_{gD})^T\). In each iteration, the particle updates its speed and position through the individual extremum and group extremum, and the updated expression is as follows:

$$\begin{aligned} V_{id}^{k+1}= & {} \omega V_{id}^{k}+c_1r_1(P_{id}^{k}-X_{id}^{k})+c_2r_2(P_{gd}^{k}-X_{id}^{k}) \end{aligned}$$
(9)
$$\begin{aligned} X_{id}^{k+1}= & {} X_{id}^{k}+V_{id}^{k+1} \end{aligned}$$
(10)

where k is the current number of iterations, \(\omega \) is the inertia weight, \(d=1,2,\ldots , D\), \(V_{id}\) is the velocity of the \(i^{th}\) particle in the D-dimensional space,\(c_1\) and \(c_2\) are nonnegative constant acceleration factors, and the parameters \(r_1\) and \(r_2\) are random numbers distributed in the interval [0, 1]. To prevent the blind search of particles, the velocity and position are usually limited to \([-V_{max},V_{max}]\) and \([-X_{max},X_{max}]\) respectively.

1.3 The New Target Function based on the F-score

In the classic PSO algorithm, a single evaluation metric in the sense of CV is usually taken as the final goal and is determined as the fitness function value, but it cannot evaluate the recognition results well for special problems. Therefore, the concept of harmonic mean in statistics is introduced. In statistics, the harmonic mean is the inverse of the arithmetic mean of the inverse of each statistical variable ( Parkash and Thukral 2010), as shown in equation (11):

$$\begin{aligned} H = \frac{n}{{\frac{1}{{{x_1}}} + \frac{1}{{{x_2}}} + \cdots + \frac{1}{{{x_n}}}}} \end{aligned}$$
(11)

where H is the harmonic mean, n is the total number of variables, and \(x_{i}\) represents the \(i^{th}\) variable, \(i=1,2,\ldots , n\).

In some systems of machine learning, Precision and Recall are usually the more concerned metrics. Among them, Precision is how many of all samples that the model determines as positive are really positive samples; Recall is how many of all positive samples are determined as positive by the model, that is, the detection rate of positive samples. The two metrics are shown in equation (12) and equation (13), respectively.

$$\begin{aligned} Precision= & {} \frac{{TP}}{{TP + FP}} \end{aligned}$$
(12)
$$\begin{aligned} Recall= & {} \frac{{TP}}{{TP + FN}} \end{aligned}$$
(13)

where true positive (TP) represents the number of positive classes predicted as positive; true negative (TN) represents the number of negative classes predicted as negative; false positive (FP) represents the number of negative classes predicted as positive; and false negative (FN) represents the number of positive classes predicted as negative.

Fig. 1
figure 1

F-new function simulation diagram

In some specific problems, Precision and Recall are both important and need to be considered comprehensively. As an important metric to weigh Precision and Recall, the \(F_{measure}\) function represents the weighted harmonic mean of Precision and Recall (Sui et al. 2015), as shown in equation (14):

$$\begin{aligned} \begin{aligned} {F_{measure}}&= \frac{1}{{\alpha \frac{1}{{Precision}} + (1 - \alpha )\frac{1}{{Recall}}}}\\&= \frac{{({\beta ^2} + 1)Precision*Recall}}{{{\beta ^2}Precision + {\mathop {\mathrm { Re}}\nolimits } call}} \end{aligned} \end{aligned}$$
(14)

where \({\beta ^\mathrm{{2}}}\mathrm{{ = }}\frac{{\mathrm{{1 - }}\alpha }}{\alpha }\), \(\alpha \in [0,1]\), \({\beta ^\mathrm{{2}}} \in [0,\infty ]\). When \(\beta > 1\), Precision has a greater impact; when \(\beta < 1\), Recall has a greater impact; when \(\beta = 1\), it is the \(F_1-score\) function, which means that Precision and Recall have the same importance, as shown in equation (15):

$$\begin{aligned} {F_\mathrm{{1}}}\mathrm{{ - score = }}\frac{{2*Precision*Recall}}{{Precision\mathrm{{ + }}Recall}} \end{aligned}$$
(15)

The standard \(F_{1}-score\) function was earlier proposed mainly for the information retrieval problems (Rijsbergen 1974; Guns et al. 2012). In information retrieval, it is more concerned about “ what percentage of the retrieved information is of interest to the users (Precision) ” and “ what percentage of the content of interest to the users is retrieved (Recall)”; therefore, Precision and Recall are more suitable as performance metrics for such requirements.

However, in the Lung CAD system, the ACC represents the overall correct recognition probability. The Recall indicator is also called the sensitivity (SEN) indicator in the Lung CAD system, which represents the detection probability of true positive nodules. The higher SEN value, the fewer false negative nodules, and the less possibility of missed detection. Precision represents the ability to recognize non-nodules. The higher the Precision value, the fewer false positive nodules, and the less probability of incorrectly diagnosed as nodules. For Lung CAD, on the one hand, the overall recognition effect is very important. On the other hand, in order to prevent the occurrence of the missed detection, the recognition rate of malignant lesions cannot be ignored, that is, the SEN indicator, which is the detection rate of malignant tumors. The definition equations of ACC and SEN are shown in equation (16) and equation (17), respectively.

$$\begin{aligned} ACC= & {} \frac{{TP + TN}}{{TP + TN + FP + FN}} \end{aligned}$$
(16)
$$\begin{aligned} SEN(Recall)= & {} \frac{{TP}}{{TP + FN}} \end{aligned}$$
(17)

In the above formula, TP is the number of true positive nodules detected, FP is the number of false positive nodules detected, FN is the number of false negative nodules detected, and TN is the number of true negative nodules detected or the number of false nodules.

For the recognition of lung nodules, more attention is paid to “ the accuracy of overall nodules recognition (ACC) ” and “ the true detection rate of lung nodules (SEN).” The missed detection of nodules may cause the patient to miss the optimal time for treatment. For a prediction model, the higher SEN is, the smaller the FN and missing nodule detection rates are; thus, the detection result will be more trusted. Therefore, SEN is as important as ACC. In this paper, for the evaluation criteria of the Lung CAD, the mentioned F-new function is shown in equation (18), F-new is used as the fitness function of the CS-MKL-SVM-PSO algorithm in the sense of cross-validation, and then the evaluation criteria for parameter optimization are established as a substitute the traditional single ACC evaluation system, taking into account the overall recognition accuracy and reducing the missed detection of nodules.

$$\begin{aligned} F - \mathrm{{new}} = \frac{\mathrm{{2}}}{{\frac{\mathrm{{1}}}{{ACC}} + \frac{1}{{SEN}}}} = \frac{{2*ACC*SEN}}{{ACC + SEN}} \end{aligned}$$
(18)

The F-new function is the harmonic mean of ACC and SEN.

Conclusion 1

The value range of ACC and SEN are both [0,1]. When ACC and SEN are equal to 1, the F-new function can obtain the maximum value, and the maximum value is 1.

Proof

The reference have proved that formula (19) is established (Liao and Wu 2015).

$$\begin{aligned} \frac{{{x_1} + {x_2}}}{2} \ge \sqrt{{x_1}{x_2}} \ge \frac{\mathrm{{2}}}{{\frac{1}{{{x_1}}} + \frac{1}{{{x_2}}}}} \end{aligned}$$
(19)

In the above formula, the equal sign holds if and only if \({x_1} = {x_2}\). It is well known that \(\frac{{{x_1} + {x_2}}}{2}\) is an increasing function. When \({x_1} \in [0,1],{x_2} \in [0,1]\), \(\frac{{{x_1} + {x_2}}}{2}\) takes the maximum value is 1 at \({x_1} = {x_2}\). Therefore, \(\frac{\mathrm{{2}}}{{\frac{1}{{{x_1}}} + \frac{1}{{{x_2}}}}}\) takes the maximum value is 1 at \({x_1} = {x_2}\).

Using ACC and SEN as input, the simulation experiment is conducted on the harmonic average F-new, and the simulation results of the three-dimensional view and contour lines obtained are shown in Fig. 1. \( \square \)

Fig. 2
figure 2

Flowchart of CS-MKL-SVM-PSO algorithm for pulmonary nodules recognition in Lung CAD

As shown in the three-dimensional view of Fig. 1(a), ACC and SEN are used as inputs, and the F-new function increases monotonically with the increase in ACC and SEN indicators. The relevant theory can also be proved as follows:

As shown above

$$\begin{aligned} {F_{new}}(ACC,SEN) = \frac{2}{{\frac{1}{{ACC}} + \frac{1}{{SEN}}}} = \frac{{2ACC \cdot SEN}}{{ACC + SEN}} \end{aligned}$$
(20)

where \(ACC \in [0,1]\), \(SEN \in [0,1]\), calculate the partial derivatives of ACC and SEN, respectively, as shown in equation (21) and equation (22) :

$$\begin{aligned} \frac{{\partial {F_{new}}(ACC,SEN)}}{{\partial ACC}}= & {} \frac{{2SE{N^2}}}{{{{(ACC + SEN)}^2}}} \ge 0 \end{aligned}$$
(21)
$$\begin{aligned} \frac{{\partial {F_{new}}(ACC,SEN)}}{{\partial SEN}}= & {} \frac{{2AC{C^2}}}{{{{(ACC + SEN)}^2}}} \ge 0 \end{aligned}$$
(22)

From the derivative, the first-order partial derivative of \({F_{new}}(ACC,SEN)\) with respect to ACC is greater than or equal to 0, and \({F_{new}}(ACC,SEN)\) increases with the increase in ACC. Similarly, \({F_{new}}(ACC,SEN)\) increases with the increase in SEN. Therefore, when \(ACC \in [0,1]\), \(SEN \in [0,1]\), \({F_{new}}(ACC,SEN)\) is a monotonically increasing function.

When both ACC and SEN are close to 1, the F-score reaches its maximum, proving that both ACC and SEN are well-considered. As seen from the contour lines of Fig. 1(b), along the direction from (0,0) to (1,1), the closer to the upper right corner, the higher the score of the F-new function. The F-new function gives higher scores to the models in which ACC and SEN are closer, and combined with the previous conclusion, F-new increases monotonically with SEN and ACC indicators, ACC and SEN are increasing and bounded, so when the ACC and SEN are larger and closer, the F-new score is higher. The extreme value is when \(ACC = SEN = 1\), F-new = 1 can be obtained , that is, the upper right vertex. The larger the interval between the ACC and SEN is, the lower the F-new score will be. Hence, the model almost loses its practical value.

2 CS-MKL-SVM-PSO Algorithm for Pulmonary Nodule Recognition in Lung CAD

2.1 The Process of the CS-MKL-SVM-PSO Algorithm for Pulmonary Nodule Recognition in Lung CAD

The overall process of the Lung CAD system is shown in Fig. 2. The Lung CAD system is divided into three parts. The first part is the preparation for benign and malignant pulmonary nodule recognition, including the preprocessing of CT images, segmentation of pulmonary parenchyma, segmentation and extraction of ROIs, and feature calculation from the ROIs. The second part is to obtain the recognition algorithm model by quickly optimizing the parameters on the training set with the CS-MKL-SVM-PSO algorithm. The last part is to obtain the final recognition result on the test set. Each of the three parts is indicated by a dotted box.

2.2 Preparation for Pulmonary Nodule Recognition

The shape and structure of pulmonary nodules and their gray features in CT images are different, which makes them easy to misdiagnose and falsely detect. Pulmonary nodule recognition as well as the core module of the Lung CAD is the final step. The selected recognition algorithm will directly affect the detection results.

During image preprocessing, binarization processing on the original CT image, and the largest eight connected regions are reconstructed to remove the background and obtain the pulmonary parenchyma region. Because nodules grow in the pulmonary parenchyma area, segmentation of the pulmonary parenchyma is mainly used to separate the left and right lungs and then extract the lung edges. After the segmentation of the lung parenchyma is completed, further segmentation is needed to extract the relevant ROIs. In this paper, the ROI was segmented by image enhancement technology. To highlight the ROI of the candidate pulmonary nodules, the gray level of the image obtained in the ROI was the same as that of the original image after image reconstruction. Figure 3 illustrates the complete process of segmenting and extracting pulmonary nodule ROIs from a complete original CT image through pulmonary parenchyma segmentation. Due to space limitations, only the pulmonary parenchyma on one side containing pulmonary nodules is given. In the actual experiment, all segments of the pulmonary parenchyma are traversed.

Fig. 3
figure 3

A complete lung nodule ROI extraction process from original CT image

In each case, several pulmonary nodule ROIs can be divided, but there is only one corresponding true positive, namely pulmonary nodules (malignant lesions), and the rest are false positive. This is because CT images are tomography images, which often contain points where blood vessels intersect and end, and their shapes are similar to those of nodules, which are also round. It can be seen from the extracted ROI that the number of false positives is larger than the actual number of lesions, resulting in a serious imbalance in the data set, which will affect the subsequent recognition results. To address this issue, in this paper, a cost-sensitive multikernel support vector machine (CS-MKL-SVM) algorithm is proposed.

The feature selection algorithm has been mentioned in a previous article (Li et al. 2013). Seven morphological features, two grayscale features and four texture features of the ROI are selected, and a total of 13 feature parameters need to be calculated. The seven morphological features selected are area, diameter, circumference, rectangularity, flattening, roundness, and slenderness. Gray features are measured by the gray mean and gray variance. The four texture features are energy, contrast, entropy and inverse difference moment.

Table 1 Twenty times experimental statistics in the training stage
Table 2 Experimental results of algorithm test stage in this paper

2.3 Model Parameter Optimization

After the above preparation for pulmonary nodule recognition is completed, the training phase of the model is begun. Due to the overfull parameter quantities and excessive optimization time of the CS-MKL-SVM algorithm, we introduce PSO to realize rapid parameter optimization, namely CS-MKL-SVM-PSO. The F-new function under fivefold cross-verification is determined as the fitness function of the CS-MKL-SVM-PSO algorithm. First, the particle species group and velocity are initialized provide the maximum number of iterations maxgen is 200, the species group number is 20, and the number of cross-verification folds is 5. The expression \({X_i} = {\left( {{x_{i1}},{x_{i2}},{x_{i3}},{x_{i4}}} \right) ^T}\) of each particle respects the solution of the \(i{th}\left( {i = 1,2, \ldots ,20} \right) \) particle, meaning D=4. The parameters \({x_{i1}},{x_{i2}},{x_{i3}}\) and \({x_{i4}}\) need to be searched, corresponding to the RBF kernel width g, multikernel weight m and regularization coefficients \(C_+\) and \(C_-\) of positive and negative class samples, respectively. In equations (9)and(10), \(\omega \) is the inertia weight, k is the current number of iterations, \({V_{id}}\) is the velocity of the particles, and the acceleration factors are provided as \(c_1\)=1.5 and \(c_2\)=1.7. To prevent blindly searching for a particle, its position and velocity are usually restricted to \(\left[ { - {X_{\max }},{X_{\max }}} \right] \) and \(\left[ { - {V_{\max }},{V_{\max }}} \right] \), respectively. The position and velocity of each particle are set as follows: the value range of parameter g is \([2^{-7}, 2^7]\), and the velocity is \([-2^9*0.6, 2^9*0.6]\); the value range of m is [0, 1], and the velocity is \([-0.6,0.6]\); the value range of parameter \(C_+\) is \([2^{-9},2^9]\), and the velocity is \([-2^9*0.6,2^9*0.6]\); the value range of parameter \(C_-\) is \(\left[ {{2^{-9}},{2^{11}}} \right] \), and the speed is \(\left[ { - {2^{11}}*0.6,{2^{11}}*0.6} \right] \).Then, the F-new value is calculated, and the individual extreme value and group extreme value according to the F-new index of the new population particles are updated. When the maximum number of iterations is reached, the update is stopped, and the optimal parameter set is output. Finally, the trained model is tested on the test set to obtain the final recognition result.

3 Pulmonary Nodules Recognition Experiment in Lung CAD

3.1 Experimental Data and Platform

Experimental data were obtained from large specialized hospitals in Jilin Province, China. The experiment selected 20 cases from the case database, with a total of approximately 700 CT tomography images (sections) and an average of approximately 35 images for each case. Each case was equipped with the diagnostic criteria and pathological analysis report from a doctor. Each CT slice image was \(512 \times 512\) pixels in size and 5.0 mm in thickness. After the previous preparation steps for pulmonary nodule recognition from approximately 700 CT images from 20 patients, a total of 270 pulmonary nodule ROIs were extracted, including 80 nodules (malignant lesions) and 190 false positives. A true positive nodule here refers to a malignant tumor according to the pathological analysis report and diagnosis. After the feature selection, the data samples were randomly divided into two groups: 170 training samples (with 53 nodules and 117 false nodules) and 100 test samples (with 27 nodules and 73 false nodules).

Using MATLAB as the experimental platform, the above calculated 13-dimensional characteristic data were normalized by the mapminmax function. The simulation experiment was conducted using the LIBSVM toolbox which can be obtained freely (Chang and Lin 2011). In the model training stage, a fivefold cross-validation was used to evaluate the F-new function which was selected as the parameter optimization standard of the CS-MKL-SVM-PSO algorithm as shown in equation (18).

3.2 Experimental Results Analysis

3.2.1 Experimental results analysis during training stage

In Table 1, the results obtained during the training stage by the multikernel functions with degrees d=2 and d=3 are listed. Moreover, the results of the proposed algorithm are also compared with the results of two algorithms: algorithm A (Li et al. 2018) is the MKL-SVM-PSO algorithm with inertial weight = 1 , and algorithm B (Li et al. 2013) is the MKL-SVM algorithm based on grid search. To ensure the experimental reliability, each experiment was repeated 20 times. The fitness values of the proposed algorithm, algorithm A and algorithm B and their statistical values are listed on the left side in Table 1, and the statistical results of the training time are listed on the right side.

Table 3 Experimental results of different kernel functions in the testing stage

Note that in the algorithm A and algorithm B, the fitness function or objective function searches for the maximum ACC in the training set, which is the overall recognition of ACC, while the proposed F-new function, which comprises the harmonic mean of ACC and SEN, is adopted as the corresponding fitness function. Hence, the fitness function value of the proposed algorithm is less than that of algorithm A and algorithm B. In this paper, the CS-MKL-SVM-PSO algorithm is adopted, and the fitness function is F-new. Under the same value of ACC, according to equation (18) , when SEN is less than or equal to 1, F-new is less than or equal to ACC; if and only if SEN=1, when equivalence is achieved it means there is no missing detection.

The above training results show that the algorithm finds the optimal parameter group when the polynomial kernel has an order d=3. The maximum value of F-new in the training stage is 0.9076, the minimum value is 0.8744, the mean value is 0.8993, and the median value is 0.9048, all of which are higher than those in the case of d=2. The training time of the proposed algorithm is on the same order of magnitude as that of algorithm A, whereas algorithm B obtains the optimal parameter group of the grid search algorithm through the use of finer grid division, and hence, the training time is almost 20 times that of the algorithm in this paper.

3.2.2 Experimental results analysis of the multiple kernel function and single kernel functions

Table 2 shows the experimental results obtained by the algorithm in this paper on the test set along with the statistical results obtained from the F-new function in the test stage. \(F_{max}\) and \(F_{min}\) represent the maximum and minimum values of the F-new function in 20 experiments, \(F_{mean}\) and \(F_{median}\) represent the statistical mean and median of the F-new function, and MFACC and MFSEN represent the ACC and SEN values corresponding to the maximum F-new function, respectively. The experimental results showed that the test results are optimal when d=3; at this point, \(F_{max}\) is 0.9357, \(F_{mean}\) is 0.9112, \(F_{median}\) is 0.9196, MFACC=91%, and MFSEN=96.3%. Regardless of the observation from the training stage and testing stage, the learning ability and generalization ability are better at the multikernel function d=3, which is the optimal algorithm in this paper.

Fig. 4
figure 4

ROC curves of different kernel functions

Table 4 AUC values of the four kernel functions

Further, the PSO algorithm is used as the optimization algorithm and the proposed F-new function is used as the fitness function. The polynomial function and RBF kernel in the single kernel function and the multiple kernel function algorithm in this paper are used for experiments. Table 3 lists the statistical mean values of the test results of different kernel functions during the testing stage, and the ROC curves are shown in Fig. 4. Table 4 showed the AUC for each of the different kernel functions, and the larger the AUC value, the better the classifier effect.

In Table 3, MA represents the maximum ACC obtained by 20 experiments on the test set, and MASEN represents the SEN test index corresponding to MA. In the 20 experiments, the mean and median values of ACC are expressed by \(MA_{mean}\) and \(MA_{median}\), respectively. Similarly, MS represents the maximum SEN obtained on the test set, and MSACC represents the ACC indicator corresponding to MS. In 20 times experiments, the mean and median values of SEN are expressed by \(MS_{mean}\) and \(MS_{median}\), respectively.

As shown in Table 3, compared with the single kernel functions, the \(F_{mean}\) of the proposed multiple kernel function algorithm reaches 0.9112, which is higher than the polynomial kernel function and RBF kernel function. When the polynomial single kernel function is used, the result is better at d=3 with \(MA_{mean}\) of 91%, but the SEN value is lower, \(MS_{mean}\) of 88.89%. When the RBF kernel function is used, the SEN is higher, \(MS_{mean}\) of 93.33%, but the ACC value is lower, \(MA_{mean}\) of 87.80%. When using the algorithm in this paper, compared with the polynomial kernel function, although \(MA_{mean}\) is reduced by 2.05%, \(MS_{mean}\) is increased by 4.63%, which can reduce the missed detection of nodules. Compared with the RBF kernel function, ACC and SEN are both improved. As shown in Fig. 4, the ROC curve represents the ratio of the true positive rate to the false positive rate. The closer the upper left vertex is to the (0,1) point, the higher the AUC value, the better the classifier performance. The upper left vertex of the ROC curve of the algorithm in this paper is closer to the (0,1) point, the AUC is 0.9777, which has better classification performance. Therefore, the MKL method can improve the classification performance of the model compared with the single kernel function.

3.2.3 Experimental results analysis of introducing cost-sensitive and F-new function

Further comparisons are made between the experimental results of algorithm A and algorithm B and the experimental results of the algorithm in this paper on the test set. Table 5 lists the statistical values of the 20 test results obtained by the algorithm in this paper and the related algorithms on the test set.

Table 5 Experimental statistics of introducing cost-sensitive and F-new function in the testing stage

Algorithm A is the MKL-SVM-PSO algorithm, which uses the swarm intelligence algorithm for parameter optimization and can greatly reduce the training time of the model, but the PSO algorithm tends to lose the particle diversity in the late iteration, so that the global optimal solution cannot be obtained. Algorithm B is the MKL-SVM algorithm, which uses a grid search algorithm for parameter search. Although the final solution is globally optimal, the training stage takes too long. As given in Table 5, the MA of grid search algorithm in algorithm B can reach 92%, MASEN can reach 92.59%, and both \(MA_{mean}\) and \(MA_{median}\) are 92%, which also fully shows the global optimal performance and robustness of the grid search algorithm. Both the algorithm in this paper and algorithm A introduce the PSO algorithm, which is a local optimal algorithm. Therefore, the optimal parameter groups obtained after 20 searches are different, resulting in different test results.

As shown in Table 5, although the accuracy and sensitivity of the proposed algorithm are not all the best among them, the main purpose of this paper is to make both ACC and SEN achieve better results. According to the test result of the F-new function, as shown in Table 2, the maximum F-new function value \(F_{max}\) = 0.9357, and when the maximum value of F-new is achieved, ACC and SEN are 91% and 96.3%, respectively. The grid search algorithm obtains the global optimal solution, when ACC has a maximum value of 92%, corresponding to SEN = 92.59%, according to the F-new calculation method shown in equation (18), the F-new value is 0.9229, which is lower than the F-new value of 0.9357 obtained by the algorithm in this paper. Through the dual reference of cost-sensitive and F-new function, the model can achieve more than 90% of both ACC and SEN indexes under unbalanced data sets. From the perspective of the F-new function of the harmonic mean index of ACC and SEN, the CS-MKL-SVM-PSO algorithm proposed in this paper achieved the best performance.

In the experiment of the algorithm during the testing stage, from the perspective of ACC, \(MA_{mean}\) is reduced compared with algorithm A and algorithm B, but the obtained MA is the same as algorithm B at 92%, indicating that the algorithm in this paper still has the ability to seek better ACC. The overall recognition accuracy for ACC index is on the same level as the original algorithm. However, the SEN index of the nodule detection rate was considered in this paper. During the 20 experiments on the test set, the algorithm of this paper obtains the maximum sensitivity value MS of 96.3%, when the sensitivity is at the maximum, the accuracy MSACC is 91%. Compared with that of algorithm A, the MS index of the algorithm in this paper is 7.41% higher and the MSACC index is flat. It is observed that the MS index of the algorithm in this paper is 3.71% higher than that in algorithm B, while the MSACC index is 1% lower than that in algorithm B. Therefore, the algorithm in this paper can improve the SEN index and reduce the missed detection of nodules to a certain extent.

In summary, the cost-sensitive algorithm proposed in this paper improves the detection of nodules by introducing different penalty coefficients to the positive and negative samples, and using the F-new function instead of the fitness function when the data set is seriously unbalanced. In terms of the time cost of the training algorithm model, the training time of the algorithm in this paper is slightly longer than that of the MKL-SVM-PSO algorithm (Li et al. 2018) but only 1/20 of the training time of the grid search algorithm (Li et al. 2013), which is easier to implement the online algorithm. Compared with using a single ACC or SEN as the objective function, using the F-new function can improve the detection rate of nodules. Therefore, the CS-MKL-SVM-PSO algorithm proposed in this paper has better recognition performance, which can take into account the overall recognition accuracy of nodules and reduce the missed detection of nodules.

When the inertia weight is constant \(\omega \)=1, the fitness curve of the algorithm in this paper is shown in Fig. 5.

Fig. 5
figure 5

Fitness curve of the CS-MKL-SVM-PSO algorithm

As shown from Fig. 5, the fitness curve of the algorithm has obvious oscillation, and the convergence speed is slow. The algorithm performance can be further improved through a dynamic adjustment of parameters such as the inertia weight.

4 Conclusion

In this paper, first, a CS-MKL-SVM-PSO algorithm is proposed. Second, a new score function is proposed, and the MKL-SVM optimal parameter set can be found quickly and accurately. Finally, the proposed CS-MKL-SVM-PSO algorithm based on the score function as its objective function is applied to pulmonary nodule recognition. The main innovations of this work are as follows:

  1. 1.

    The cost-sensitive algorithm and MKL-SVM-PSO algorithm are combined to form the new CS-MKL-SVM-PSO algorithm. The cost-sensitive algorithm is used to solve the problem that positive and negative samples are imbalanced. However, the PSO algorithm is used to solve the problem of parameter optimization velocity, and the training model establishment can be accelerated.

  2. 2.

    A new F-new function is proposed to replace the fitness function by constructing the harmonic mean of ACC and SEN to consider both ACC and SEN and ultimately prevent the missed detection of nodules.

  3. 3.

    On the basis of constant weight, the experimental results of the CS-MKL-SVM-PSO algorithm, MKL-SVM-PSO algorithm and grid optimization MKL-SVM algorithm are analyzed.

The experimental results show that the proposed CS-MKL-SVM-PSO algorithm can effectively prevent the missed detection of nodules when the data set is unbalanced. Both ACC and SEN can be considered simultaneously with the proposed F-new function.

Although the above improvements have better experimental performance, there are still several problems and challenges to be solved in the future. In the future, the research work will focus on the following contents:

  1. 1.

    The PSO algorithm is a kind of local search algorithms, which can get the optimal solution faster, but its disadvantage is that with the increase in iterations, the diversity of the population decreases, which easily causes the phenomenon of premature particles, thus it is easy to fall into the local optimum. Therefore, some indexes are not reaching the optimum. The grid search algorithm can find the global optimal solution, but as the parameters increase, the training iteration period becomes longer. In order to find the optimal parameter set and reduce the training time, other global optimization algorithms and hybrid optimization algorithms will be explored to improve the model performance.

  2. 2.

    The kernel function directly determines the structure of the feature space, and the design of a new kernel function suitable for specific problems remains to be discussed.

  3. 3.

    At the same time, under the background of the rapid development of deep learning, how can the combination of traditional machine learning methods and deep learning be more helpful to promote the development of Lung CAD and better serve the medical field.