Introduction

The increase in the use of computed tomography (CT) has also increased the detection rate of incidental pulmonary nodules and related early stage lung cancers [1, 2]. To standardize the approach to these nodules, risk factors are determined according to the patients and the radiological characteristics of the nodules, and malignancy assessment and follow-up and treatment strategies are determined in light of different guidelines [3,4,5,6].

In addition, pulmonary nodule malignancy risk calculation models have been developed using the clinical features of patients and radiological features of nodules by thorax CT and PET–CT [7,8,9]. These models have been validated in the populations where the studies were conducted; in other societies, changes in efficacy may be observed depending on demographic characteristics or endemic diseases [10].

In this study, the clinical and radiological findings of patients who were operated on for pulmonary nodules and the effectiveness of the Brock University [7], Mayo Clinic [8], and Herder [9] malignancy risk calculation models developed with data from Western societies were evaluated using final pathology results. It was aimed to compare these models and evaluate their effectiveness and applicability for the Turkish population.

Materials and methods

Local ethics committee approval was obtained for this study (No. 2012-KAEK-15/2195).

Patient data

A total of 478 patients who were operated on for pulmonary nodules in our clinic between 2014 and 2019 were evaluated based on their digital information and archive files. The following data were obtained: age, gender, smoking history, family history of lung cancer, history of extrapulmonary malignancy, history of granulomatous disease, diameter of the nodule (largest diameter), attenuation character, side, lobe in which the nodule was located, presence of spiculation, total number of nodules detected, presence of pulmonary emphysema, fluorodeoxyglucose (FDG) uptake in positron emission tomography (PET) of the nodule, and definitive pathology data. Cases in which all data could not be reached, patients who had a history of lung or extrapulmonary malignancy within the last 5 years, patients who had endobronchial nodule components, and patients whose nodules were a result of metastasis were excluded. The remaining 351 patients were included in the study.

Scoring systems

Pulmonary nodule malignancy scores were calculated using patient data in the equations of the Brock [7], Mayo [8], and Herder [9] models (Table 1). Although PET–CT is a semi-quantitative examination, FDG uptake in the nodules included in the study was scored as “no uptake,” “low uptake” (SUVmax < 2.5), “moderate uptake” (SUVmax 2.6–10), or “high uptake” (SUVmax > 10) as suggested by the Herder model according to a four-step scaling system [11] in line with nuclear medicine recommendation.

Table 1 Formulas of malignancy scoring models used for pulmonary nodules

Statistical analysis

All analyses were performed with IBM SPSS Statistics 21 (IBM Corp., Armonk, NY, USA). The conformity of the quantitative data to normal distribution was checked with histograms and Q–Q plots. Quantitative variables were summarized as mean ± standard deviation and median (minimum–maximum), while qualitative variables were given as frequency (percentage). Quantitative variables satisfying the assumption of normal distribution were analyzed with t tests in independent samples. Quantitative variables that did not satisfy the assumption of normal distribution were analyzed with the Mann–Whitney U test, and qualitative variables were analyzed with the Chi-square test. The success of the models in distinguishing between malignant and benign nodules was evaluated by receiver operating characteristic (ROC) curve analysis. The performance measures of sensitivity, specificity, correct classification rate, positive predictive value, and negative predictive value were calculated for cutoff points. Values of p < 0.05 ​​were considered statistically significant.

Results

A total of 351 patients (236 men and 115 women) were included in the study. The mean age of these patients was 57.84 ± 10.87 (14–79) years, and 226 had malignant and 125 had benign nodules. Postoperative histopathological diagnoses are shown in Table 2.

Table 2 Postoperative histopathological diagnosis

When 11 patients with a benign diagnosis (8.80%) and 18 patients with a malignant diagnosis (7.96%) with history of granulomatous disease were compared, no statistically significant difference was found (p = 0.944). In addition, there was no statistically significant relationship between history of granulomatous disease and number of detected nodules (p = 0.944) (Table 2).

When the quantitative data of individuals and nodules were evaluated according to groups, significant relationships were found between malignancy and age (p < 0.001) and nodule diameter (p < 0.001) (Table 3). When qualitative data were analyzed, gender (p < 0.009), nodule diameter (p < 0.05), spiculation (p < 0.001), emphysema (p < 0.05), and FDG uptake of the nodule (p < 0.001) were found to be significantly correlated with malignancy (Table 4).

Table 3 General characteristics of individuals and nodules by groups (quantitative data)
Table 4 General characteristics of individuals and nodules by groups (qualitative data)

When the scores for the pulmonary nodules obtained from the malignancy risk calculation models were evaluated, significant results were observed for all three models (p < 0.001) (Table 5). The score distributions of benign/malignant nodules according to the models are shown in Figs. 1, 2, and 3.

Table 5 Score data obtained from the models for benign and malignant nodules
Fig. 1
figure 1

Score distribution of benign and malignant nodules according to the Brock model

Fig. 2
figure 2

Score distribution of benign and malignant nodules according to the Mayo model

Fig. 3
figure 3

Score distribution of benign and malignant nodules according to the Herder model

While evaluating risk scores for the subcentimetric nodules, none of the calculation models showed significant performance result. (p = 0.654 for the Brock model, p = 0.898 for the Mayo model and p = 0.92 for the Herder model).

For nodules larger than 1 cm, the malignancy risk scores of the models were quite different compared to those for subcentimetric nodules. When nodules of 11–20 mm were examined, the risk scores of the models for benign and malignant nodules were compared and it was observed that the risk scores of malignant nodules were statistically significantly higher with all three models (p < 0.001 for all three models). For nodules of 21–30 mm, the malignancy probabilities obtained with the models were found to be statistically significantly higher for malignant nodules (p = 0.020 for the Brock model, p = 0.010 for the Mayo model, and p < 0.001 for the Herder model).

The efficacy of the models compared in this study was also evaluated according to the attenuation of the nodules by CT, and it was determined that nodules had different efficacy according to whether they were ground-glass, semi-solid, or solid nodules.

The malignancy probability scores obtained for ground-glass nodules were compared between malignant and benign nodules, and it was observed that the risk scores of malignant nodules were statistically significantly higher with the Brock model while there was no statistically significant difference between the risk scores of benign and malignant nodules with the Mayo and Herder models (p = 0.020 for the Brock model, p = 0.523 for the Mayo model, and p = 0.499 for the Herder model).

The scores observed from the models for semi-solid nodules were compared between benign and malignant nodules, and the probability of malignancy for malignant nodules was found to be statistically significantly higher with all three models (p < 0.001 for all three models).

Considering the efficacy of the models for solid nodules as the final attenuation character, the risk scores of the models for benign and malignant nodules were compared and it was seen that the malignancy risk values of malignant nodules were statistically significantly higher than those of benign nodules by all models (p < 0.001 for all three models).

To evaluate the effectiveness of the models in the differentiation of malignant and benign nodules, ROC curves were created and the area under the curve (AUC) was measured. The AUC value was 0.700 (95% CI 0.644–0.756) for the Brock model, 0.717 (95% CI 0.661–0.772) for the Mayo model, and 0.786 (95% CI 0.736–0.837) for the Herder model (Fig. 4).

Fig. 4
figure 4

ROC curves for the Brock, Mayo and Herder models

While evaluating the performance of the models for subcentimetric nodules, the AUC values were found to be 0.536 for the Brock model, 0.490 for the Mayo model, and 0.508 for the Herder model.

When the ROC curves calculated to evaluate the performance of the malignancy risk estimation models for nodules larger than 1 cm in size were evaluated, the AUC value for nodules of 11–20 mm was found to be 0.724 for the Brock model, 0.755 for the Mayo model, and 0.841 for the Herder model. For nodules of 21–30 mm, the AUC values were 0.657 for the Brock model, 0.674 for the Mayo model, and 0.766 for the Herder model.

Areas under the ROC curves obtained from the risk scores of the models for ground-glass nodules were examined, the AUC values were 0.761 for the Brock model, 0.572 for the Mayo model, and 0.576 for the Herder model. The Brock model was superior to the Mayo and Herder models in the differentiation of benign and malignant ground-glass nodules.

When the AUC values of the models were examined for semi-solid nodules, they were found to be 0.741 for the Brock model, 0.740 for the Mayo model, and 0.797 for the Herder model.

While evaluating the performance of the models for solid nodules, it was seen that the AUC value was 0.719 for the Brock model, 0.758 for the Mayo model, and 0.891 for the Herder model.

To optimize the performance of the models for the 5% and 10% threshold values suggested in the guidelines according to the nodules in the population included in this study, to optimize the success of the benign/malignant distinction, separate sections were applied for the Brock (19.5%), Mayo (23.1%), and Herder (56%) models and ideal threshold values were determined [3, 5, 6] (Table 6).

Table 6 Performance measures for malignant/benign distinctions of the models

The performances of the models were also evaluated according to nodule size, and it was seen that no model achieved a statistically significant difference for nodules of 0–10 mm, while all models were found to be effective for nodules of 11–20 mm and 21–30 mm (Table 7). The performance measures of the models according to nodule size are shown in Table 8.

Table 7 Nodule size scores from the models
Table 8 Performance measures of models in the distinction of malignant and benign according to nodule size

The performances of the models were evaluated separately for ground-glass, semi-solid, and solid nodules according to the attenuation of the nodules. Statistical significance was observed for ground-glass nodules only with the Brock model (p = 0.02), while statistical significance was found for all models in the evaluation of semi-solid and solid nodules (Table 9). The performance measures of the models according to nodule attenuation are shown in Table 10.

Table 9 Nodule attenuation scores from the models
Table 10 Performance measures of models in the distinction of malignant and benign according to nodule attenuation

Discussion

When evaluated independently of other variables, it was seen that all scoring systems gave significant results in the differentiation of benign and malignant nodules. In addition, no significant results were observed for any scoring system with nodule sizes of < 1 cm, while all scoring systems were successful in the differentiation of benign and malignant nodules of > 1 cm.

According to nodule attenuation, while no scoring system gave significant results for ground-glass nodules, all scoring systems provided significant results for solid and semi-solid nodules.

In our study, there were significant relationships between age, nodule diameter, gender, spiculation, emphysema, and FDG uptake of the nodule and malignancy, which are among the parameters considered in these models, but no significant relationship was found between the other variables and malignancy.

When the ideal threshold values and different threshold values determined for each scoring system for our cases were evaluated, the obtained significance levels did not change. It was also observed that the history of granulomatous disease did not cause a significant change in the number of nodules.

When benign and malignant cases were compared according to the scores obtained from the pulmonary nodule malignancy prediction models evaluated in this study, the risk scores of malignant cases were found to be statistically significantly higher with all three models. However, when the mean malignancy risk score of benign nodules was considered, it was seen that it was 16.99% for the Brock model, 22.89% for the Mayo model, and 31.17% for the Herder model. The relevant threshold values attributed to malignancy risk probability for nodules in terms of benign/malignant distinction were determined as 5% in the ACCP and Fleichner guidelines and as 10% in the BTS guidelines [3, 5, 6]. Therefore, most of the nodules in our study had risk scores above the specified threshold values. In our opinion, the reason for this is that almost all of the nodules in the cases included in this study were operated on in our clinic due to moderate or high suspicion of malignancy. To optimally distinguish between benign and malignant nodules in patients with high mean risk scores by all three models and with many clinical and radiological risk factors, for the models evaluated in this study, it was necessary to determine new threshold values for the possibility of malignancy as specified in the guidelines. As a result of the statistical analyses, optimal threshold values were found to be 19.5% for the Brock model, 23.1% for the Mayo model, and 56% for the Herder model. Although the new threshold values slightly decreased the overall sensitivity of the models in distinguishing benign and malignant nodules, they had positive effects on other parameters, especially specificity and positive predictive values.

AUC for ROC curves were measured for evaluating the performances of the calculation models. It was observed that the Herder model performed significantly better than the Brock and Mayo models, which had very close AUC values. In light of this situation, FDG uptake in PET–CT may play an important role in the evaluation of pulmonary nodules. While developing the Herder model, only the performance of FDG uptake in the differentiation of benign and malignant nodules was examined, and no statistically significant difference was found for the performance of the Mayo model as used in this study. However, after integrating the FDG uptake of the nodule into the Mayo model, it was seen that the final version of the Herder model was statistically significantly superior to the performance of the Mayo model and isolated FDG uptake [9]. As can be understood here, the FDG uptake level of a nodule in PET–CT is not a sufficient parameter for evaluating the possibility of the malignancy of that nodule. However, when PET–CT findings are evaluated together with other clinical and radiological features of the patient, it becomes a valuable tool in determining the possibility of malignancy.

In addition, in our study, it was observed that the AUC values for all three models were lower than the AUC values reported in the original publications on the models’ development and validation (0.96 for the Brock model, 0.79 for the Mayo model, and 0.92 for the Herder model) [7,8,9]. This may be because, in addition to many other factors, almost all of the patients included in this study from our clinic were being followed due to a relatively high risk of malignancy. Therefore, the scores of malignant and benign nodules were generally closer to each other than they were in the populations studied in the original development of the models.

Compared to the AUC values obtained when all nodules were included, significant performance loss was detected upon differentiating benign and malignant nodules in all three models for subcentimetric nodules. There could be several reasons for this. First of all, the numbers of benign and malignant nodules included in this study were very close to each other (27 benign, 25 malignant). In the populations in which the models were developed, the malignancy rate was below 5% in both cohorts for the Brock model, 23% for the Mayo model, and 57% for the Herder model [7,8,9]. The performance degradation of the models may be due to this. Furthermore, while developing the Mayo model, all of the evaluated nodules were detected by chest X-ray [8]. Since subcentimetric nodules are more difficult to detect by chest X-ray than large nodules, the characteristic features of the detected subcentimetric nodules may have differed from our study. The same reasoning applies for the Herder model, since parameters other than PET–CT findings are calculated in the Herder model in contrast to the Mayo model. In addition, since none of the subcentimetric nodules in our study had moderate or high uptake of FDG, the guiding effect of PET–CT was limited, and the effectiveness of the Herder model may have therefore decreased. In the BTS guidelines, in accordance with the inferences to be made from the results of this study, the use of the Herder model is not recommended for nodules smaller than 8 mm [5]. Since all of the nodules included in the original study were detected by CT for the Brock model, the rate of subcentimetric nodules was higher than that evaluated by the other models [7]. Although a statistically significant effect was not observed, we think that the higher AUC value obtained for the Brock model compared to the other models was related to this. However, in our clinic, many patients with malignant subcentimetric nodules that should be conservatively followed-up according to the guidelines or even removed from follow-up were operated on thanks to the individual experiences and initiatives of the experienced radiologists and clinicians in our hospital, and these patients obtained curative treatment at the earliest possible stage. Sometimes clinicians or radiologists with quite experience may use clinical judgment which is different from the calculation model or guideline, and this is as effective as risk prediction models because of considering more variables and old experiences [12]. In addition, it is difficult to detect these nodules intraoperatively as well as in follow-up. Marking methods can also be used preoperatively [13].

For nodules larger than 1 cm, results were all statistically significant for all three models, both for solid and semi-solid nodules. In our opinion, with the elimination of the disadvantages of subcentimetric nodules, the malignancy risk estimation models achieved significant success in distinguishing between benign and malignant nodules. In addition, since nodules larger than 1 cm do not pose the difficulties for diagnostic factors that are seen with subcentimetric nodules, the AUC values were significantly higher.

The reason why the AUC values obtained for nodules of 11–20 mm were higher than those obtained for nodules of 21–30 mm, in our opinion, is the false positivity of large benign nodules. While the mean malignancy probabilities observed from the models for benign nodules of 11–20 mm were calculated as 15.24%, 17.51%, and 26.24% for the Brock, Mayo, and Herder models, respectively, these probabilities were calculated as 34.55%, 48.44%, and 59.66% for nodules of 21–30 mm. In other words, the mean probability of the malignancy of benign nodules of 21–30 mm in all models is higher than the optimal threshold values calculated for those models. This increases the false positive results and causes a negative effect on the performance measures of the nodules. Among the three models, the highest AUC value was obtained for the Herder model for both size ranges, and the lowest AUC value was that of the Brock model. However, no significant difference was observed between the AUC values of the Brock and Mayo models. Considering this finding, PET–CT is an important tool in the management of nodules of > 1 cm.

The efficacy of the models compared in this study was also compared according to the attenuation and the malignancy probability scores obtained for ground-glass nodules showed that only the Brock model determined malignant and benign nodules sufficiently. Ground-glass nodules are very difficult to evaluate, similar to subcentimetric nodules. In our study, it was an expected finding that the Mayo and Herder models, which could not make optimum use of these two factors, could not make effective distinctions between benign and malignant nodules, since spiculation, which has a significant difference between benign and malignant nodules, was not seen in ground-glass nodules due to their structures and generally low FDG avidity.

While AUC measurements for ground-glass nodules compared, in our opinion, the reason for the poor performance of the Mayo and Herder models in this regard may be that, similar to subcentimetric nodules, the nodules in the population included in the development of the Mayo model were evaluated after chest radiographs were reviewed [8]. Since ground-glass nodules, and especially those that are small in size, are difficult to detect on chest radiographs, the rate of ground-glass nodules included in the original study is likely very low compared to our study. Since the parameters of the Herder model, excluding PET–CT findings, are based on the Mayo model, the same problem is likely to be experienced with the Herder model. The Brock model, on the other hand, was created based on nodules detected by CT and the attenuation of the nodules was integrated into the model [7]. However, in the Brock model, the ground-glass character was a factor that reduced the possibility of malignancy, while 75% of the ground-glass nodules in our study were found to be malignant. Despite this, it is an interesting finding that the Brock model yielded the highest AUC value for ground-glass nodules among all groups evaluated by the Brock model in this study.

All three models successfully differentiated malignant and benign semi-solid nodules. Most of the semi-solid nodules included in this study were over 1 cm in size and it is possible that all of the models produced significant results in the differentiation of benign and malignant semi-solid nodules for this reason, in contrast to ground-glass nodules. In addition, when the mean malignancy probabilities of benign and malignant semi-solid nodules by the Brock model were examined, it was seen that they were higher than those obtained for solid and ground-glass nodules. This, in line with the model, suggests that the semi-solid nature of a nodule increases the possibility of malignancy.

When the AUC values of the models for semi-solid nodules were considered, excluding the Brock model, a significant increase was found in the AUC values of the other two models. In our opinion, the reason for this is likely related to the fact that the Mayo and Herder models are models developed based on nodules detected by chest radiography, as mentioned above while discussing the AUC values of the models for ground-glass nodules [8, 9]. The Mayo and Herder models may have been more successful in distinguishing benign and malignant semi-solid nodules compared to ground-glass nodules because the solid components of these nodules are increased. Therefore, the probability of their detection by chest X-ray also increases. In addition, it seems likely that the higher FDG uptake of semi-solid nodules compared to ground-glass nodules in this study contributed to the increased efficiency of the Herder model. The Herder model had the best performance among the three models for semi-solid nodules.

When it comes to solid nodules, all three models also differentiated malignant and benign ones successfully. Approximately 80% of the nodules in the two cohorts included in the original study for the development of the Brock model were solid nodules. Since the Mayo model and the Herder model, which is a derivation of the Mayo model, are models developed on the basis of nodules seen by chest X-rays, it is highly likely that the majority of the nodules included in those studies were solid. Models developed in studies in which solid nodules were the majority may have differentiated benign and malignant solid nodules more effectively in our study.

Considering the AUC values in the evaluation of solid nodules, the most effective model was the Herder model. The Mayo and Brock models followed respectively. The Mayo and Herder models yielded the highest AUC values here among all the groups evaluated in this study. This is because, as mentioned above, these two models, which are closely related, are likely to have been developed and validated in populations with high numbers of solid nodules. In addition, the AUC value for solid nodules with the Herder model is very close to the AUC value of the original study (0.92) [9]. This highlights the superiority of PET–CT for solid nodules.

As a result of various studies, many risk factors related to lung cancer were determined according to the clinical and demographic characteristics of the patients and the radiological characteristics of the nodules. However, these risk factors, and especially clinical and demographic factors, may differ in terms of their effects according to the structure of local populations and geographical features [10]. In the Brock model, female gender, family history of lung cancer, nodule type, localization of the nodule, and number of nodules are parameters that affect the probability of malignancy since there was a statistically significant difference between benign and malignant cases in the population investigated during the development of that model [7]. In the present study, a statistically significant relationship was found between male gender and malignancy and no other statistically significant differences were found between benign and malignant cases for the other parameters. Similarly, a history of smoking and a history of extrapulmonary malignancy at least 5 years ago were determined as risk factors in the Mayo and Herder models, but in our study, no statistically significant difference was found between benign and malignant cases for either parameter [8, 9]. Thus, the effects of these parameters on the differentiation of benign and malignant nodules in our study were reduced compared to the populations in the original studies. In such cases, it is inevitable that the performance of all three models will be decreased.

The main contribution of this study is its evaluation of nodules in the Turkish population with the currently used malignancy scoring systems by referring to definitive postoperative pathology results to retrospectively calibrate risk calculation models before using them in a new local population and provide a new optimal threshold value, as it mentioned in the literature [14].

The main limitation of the study is that it was conducted among patients who were followed in a thoracic surgery clinic in a reference center and operated on for pulmonary nodules. Most of the nodules in this study were already considered risky by clinicians and radiologists.

In conclusion, all models effectively differentiated benign from malignant pulmonary nodules in all groups except subcentimetric nodules and ground-glass nodules. However, none of the groups for which these models were effective had AUC values as high as those obtained in the original studies. This highlights the need to optimize models and malignancy risk thresholds for this population or develop a new model.