Introduction

In patients with suspected lung cancer, lymph node staging in the mediastinum is crucial due to the impact on management and prognosis [1]. Lung cancer staging is performed through the tumor nodes metastasis (TNM) system [2], where the size and position of the primary tumor (T), the presence and location of compromised lymph nodes (N), and the presence of distant metastases (M) are evaluated.

Approximately 30% of patients with pulmonary cancer presents mediastinal involvement at the time of diagnosis. This involvement may affect ipsilateral lymph nodes, contralateral lymph nodes or direct tumor invasion. The major difficulty is determining which patients need to proceed with an invasive investigation, and which are consider reactionary or benign lymphadenopathy [3].

Imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI) and positron emission tomography (PET-CT) can potentially identify malignant involvement in mediastinal lymph nodes. CT has the advantage of being more widely available, its images are easily interpreted, as well as less sensitive to motion artefacts [4]. In CT scan diagnosis, the increase in the diameter of mediastinal lymph nodes leads to a further investigation [1]. In addition, lymph node enhancement to intravenous contrast may be another criterion used as a predictor of malignancy. To confirm the involvement of suspected lymph nodes, biopsies are performed for histopathological analysis [3]. There is also the possibility to utilize the extraction of quantitative features from radiological images to aid the diagnosis. Those features can provide more information within studied structures not always visible to the clinician’s eye [5].

In this context, image processing and classification methods could be used to assess the diagnosis based on image evidence [2]. Since its introduction [6], texture features have been used in many classification problems associated with different imaging methods.

Based on the above evidence and many other published papers [5], texture features have the potential to aid diagnostic decisions. In this context, a set of texture features were used and applied to different classifiers based on machine learning. The main objective was to evaluate whether texture analysis associated with machine learning approaches could differentiate between malignant and benign lymph nodes. The great differential of our study was that 15 texture features were extracted from mediastinal lymph nodes, with five different physicians as operators. In addition, the inter variability of operators who selected the regions of interest was evaluated. The gold standard was tissue sampling extracted through surgery in a cohort of selected patients with confirmed lung cancer.

Material and methods

This retrospective study initiated with the collection of computed tomography examinations at a medical school, between June of 2019 and May of 2020. The local institutional ethics committee approved this study (CAAE number: 15612619.6.1001.5411) according to country regulations. The patient selection used the following inclusion criteria: (1) patients who underwent CT exams on the same equipment and performed surgery or mediastinoscopy and biopsy of mediastinal lymph nodes in within a maximum period of ten days between them. In the case of multiple lymph nodes, the largest within each station was selected. (2) Lymph node with a short diameter greater than 12 mm. (3) All lymph node diagnosis were confirmed by histopathological analysis, here used as gold standard confirmation. In addition, the following exclusion criteria were adopted: (4) patients who had surgery before CT acquisition and (5) patients with lymph nodes compromised by CT artifacts.

All CT examinations included the chest and the upper abdomen. CT was performed with a multiple-row detector CT scanner GE Optima 660, 64 channel (General Electric, USA). All CT multiphase scans were acquired from the same equipment. The acquisition parameters were: collimation, 64 × 0.625 mm; 120 kVp, modulation mAs with 10.0 standard deviation; rotation time, 0.75 s; reconstruction thickness, 2 mm; increment, 1 mm; pitch, 1.0; field of view, 35 cm; pixel size, 0.7227 × 0.7227 mm2 and matrix, 512 × 512. All patients had exams with and without sequences with contrast medium; iodixanol contrast agent was injected intravenously at weight-adjusted doses according to body mass. After injection, CT was performed with a 40 s delay.

Tissue sampling was obtained through surgery (thoracotomy, video assisted thoracoscopy or mediastinoscopy) with complete lymph node resection or with sampling from nodal stations 2 (upper paratracheal—right/left), 4 (lower paratracheal—right/left), and 7 (subcarinal).

Five operators (physicians), with more than 15 years of experience, including one radiologist and four thoracic surgeons, selected the patients and analyzed all CT examinations, in axial orientation. They individually selected each region of interest where texture features were extracted. A total of 18 selected patients (mean age 54.5 years, range age from 34 to 66 years), being 9 male and 9 female, were enrolled in this study. After resection and histopathological analysis, 39 lymph nodes were adequate for analysis, being 15 malignant and 24 benign.

The steps of extracting statistical textures and classification with machine learning were performed in two different software, Matlab (steps 1 to 4) and Orange Canvas (steps 5 to 7). A summary of the main steps performed throughout the methodology are presented below:

  1. 1.

    After selecting inclusion and exclusion criteria, DICOM images were read in Matlab;

  2. 2.

    Radiologists selected the most appropriate slices for lymph node visualization;

  3. 3.

    Regions of interest (ROI) were positioned within each of the included lymph nodes;

  4. 4.

    From each ROI, 15 different statistical textures were extracted;

  5. 5.

    Textures were assessed for their ability to distinguish between the two groups using the Gini index and Gain Ratio, selecting the five best features;

  6. 6.

    Three different Machine Learning classifiers were used;

  7. 7.

    Results were demonstrated according to the ROC curve and classification indexes.

Feature extraction

Texture extractions were performed using Matlab software R2017a. We selected CT slices with the largest lesion diameter. In order to achieve the best classification, we compared images acquired with and without contrast medium. Each operator individually positioned the regions of interests (ROI) of 10 × 10 pixels, contained within 80% of the inner lymph node area. Figure 1 demonstrates the selected lymph nodes in CT axial slice without contrast medium (a); CT axial slice with contrast medium (b); CT axial slice without contrast medium and ROI positioned (c); CT axial slice with contrast medium and ROI positioned (d). Since operators positioned each ROI individually, all our results of classification include mean and standard deviation.

Fig. 1
figure 1

Example of ROI positioning within selected lymph nodes in CT axial slices. CT axial slice without contrast medium (a). CT axial slice with contrast medium (b). CT axial slice without contrast medium and ROI positioned (c). CT axial slice with contrast medium and ROI positioned (d)

A selection of 15 statistical texture features were used including first-order statistical features such as mean, standard deviation, minimum and maximum intensity, skewness, and kurtosis [7]. Also, second-order statistical methods such as gray level co-occurrence matrix (GLCM) [8], gray level run-length (GLRL), and wavelet’s transform [9] were utilized.

Machine learning classification

Orange Canvas® software processed all texture features with different methods of machine learning: Stochastic Gradient Descent (SGD), Naive Bayes (NB) [10], and Support Vector Machine (SVM) [11, 12].

Stochastic Gradient Descent (SGD) Stochastic is a standard algorithm to optimize complex functions iteratively. SGD has been used as an optimization method for unconstrained problems, but can be utilized for classification problems as well. SGD performs iteratively over the training examples updating the model parameters with each iteration and it approximates the true gradient through a single training example. SGD was used with the Hinge loss classification function, with constant learning rate of 0.01 and 50 iterations [13, 14].

Naive Bayes (NB) is a classification method based on Bayes’ theorem and the maximum posterior hypothesis. This method assumes that the effect of an attribute on a given class is independent of the other attributes. The classification searches for the maximum probability for each variable to be assigned to the correct class [10, 15].

Support Vector Machine (SVM) is a classification method that uses input–output training data from two classes. SVM algorithm establishes the equation of a hyperplane that divides the training data leaving all points of the same class on the same side while maximizing the minimum distance between either of the two classes and the hyperplane. SVM was used with the Radial basis function kernel, with numerical tolerance = 0.001; cost = 1.0; regression loss epsilon = 0.1 and iteration limit = 100″ [11, 16].

All three methods used the 15 textural features with an F10-fold cross-validation method. The training set was composed of 70% of all the input data and the test with the remaining 30%. Gain ration and Gini index were used to rank all features according to their correlation with each class [17, 18]. Thus, we selected the five features that achieved the highest scores for classification within each machine learning classifier and each operator.

To determine how efficiently the model classified our groups (malignant and benign lymph nodes) we utilized parameters such as the area under Receiver Operating Characteristic curves (AUC), accuracy (CA), F-score (F1), precision, and sensitivity.

Results

Among the 18 patients and 39 lymph nodes available for analysis, 15 were malignant and 24 benign evidenced by the gold standard, the histopathological analysis after tissue sampling. All three classifiers presented the same five best features for classification in each operator (E_soma_bior3.3_1, Evbior3.3_1, Edsum4_1, Ed_hafigure 1ar_1, Eh_bior3.3_1). All those features are related to Wavelet’s transforms with numbers related to scale and wavelet filters of the decomposition. Those five best features achieved the best Gain Ratio and Gini index, regardless of exams with and without contrast.

Gain Ratio and Gini index selected which features distinguish with higher precision, sensitivity and the area under the Receiver Operating Characteristic curves the data between both patient groups. Table 1 demonstrates, for the two best machine-learning classifiers (SVM and SGD), the area under the ROC curve (AUC), accuracy (CA), F-score (F1), precision and sensitivity for lymph nodes extracted in CT slices without contrast for the five operators. Table 2 demonstrates the same parameters for CT slices with contrast medium.

Table 1 Test results of the SGD and NB classifiers for the five operators with images obtained without the contrast medium
Table 2 Test results of the SGD and NB classifiers for the five operators with images obtained with the contrast medium

In order to determine the optimal cutoff value for both sensitivity and specificity, we performed a plot of ROC curves for the best classification methods (SVM—Support Vector Machine; NB—Naïve Bayes; and SGD—Stochastic Gradient Descent). Figure 2 demonstrates the results without the contrast medium and Fig. 3 demonstrates the results with the contrast medium for all the five operators (A–D).

Fig. 2
figure 2

ROC curves for the different classifiers (SVM—Support Vector Machine; NB—Naïve Bayes; SGD—Stochastic Gradient Descent) in images obtained without contrast medium of all five operators. Operator 1 (a). Operator 2 (b). Operator 3 (c). Operator 4 (d). Operator 5 (e)

Fig. 3
figure 3

ROC curves for the different classifiers (SVM—Support Vector Machine; NB—Naïve Bayes; SGD—Stochastic Gradient Descent) in images obtained with the contrast medium of all five operators. Operator 1 (a). Operator 2 (b). Operator 3 (c). Operator 4 (d). Operator 5 (e)

Discussion

This study includes some limitations. The whole study was carried out with data obtained from one computed tomography of a single institution and all our data was obtained retrospectively. Although the results are promising, a full clinical trial with prospective cases from different institutions would be necessary to consolidate the method. Due to increasing innovation and development in new diagnostic equipment, the cost and quantity of radiological exams increases every day, requiring optimization procedures. Therefore, researchers worldwide are frequently looking for algorithms to minimize these costs and assist physician in streamlining radiological procedures, especially for developing countries. In this sense, Gopinath et al. [19], implemented a methodology similar to ours to classify benign tumors from thyroid malignancies. Recently, Apostolopaulos et al. [20] employed neural networks to differentiate patients with Covid-19 from normal in X-ray exams. In particular, the increase in lymph node size is a nonspecific finding that does not allow to know if it is increased by secondary neoplastic involvement, by infectious/inflammatory process or otherwise [1, 3]. Thus, for this staging, it is very important to establish whether an enlarged lymph node, detected by CT or another diagnostic imaging method, is metastatic or not. In this study, the great differential was that we tested 15 textures from mediastinal lymph nodes, with five different operators selecting regions of interest. We presented a classification approach for mediastinal lymph nodes based on machine learning and texture features. Our cohort included only patients with confirmed lung cancer. To establish the gold standard, all our mediastinal lymph nodes passed through histopathological analysis.

Our approach was able to differentiate with good sensitivity and area under the ROC curve (AUC) the mediastinal lymph nodes from being malignant and benign. When comparing the three classifiers, SVM (95% of AUC) and SGD (88% of AUC) presented the best classification performance with images without contrast medium. We also observed a small variability among operators. For example, for the values of AUC, they varied from 0.87 to 0.98 in images without contrast medium, and between 0.84 and 0.98 in images with contrast medium. The five operators carefully conducted the selection of patients and the manual positioning of ROIs within lymph nodes. Literature reports inter-operator variability lower than 15% [21,22,23] in manual segmentation and ROI positioning. Our results presented a variability of less than 5% for AUC and less than 5% for sensitivity for the best classifier, demonstrating low variability among operators.

Another interesting finding was that images with contrast medium provided somewhat lower classification scores for all machine learning approaches as can be seen when comparing Tables 1 and 2. However, not as pronounced as those presented by Andersen et al. [24] or discussed by Bayanati et al. [25]. We hypothesize that CT sequences without contrast medium preserves the texture within structures and allows differentiating between malignant and benign lymph nodes. However, contrast analysis should not be ruled out for diagnosis without further studies.

Previous works demonstrated the potential of texture analysis associated with machine learning. SVM in association with texture features has been successfully used before for differentiating benign and malignant solitary pulmonary nodules Zhu et al. [16]. Another similar approach to ours was proposed by Ye et al. [26], where authors used first and second-order statistics features with SVM classifier to differentiate normal-abnormal cyst, and carcinoma-haemangioma in liver CT images. They presented accuracy higher than 95% for all three classifications.

Also, for classifying malignant and benign lymph nodes, Andersen et al. [24] presented a method based on texture analysis. They used a small number of statistical features such as mean image intensity after histogram-based filtration and achieved 0.834 of AUC. Sigovan et al. [27] utilized the mean apparent diffusion coefficient values from diffusion-weighted MRI images to differentiate benign from malignant lymph nodes. They found a sensitivity of 90.9% and an accuracy of 85%. Another interesting review study regarding computer-aided detection for lung nodule differentiation in malignant and benign was provided by Al Mohammad et al. [28]. In this review, many studies demonstrated that computer-aided detection (CAD) increased detection sensitivity and recognized some originally missed nodules by radiologists [29, 30].

Bayanati et al. [25] also utilized an approach to differentiate benign and malignant mediastinal lymph. Their approach combined textural (GLCM and GLRL) and shape features with logistic regression and SVM classifiers. When comparing the results of Bayanati et al. [25] with this present work, there are some important topics of discussion. We utilized more texture features (15 against 6), and three different machine learning classifiers against two [25]. The size of our ROI was fixed to decrease the effect of size variation in textural features. We utilized both CT sequences with and without contrast medium and proved that for every classifier, sequences without contrast performs better. In addition, we demonstrated small variability among the five operators in ROI positioning, since they used only one operator. This allowed achieving higher AUC (95%) and sensitivity (89%) in comparison to [25] those obtained by the authors, AUC (87%) and sensitivity (81%).

The application of computational methods with a high sensitivity and accuracy values could help in staging mediastinal lymph nodes in lung cancer patients. In many cases, diagnosis based on image evidence would avoid more unnecessary imaging acquisitions thus being more beneficial for the patients. For example, PET is considered more sensitive than CT, and it allows the metabolic assessment of mediastinal lymph nodes in patients with lung cancer; however, its poor resolution and the potential of false positives diminish its reliability [31, 32]. Another important contribution of this work is to reduce the number of invasive procedures. Even when a trained surgical team and appropriate equipment are available, patients who present serious comorbidities that increase risk, invasive staging will be contraindicated, and a therapeutic decision can be impaired due to imprecise definition of lymph nodes involvement.

Still, due to the relatively small number of cases, this work initiates the discussion about the use of this method and potential further research with broader samples. Regardless, several works were able to contribute meaningfully to the literature in this area, despite the small data set size similar to ours [24, 33,34,35].

Conclusion

In conclusion, we developed a method for the classification of mediastinal lymph nodes in lung cancer patients. The great differential of our approach was that we tested 15 textures, with three machine learning methods and five different operators. After classification with three machine learning approaches, there was a variability of less than 5% on sensitivity among operators, for the best classifier. Texture analysis associated with machine learning may be helpful in the differentiation between neoplastic and benign lymph nodes. They can aid the physician in diagnosis and potentially reduce the number of invasive analysis to histopathological confirmation.