Introduction

The incidence of thyroid cancer has increased dramatically worldwide in recent years [1]. Papillary thyroid carcinoma (PTC), accounting for 80–89% of thyroid carcinoma, is the most common thyroid carcinoma [2, 3]. Cervical lymph node metastasis, which occurs in approximately 35–80% of patients with PTC, is the most important risk factor related to recurrence and poor overall survival [4, 5]. As central lymph node metastasis (CLNM) is the first station of metastasis and the most frequently involved area, central compartment neck dissection (CCND) is often performed for patients with positive CLNM, which can significantly reduce the recurrence rate and mortality [6]. However, approximately 38% of PTC patients have been reported to have CLNM even in patients with no clinical evidence of nodal metastasis [7]. As preventive CCND increases the incidence of surgical complications, such as parathyroid and laryngeal recurrent nerve injuries, preventive CCND is controversial in CLNM-negative patients [8]. Therefore, to reduce the unnecessary CCND, an accurate and non-invasive method for preoperative assessment of CLNM is essential for selecting the optimal treatment strategy for patients with PTC.

Ultrasonography (US) is the preferred modality for the preoperative evaluation of patients with PTC and it is widely used to screen thyroid nodules and lymph node metastases. Studies have demonstrated that several ultrasound morphological features are associated with CLNM [9,10,11], such as tumor size, extrathyroidal extension, microcalcification, etc. However, the diagnostic performance show greatly varying degrees among different US physicians due to the subjectivity of the experience and application of diagnostic criteria [12,13,14,15]. Therefore, an effective and stable method is needed to evaluate the central lymph nodes of patients with PTC. Radiomics is an emerging technique, which can not only improve the accuracy and consistency of disease diagnosis but also reduce the time to output results. Based on these superiorities, several limited studies have investigated radiomics to predict the CLNM in PTC patients [16, 17]. However, the models in these studies are based on US-reported central lymph nodes status. As the US has a low detection rate for central lymph nodes owing to the influence of air, bone, or glandular tissues [18], it may lead to some impact on the accuracy of the model. Thus, the radiomic model in this study is based on the thyroid nodules themselves, which can avoid the above problems to some degree. In addition, different from previous studies that only determine the risk factors of CLNM [19, 20], the aim of our study is not only to identify risk factors for predicting CLNM, but also to develop and validate the models for predicting CLNM. Moreover, few studies have evaluated the overall diagnostic performance of the combined model, which has integrated the radiomics with conventional morphological features.

Therefore, to fill the gap of the available literature, our study aims to build an ultrasound radiomic model and compare its diagnostic value with a conventional morphological model for predicting CLNM in patients with PTC. We further construct a combined model based on ultrasound radiomic and morphological features, to assist clinical selection of the best treatment.

Materials and methods

Patients

The retrospective study was approved by our institutional review board, and the informed consent was waived. Consecutively patients with pathologically confirmed PTC from October 2019 to July 2022 were retrospectively collected in our hospital. Inclusion criteria were as follows: (1) patients who underwent thyroidectomy and lymph node dissection in the central region of the neck; and (2) patients who underwent ultrasound examination within two weeks before surgery. Exclusion criteria were as follows: (1) patients who had received preoperative treatment (radiofrequency or microwave ablation, radiotherapy, radioiodine therapy, or chemotherapy); (2) multiple thyroid nodules with confirmed PTC in one lobe; and (3) poor image quality. Details of the patient recruitment in this study were shown in the flow chart (Fig. 1).

Fig. 1
figure 1

The flow diagram of patients selection. CLNM, central lymph node metastasis

Finally, a total of 295 patients with PTC were included in this study, including 49 men and 246 women, with a mean age of 43.23 (20–73) years.

Ultrasound radiomic model

Image acquisition

To preserve maximum longitudinal images of bilateral lobe nodules of the thyroid gland according to thyroid and neck ultrasound examination specifications, images obtained with Mindray Resona 7 and Mindray DC-80 color Doppler ultrasound diagnostic instruments from Mindray, L14–5WU, and L14–5WE line array probes with frequencies from 4 to 14 MHz. were included. According to the different ultrasound scanners, patients who underwent Mindray Resona 7 scan were included in the training set (115 with CLNM, 97 without CLNM), while those who underwent the Mindray DC-80 scan were enrolled in the validation set (45 with CLNM, 38 without CLNM).

Image segmentation

The two-dimensional (2D) ultrasound images included in this study were imported into the ITK-SNAP software in DICOM format, and the region of interest (ROI) of the lesion was manually contoured by a US physician with 5 years of experience in thyroid ultrasound diagnosis. To evaluate the inter-observer agreement, another US physician with 10 years of experience in thyroid ultrasound diagnosis outlined 30 randomly selected cases. Both US physicians were blind to the pathological result. The reliability was evaluated through the intraclass correlation coefficient (ICC). The features with ICCs > 0.75 indicated a high reproducibility, which was reserved for further analysis. The representative images were shown in Fig. 2.

Fig. 2
figure 2

Representative images and segmentation results of thyroid nodules. (A, B) A female with central lymph node metastasis. (C, D) A female without central lymph node metastasis. (A, C) Original images and (B, D) segmentation results

Feature extraction

Owing to the different acquisition parameters of the two ultrasound scanners, all the original images were first resampled at a spatial resolution of 1 × 1 × 1 mm3 through a linear interpolation algorithm to ensure the rotationally invariant and comparable texture features between different scanners [21]. In addition, a fixed bin count of 32 was used to discretize voxel intensity values and reduce the noise. Then, the PyRadiomics package (https://pyradiomics.readthedocs.io) was used to extract the radiomic features [22], including first-order, morphological, and texture features [gray-level cooccurrence matrix (GLCM), gray-level dependence matrix (GLDM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), and neighborhood gray-tone difference matrix (NGTDM)]. Gray level co-occurrence matrix (GLCM) and gray-level dependence matrix (GLDM) features are computed from each 3D directional matrix and averaged over the 3D directions, and gray level size zone matrix (GLSZM), neighboring grey level dependence matrix (NGLDM), and neighboring gray-tone difference matrix (NGTDM) features are computed from a 3D matrix. Details of radiomic features were available at https://pyradiomics.readthedocs.io/en/latest/features.html.Then, to obtain the filtered derived image and extract the corresponding ultrasound radiomic features, the wavelet transform is applied to the original image. Finally, a total of 464 radiomic features were extracted.

Model construction

Z-score normalization was first applied to standardize the radiomic features in the training set. The least absolute shrinkage and selection operator (LASSO) was used to avoid overfitting the radiomic features and select the most significant predictive features. To find an optimal regulation weight in LASSO logistic regression, 10-fold cross-validation with minimum criteria was used, in which the value of λ yielded the minimum binomial deviance. Then, multivariate logistic regression analysis, with stepwise regression selection method and Akaike information criterion (AIC) was used to construct the final ultrasound radiomic model. Ultimately, we validate the model with an independent validation set. The selected features were linearly weighted to calculate the radiomic score (Rad-score).

Ultrasound morphological model

Ultrasound morphological features in this study included: location, length diameter, morphology, boundary, color doppler flow imaging, and microcalcifications of the nodules. Two ultrasonographers with 5 and 10 years of experience in diagnostic thyroid ultrasound completed the thyroid nodule morphologic feature evaluation independently. Any disagreement was resolved by negotiation and discussion. If disagreement was insisted, a senior ultrasonographer with 15 years of experience made the final determination. All ultrasonographers were unaware of the corresponding pathologic histologic findings before evaluating the images. All morphological features were included in a univariate logistic regression analysis to assess the predictive power of central lymph node metastasis. Subsequently, features with P < 0.05 were placed into a multivariate logistic regression analysis, and the final morphological model was constructed using the stepwise regression selection method and AIC.

Building combined model

The ultrasound radiomic and morphological features with P < 0.05 were subjected to a multivariate logistic regression analysis, and independent risk predictors for central lymph node metastasis were determined using the stepwise regression selection method and AIC to construct the final combined model.

Statistical analysis

As a prediction model study for diagnostic purposes, this study followed the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement [23].

SPSS (version 25.0), R software (version 4.2.1), and MedCalc (version 19.6.4) were used for statistical analysis. Group differences were evaluated through the Mann-Whitney U test for continuous variables, and the chi-square test or Fisher test for categorical variables. A two-sided P < 0.05 was indicative of a significant difference. The diagnostic performance of the combined, ultrasound radiomic and morphological models were assessed using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The optimal threshold value of the ROC curve was determined through the Youden index. The corresponding accuracy, sensitivity, and specificity values were calculated, and a 95% confidence interval (CI) was calculated by the Binomial exact method. The Hosmer-Lemeshow test was performed to estimate the goodness-of-fit of each model, and the calibration curves were plotted for each model, which was capable of visualizing the consistency of models. Finally, the clinical decision curve analysis (DCA) was used to evaluate the clinical application value of each model.

Result

Baseline characteristics

The clinical and ultrasound morphological characteristics of 295 thyroid nodules were presented in Table 1, in which there were significant differences in color doppler flow imaging, microcalcifications, morphology, boundary, and length-diameter between groups in both the training and validation sets (all P < 0.05). Significant differences were found in gender between groups in the training set (P = 0.018), but not in the test set (P = 0.483). No significant differences were found in age and location between groups in both the training and validation sets (all P > 0.05).

Table 1 Clinical and ultrasound morphological features of the patients

Ultrasound radiomic model

In total, 284 of the 464 ultrasound radiomic features had high reproducibility (ICC > 0.75). The process of the ultrasound radiomic feature selection was shown in Fig. 3. After selecting, five features remained to construct the ultrasound radiomic model. The formula was as follows: ln(P/1-P) = 0.226 + 1.457 × wavelet.HH_first_order_Median + 1.287 × wavelet.LL_first_order_Range − 1.185 × wavelet.LL_GLCM_Imc2–2.239 × wavelet.HH_GLDM_SmallDependenceHighGrayLevelEmphasis + 2.331 × wavelet.HL_GLSZM_ZoneEntropy, in which P is the probability of CLNM, with a cutoff value > 0.498. To standardize the features by Z-score before calculation, the mean and standard deviation of these features were detailed in Table 2.

Fig. 3
figure 3

The process of the ultrasound radiomic feature selection. (A) Flow chart of ultrasound radiomic features selection. (B) Radiomic feature selection by applying the LASSO binary logistic regression model. The vertical lines on the left side of the horizontal coordinate represent the tuning parameter lambda (λ) chosen in the LASSO model using 10-fold cross-validation via minimum criteria. (C) Coefficient convergence plot of the screened features, with the best log (λ) value screening out 12 histological features with non-zero coefficients. LASSO, least absolute shrinkage, and selection operator; PTC, papillary thyroid carcinoma; AIC, Akaike information criterion

Table 2 Means and standard deviations of ultrasound radiomic features

Ultrasound morphological model

Univariate logistic regression analysis showed that gender, microcalcifications, color doppler flow imaging, morphology, boundary, and length-diameter were risk factors for predicting CLNM (all P < 0.05) (Table 3). According to the multivariate logistic regression analysis, based on the length diameter of the nodes (OR, 2.623; 95% CI, 1.199–5.738, P = 0.016), morphology (OR, 4.114; 95% CI, 1.896–8.926, P < 0.001), color doppler flow imaging (OR, 7.194; 95% CI, 2.865–18.067, P < 0.001) and boundary (OR, 3.558; 95% CI, 1.352–9.363, P = 0.016) to construct an ultrasound morphological model. The formula was: ln(P/1-P) = − 2.911 + 1.404 × morphology(irregular) + 1.309 × boundary(irregular margin) + 0.440 × boundary(ill−defined) + 1.818 × color doppler flow imaging(mainly central vascularity) + 0.933 × color doppler flow imaging(Mainly peripheral vascularity) − 0.197 × color doppler flow imaging(mixed vascularity) + 0.937 × length diameter, where P was the probability of CLNM, with a cutoff value > 0.747.

Table 3 Univariate and multivariate logistic regression analysis of ultrasound radiomic and morphological features
Table 4 Diagnostic performance of each model in predicting central lymph node metastasis in the training and validation sets

Combined model

According to the multivariate logistic regression analysis (Table 3), the final combined model was constructed based on the morphology (OR, 10.466; 95% CI, 3.016–36.323, P < 0.001), boundary (OR, 9.767; 95% CI, 2.262–42.173, P = 0.02), length diameter (OR, 0.392; 95% CI, 0.161–0.955, P = 0.039) and Rad-score (OR, 3.577; 95% CI, 2.372–5.395, P < 0.001) (Table 3). The formula was: ln(P/1-P) = − 1.764 + 2.348 × morphology(irregular) + 2.279 × boundary(irregular margin) + 0.579 × boundary(ill−defined) − 0.936 × length diameter + 1.275 × Rad-score, where P is the probability of CLNM, with a cutoff value > 0.486.

Model performance

The ROC curves of the combined, ultrasound radiomic and morphological models in the training and validation sets are shown in Fig. 4. The AUCs of the models for predicting CLNM were 0.956 (95% CI, 0.928–0.984), 0.920 (95% CI, 0.875–0.953), 0.836 (95% CI, 0.786–0.886) in the training set, and 0.973 (95% CI, 0.951–0.995), 0.901 (95% CI, 0.815–0.955), 0.796 (95% CI, 0.742–0.850) in the validation set, respectively.The accuracy, sensitivity, and specificity of each model were listed in Table 4. The AUCs of the combined model were significantly higher than ultrasound radiomic and morphological models in both the training and validation sets, and the combined model showed significant higher AUC than three sonographers in the validation cohort (all P < 0.05) (Table 5). The Hosmer-Lemeshow test showed non-significant results for the models in the training and validation sets (all P > 0.05), and the calibration curves showed good consistency between prediction and observation (Fig. 5A and B C). The DCA analysis demonstrated that the combined model had a higher net clinical benefit compared with the other models (Fig. 5D).

Fig. 4
figure 4

The receiver operating characteristic (ROC) curves of combined, ultrasound radiomic, and morphological models. (A) training set, (B) validation set

Table 5 Comparison of the diagnostic performance of the models in predicting central lymph node metastasis in the training and validation sets
Fig. 5
figure 5

Calibration curves and clinical decision curves of each model. (A) Combined model calibration curve, (B) Ultrasound radiomic model calibration curve, (C) Morphological model calibration curve, (D) clinical decision curve

We concluded the TRIPOD type of this study could be type 2b.

Discussion

PTC is the most common thyroid carcinoma with low mortality and good prognosis, with a 10-year survival rate of 97% [24]. However, the recurrence rate of PTC with cervical lymph nodes can reach 22% [25]. In particular, metastases in the central region of the neck are the most common [26]. Among the currently used imaging modalities, the efficacy of computed tomography (CT) and US in the evaluation of thyroid and CLNM are not significantly different [27], and magnetic resonance imaging (MRI) has a slightly higher diagnostic sensitivity than US [28]. However, CT is radioactive and insensitive to soft tissue; MRI is expensive and took a long time. The US is considered the imaging test of choice for the evaluation of thyroid nodules and lymph node metastases owing to its lack of radiation damage and ease of universal access. However, the US has a high specificity (93–98%) and low sensitivity (33–64%) for central lymph nodes owing to the influence of air, bone, or glandular tissues [15, 18, 27, 29]. In this study, the sensitivity of the US morphological model for the detection of CLNM was 52%, and the specificity was 90%, which is consistent with the literature. Since preoperative US is prone to miss or misdiagnosis of CLNM, prophylactic CCND can reduce the tumor recurrence rate. However, it also increases the incidence of surgical complications [30]. The Chinese and Japanese thyroid guidelines advocate aggressive prophylactic CCND, while the American Thyroid Association does not recommend prophylactic CCND [31], which is controversial. Therefore, accurate preoperative assessment of the status of central lymph node metastasis of PTC is critical for the clinical selection of whether to dissect the central lymph nodes to improve the therapeutic efficacy without increasing surgical complications.

Previous literature reported that US features of PTC, such as tumor size, microcalcifications, irregular shape and multifocality, were associated with CLNM [10, 11]. In this study, we found significant differences between the CLNM and no-CLNM groups in the morphology, boundary, length-diameter, color doppler flow imaging, and microcalcifications of the PTC lesions using univariate logistic regression. However, multivariate logistic regression showed that only the lesion length-diameter, morphology, boundary, and color doppler flow imaging were significantly associated with CLNM. Some of the US morphological features, such as microcalcifications, were different from those reported in the literature and were associated with CLNM in the univariate logistic regression analysis of this study. However, microcalcifications were not an independent risk factor for CLNM in the multivariate logistic regression analysis. This may be related to its pathological basis, such as the mummification of some benign thyroid nodules, whose US presentation is also similar to microcalcifications [32]. Ultimately, we created morphological model using the above features. However, the sensitivity of this model is only 52%. Thus, a more effective method is needed for predicting CLNM.

Radiomics is an emerging frontier discipline that can extract a large number of quantitative imaging features from medical images for deep mining of tumor biological information and analysis of tumor heterogeneity. Moreover, radiomic analysis based on image features has objectivity and is valuable in predicting clinical outcomes [33, 34]. Previously, models based on radiomic features have been applied to predict lymph node metastasis in several tumors [35, 36]. For example, shear wave elastography combined with grayscale ultrasound for predicting CLNM of PTC [37]. It shows the feasibility of applying ultrasound radiomics features to predict CLNM in patients with PTC. In this study, we used radiomic analysis to predict the presence of CLNM, and the results demonstrated that the ultrasound radiomic model could predict CLNM in PTC patients in both the training (AUC, 0.920) and validation sets (AUC, 0.901) with higher efficacy than the US morphology model, which indicates the value of radiomics in predicting the presence of CLNM in patients with PTC.

In previous studies [10], investigators have often used US morphological or radiomic features alone for the prediction of lymph node metastasis. They have not evaluated the diagnostic accuracy and additional benefit of combined US morphological and radiomic features. A much higher diagnostic value of preoperative prediction of CLNM has been reported in patients with PTC when CT and US or US and clinical are combined [27, 38]. Therefore, we developed and validated a combined model in this study for predicting CLNM in PTC patients that combine ultrasound morphological and radiomic features. The combined model had higher diagnostic efficacy and more stable sensitivity and specificity in both the training and validation sets than the morphological or radiomic models alone. The combined model provided better net clinical benefit than the ultrasound radiomic or morphological models over the most reasonable threshold probability ranges, as demonstrated by DCA. Additionally, calibration curves showed good agreement between predicted and actual values for each model. The DeLong test depicted significant differences in AUC between the ultrasound radiomic, morphological, and combined models. These results demonstrated that our combined model can greatly facilitate the preoperative individualized prediction of CLNM in patients with PTC. Many studies have been conducted on this topic. Tong et al. [17] established a nomogram based on ultrasound-reported lymph node status and radiomic features to predict CLNM in patients with PTC. However, the exploration of central lymph nodes was limited due to gas, bone, etc., which may affect the reporting of lymph node status and thus have an impact on the results, by contrast, our study was directly targeting the tumor itself, which greatly avoids the problem of displaying central lymph nodes. Zhou et al. [39] developed a radiomics nomogram based on dual-energy CT derived iodine maps, which showed favorable performance in predicting CLNM in PTC patients with an AUC of 0.837. However, CT involves radiation exposure, and iodinated contrast agents have a risk of contrast allergy. Moreover, this examination might delay radioactive iodine therapy, which limits its clinical application. Li et al. [40] develop a CT-based radiomics model to predict CLNM preoperatively in patients with PTC. Compared with the study, our combined model has better sensitivity and specificity and eliminates the need for CT examinations, which reduces the financial stress and radiation damage to patients and can be performed in the clinic to serve the patients more effectively.

With the popularity of artificial intelligence in recent years, radiomics and deep learning have been widely used in the studies of tumor imaging. For example, Abbasian Ardakani A et al. [41] used both machine learning (ML) and deep learning (DL) models on multi-center databases to predict lymph nodes metastatic preoperatively in patients with PTC. Although this was a multicenter study, its sample size was limited. This may have some implications for model construction. In addition, the model for this study was based on US images of lymph nodes. This will ignore some of the central lymph nodes that are not shown due to gas, bone, etc. This may lead to the omission of some patients with CLNM. Comparing with this study, our study is based on the tumor itself, avoiding the above makes for more accurate results. However, only internal and not external validation was performed in our study. While there is a lot of research on DL [42,43,44], the underlying parameters of DL algorithms were inherently in black box, there is a lack of a sufficient number of studies performing external validation on DL algorithms and a limited number of publications focusing on DL protocols, which limits the extent to which the results could be generalized. Compare with DL models, radiomics model has better protocols and fewer data requirements. This makes the model more reproducible. And the disadvantage of radiomics is that some radiomics models in the process of outlining the region of interest are manual-segmentation of the image and lack certain criteria.

Limitations of this study were that this was a retrospective and single-center study with a limited sample size. However, the result of the validation dataset from another ultrasound scanner, makes our radiomic model more convincing. Thus, future large multi-center studies are still warranted to assure the generalizability of our results. In addition, the diagnostic performance of the model in this study isn’t comparing with human readers. It is a challenge for a sonographer to directly determine whether a lymph node is metastatic without scanning it. As for the central lymph nodes, it’s more difficult due to impact of air, bone, or glandular tissues and the experience of the doctor. Futhermore, our study only using a single 2D ultrasound image, and did not incorporate other multimodal ultrasound images such as elastography, ultramicro flow imaging, contrast-enhanced ultrasound, etc.

Conclusion

Our combined model is a noninvasive predictive tool that combined ultrasound radiomic and morphological features, which showed a better ability to predict CLNM preoperatively compared with ultrasound radiomic or morphology models, and can help clinicians to select more reasonable treatment modalities and avoid overtreatment.