Introduction

Papillary thyroid carcinoma (PTC) accounts for approximately 85% of all thyroid carcinoma cases, and its global incidence has been increasing over the past few decades [1]. Despite having an indolent clinical course, PTC is associated with cervical lymph node metastasis (LNM) [2]. Cervical LNM occurs in approximately 30–80% of PTC cases and is one of the most critical risk factors for locoregional recurrence [3]. The presence of LNM also affects the staging and treatment options for PTC [4]. Central compartment LNM is usually treated with central compartment neck dissection (CCND), and central LNM has been found in 45% of PTC patients with clinically negative LNs (cN0) who undergo prophylactic CCND [5]. However, the risk of hypoparathyroidism and recurrent laryngeal nerve injury is heightened with CCND; whether CCND should be performed prophylactically in cN0 patients remains controversial [6]. Patients with preoperatively evident lateral nodal metastasis typically require aggressive treatment, including lateral compartment lymph node (LN) dissection and high-dose radioactive iodine (RAI) therapy [7]. However, therapeutic neck dissection and/or RAI ablation increases the morbidity and reduces the quality of life [8]. Lowering the incidence of complication must be based on the reduction of unnecessary CLN dissection. Therefore, an accurate preoperative assessment of cervical LNM is important for choosing the optimal therapeutic strategy for PTC patients.

Ultrasound (US) is the preferred screening modality for preoperative assessment of cervical LN (CLN) status in PTC patients [9]. However, preoperative US has a low sensitivity (38–59%) for detecting cervical LNM [6, 10]. Furthermore, the diagnostic performance of neck US differs among physicians, and the interobserver variability is high [11]. Fine-needle aspiration (FNA) is an invasive approach, and its sensitivity for evaluating LNM varies and is specific to the operator [12]. Thus, a non-invasive, effective alternative imaging modality with improved accuracy is urgently needed in clinical practice.

Fueled by the rapid advances in medical imaging and developments in analytical methods, the field of radiomics has attracted increasing research attention in recent years [13]. Radiomics refers to the high-throughput extraction of extensive quantitative features to convert medical images into mineable data that could likely be used as diagnostic, predictive or prognostic biomarkers and support the clinical decision-making system [14]. Accordingly, our previous studies have proved the value of radiomics in predicting the cervical LNM of PTC [15,16,17]. To our knowledge, the US radiomics method applied in individually predicting cervical central and lateral LNM for PTC patients, respectively, has not been reported in the same article.

Therefore, the current study aimed to develop and validate nomograms that incorporated the US radiomics as well as the clinical risk factors for individual prediction of central and lateral cervical LNM in PTC patients.

Materials and methods

Patients

Between January 2019 to June 2019, consecutive patients with thyroid nodules in Fudan University Shanghai Cancer Center (Shanghai, China; training and internal validation cohort), Zhongshan Hospital (Shanghai, China; external validation cohort 1) and Ruijin Hospital (Shanghai, China; external validation cohort 2) were included. The inclusion criteria were as follows: (1) primary surgical resection was performed for the target tumor which had a pathological diagnosis of PTC; (2) preoperative neck US was performed, and the images were recorded and saved as DICOM format; (3) US images meet the requirements for the stored images [15]; (4) patients with complete clinicopathological information. The exclusion criteria included the following: (1) patients with more than one lesion confirmed to be PTC; (2) no history of preoperative therapy (radiofrequency or microwave ablation, neck radiation therapy); (3) patients with only microscopic cervical LNM. After exclusion, a total of 720 patients were enrolled in this study. Four hundred forty-three patients were chronologically divided into two cohorts: the training cohort with 300 patients who were treated between January and April 2019, and the internal validation cohort with 143 patients who were treated between May and June 2019. Two external validation cohorts enrolled 144 and 133 consecutive patients, respectively, using the same criteria. The patients were divided into two groups based on the pathological reports of the lymph node status after CLN dissection.

This retrospective study was approved by the Ethics Committees of all participating hospitals and performed in accordance with the Declaration of Helsinki. The inform consent was waived because of the retrospective nature of the study.

Surgery and pathology

All patients received either total thyroidectomy or lobectomy in accordance with the clinical TNM staging and underwent therapeutic or prophylactic central compartment dissection. Lateral neck LN dissection was performed for patients with FNA proven, or radiologically suspicious LNs that were not suitable for FNA. The resected thyroid and LNs specimens were collected and subjected to pathological examination, including the determination of unifocal or multifocal, and tumor size (maximum diameter of the tumor); the number, size and neck level of the metastatic LNs. Experienced pathologists in each of the three institutions reviewed and validated the pathological results. The pathological results of the LNs were used as the gold standard [15].

Clinical characteristics

The baseline clinical information, including age, gender, thyroglobulin (TG), thyroid stimulating hormone (TSH), thyroglobulin antibody (TGAB), thyroid peroxidase antibody (TPOAB), and cytological findings of FNA were collected from the medical record system and was double checked by the qualified specialists. The results of TG, TSH, TGAB, and TPOAB were detected within one week prior to surgical treatment. According to the previous reports and clinical experience [18, 19], the threshold set for TG, TSH, TGAB, and TPOAB were as follows: TG ≥ 77 ng/ml, TSH ≥ 2.5 ng/ml, TGAB ≥ 100 IU/ml, and TPOAB ≥ 35 IU/ml. Imaging data including US images, US reports and CT reports were obtained from the medical image station.

US image acquisition and US-reported CLNs status

All patients received neck US examination before surgery. US images were acquired with a Supersonic Aixplorer System using a 15–4 linear-array transducer (SuperSonic Imaging, France) by radiologists with more than six years of experience. The US acquisition parameters were consistent among patients: image depth, 3 cm; focus parallel to the lesion, and gain 53%, and the spatial resolution of axial and lateral was 0.2 mm and 0.4 mm, respectively [20]. The detailed requirements for the US images were described in our previous study [20]. The diagnostic result “central LNM” or “lateral LNM” in the US report was defined as US-reported central/lateral cervical LNM positive. The results of “LNs negative”, “inflammatory LNs” and “small LNs” were categorized as US-reported central/lateral cervical LNM negative [15, 20]. The abnormal US finding suggestive of LNM include the following: (1) round shape; (2) absence of the fatty hilum; (3) microcalcification within the nodes; (4) presence of peripheral flow; (5) heterogeneity with cystic components [21]. LNs that met two or more of the five criteria would be considered positive. All the US images and reports were retrospectively reviewed and validated by independent senior radiologists. The diagnostic sensitivity, specificity and accuracy of neck US examination in the training and validation sets were calculated based upon the US reports [15].

Region of interest (ROI) segmentation and feature extraction

A representative US image was selected, in which the tumor was delineated as the ROI, and was manually segmented by two senior US physicians with more than 6-year-experience in thyroid US imaging who were blinded from the final LNs status. Before feature extraction, the area inside the ROI where the feature extracted was min–max normalized into 0–255 to remove bias, and scaling factors of the effect of different imaging parameters. To avoid the effect of outliers, we calculated the maximum and minimum values after removing outliers for min–max normalization of images. Then, the software “PTC cervical LN metastasis prediction system” developed by the Department of Electronic Engineering, Fudan University [20] was used to input DICOM images after delineation, followed by extraction of image features. (MATLAB R2015b, Mathworks). To evaluate the reproducibility of feature extraction, fifty cases were randomly selected and double-blinded for comparing manual segmentation by two physicians. The interclass correlation coefficient (ICC) was calculated to measure the intra-observer and inter-observer agreement of the radiomics feature extraction. An ICC greater than 0.80 was considered to indicate good agreement.

Feature selection and radiomics signature building

The detailed procedures of the feature selection and the radiomics signature construction for predicting the central and lateral cervical LNM were described in our previous studies [15, 16, 20], respectively. Briefly, based on the training set, 614 extracted reproducible features were reduced to 19 and 16 features, respectively, using our feature selection model. The selected radiomics features were then combined to build two radiomics signatures (Rad-score). Based on the estimated coefficient, a Rad-score for each case was computed to reflect the risk of cervical central and lateral LNM.

Development of US radiomics nomogram

A multivariate logistic regression analysis comprising the Rad-score and the independent risk factors was conducted. The backward stepwise selection procedure was utilized with a liberal p value less than 0.05 as the retention criteria to select the final independent predictors for central/lateral LNM. Then a radiomics nomogram was developed based on the multivariate analysis in the training set.

Performance of the US radiomics nomogram

The receiver operating characteristic curve (ROC) of the US-reported LN status, radiomics signature, and radiomics nomogram in each cohort was plotted. The discriminatory performance of the three models was evaluated using the area under curve (AUC). Calibration of the US radiomics nomogram in all four cohorts was assessed by the calibration curve and Hosmer–Lemeshow test. To determine the clinical usefulness of the US radiomics nomogram, decision curve analysis (DCA) was performed by quantifying the net benefits at different threshold probabilities in our entire dataset.

Statistical analysis

The statistical analyses were performed with R software (version 3.5.3 http://www.r-project.prg), where the package ‘shiny’, ‘foreign’, ‘nomogramFormula’, ‘tmcn’, ‘rms’, ‘set’, ‘glmnet’, ‘rmda’ and ‘DT’ were applied. All other statistical tests were conducted using SPSS software (version 26, Chicago, IL). The categorical and normally distributed continuous variables were presented as frequency (percentage) and mean ± standard deviation (SD), respectively. Categorical variables were compared by χ2 test; student’s t-test was used for comparison between normally distributed continuous variables. Rad-scores were presented median (interquartile range), and the potential association of the Rad-scores and LN status in the training and validation cohorts were assessed using a Mann–Whitney U test. The Delong test was employed to compare different AUCs. A two-sided p < 0.05 was considered to indicate statistically significant.

Results

Clinical characteristics

The clinicopathological characteristics of the patients in the entire dataset are presented in Tables 1 and 2. The positive rate in terms of the central LNM in the training and validation cohorts were 34.7%, 32.9%, 48.6%, and 31.6%, respectively; and 9.0%, 10.5%, 10.4%, and 14.3%, respectively, with regard to the lateral LNM. No significant differences in the positive rate were found among the four cohorts (p = 0.058; p = 0.435).

Table 1 Associations between the central CLN status and clinical parameters in the training and validation cohorts
Table 2 Associations between the lateral CLN status and clinical parameters in the training and validation cohorts

Establishment of US radiomics signatures

In total, 614 US radiomics features were extracted from the original thyroid US images. Among them, features with nonzero coefficients were selected using our feature selection model [20]. The flowchart of the feature selection is presented in Additional file 1: Fig. S1, and the selected features in the radiomics score calculation formula can be found in Additional file 1: Table S1 and S2. The distribution of the Rad-score for patients with and without central/lateral LNM in the training and validation cohorts were displayed in Tables 1 and 2.

Radiomics signature discrimination

The Rad-score was significantly higher in the central and lateral LNM groups than that in the non-LNM groups in both the training set and the validation sets (p < 0.001, p < 0.001, Tables 1 and 2). The radiomics signature yielded an AUC of 0.839 (cutoff value: − 0.389) for discriminating between central/non-central LNM groups in the training set, and 0.819, 0.799, 0.797, respectively, in three validation sets (Fig. 1). The AUC for differentiating between lateral/non-lateral LNM groups was 0.908 (cutoff value: -0.063) in the training set and were 0.888, 0.796, 0.793, respectively, in three validation cohorts (Fig. 2). The radiomics signatures yielded a better discriminatory performance of cervical LNM than that of the radiologists’ subjective prediction (Tables 3 and 4).

Fig. 1
figure 1

Performance of the different models in predicting the central LNM in PTC patients. a ROC curves of US-reported central LN status, radiomics signature, and radiomics nomogram for predicting central compartment LNM in the training cohort; b in the internal validation cohort; and c, d in two external validation cohorts. ROC, receiver operation characteristic; US, ultrasound; LN, lymph node; LNM, lymph node metastasis

Fig. 2
figure 2

Performance of the different models in predicting the lateral LNM in PTC patients. a ROC curves of US-reported lateral LN status, radiomics signature, and radiomics nomogram for predicting lateral compartment LNM in the training cohort; b in the internal validation cohort; and c, d in two external validation cohorts. ROC, receiver operation characteristic; US, ultrasound; LN, lymph node; LNM, lymph node metastasis

Table 3 Diagnostic performance of US, radiomics signature, and nomogram for predicting central LN status in the training and validation cohorts
Table 4 Diagnostic performance of US, radiomics signature, and nomogram for predicting lateral LN status in the training and validation cohorts

Prediction of LN status based on radiomics nomogram

The age (< 45), US-reported central CLN status, and radiomics signature were identified as independent predictive factors for central LNM. The US-reported lateral CLN status and radiomics signature were identified as independent variables for lateral LNM by multivariate logistic regression analysis (Tables 5 and 6). The two radiomics nomograms were constructed by incorporating the independent predictors. In the training sets, the radiomics nomogram showed the highest discrimination between central LNM positive and negative with an AUC of 0.875 (95% CI 0.834–0.915; cutoff value: 0.273; ~ 52 points; Fig. 3a); the nomogram for discriminating between the lateral LNM positive and negative also yielded the greatest AUC (0.938, 95% CI 0.887–0.989; cutoff value: 0.177; ~ 52 points; Fig. 3b), which indicated that nomogram achieved better discriminatory performance than either the US-reported CLN status or the radiomics signature. The favorable discrimination was also observed in the validation cohorts (Tables 3 and 4). The calibration curve of the radiomics nomograms for the probability of central/lateral LNM demonstrated an optimal consistency between the prediction and pathologic observation in the training and validation cohorts (Figs. 4 and 5). The Hosmer–Lemeshow test showed no statistical significance, which suggested no significant deviation from a perfect fit.

Table 5 Independent predictive factors for central LNM in the radiomics nomogram
Table 6 Independent predictive factors for lateral LNM in the radiomics nomogram
Fig. 3
figure 3

US-based radiomics nomograms used for prediction of a central compartment and b lateral compartment LNM in patients with PTC. US, ultrasound; LNM, lymph node metastasis; PTC, papillary thyroid carcinoma

Fig. 4
figure 4

Calibration curve evaluate the radiomics nomogram used for prediction of central LNM a in the training cohort (p = 0.182), b internal cohort (0.427), and c, d in two external cohorts (p = 0.561, p = 0.894). The Hosmer–Lemeshow tests yield nonsignificant statistics. LNM, lymph node metastasis

Fig. 5
figure 5

Calibration curve evaluate the radiomics nomogram used for prediction of lateral LNM a in the training cohort (p = 0.722), b internal cohort (p = 0.326), and c, d in two external cohorts (p = 0.637, p = 0.589). The Hosmer–Lemeshow tests yield nonsignificant statistics. LNM, lymph node metastasis

Clinical significance

The decision curve analyses (DCA) for the US-reported CLN status, the radiomics signature, and the nomograms are illustrated in Fig. 6. The DCA demonstrated that, compared with other models, the nomogram exhibited an optimal net benefit to predict the central LNM for threshold probability within the range of 0–0.98. And the prediction of lateral LNM with radiomics nomogram could benefit more as compared to the radiomics signature and the US-reported lateral CLN status when the threshold probability ranged from 0.04 to 0.98.

Fig. 6
figure 6

Decision curve analysis for each model in predicting a central compartment and b lateral compartment LNM in PTC patients. The y-axis represents the net benefit, which was calculated by summing the benefits (true positives) and subtracting the harms (false positives), and weighing the relative harm of false-positive and false-negative results. According to threshold probability obtained, the radiomics nomograms have the greatest net benefit compared with other models or simple strategies such as all-treat and non-treat scheme do. LNM, lymph node metastasis; PTC, papillary thyroid carcinoma

Discussion

In this multi-institutional study, we developed and validated two radiomics nomograms to preoperatively predict the central and lateral LNM, respectively, in a non-invasive and individualized fashion. These two nomograms showed favorable predictive ability in both training and independent validation cohorts, outperforming the radiologists’ subjective prediction of cervical LNM and the radiomics signature. Furthermore, the DCA also demonstrated that the nomograms had the best net benefit within a wide range of threshold probabilities. These results highlighted that our radiomics nomograms provided a new method for an individualized prediction of central and lateral LNM in PTC patients before surgery.

Accurate preoperative prediction of the cervical LNM of patients with PTC is of great importance for guiding clinical treatment, particularly for surgeons to determine the extent of the surgical resection and assess the necessity of CLN dissection [3, 13]. Neck US examination plays a pivotal role in PTC staging. Unfortunately, the detection rate of cervical LNM is not desirable. Since US is unable to consistently visualize the deep anatomic structures or structures that are acoustically shadowed by air and bone [22], in many patients, the LNM in the central compartment may not show any abnormal finding in preoperative US examination [23]. Besides, US examination is an empirical diagnosis which is greatly affected by the expertise of the operator; therefore, prone to interobserver variability when determining lateral LNM [8]. Many studies have investigated the association between cervical LNM and the morphologic US features of the primary tumor, and have reported that the “taller than wide” shape, tumor size, presence of calcification, and closer distance between tumor and capsule were independent risk factors for LNM [15, 24,25,26]. Although the US features mentioned above are encouraging, the imaging features are also based on the judgment of the performing physician and thus, lack objectivity. However, utilizing imaging features of the primary tumor enables an approach that is less affected by the expertise of the operator, thus, could be a promising tool for clinical practice [8].

Currently, radiomics has been explored in the field of thyroid carcinoma research, such as in the differential diagnosis of thyroid nodules [27], prediction of the cervical LNM [16], extrathyroidal extension [28], BRAFV600E mutation [29], as well as the survival prediction [30]. Lu et al. developed a nomogram based on the contrast-enhanced CT, which showed favorable performance in predicting LNM in PTC patients with an AUC of 0.867 [31]. Nevertheless, iodinated contrast agents have a risk of contrast allergy and might delay the RAI in PTC patients, which limits its clinical application. Hu et al. established an MRI radiomics model, which obtained good accuracy in predicting LNM status preoperatively [6]. However, MRI is expensive, time-consuming, and not routinely performed for PTC patients in clinical management.

Our previous study has demonstrated the feasibility of applying the US-based radiomics analysis for assessing the risk of LNM in PTC patients [16]. Jiang et al. established a nomogram that combined the shear-wave elastography (SWE) radiomics signature and clinicopathological parameters achieved satisfactory predictive value for LN staging in PTC patients, with an AUC of 0.851 and 0.832 in the training and validation cohort, respectively [23]. These US radiomics studies were designed to predict the cervical LNM; however, the treatment strategy for LNM in the central or lateral compartment varies greatly. Thus, accurate radiomics models for predicting LNM in central or lateral compartments, respectively, are urgently needed. Then, another two US radiomics studies were conducted by us separately. One was designed to build a radiomics nomogram which could preoperatively predict the central LNM in PTC patients. The other was designed to develop a radiomics model that could discriminate the lateral LNM prior to surgery. Both of the studies achieved desirable results and was validated by a single independent validation set. However, the effectiveness of the two models needs to be validated in a larger multi-institutional study.

Here, we investigated the association between central/ lateral LNM and preoperatively available characteristics by univariate analysis. By multivariate logistic regression analyses, age, US-reported central CLN status, and radiomics signature were identified as the independent predictors for central LNM; US-reported lateral CLN status and radiomics signature were identified as the independent risk factors for lateral LNM. Then, two nomograms were developed based on the multivariate analyses. Interestingly, the independent risk factors on which the two nomograms were based in this study were also the independent risk factors for central/lateral LNM in our previous studies [15, 20]. This may indicate the great predictive value of these factors for cervical LNM in PTC patients. The nomograms have been confirmed to be capable of generating a probability of central/lateral LNM preoperatively and individually, which is in line with the prevailing concept of personalized precision medicine. The calibration curve showed superior consistency between nomogram-estimated and actual observed probability in each of the cohorts. For the clinical usefulness of the nomograms, we employed DCA to evaluate whether the nomogram-assisted decision-making would improve patient outcomes. Our results demonstrated that the nomogram supplied better clinical net benefit across the majority range of reasonable threshold probabilities than either the radiomics signature or the US-reported CLN status.

Some limitations have to be acknowledged in this study. First, the gene mutation status was not included in this study. Recently, increasing studies have been conducted to investigate the association between gene mutation and the LNM in PTC. Since not all the patients received the BRAF mutation examination after FNA, the role of gene mutation as an independent predictor still needs to be further studied. Second, the radiomics features were only extracted from conventional US images. Since multimodal US technology, such as SWE and the contrast-enhanced US, has been applied jointly for the diagnosis of thyroid nodules in clinical practice, a new radiomics study incorporating multimodal technology is ongoing at our center. Third, the utilized images were obtained from the same US system. Radiomics features have been reported to be affected by the US machine and parameters used for image acquisition. Thus, a multi-center study with multiple US systems is needed to acquire high-level evidence for further clinical application.

In conclusion, we developed an easy-to-use and noninvasive predictive tool that incorporates the radiomics signature and clinical characteristics to preoperatively evaluate the individual risk of cervical central/lateral LNM in patients with PTC. The nomograms proposed here hold promise for optimizing personalized treatment and might greatly facilitate the decision-making in clinical practice.