Introduction

Thyroid cancer remains the most common endocrine malignancy, with papillary thyroid cancer (PTC) accounting for > 90% of new cases [1,2,3]. Papillary thyroid microcarcinoma (PTMC) refers to PTC with a maximum size of < 10 mm, and its incidence has continually increased in recent years [4]. Although PTMC generally has excellent outcomes, patients with some risk factors (e.g., node metastases) are more likely to experience local recurrence and metastasis [56].

Central lymph node metastasis (CLNM) is prevalent in PTC. Even in clinically node-negative (cN0) PTMC, the rate of CLNM is high due to the low sensitivity of preoperative examinations (e.g., ultrasound [US]) [7,8,9]. However, whether to perform prophylactic central lymph node dissection (PCLND) in patients with cN0 PTMC remains controversial [10]. PCLND may increase postoperative morbidity, such as by causing permanent hypothyroidism [11].

Screening patients at high preoperative risk of CLNM is crucial to determining the indications for PCLND [12]. Numerous studies have analyzed the risk factors for CLNM in patients with cN0 PTMC. However, they included only limited clinicopathological and US characteristics, preventing sufficient predictive accuracy. Radiomics is a novel technology that converts imaging data into a large panel of quantitative features [13]. Here, we used US radiomics features to develop and validate a predictive model for the individualized prediction of CLNM in patients with cN0 PTMC.

Materials and methods

Study design

This retrospective case-control study was conducted at two institutions. Its aim was to develop and validate a model for the individual prediction of CLNM in patients with PTMC. Data were collected from 327 patients at Cangzhou Hospital of Integrated TCM-WM·Hebei for the training cohort between January 2018 and December 2020. Data were obtained from Hebei Medical University Health System (n = 153) during the same period for the validation cohort. The study was approved by each institution’s institutional review board. Written informed consent was obtained from all patients. This study was conducted in accordance with the Declaration of Helsinki (2013 revision) and followed the Strengthening the Reporting of Observational Studies in Epidemiology reporting guideline.

Inclusion and exclusion criteria

The inclusion criteria were as follows: (1) age > 18 years; (2) histologically confirmed PTMC that was clinical N0 assessed by preoperative US; and (3) having undergone standard thyroidectomy and PCLND.

The exclusion criteria were as follows: (1) concurrent malignant disease of other organs; (2) recurrent or metastatic thyroid cancer; (3) history of previous surgery or radiotherapy of the neck; and (4) incomplete US or clinicopathological data (Figure S1).

Data collection

All clinicopathological data, including age, sex, and BRAF status, were collected from the medical records. For the BRAF mutation analysis, an AmoyDx® BRAF Mutation Detection Kit (V2) (ADx-BR02; Amoy Diagnostics Co., Ltd., Xiamen, China) was utilized. The detection of the mutation was performed using a next-generation sequencing method, followed by a real-time fluorescence polymerase chain reaction–amplification refractory mutation system.

US examinations were performed using a 5–14-MHz transducer (Siemens, ACUSON Sequoia, Siemens Medical Solutions USA, Inc., Malvern, PA, USA) by radiologists with at least 8 years of experience performing thyroid US evaluations. The US characteristics of each lesion, including diameter, texture, echo, boundary rule, presence of calcification, and capsule invasion, were evaluated by an independent radiologist (W.L.).

Feature extraction and selection

US images were retrieved from the picture archiving and communication system (Carestream, Toronto, Canada) for further feature extraction. The region of interest (ROI) of each lesion was manually segmented on the largest diameter image using ITK-SNAP software. To measure the interobserver agreement, all the manual segmentations were performed by two experienced radiologists who were blinded to patients’ characteristics. Moreover, one of the radiologists delineated the ROIs again after two weeks to measure the intraobserver agreement. The ROIs delineated by this radiologist in the second round were used for subsequent feature extraction. The radiomics feature extraction was performed using the open-source platform Pyradiomics (version 3.1.0). This platform allows the extraction of 851 radiomics features, which can be classified into shape features, first-order features, gray-level co-occurrence matrix features, gray-level size zone matrix features, gray-level run length matrix features, and gray-level dependence matrix features. The interclass correlation coefficient (ICC) was used to evaluate the inter- and intraobserver agreements of the feature extraction. Features with good consistency (ICC > 0.75) were subjected to further analysis.

Before the feature selection, the values of the extracted features were standardized with z scores. A three-step procedure was performed to select the robust radiomics features in the training cohort [14]. First, a univariable logistic regression analysis was performed to identify significant CLNM predictors with P < 0.05. Second, the Pearson correlation coefficient for each of the two features was calculated, and we excluded the one with a higher P value for those feature pairs with a strong correlation (Pearson r > 0.90). Overall, 107 features were screened out for the last selection. Third, the least absolute shrinkage and selection operator (LASSO) logistic regression model was performed to determine the optimal combination of radiomics features and calculate a radiomics (Rad) score by 10-fold cross-validations via the 1–standard error criteria.

Statistical analysis

Categorical variables are expressed as frequencies and percentages and were compared using the chi-squared test. We calculated the hazard ratios (HRs) and 95% confidence intervals (CIs) of CLNM using the logistic regression model with uni- and multivariate analyses. Pearson correlation coefficients were calculated to evaluate correlations among the parameters.

To provide a quantitative tool to predict the individual probability of CLNM, we generated the radiomics nomogram based on the multivariate analysis of the training cohort. The discrimination of the nomogram was assessed using receiver operating characteristic curves by calculating the area under the curve (AUC). The model’s calibration was assessed using calibration curves by comparing the predicted and actual probability. A decision curve analysis was utilized to assess the clinical usefulness of the nomogram.

All statistical analyses were performed using SPSS software (version 22.0; IBM Corporation, Armonk, NY, USA) and R software version 4.1.3 (R Foundation for Statistical Computing, Vienna, Austria). Statistical significance was set at a two-tailed value of P < 0.05.

Results

Patients’ baseline characteristics

Between January 2018 and December 2020, 327 and 153 patients were included in the training and validation cohorts. For the training cohort, the mean age was 45.0 years (range, 18–74 years); there were 260 women (79.5%) and 67 men (30.5%). For the validation cohort, the mean age was 46.2 years (range, 23–72 years); there were 123 (80.4%) women and 30 men (19.6%). The demographic and ultrasound characteristics of the two cohorts are summarized in Table S1.

Table 1 shows the association between CLNM and the patients’ characteristics. CLNM was significantly associated with younger age (< 45 years: 60.9% vs. 44.7%, P = 0.006), larger tumors (≥ 7 mm: 59.1% vs. 36.9%, P < 0.001), the presence of calcification (56.4% vs. 39.2, P = 0.003), capsule invasion (12.7% vs. 5.1%, P = 0.014), and BRAF V600E mutation (68.2% vs. 53.0%, P = 0.009).

Table 1 Clinicopathological and ultrasound characteristics of patients according to central lymph node metastasis in the training cohort

Derivation of rad score

A Rad score was developed using LASSO regression analysis based on six of the 107 radiomics features in the training cohort (Fig. 1). The formula used to calculate this score is as follows:

Fig. 1
figure 1

Radiomics feature selection using the least absolute shrinkage and selection operator (LASSO) logistic regression model. (A) Ten-fold cross-validation for tuning parameter selection in the LASSO logistic model. Solid vertical lines represent binomial deviance ± standard error. The vertical lines are drawn at the optimal values by minimum criteria and 1 - S.E. criteria. (B) LASSO coefficient profiles of the 107 radiomics features. A coefficient profile plot was produced against the log (λ) sequence. A vertical line was drawn at the value selected using ten-fold cross-validation, where optimal λ resulted in six nonzero coefficients

Rad score = 0.048135878 × originalshapeMaximum2DDiameterSlice + 0.034883446 × wavelet-LLHfirstorderKurtosis + 0.016035799 × wavelet-LLHglszmGrayLevelNonUniformityNormalized + 0.007692882 × wavelet-LLLglrlmRunLengthNonUniformity + 0.051670164 × wavelet-HHLglrlmRunLengthNonUniformity - 0.009525348  × originalngtdmContrast.

In the training and validation cohorts, a significant linear relationship was observed between the Rad score and the risk of CLNM (non-linear, P > 0.05; Fig. 2): a higher score was associated with a higher risk of CLNM.

Fig. 2
figure 2

Univariate logistic analysis of central lymph node metastasis with restricted cubic splines (RCS) in the training (A) and validation (B) cohorts

Uni- and multivariate analyses

In the training cohort, the univariate analysis revealed that age, tumor size, calcification, capsule invasion, and Rad score was significantly associated with the risk of CLNM (P < 0.05). In the multivariate analysis, age < 45 years (HR, 0.461; 95% CI, 0.267–0.798; P = 0.006), presence of capsule invasion (HR, 2.885; 95% CI, 1.108–7.514; P = 0.030), and a higher Rad score (HR, 2.376; 95% CI, 1.843–3.063; P < 0.001) independently predicted the risk of CLNM (Table 2).

Table 2 Univariate and multivariate analyses for central lymph node metastasis in the training cohort. Abbreviations: HR, hazard ratio; CI, confidence interval

Association between Rad score and BRAF status

Rad score was significantly associated with BRAF status in the training (Pearson r = 0.313, P < 0.001) and validation (Pearson r = 0.256, P = 0.001) cohorts. After the adjustment for age, capsule invasion, and BRAF status, the Rad score remained independently associated with the risk of CLNM (HR, 2.316; 95% CI, 1.826–2.938; P < 0.001; Table S2). Moreover, the predictive accuracy of the Rad score was not associated with BRAF status in either cohort (Figure S2).

Nomogram construction

A nomogram that combined the Rad score and other independent predictors was established to quantitatively predict the probability of CLNM (Fig. 3A).

Fig. 3
figure 3

(A) A nomogram combining the Rad score, age, and capsule invasion for predicting probability of central lymph node metastasis (CLNM). (B) Plots depict the calibration of the nomogram in terms of agreement between predicted probability and actual probability in the training and validation cohorts. (C) Areas under the receiver operating characteristic curves for CLNM in the training and validation cohorts. (D) Decision curve analysis for the nomogram in the training and validation cohorts

Model performance

The AUC values of the Rad score for predicting CLNM were 0.768 (95% CI, 0.714–0.821) and 0.745 (95% CI, 0.665–0.826), respectively, in the training and validation cohorts (Table 3). The AUC value of the nomogram reached 0.795 (95% CI, 0.745–0.846) in the training cohort and 0.774 (95% CI, 0.696–0.852) in the validation cohort (Fig. 3C; Table 3). Calibration curves showed that the nomogram had a good fit for CLNM in the training and validation cohorts (Fig. 3B). Moreover, the nomogram demonstrated promising clinical utility for most risk thresholds (Fig. 3D).

Table 3 Predictive performance for central lymph node metastasis in the training and validation cohorts

Discussion

This is the first study to explore the value of US radiomics features for predicting CLNM in patients with cN0 PTMC. The six feature–based Rad score exhibited a significant association with the risk of CLNM. Moreover, combining this score with other clinical and US factors, the nomogram provided strong predictive power in the training and validation cohorts. These results suggest that this radiomics-based predictive model is a noninvasive, objective, and reliable tool for the preoperative prediction of CLNM.

Radiomics, an emerging algorithm that translates unseen aspects of images into a readable value, has been utilized to predict nodal metastasis in various malignancies [15,16,17]. Radiomics enables the noninvasive assessment of intratumor heterogeneity and facilitates a better understanding of tumor behavior [18]. For example, Yan et al. established a radiomics score based on US images, which showed good performance for predicting CLNM among patients with PTC [19]. Wang et al. developed a computed tomography radiomics signature for the preoperative prediction of CLNM in PTC [20]. In the present study, the Rad score was independently associated with the risk of CLNM, confirming the radiomics signature’s value for assessing intratumoral heterogeneity. For patients with PTMC and a higher Rad score, more aggressive treatments, such as PCNLD, should be recommended, even among those with cN0 disease.

This study incorporated Rad score and conventional clinical and US characteristics into a nomogram as in a previous study [13]. In addition to the Rad score, age and capsule invasion were independent risk factors for CLNM. In most previous studies, younger age was closely correlated with a high incidence of CLNM irrespective of the threshold [12, 2122]. This may be explained by the negative association between age and PTMC promotion rate [23]. Capsule invasion is defined as tumor cells that invade the thyroid capsule, contributing to a high risk of CLNM [12, 2122]. A feasible explanation for this is that the tumor could more easily metastasize to the central lymph nodes upon breaching the capsule [24].

Molecular alterations associated with BRAF mutations influence PTMC initiation, progression, and metastasis. Some studies reported a high incidence of CLNM in patients with PTMC and BRAF mutations [21, 25]. In the present study, BRAF status was also significantly associated with the risk of CLNM. Moreover, the predictive accuracy of the Rad score was not influenced by BRAF status, confirming its value.

Some limitations of this study should be discussed. First, this was a retrospective study with a small sample size, which may lead to selection bias. Second, although the Rad score was verified in an external cohort, external validation cohorts from other countries were lacking to confirm its generalizability. Third, the AUC values of the radiomics nomogram were lower than we expected, indicating that the radiomics data from only grayscale US images were insufficient. Multimodal US images, such as contrast-enhanced and elastography, are needed in further studies.

Conclusions

In conclusion, the nomogram constructed based on US radiomics features combined with clinical and US characteristics is a reliable tool with high accuracy for predicting CLNM in patients with cN0 PTMC to enable the tailoring of individualized treatment strategies for them. A prospective international large-scale study must further validate this predictive model.