Background

Thyroid carcinoma is the most common malignant tumor of endocrine system, ranking 9th in incidence in 2020 [1]. Papillary thyroid carcinoma (PTC) is the commonest histological type of thyroid cancer. Studies have indicated that lateral lymph node metastasis (LLNM) occurs in 18-64% of PTC patients [2]. The presence of LLNM is a known prognostic factor for poor prognosis and high mortality after surgery in PTC [3].

Ultrasound (US) is the mainly method for assessing preoperative lymphatic status. However, the sensitivity of US diagnosis for cervical lymph node metastasis (LNM) is only 20-40% [4, 5]. It is reported that 18.6–64% of patients with PTC had occult LLNM with clinical negative (cN0) lateral neck [6]. Prophylactic lateral neck dissection (LND) is not suggested for lateral neck cN0 patients in majority of clinical guidelines, including American Thyroid Association (ATA) guidelines [7] and National Comprehensive Cancer Network [8]. It is recommended that therapeutic lateral neck dissection should be performed for PTC patients with metastatic lateral cervical lymphadenopathy confirmed by biopsy according to the 2015 ATA guidelines. The Japanese Society of Thyroid Surgeons and Japan Association of Endocrine Surgeons did not recommend prophylactic modified radical neck dissection as a routine method, which had not been proved to be beneficial [9]. Therefore, non-invasive assessment of lateral lymph node status in patients with PTC is of great value for clinical decision-making.

The nomogram is a kind of graphical calculation sliding rule to predict the probability of an event occurring, which has been widely used in the medical field in recent years. Combined with significant variables of regression analysis, nomogram has been extensively used in clinical practice because of its objectivity and convenience [10]. However, it is rarely used in the assessment of LLNM in PTC patients, especially in establishing nomogram based entirely on preoperative data.

In this study, we aimed to establish a preoperative estimating nomogram based on clinical and ultrasonic features (Clin-US nomogram) to predict the risk of LLNM in patients with PTC, and propose individualized treatment strategies to help clinicians make appropriate clinical decisions.

Methods

Patients and cohorts

This retrospective study was approved by the Ethics Committee of the Tianjin medical cancer institute and hospital, and the requirement for informed consent was waived. Patients were enrolled based on the following criteria: (1) Pathological examinations were performed to confirm PTC with/ without LLNM by surgery and fine needle aspiration cytology (FNAC). (2) Patients with clear US imaging of the thyroid nodules. (3) Patients with complete clinical information. (4) BRAF V600E analysis was necessary in hospital 1. Exclusion criteria were as follows: (1) Poor US imaging quality. (2) Cases with incomplete clinicopathological information. (3) Patients who had received preoperative therapy before image acquisition.

We retrospectively evaluated the patients who received thyroid surgery and histologically confirmed PTC in hospital 1 (Tianjin Medical University Cancer Institute and Hospital, Tianjin, China) between January 2013 to June 2018 and hospital 2 (Binzhou Medical University Hospital, Shandong, China) between January 2017 and November 2017. We analyzed routine clinical, US data and pathological results, and a total of 2612 patients with PTC were included in this study. In hospital 1, all patients (n = 2310) were randomly divided into a training cohort (n = 1732) and an internal testing cohort (n = 578) for developing and evaluating the nomogram. Patients in training cohort from hospital 1 were divided into a non‑LLNM group and a LLNM group according to the pathological results. Patients in hospital 2 were used as an external testing cohort (n = 302). Fig. 1 illustrates the flow chart of recruitment process of the final study patients.

Fig. 1
figure 1

Flow chart of the patients enrolled in our study. Finally, 2612 PTC patients from two centers were reviewed

FNAC and surgical strategy

US-guided FNAC was performed by radiologists with more than 15 years of experience in thyroid FNAC with at least three repeated aspirations were performed in different directions for each nodule, using 22-gauge needles and preserving the remaining specimen in normal saline for BRAFV600E mutation analysis. According to US-guided FNAC, all patients enrolled in this study were confirmed as Bethesda Categories V or VI.

Cervical lymph nodes with spherical shape, normal echo of lymphatic hilum disappeared, cystic components, microcalcifications or peripheral vascularity were suspected to be metastatic [8, 11]. FNAC and thyroglobulin (TG) test were also conducted on the most suspicious lateral cervical lymph nodes before surgery to confirm the pathological diagnosis.

Total thyroidectomy or thyroid lobectomy were performed on all patients, along with central neck dissection (CND) according to the Chinese guidelines for diagnosis and treatment of differentiated thyroid carcinoma. According to the ATA [8] and Chinese guidelines, only patients with highly suspected metastatic lateral neck lymph nodes based on preoperative imaging data, FNAC and TG test, underwent LND, which comprised of removal of the lateral lymph nodes from level II to V, while preserving the internal jugular vein, spinal accessory nerve, or sternocleidomastoid muscle.

Clinical data and ultrasound images

Clinical information, ultrasonic measurements and features were collected for data analysis. Clinical information included age and gender. US examination of PTC patients was operated by experienced radiologists with more than 8 years of experience in thyroid diagnosis. The US machine included Phillips EPIQ 5, IU 22, HD11, (Philips Healthcare, Eindhoven, The Netherlands), and Aplio 400, 500 (Toshiba Medical Systems, Tokyo, Japan) devices equipped with 5–12 MHz or a 4.8–11 MHz linear array probe.

The following characteristics were evaluated for all selected thyroid nodules: maximum diameter of tumor (tumor size), location, position, mulifocality, composition, echogenicity, margin, shape (A/T ≥ 1 or < 1), and microcalcification. Vascularization (blood flow) was classified according to the Adler grade of blood flow from 0 to 3 [12]. In addition, Hashimoto’s thyroiditis and the adjacent relationship with thyroid capsule were evaluated on the basis of US images. Hashimoto’s thyroiditis manifests as uneven echogenicity of the thyroid parenchyma on ultrasonography, with a few or multiple lamellar hypoechoic areas showing grid-like changes. The abutment was defined as the edge of the thyroid nodule contacting with the thyroid capsule. The measurement of abutment of the perimeter (abutment/perimeter, A/P) in a thyroid lesion was calculated by the average ratio (1/2) on the transverse + longitudinal section of a nodule. Based on our previous study, we calculated the 1/4 (25%) perimeter of the thyroid lesion as the cutoff value[13].

Feature selection and clin-US nomogram development

The nodules in the primary cohort (hospital 1) were divided randomly into a training cohort and an internal testing cohort with a 7:3 ratio. Student’s independent t-test and χ2 test were applied to select the risk parameters associated with LLNM significantly. Multivariate regression analysis conducted by combining significant clinical and US variables was performed to decide the final indicators for predicting LLNM. Statistical significance was determined by a two- tailed P < 0.05. Based on the multivariate analysis in the training cohort, the Clin-US nomogram was developed. Then, the Clin-US nomogram was then internally validated and externally tested. Fig. 2 is the schematic diagram of constructing and verifying the Clin-US nomogram for predicting the probability of LLNM.

Fig. 2
figure 2

Schematic diagram of the Clin-US nomogram for predicting the risk of LLNM. First, ultrasonic and clinical features were extracted, and logistic analysis was performed according to pathological results. Second, features with significant differences were selected and the nomogram was constructed. Last, the predicting performance of Clin-US nomogram was evaluated by ROC, decision and calibration curves

Predictive performance of the Clin-US nomogram

Receiver operating characteristic (ROC) curve analysis was utilized to evaluate the diagnostic performance of the Clin-US nomogram. The area under the curve (AUC) was used for quantitative measurement and differentiation. The calibration curve and Hosmer-Lemeshow test were applied to judge the correction effect of the Clin-US nomogram on the training and testing cohorts. Decision curve analysis (DCA) was used to evaluate clinical utility of the predictive Clin-US nomogram by calculating the net benefit at disparate threshold probabilities. According to the nomogram algorithm, the predicted probability of each nodule was calculated and defined as Nomoscore. Then, the best cutoff value was determined by maximizing the Youden index. The predictive performance of the optimal cutoff value of the Nomoscore was evaluated by the AUC, sensitivity, and specificity.

Statistical analyses

The Mann-Whitney U test and chi-square test were separately used to compare the differences in continuous variables and categorical variables. The model predictions were assessed by sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), AUC and 95% confidence interval (CI) as well as calibration curves in both the training and testing cohorts. Delong test was used to compare different AUC. Calibration plot analysis was performed by bootstrapping with 1,000 replications. All analyses were performed using R statistical software (version 3.3.3; www.R-project.org). P < 0.05 was considered statistically significant.

Results

Baseline characteristics of all patients with PTC in three cohorts

In total, 564/2612 (21.6%) and 2048/2612 (78.4%) patients with LLNM and without LLNM were enrolled, respectively. The demographics and sonographic features of the patients were demonstrated in Table 1. The mean age of patients was 44.12 ± 11.17 years for the training cohort, 43.91 ± 11.12 years for the internal testing cohort and 46.56 ± 11.78 years for the external testing cohort. The rate of LLNM in three cohorts was 21.6% (374/1732), 20.4% (118/578) and 23.8% (72/302), respectively.

Table 1 Patient characteristics of the training and testing cohorts

Constructing and evaluating nomogram

In the training cohort, nine predictors were significantly different in the LLNM and non-LLNM groups by univariate analysis, which were age, gender, tumor size, tumor position, internal echo, microcalcification, vascularization, A/T and mulifocality (Table 2). Then, multivariate regression analysis was applied to construct a Clin-US nomogram for predicting LLNM based on these nine risk predictors. Multivariate logistics regression analysis identified tumor position, gender (male), microcalcification, tumor size, mulifocality, and A/P > 0.25 as independent predictive risk factors for LLNM (Table 3; Fig. 3a). The Clin-US nomogram was established based on these six indicators (Fig. 3b). The nomogram scored 11 for male, 18 for lower, 19 for middle, 34 for upper, 11 for 1-2 cm in size, 36 for 2-3 cm in size, 100 for size > 3 cm, 31 for multifocal, 17 for microcalcification, 45 for A/P > 0.25. The final nomoscore cutoff for positivity is 108 with the corresponding probability of LLNM is 0.30.

Table 2 Patient characteristics of the PTC with LLNM and PTC without LLNM groups in the training cohort
Table 3 Risk factors for lateral cervical lymph node metastasis in the training cohort
Fig. 3
figure 3

Forest plot of risk factors and the Clin-US nomogram for estimating the risk of LLNM in PTC. (a) Forest plot of risk factors in multivariable logistic regression analysis for LLNM. (b) The proposed nomogram based on preoperative data for assessing the risk of LLNM in PTC patients. OR, odds ratio; LLNM, lateral lymph node metastasis; CI, confidence interval

Predictive performance of Clin-US nomogram

The performance of the Clin-US nomogram in predicting LLNM is shown in Table 4, Table S1 and Fig. 4. The AUC of the Clin-US nomogram in the training cohort was 0.813 (95% CI, 0.790–0.835), 0.815 (95% CI, 0.775–0.854) in the internal cohort, and 0.870 (95% CI, 0.822–0.917) in the external test cohort. Both of ROC curves and violin plots in Fig. 4 revealed that the Clin-US nomogram showed excellent prediction ability in three cohorts. The Clin-US nomogram achieved a highest AUC in the training cohort, with an accuracy of 77.89%, a specificity of 83.58% and a sensitivity of 57.22%. Similarly, the AUC of the Clin-US nomogram was higher than other six independent risk factors in the both internal and external testing cohort, with a highest sensitivity of 89.47% in the external testing cohort. The cross-cross matrices of Clin-US nomogram in three cohorts are shown in Fig. S1.

Table 4 Predictive performance of the different models for the training and testing cohorts
Fig. 4
figure 4

Predictive performance of the Clin-US nomogram and US features in discrimination of LLNM and non-LLNM in three cohorts. ROC curves of Clin-US nomogram compared to ultrasonic features in the training cohort (a), internal testing cohort (b) and external testing cohort (c). The violin plot shows the data distribution, including a box plot (d-f)

The calibration curves of Clin-US nomogram exhibited good consistency between the bias-corrected prediction and ideal reference lines with an additional 1000 bootstraps in the training and two testing cohorts (Fig. 5a-c). We also performed decision curve analysis (DCA) to compare the clinical availability and benefits of Clin-US nomogram and traditional US methods in estimating the risk of LLNM. The DCA curves of the Clin-US nomogram showed greater net benefits across a range of LLNMs risks in the three cohorts than the other factors (Fig. 5d-f).

Fig. 5
figure 5

Calibration curves and decision curve analysis. (a-c) Calibration curve of the Clin-US nomogram in the training cohort and two validation cohorts. (d-f) Decision curve analysis of the Clin-US nomogram in the training cohort and two validation cohorts. The x-axis represents the threshold probability, and the y-axis represents the net benefit

Clinical application of the Clin-US nomogram

Representative examples of predicting the risk of LLNM in PTC patients are shown in Fig. 6. The thyroid nodule in Fig. 6a was obtained from a man with a tumor on the lower pole of the left thyroid, who had three high-risk sonographic features (microcalcification, mulifocality, and A/P > 0.25). The maximum diameter of this tumor is 1.43 cm. The probability of LLNM using the nomogram model was 55% (Fig. 6b). The tumor was PTC with LLNM of levels II, III, IV and VI according to postoperative pathological report. The male patient 2 with a nodule on the lower pole of the left thyroid in Fig. 6c, who had only one high-risk sonographic feature (microcalcification). The maximum diameter of this tumor is 0.91 cm. The probability of LLNM using the nomogram model was less than 5% (Fig. 6d). The nodule was PTC without LLNM according to postoperative pathological report and FNA-TG test.

Fig. 6
figure 6

Examples of clinical application of the Clin-US nomogram. (a) Image was obtained from a 22-year-old man with nodule in the left thyroid. (b) The nomogram resulted in a total score of 133 points for man (11 points), lower pole (18 points), max diameter of 1.43 cm (11 points), mulifocality (31 points), microcacification (17 points), A/P > 0.25 (45 points). The corresponding risk of LLNM was 0.55, and the pathological result of the nodule was PTC with LLNM. (c) Image was obtained from a 39-year-old man with nodule in the left thyroid. (d) The total points of the nomogram were 46 for man (11 points), lower pole (18 points), max diameter of 0.91 cm (0 points), unifocal (0 points), microcacification (17 points), and A/P < 0.25 (0 points). The corresponding risk of LLNM was low (< 0.05), and the pathological result of the nodule was PTC without LLNM.

Discussion

In this research, we constructed and validated the Clin-US nomogram based on clinical and ultrasound characteristics to predict the probability of LLNM in patients with PTC. The Clin-US nomogram effectively categorized patients based on their risk of LLNM, and achieved excellent performance in both internal and external testing cohorts. Therefore, the preoperative probability of LLNM can be estimated individually and noninvasively. Our study has several advantages: (I) To the best of our knowledge, this is the largest scale retrospective consecutive multicenter study to construct and evaluate a nomogram to predict status of LLNM in PTC. (II) Different from published studies based on radiomics [14], contrast enhanced ultrasound (CEUS) [15], or elastography [16], this novel nomogram only incorporated clinical and gray-scale US factors, which increasing the general applicability of the model. It was particularly important for PTC patients in underdeveloped countries and regions. (III) Comparing with other machine learning (ML) models in previous studies [2, 17], the Clin-US nomogram we proposed had a better interpretability and maneuverability in clinical practice with a similar diagnostic performance.

Precise preoperative checking to determine status of lateral lymph nodes is essential for clinicians. Although many previous studies have explored the risk factors affecting LLNM of PTC, the results have not always been consistent. We confirmed that suspicious US features of A/P > 0.25 and microcalcification were independent predictors of LLNM, which was consistent with our previous study [13]. Capsular extension, especially the degree of capsular extension and disruption, can predict extrathyroidal extension and invasion in many researches. Ye et al. indicated capsular extension > 50% in the LLNM group of PTC was the most common, comparing with no LNM group and central LNM group [18], which was similar to our findings. Microcalcification could reflect the psammoma bodies in pathology, which was a result of necrosis and calcification of cancer cells and was a specific indicator for PTC diagnosis. It was also reported associated with lymph node metastasis significantly [19].

In this study, the clinical characters of gender (male), tumor maximum diameter, mulifocality, and position of the lesion also showed significant significance contributing to LLNM. Similarly, Feng et al. considered that LLNM was independently related to tumor size, the number of foci, and location. Zhuo et al. [10] identified male sex, tumor size, thyroid nodules, irregular tumor shape, rich lymph node vascularity and location of lymph node as independent risk factors for LLNM. Numerous previous studies have identified mulifocality as a risk predictor for LNM. Wang et al. considered that mulifocality was an independent risk factors for both central lymph node metastasis (CLNM) and LLNM [20].

To our surprise, BRAF V600E mutation had no significant difference in univariate analysis. The BRAFV600E gene was reported to be an important biomarker for the progression of PTC [21]. On the contrary, Liu et al. [22] indicated that absence of BRAF V600E mutation was more prone to LLNM. A taller than wide (A/T) shape is an insensitive but highly specific indicator of malignant thyroid tumor [11]. But it had no significant correlation with LLNM in univariate analysis in our findings. Previous reports have suggested that PTC patients with Hashimoto’s thyroiditis were associated with less aggressive diseases [20]. In other words, absence of Hashimoto’s thyroiditis is independent clinical feature of PTC patients who have cervical lymph nodes metastasis [23]. However, Hashimoto’s thyroiditis had no significant difference in univariate analysis in this study. The possible reason is that we judged the presence or not of Hashimoto’s thyroiditis by preoperative US but not postoperative pathology. In addition, although vascularization was statistically significant in univariate analysis, it was not significant in multivariate analysis. One possible reason is that the assessment to distribution of internal blood vessels by US is unreliable and easily influenced by the operators and the machines.

There are many studies using ML to predict risk of LNM or CLNM in PTC patients [24,25,26], but only a few ML models were applied to LLNM. Although many models showed good performance, they were not convenient for clinical application since the predicted probability could not be obtained intuitively. Comparing to other ML models, the Clin-US nomogram is a reliable and easy-to-use clinical prediction model with better interpretability and maneuverability.

A few previous studies had constructed nomograms to estimate LLNM based on different risk parameters for LLNM of PTC. Jin et al. established a nomogram for estimating LLNM with clinicopathologic factors, such as Hashimoto thyroiditis, numbers of tumor, serum thyroid-stimulating hormone level, and metastatic rate of central lymph nodes [27]. Wang et al. created a nomogram to predict level V lymph node metastasis incorporating extra nodal extension, unilateral central lymph node metastasis, level II-IV metastasis, and lymph node size [28]. Unfortunately, some of these risk factors cannot be beneficial for preoperative prediction as some parameters used in these assessment models included postoperative pathologic features which were not obtainable before operation. Besides, multimodal imaging data including CEUS, ultrasound elastography, and three-dimensional imaging are not available in underdeveloped areas with little medical resources. Some researchers developed nomogram based on ultrasonography radiomics to predict the LLNM in PTC patients [14, 29]. However, US radiomic features are often very abstract, unspecific, and difficult to apply directly. In addition, some studies predict risk of lymph node metastasis based on sonographic features of lateral lymph nodes [10]. But occult LLNM may occur and not be detected by preoperative regular US [30], which led to that the lymph nodes could not be evaluated by US. The Clin-US nomogram we constructed is based on available clinical and US features of primary thyroid tumor and thus can be helpful for preoperative decision making, especially for PTC patients in underdeveloped countries and regions.

This study has some limitations. First, it was designed retrospectively, and the assessment of static US images has an inherent limitation to the precision of US interpretation. Further improvements will be needed through large-scale prospective studies. Second, US scanning and biopsy were conducted by different doctors using differing US machines. Therefore, the determination of US categorization and biopsy may have been affected by operator experience. Lastly, given that LND was only performed in patients with high suspicion of LLNM based on preoperative imaging and FNAC, microscopic metastases might be overlooked.

Conclusions

In conclusion, the Clin-US nomogram could effectively predict the probability of LLNM in patients with PTC preoperatively according to the suspicious US and clinical characters. We identified gender (male), maximum diameter of tumor, mulifocality, position, micro-calcification, and A/P as the most influentials in predicting LLNM. The proposed nomogram will be useful for estimating lateral neck lymph nodes and guiding the clinical diagnosis and treatment process of patients with PTC.