Introduction

Papillary thyroid carcinoma (PTC) is the most common malignancy originating from thyroid follicular cells. Although PTC generally has an indolent character, some patients have a dire prognosis. Therefore, accurate extraction of aggressive and life-threatening PTC in the early phase is vital and, to date, several risk classification and staging systems have been established to identify patients who require further treatment. Of these, the tumor–node–metastasis (TNM) classification by the American Joint Committee on Cancer is most widely used to predict prognosis of patients with thyroid carcinoma, including PTC. In the last (seventh) edition of the TNM, the age cutoff for determining risk was 45 years. The T factor included tumor size and extrathyroid extension, which was divided into three categories: none, minimal, and significant, based on preoperative findings by imaging studies and endoscopy. The N factor consisted of N0 (no metastasis), N1b (metastasis to the lateral and mediastinal compartments), and N1a (metastasis to the central compartment). Patients ≥ 45 years with N1b were upstaged as compared to those with N1a [1]. However, several studies provided important findings about prognostic factors in the TNM. A cutoff age of 55 was shown to more keenly reflect patients’ prognoses than a cutoff age of 45 [2,3,4]. Additionally, the lack of prognostic value of minimal extension [5,6,7] and the similar prognosis between N1a and N1b patients were demonstrated [8]. Regarding extrathyroid extension, evaluation based on intraoperative findings was shown to be better than that based on preoperative findings [2].

The newest (eighth) edition of the TNM classification (TNM-8th) was published in 2017 [9] and included significant changes from the previous edition. The cutoff age changed from 45 to 55 years. In contrast to the TNM-7th, carcinoma extension is evaluated not only on preoperative but also on intraoperative gross findings. Minimal extension was abolished, and significant extension was divided into two categories: T3b, extension to the strap muscles, and T4a, posterior extension, such as extension to the larynx, trachea, esophagus, or recurrent laryngeal nerve, and extension to the subcutaneous soft tissues based on intraoperative findings. T3b and T4a patients ≥ 55 years were upstaged to stages II and III, respectively (but downstaged from stage IVA in the 7th edition). The difference in upstaging between N1a and N1b was abolished (metastasis to the upper mediastinal lymph nodes belongs to N1a in the eighth edition), and both N1a and N1b patients ≥ 55 years were upstaged to stage II (but downstaged from stages III and IVA in the seventh edition). Regarding tumor size, patients ≥ 55 years having tumors 2.1–4 cm in size and those having tumors > 4 cm are classified as stages I and II, respectively, if they are N0M0. Comparative studies between the seventh and eighth editions showed that the TNM-8th is more suitable to predict prognosis in patients with differentiated carcinoma [10,11,12,13].

However, there are still some issues to be improved in the TNM-8th. PTC with posterior extension includes a wide range of biological characteristics according to width and depth of tumor extension. Additionally, previous studies demonstrated that the size of nodal metastasis (≥ 3 cm) is a strong prognostic factor [2, 8, 14]. In this study, we focused on these points and investigated whether and how the TNM-8th can be improved by taking these into consideration. The aim of this study is to establish the revised (re-) TNM-8th by subclassifying extrathyroid extension and clinical node metastasis. Here, we showed that subdivision of posterior extension of tumors and clinical node metastasis significantly contributed to the improvement of the TNM-8th.

Material and methods

Patients

We enrolled 5683 patients who underwent surgery for PTC in the Kuma Hospital between January 1988 and January 2005. Postoperative follow-up periods ranged from 6 to 345 months (median follow-up period, 176 months). All patients were pathologically diagnosed with PTC. Patients having other thyroid malignancies such as follicular carcinoma, medullary carcinoma, anaplastic carcinoma, and malignant lymphoma were excluded. Patients who underwent postoperative follow-up for less than 6 months were also excluded. We obtained informed consent to participate from all patients in advance of the postoperative follow-up; all patients agreed to be followed up by questionnaire even after leaving the hospital. Our ethics board decided that this study does not need approval from the committee because this is a retrospective study.

Clinicopathological features and categorization of extrathyroid extension and clinical node metastasis of patients

We evaluated tumor extension based on intraoperative gross findings. We judged tumor extension as positive when the tumor grossly invaded the surrounding organs. We subdivided tumor extension (T4a) into two categories: T4a1, extending to the tracheal adventitia and cartilage, esophageal muscle layer, recurrent laryngeal nerve, and cricothyroid and inferior constrictor muscles, and T4a2, extending to other organs such as the subcutaneous soft tissues, thyroid cartilage, larynx, tracheal mucosa, esophageal mucosa, jugular and brachiocephalic veins, and sternocleidomastoid muscle. We enrolled no T4b patients in this study, because the number of patients was too small to analyze. We subdivided N1 (N1a and N1b were analyzed as a single group) into two categories: N1, < 3 cm, and N2, ≥ 3 cm. Evaluation of the N factor was based on preoperative findings, such as ultrasonography and computed tomography (CT). Table 1 shows the clinicopathological features of patients enrolled in this study.

Table 1 Clinicopathological features of the 5683 included patients

Surgery

Of 5683 patients in our series, 2970 underwent total thyroidectomy, 480 underwent subtotal thyroidectomy, 50 underwent isthmectomy, and 2183 underwent hemithyroidectomy. Two hundred and sixty-five patients underwent no lymph node dissection; the remaining 5418 underwent prophylactic or therapeutic lymph node dissection. All 5418 patients underwent central node dissection, and of these, 4280 underwent uni- or bilateral modified radical neck dissection. Thirteen patients also underwent upper mediastinal therapeutic node dissection.

Radioactive iodine therapy

Of 66 M1 patients, 47 underwent radioactive iodine (RAI) therapy for metastases. The remaining 19 did not undergo RAI therapy because of patient refusal or poor general condition. Adjuvant RAI therapy or RAI ablation using RAI ≥ 50 mCi was performed for 79 patients who underwent total thyroidectomy.

Postoperative follow-up

All of the patients were followed by blood examinations (thyroid-stimulating hormone, thyroglobulin [Tg], and anti-Tg antibody) and imaging studies such as ultrasound once or twice per year. Chest radiotherapy, CT, and bone scintigraphy were also used for follow-up at the physicians’ discretion. We regarded patients as having PTC recurrence when recurrent lesions were detected by imaging studies. Lymph node recurrence was diagnosed on cytology, and Tg measurement was taken for suspicious nodes using the washout from the fine needle aspiration cytology. Patients with postoperative high or elevated Tg levels without structural evidence of recurrence on imaging studies were not judged as having recurrence, because we do not regard such patients as candidates for any treatment. To date, 348 patients have died of various causes, and 110 have died of thyroid carcinoma. Further, 580 patients experienced locoregional recurrence, such as recurrence to the regional lymph nodes, soft tissues, thyroid beds, and the remnant thyroid, and 140 patients showed distant metastasis, including to the lung, bone, and brain, during follow-up.

Statistical analyses

We used the Kaplan–Meier method with log-rank test (univariate analysis) and the Cox proportional hazard model (multivariate analysis) for the statistical analyses, which were performed using the software program StatView (SAS, Tokyo, Japan). p values < 0.05 were accepted as significant, and p values between 0.1 and 0.05 were regarded as having marginal significance.

Results

Cancer-specific survival of patients in each stage based on the TNM-8th.

Figure 1 shows the Kaplan–Meier curves for cancer-specific survival (CSS) in each stage. The respective 10-, 15-, and 20-year CSS rates were 99.9%, 99.6%, and 99.3% for stage I patients, 96.5%, 95.1%, and 93.4% for stage II patients, 92.5%, 88.1%, and 82.7% for stage III patients, and 37.7%, 22.6%, and 11.3% for stage IVB patients. No stage IVA (T4b Any N M0) patients were included in the present study.

Fig. 1
figure 1

Kaplan–Meier curves for CSS of stage I, II, III, and IVB patients according to the TNM-8th. CSS cancer-specific survival, TNM-8th eighth edition of the tumor–node–metastasis classification

Upstaging of M0 patients < 55 years

In the TNM-8th, all M0 patients < 55 years are classified as stage I. We performed a multivariate analysis of various clinicopathological features affecting CSS in patients < 55 years (Table 2). Since no T3b patients died of thyroid carcinoma, they were excluded from the analysis because our analysis software was unable to perform multivariate analysis if T3b patients were included. It is clear that T3b has no prognostic value for CSS in patients < 55 years. M1 was an independent prognostic factor for patients < 55 years (p = 0.0002). In addition, N2 (p < 0.0001) and T4a2 (p < 0.0001), but not N1 (p = 0.4464) or T4a1 (p = 0.1365), independently predicted CSS in these patients. Six stage I patients were categorized as both N2 and T4a2. The CSS of 106 stage I patients with N2 and 25 stage I patients with T4a2 did not significantly differ from that of 403 stage II patients (p = 0.8901 and p = 0.6207, respectively) (Fig. 2). Therefore, they were upstaged to re-stage II. The CSS of stage I patients with M0 and T4a or N2 did not significantly differ from that of M1 patients (p = 0.0872).

Table 2 Multivariate analysis for factors affecting CSS for PTC patients <55 years
Fig. 2
figure 2

a Kaplan–Meier curves for CSS of stage I patients with nodal metastasis ≥ 3 cm and stage II patients. b Kaplan–Meier curves for CSS of stage I patients with T4a2 and stage II patients. CSS cancer-specific survival

Downstaging of stage III patients with T4a1

In the TNM-8th, stage III includes both T4a1 and T4a2 patients ≥ 55 years (accounting for 334 and 72 patients in our study, respectively). Figure 3 shows the Kaplan–Meier curves of T4a1 and T4a2 patients, demonstrating that T4a2 patients had significantly poorer CSS than T4a1 patients (p < 0.0001). As shown in Fig. 4, the CSS of stage III T4a1 patients did not significantly differ from that of stage II patients. Therefore, stage III T4a1 patients were downstaged to re-Stage II.

Fig. 3
figure 3

Kaplan–Meier curves for CSS of stage III T4a1 and T4a2 patients. CSS cancer-specific survival

Fig. 4
figure 4

Kaplan–Meier curves for CSS of stage II patients and stage III T4a1 and T4a2 patients. CSS cancer-specific survival

Investigation of candidates who were upstaged from stage II to re-stage III

In the TNM-8th, stage II includes N1 and/or T3 patients ≥ 55 years. We performed a multivariate analysis of factors affecting CSS in stage II patients ≥ 55 years (Table 3). In this series, N2 was regarded as an independent predictor of carcinoma death (p = 0.0115), and no significant difference was observed between the CSS of N2 patients (n = 46) in stage II and that of T4a2 patients (n = 72) in stage III (Fig. 5). Hence, we upstaged N2 patients in stage II to re-stage III. Twenty-seven stage III patients with T4a1 and N2 were also staged as re-stage III.

Table 3 Multivariate analysis for factors affecting CSS for stage II patients ≥ 55 years
Fig. 5
figure 5

Kaplan–Meier curves for CSS of stage II patients with nodal metastasis ≥ 3 cm and stage III T4a2 patients. CSS cancer-specific survival

Outcomes after restaging based on the re-TNM-8th.

Table 4 delineates the restaging system based on our proposal compared to the TNM-8th. In patients < 55 years, not only M1 but also T4a2 and N2 patients were classified as re-stage II, and all others were classified as re-stage I. In M0 patients ≥ 55 years, those with T3a, T3b, or T4a1 N0 and those with T1-3 N1 were classified as re-stage II. Patients with T4a2 and those with any T (except for T4b) N2 were staged as re-stage III.

Table 4 Comparison between the TNM-8th and re-TNM-8th for papillary thyroid carcinoma

Table 5 indicates the relationship between the numbers of patients staged I, II, III, and IVB in the TNM-8th and the re-TNM-8th. The number of re-stage III patients decreased by 67% (from 406 to 136), whereas the number of re-stage II patients increased by 98% (from 403 to 798). Figure 6 shows the Kaplan–Meier curves of patients with re-stages I, II, III, and IVB. The respective 10-, 15-, and 20-year CSS rates were 99.9%, 99.9%, and 99.4% for re-stage I patients and 97.1%, 94.1%, and 91.8% for re-stage II patients, which did not differ from those of original stage I and II patients. However, the CSS rates of re-stage III patients were significantly poorer than those of original stage III patients: the 10-, 15-, and 20-year CSS rates of re-stage III patients were 82.4%, 74.5%, and 69.5%, respectively. The CSS of patients in each stage and re-stage is summarized in Table 6.

Table 5 Relationship between the numbers of patients in each stage in the TNM-8th and re-TNM-8th
Fig. 6
figure 6

Kaplan–Meier curves for CSS of re-stage I, II, III, and IVB patients CSS: cancer-specific survival

Table 6 Comparison of CSS between TNM-8th and re-TNM-8th for papillary thyroid carcinoma

Disease-free survival for stage I–III and re-stage I–III patients

We also calculated local recurrence-free survival (LR-FS) and distant recurrence-free survival (DR-FS) rates of Stage I–III and re-Stage I–III patients. As shown in Table 7a, the 10-, 15-, and 20-year LR-FS rates of stage I and stage II patients were similar to those of re-stage I and re-stage II patients. However, the 10-, 15-, and 20-year DR-FS rates of re-stage III patients were 58.8%, 44.1%, and 33.1%, respectively, which were much poorer than those of stage III patients (72.5%, 62.3%, and 56.8%, respectively).

Table 7 Comparison of disease-free survival rates between TNM-8th and re-TNM-8th for papillary thyroid carcinoma

Similarly, as shown in Table 7b, the DR-FS rates of stage I and stage II patients were similar to those of re-stage I and re-stage II patients. However, the 10-, 15-, and 20-year DR-FS rates of re-stage III patients were 76.1%, 63.9%, and 47.9%, respectively, which were poorer than those of stage III patients (87.2%, 76.0%, and 73.2%, respectively).

Discussion

In this study, we investigated whether the TNM-8th is further improved by subclassifying the N factor and extrathyroid extension in the T factor. Previous studies demonstrated that the size of nodal metastasis based on preoperative imaging studies significantly reflects prognosis in PTC [2, 8, 14]. Regarding extrathyroid extension based on intraoperative gross findings, one study showed that PTC invading the tracheal and/or esophageal mucosa had significantly poorer prognosis than PTC extending to other organs [15].

First, we investigated whether young patients ( < 55 years) with clinicopathological features other than M1 should be upstaged. In our series, as shown in Table 2, the odds ratios of multivariate analysis of carcinoma death for T4a2 and N2 cases were even higher than the ratio for M1 cases, indicating that these cases should be upstaged to re-stage II. This indicates that, even in young patients, metastasis and/or recurrence of cases with aggressive characteristics such as T4a2 or N2 may be hard to control using standard modalities such as re-operation, external beam radiation, and RAI therapy. Physicians should carefully treat such patients with abandonment of the fixed concept that all young patients without distant metastasis have favorable prognosis.

For stage II patients ≥ 55 years, N2 independently predicted carcinoma death, and in stage III patients, the CSS of T4a2 patients was significantly poorer than that of T4a1 patients. Stage II N2 patients had a CSS rate similar to that of stage III T4a2 patients. Moreover, the CSS of stage III T4a1 patients did not significantly differ from the CSS of stage II patients. Hence, we believed it was appropriate that N2 patients ≥ 55 years be upstaged to re-stage III and T4a1 patients ≥ 55 years be downstaged to re-stage II. The difference in prognosis between T4a1 and T4a2 patients should be reasonable. T4a1 tumors extend to the tracheal adventitia and cartilage, esophageal muscle layer, recurrent laryngeal nerve, and cricothyroid and inferior constrictor muscles, indicating that the range of extension is much smaller and the depth of extension much shallower than in T4a2 tumors, which extend to the tracheal and esophageal mucosa, jugular and brachiocephalic veins, and sternocleidomastoid muscle. This study clearly demonstrated that the width and depth of carcinoma extension significantly affected patients’ prognoses.

Although stage classification is mainly used to predict CSS, it is also often used to evaluate disease-free survival. In this study, we evaluated LR-FS and DR-FS based on stage and re-stage. Similar to what we observed with CSS, the re-TNM-8th was more accurate in evaluating disease-free survival for PTC patients than the TNM-8th.

However, this study has some limitations. First, this is a retrospective study, and therapeutic management, such as the extent of thyroidectomy and lymph node dissection and indication of RAI adjuvant therapy, differed from current guidelines. In this study, only 145 patients underwent RAI therapy (66 for M1 and 79 as adjuvant therapy), which differs from the present treatment guidelines. Moreover, all of our patients underwent not only therapeutic but also prophylactic central node dissection. Prophylactic central node dissection is not recommended in the newest guidelines of the American Thyroid Association [16]. It remains unclear whether prophylactic central node dissection affects prognosis, but in Japan, prophylactic central node dissection is performed to avoid recurrence to the central compartment rather than to improve prognosis. Re-operation for this compartment might cause significant complications, such as recurrent laryngeal nerve injury and permanent hypoparathyroidism. Second, since there were too few T4b cases, we did not enroll any of these patients. Therefore, how to re-stage T4b patients remains unclear. Previous studies demonstrated that locally curative surgery could be performed for T4b patients at high rates [17, 18], but studies examining CSS in a large number of T4b patients should be done to verify this point. According to our data, the CSS of T4a2 patients was significantly poorer than that of T4a1 patients, indicating that wideness and depth of carcinoma extension significantly affected prognosis. This may be because curative surgery is more difficult in T4a2 patients than in T4a1 patients. Further, PTC showing wide and deep invasion to adjacent organs might have an aggressive biological character and be likely to metastasize to local and distant organs in earlier phases. Since T4b indicates wider and deeper carcinoma extension than T4a1 and T4a2, it is likely that T4b patients will have poorer prognosis than T4a1 or T4a2 patients. Since few patients present with stage T4b, it is difficult to analyze the prognostic value of T4b in a single-institution study, but we speculate that it could be appropriate to upstage these patients above re-stage III. Last, we previously demonstrated that significant tumor extension from the metastatic nodes also predicts poor prognosis in PTC patients [19]. In this study, however, we did not evaluate the prognostic value of extranodal tumor extension because of the small number of patients with tumor extension from the nodes.

In conclusion, subclassification of extrathyroid extension and clinical node metastases can further improve the TNM-8th and more keenly and accurately discriminate high-risk patients. For re-stage III patients, more careful and closer active surveillance after surgery should be performed through monitoring thyroglobulin and thyroglobulin antibody and imaging studies, such as ultrasonography, because these patients are regarded as high risk for both carcinoma recurrence and carcinoma death. Less intense surveillance may be needed for re-stage I and II patients.