Background

Papillary thyroid carcinoma (PTC) constitutes about 80% of all differentiated thyroid cancers, representing the commonest type of thyroid cancer [1]. PTC shows aggressive properties, including extrathyroidal extension (ETE), and lymph node and distant metastases, suggesting poor prognosis [2, 3]. ETE reflects a primary tumor that extends beyond the thyroid capsule and invades the neighboring tissues [4]; it is considered to have an elevated risk of local recurrence [5, 6] and utilized in multiple staging systems [7,8,9]. Based on the degree of invasion, ETE is divided into minimal and gross ETE. Traditional treatment options for PTC include total and subtotal thyroidectomies, with or without cervical lymph node dissection, and subsequent radioactive iodine remnant ablation [10]. However, PTC risk is relatively low, with recurrence and survival rates of 3–4% and > 99%, respectively [11]. According to the 2015 ATA Guidelines [12], ipsilateral lobectomy is recommended rather than total thyroidectomy in low risk patients with PTC, and thyroidectomy with prophylactic central cervical lymphadenectomy is not recommended for non-aggressive PTCs because of complications, including laryngeal nerve injury and hypoparathyroidism. Currently, the aggressive properties of tumors, especially the ETE feature, can only be obtained by pathological evaluation of specimens after thyroidectomy [13]. Therefore, preoperative assessment of PTC aggressiveness may help clinicians better plan surgical procedures. This suggests that noninvasive examination methods for identifying the aggressiveness of tumors are urgently needed for more targeted treatment.

Ultrasound (US) represents the commonest imaging method for thyroid nodule detection. However, its accuracy in assessing deep neck structures is not satisfactory due to the influence of bones and air [12, 14]. Furthermore, US is ambiguous for minor extrathyroidal extension [15, 16]. Fine-needle aspiration (FNA) biopsy can be accurately and cost-effectively applied [12], but has low ability in revealing the aggressive features of thyroid nodules [3, 17, 18].

Magnetic resonance imaging (MRI) can provide excellent contrast of soft tissues and allow multi-planar evaluation of anatomical details. MRI also assesses tumor aggressiveness, such as the ETE feature and cervical lymph node metastasis [19,20,21]. Another study [3] demonstrated that DW-MRI-based ADC values could help stratify PTC patients according to the ETE, although the average ADC of ROIs utilized might not comprehensively reflect the tumor features. Identification of an effective non-invasive imaging approach would provide insights into early PTC management.

Radiomics represents a high-throughput quantitative feature extraction method, which converts images into minable data, and then analyzes these data to provide decision support. These data are mined using complex bioinformatics tools for developing models that could ameliorate diagnosis, prognosis, and prediction accuracy [22,23,24,25]. A previous study showed that MR-based radiomics has a potential value in the presurgical prediction of lymph-vascular space invasion in cervical cancer [26]. Another study revealed that radiomics provides a noninvasive approach for analyzing breast cancer subtypes and TN stages [27].

However, there are few reports applying radiomics to assess extrathyroidal extension in PTC, indicating a gap in knowledge. Therefore, MR-based radiomics might provide an accurate approach for extrathyroidal extension prediction in PTC. This work aimed to evaluate whether radiomics applying multiple parametric MRI has the potential to detect extrathyroidal extension in PTC.

Methods

Patients

The current retrospective trial assessed consecutive individuals with thyroid nodules firstly identified by US from January 2018 to March 2019. Based on the American College of Radiology Thyroid Imaging, Reporting, and Data System [28], tumor grades were TR3-TR5.

All patients were examined by multiparametric MRI and subsequently administered thyroid surgery, subtotal or total thyroidectomy, within 1 week following MRI. PTC was pathologically confirmed with surgical specimens. Exclusion criteria were: (1) pathological diagnosis not reflecting PTC; (2) tumor size < 5 mm; (3) no association of pathological data of tumor specimens with MR imaging findings; (4) poor MR quality. Finally, 132 cases were assessed. Figure 1 depicts the patient selection process.

Fig. 1
figure 1

Study flowchart. US ultrasound, PTC papillary thyroid carcinoma, ETE extrathyroidal extension, MR magnetic resonance

The Institutional Review Board of our Hospital approved this study and waived the requirement for written informed consent due to its retrospective nature.

MRI acquisition

All patients were scanned on an EXCITE HD 1.5T scanner (GE Healthcare, USA) comprising an 8-channel special neck surface coil, using the same scanning protocol. The applied parameters were as follows: axial T2-weighted (T2WI) fast recovery fast spin-echo with fat suppression with an echo time (TE) of 85 ms, a repetition time (TR) of 1280 ms, a slice thickness of 4–5 mm, a matrix of 288 × 192, spacing of 1 mm, a field of view (FOV) of 18 cm, and a number of excitations (NEX) of 4; DWI with a single-shot echo planar imaging (EPI) sequence, with minimal TE, a TR of 6550 ms, a slice thickness of 4–5 mm, a matrix of 128 × 128, spacing of 0.5 mm, a FOV of 14 cm, and a NEX of 4 (b value, 800 s/mm2); contrast-enhanced axial T1WI (CE-T1) with multiphase utilizing a fast-spoiled gradient recalled echo sequence (TE = 1.7 ms, TR = 5.7 ms, matrix = 192 × 256, FOV = 14 cm, and NEX = 1). The Magnevist contrast agent (Bayer Healthcare, USA) was administered by intravenous injection at 3 ml/s (0.2 ml/kg), followed by flushing with 20 ml of normal saline. Scanning was performed at 30, 60, 120, 180, 240 and 300 s after contrast administration, respectively, and images of the six phases were obtained, including breath-holds. Spatial saturation bands were employed for removing signals generated by overlying fat and surrounding tissues.

Histopathologic analysis

Surgical tumor samples were evaluated and analyzed by an experienced pathologist (> 10 years of related experience). Paraffin-embedding of tumor samples was followed by sectioning and hematoxylin and eosin (H&E) staining. Then, established criteria were utilized by the pathologist for evaluating the extrathyroidal extension (ETE) feature [12]. The patients were then assigned to the non-ETE and ETE groups.

MRI radiomics

Tumor segmentation

ITK-SNAP (http://www.itk-snap.org) was applied for the segmentation of thyroid tumors. Regions of interest (ROIs) were manually drawn on MR images by 2 radiologists (9 and 12 years of related experience, respectively). In case of disagreement, they reached a consensus through additional reading sessions. The ROIs were delineated slice-by-slice to represent the 3D volume of the whole tumor. The largest tumor was selected in each patient and delineated on MR images, which could reduce potential bias of multiple tumors in the same individual and improve the applicability of findings.

Radiomics feature extraction

To facilitate imaging analysis, all T2WI, ADC and CE-T1 images were resliced at 4 mm. Radiomic features were automatically extracted with the AK software version 3.2.2 (GE healthcare). A total of 402 features were extracted, including shape, histogram, gray-level run-length matrix (GLRLM), gray-level cooccurrence matrix (GLCM), and gray-level size zone matrix (GLSZM) indexes.

Feature selection and model construction

Participants were randomized to the training and test cohorts (ratio, 7:3). To assess interobserver agreement, 30 patients were randomly selected and intraclass correlation coefficients (ICCs) for various features were calculated. According to the 95% confidence intervals (CIs), values below 0.4, from 0.41 to 0.60, from 0.61 to 0.80, and above 0.80 were classified as poor, medium, good, and excellent reliability, respectively. Various features were utilized for further extraction, with ICCs reaching 0.80 [29].

Radiomic feature selection

Firstly, the mRMR (maximum correlation minimum redundancy) algorithm was applied in the training group to eliminate redundant and irrelevant features, and 30 features with high correlation with labels, and without redundancy were retained. Then, the least absolute shrinkage and selection operator (LASSO) with ten-fold cross-validation was applied, and the feature subsets was further selected through regularization by optimizing the hyperparameter λ. The coefficients of some candidate features were compressed to zero at the optimal λ, and features with non-zero coefficients were retained for constructing a radiomics signature via a linear combination. Finally, the radiomics score (rad-score) was calculated.

Model building and validation

The performance of the model in distinguishing the ETE feature of PTC was evaluated and validated by receiver operating characteristic (ROC) curve analysis in the training and test cohorts, respectively. The area under the curve (AUC), sensitivity, specificity, accuracy, and negative and positive predictive values were calculated. In addition, 100 times leave-group-out cross-validation (LGOCV) was carried out to verify the model’s reliability, indicating the results given in the model were not contingent.

Results

Patient features

Totally 132 patients aged 45.42 ± 13.99 years (range, 12–77 years) were assessed. Among them, 27 patients (44.89 ± 13.56 years old; age range, 12–73 years) and 105 (45.55 ± 14.10 years old; age range, 22–77 years) were assigned to the ETE and non-ETE groups, respectively, based on pathologic results. ETE patients were divided into those with minimal ETE (n = 15), and gross ETE (n = 12) according to the degree of invasion. Table 1 summarizes the clinical features of PTC cases enrolled in this study. The training cohort included 92 patients, while the testing set had 40 patients.

Table 1 Patient features in the ETE and non-ETE groups

US prediction

Of the 27 patients with ETE, 12 had ETE identified by presurgical US, while the remaining 15 showed no presurgical US evidence of the ETE feature. The sensitivity, specificity and accurate of US were 44.4%, 97.1% and 86.4% in predicting ETE.

PTC ETE prediction

For predicting ETE and non-ETE masses, 16 top-performing features, including four DWI, seven T2WI, and five CE-T1WI indexes, were finally retained to construct the radiomics signature (Table 2). The proportion of features derived from T2WI was elevated (7/16). There were eight RLM, five CM, two shape and one SZM features. Table 2 shows the coefficients of the selected features. All 16 features showed significant differences between ETE and non-ETE masses (P < 0.05). Figure 2 shows ROC curves for the radiomics model in distinguishing ETE from non-ETE masses in the training and test cohorts. The radiomics prediction model yielded AUCs of 0.96 (95% CI 0.93–0.99) and 0.87 (95% CI 0.75–0.98) in the training (Fig. 2a) and test (Fig. 2b) sets, respectively. Figure 3a shows the results of 100 fold LGOCV. The clinical decision curve of the radiomics model is depicted in Fig. 3b. Table 3 shows the radiomics model’s diagnostic performance. Sensitivity, specificity and accuracy were 0.895, 0.934 and 0.917 in the training set, respectively, and 0.750, 0.800 and 0.789 in the test set, respectively. The negative predictive value was 92% in the test group. These results indicated an overall good performance of the prediction model.

Table 2 Extracted modeling features predictive of ETE and non-ETE tumors
Fig. 2
figure 2

Receiver operating characteristic curves (ROCs) for the radiomics model in predicting ETE and non-ETE tumors in the training (a) and test (b) cohorts

Fig. 3
figure 3

Boxplot of 100 fold LGOCV data (a). Decision curve of the radiomics model (b) showing that in a threshold range of 0–1, the radiomics model provided a benefit

Table 3 Diagnostic performance of the radiomics model

Discussion

The results of this study indicate that radiomics analysis based on multiparametric MRI data has the potential to detect the presence of ETE in PTC. The above findings showed that radiomics features yielded a high AUC in predicting ETE in PTC. According to the 2015 ATA Guidelines [12], thyroid lobectomy or thyroidectomy without prophylactic central neck dissection suffices for treating non-aggressive PTCs. Predicting ETE by radiomics based on MRI data would help clinicians identify individuals likely to benefit from more aggressive initial treatment. Therefore, such tool has an important impact on patient management, especially in cases of low-risk thyroid cancer. This study showed that radiomics based on multi-parameter MRI accurately distinguished ETE from non-ETE in PTC, and these findings are expected to promote the development of a non-invasive method for evaluating ETE in PTC.

Our results demonstrated that US had good specificity and accuracy but low sensitivity in predicting ETE, while MRI radiomics showed better performance. The evaluation by US was relatively subjective and depended on the diagnostic level of the operator. MRI is a noninvasive imaging method without ionizing radiation. It is widely available around the world, with a simple and fast clinical setup. Radiomics provides multiple features extracted from images to quantify tumors, and offers the possibility of revealing differences that the human eye cannot recognize. Radiomic features were obtained from multiparametric MRI comprising T2WI (7/16), ADC (4/16) and CE-T1 (5/16) images. A previous report [3] revealed ADC’s associations with various aggressive features of tumors, and showed that only ETE reached significance. Another study by Hu et al. [19] showed that ADC is effective in assessing aggressiveness using ETE in PTC. Ma et al. [30] found that a radiomics signature utilizing T2WI data could predict the pathological extracapsular extension status in prostate cancer patients. However, no similar study regarding multiparametric MRI-based radiomics for the preoperative assessment of ETE in PTC has been published.

This study extracted multiple radiological features, including shape-based, intensity-related and texture features, which comprehensively reflect the underpinning tumor biology. The LASSO was utilized as the feature selection method. It represents a regression analysis technique performing both regularization and variable selection for enhancing prediction accuracy [31]. The LASSO is considered a promising technique for optimal feature selection, and could combine these radiomic features to generate a radiomic signature [32, 33]. A previous study [34] assessed many feature selection techniques, and LASSO showed an optimal performance.

The above results showed that the MR-based prediction model for differentiating ETE and non-ETE masses achieved high AUC values in both the training (0.96) and test (0.87) groups. It is worth mentioning that each feature in the model had a significant difference between the two groups. Radiomics based on MRI can significantly improve the diagnostic performance. PTC patients could benefit from the entire risk threshold of 0 to 1 according to the decision curve. The radiomics model in this study had more features derived from T2WI (7/16) compared with T1WI and DWI, and the most highly weighted feature was from T2WI. A previous study [35] also showed that features extracted from T2WI achieve a higher prediction performance than those obtained from other sequences, indicating that T2WI may provide more information. The combination of sequences can provide more information than each of them individually [36]. In this study, the proportions of GLRLM (8/16) and GLCM (5/16) features were the largest in the final constructed model. The GLRLM is broadly utilized to extract statistical features [37], whose entries record distributions and relationships of image pixels, which can better reflect regional heterogeneous differences. The GLCM provides a second-order technique to generate texture features for determining associations among combinations of gray levels in image indexes [38], which can reflect internal spatial heterogeneity of the lesions.

The present study had limitations. Firstly, the sample size was modest, which may limit the predictive performance of the model. Indeed, this was an exploratory study and the data were collected from a single institution and lacked validation in external cohorts. Secondly, due to the small sample size of ETE, patients with minimal ETE and gross ETE were categorized in the same group for ETE to enable binary classification. In the future, a large-scale study is warranted to confirm that this method could be used to distinguish ETE from non-ETE in PTC and for further subgroup analysis. Thirdly, the size of the lesions significantly differed between the ETE and non-ETE groups, introducing a potential bias in the interpretation of the radiomics prediction model results. Also, thyroid tumors smaller than 5 mm were not included in this study. Future more advanced MR techniques could improve the detection of smaller tumors. Fourthly, TNM staging and follow-up data were not included for evaluating tumor aggressiveness. PTC generally has a favourable prognosis [11, 39], and our retrospective interval was just one year. Thus, further investigation should be performed.

Conclusions

Overall, the MRI radiomics approach has the potential to stratify patients according to the ETE status in PTC before surgery, and could help improve therapeutic strategies and patient prognosis.