Radiomics analysis from magnetic resonance imaging in predicting the grade of nonfunctioning pancreatic neuroendocrine tumors: a multicenter study

Objectives To explore the potential of radiomics features to predict the histologic grade of nonfunctioning pancreatic neuroendocrine tumor (NF-PNET) patients using non-contrast sequence based on MRI. Methods Two hundred twenty-eight patients with NF-PNETs undergoing MRI at 5 centers were retrospectively analyzed. Data from center 1 (n = 115) constituted the training cohort, and data from centers 2–5 (n = 113) constituted the testing cohort. Radiomics features were extracted from T2-weighted images and the apparent diffusion coefficient. The least absolute shrinkage and selection operator was applied to select the most important features and to develop radiomics signatures. The area under receiver operating characteristic curve (AUC) was performed to assess models. Results Tumor boundary, enhancement homogeneity, and vascular invasion were used to construct the radiological model to stratify NF-PNET patients into grade 1 and 2/3 groups, which yielded AUC of 0.884 and 0.684 in the training and testing groups. A radiomics model including 4 features was constructed, with an AUC of 0.941 and 0.871 in the training and testing cohorts. The fusion model combining the radiomics signature and radiological characteristics showed good performance in the training set (AUC = 0.956) and in the testing set (AUC = 0.864), respectively. Conclusion The developed model that integrates radiomics features with radiological characteristics could be used as a non-invasive, dependable, and accurate tool for the preoperative prediction of grade in NF-PNETs. Clinical relevance statement Our study revealed that the fusion model based on a non-contrast MR sequence can be used to predict the histologic grade before operation. The radiomics model may be a new and effective biological marker in NF-PNETs. Key Points The diagnostic performance of the radiomics model and fusion model was better than that of the model based on clinical information and radiological features in predicting grade 1 and 2/3 of nonfunctioning pancreatic neuroendocrine tumors (NF-PNETs). Good performance of the model in the four external testing cohorts indicated that the radiomics model and fusion model for predicting the grades of NF-PNETs were robust and reliable, indicating the two models could be used in the clinical setting and facilitate the surgeons’ decision on risk stratification. The radiomics features were selected from non-contrast T2-weighted images (T2WI) and diffusion-weighted imaging (DWI) sequence, which means that the administration of contrast agent was not needed in grading the NF-PNETs. Supplementary information The online version contains supplementary material available at 10.1007/s00330-023-09957-7.


Introduction
Pancreatic neuroendocrine tumors (PNETs) account for 1-3% of pancreatic tumors and rank the second most common malignancies of pancreas [1][2][3].Nonfunctioning PNETs (NF-PNETs) are much more common than functioning PNETs, accounting for approximately 70-90% of all PNETs [4].The World Health Organization (WHO) categorizes PNETs as low (grade 1), intermediate (grade 2), or high grade (grade 3) based on the mitotic rate and Ki-67 index [5].In general, the risk of tumor progression increases by 2% for every 1% increase in the Ki-67 index [6,7].Observation is routinely recommended for grade 1 NF-PNETs, especially those sized < 2 cm [8].In contrast, grade 2/3 tumors are associated with a poorer prognosis and often require more intensive treatment [9].It is critical to accurately assess the grade before surgery because the individual therapeutic decision-making has been seen to strongly depend on histologic grade, especially for unresectable tumors.However, it was difficult to ascertain the grade before surgery.To date, endoscopic ultrasonography-guided fine-needle aspiration (FNA) is still the most commonly used strategy to diagnose and grade the tumor, although invasiveness, limited accuracy, and difficulty in reflecting tumor heterogeneity have been reported [10,11].
MRI has shown great potential as an imaging biomarker to predict the tumor grade of PNETs.For example, the parameter ADC calculated from diffusion-weighted imaging (DWI) was proved to be negatively correlated with tumor grade in the previous study [12].Recently, histogram analysis of ADC maps was proved to be helpful in predicting PNETs grade, and ADC entropy and ADC kurtosis were the most accurate parameters for identification of high-grade PNETs [13].In addition, T2-weighted images (T2WI) have been used in the evaluation of many cancers because they can provide more details of anatomical information and the texture features from different scanners were highly reproducible [14,15].Kulali et al [16] proved that the low to intermediate signal intensity on T2WI and lower ADC values were significantly correlated with high-grade PNETs because these changes can suggest tumor invasiveness.In addition, T2WI and DWI were the most commonly used non-contrast sequences in clinical which means that the administration of contrast agent was not needed [17].
Radiomics can convert imaging data into high-dimensional quantitative image features using a large number of automatically extracted data-characterization algorithms [18,19].Recently, radiomics has been successfully applied for the prediction of tumor grade in PNETs as a noninvasive method.For example, Bian et al [20] demonstrated that the MRI rad-score consisting of seven selected features from arterial and portal venous phase images was significantly associated with the NF-PNET grades, with an area under curve (AUC) of 0.775 and accuracy of 0.701, respectively.However, to the best of our knowledge, there have not been published reports using radiomics analysis based on the most commonly used non-contrast MRI sequences including T2WI and DWI.Thus, the purpose of our research was to assess the value of radiomics features from T2WI and DWI for predicting the grade of NF-PNETs.

Patients
The multicenter retrospective study was derived from 5 hospitals in China.This study was conducted in accordance with the Declaration of Helsinki and was approved by the institutional review board of Peking University Cancer Hospital & Institute (Beijing, China).Informed consent was waived.
The medical records of patients with histologically confirmed NF-PNETs who underwent surgical resection were searched from January 2014 to December 2020 to derive the pathologically confirmed NF-PNETs.Patients were excluded if (1) they had no preoperative MRI or the interval between the MRI examination and operation was longer than 4 weeks; (2) images were not satisfactory for analysis; (3) they received local or systemic therapy before imaging; and (4) the lesion was smaller than 1 cm.The recruitment pathway is shown in Fig. 1.

MRI protocols
All examinations were taken on 1.5-T or 3.0-T scanners, using an 8-channel phased array body coil with the patients in the supine position covering the upper abdomen.In the training group, 25 subjects were scanned on a 1.5-T scanner and 90 patients on a 3.0-T scanner.The detail parameters of MRI protocols are listed in Table 1.DWI was performed in all 5 centers with single-shot echo-planar imaging sequence prior to contrast administration with at least a b value of 0 and 1000 s/mm 2 .

MRI feature analysis
The interpretations of MRI, including the qualitative analysis and ROI selection, were done by two radiologists in Fig. 1 Flowchart of the study of the enrolled patients consensus (H.B.Z. and P.N., both with 12 years' experience in abdominal MRI).When there was a discrepancy, a senior radiologist (X.Y.Z., with 15 years' experience in abdominal MRI) was introduced for arbitration, and the result of the arbitration was used in the next analysis.The reviewers were blinded to the clinical information and the pathological reports.

Image segmentation
The region of interest (ROI) of the whole volume tumor was manually drawn on T2WI and DWI slice by slice with software ITK-SNAP (version 3.8.0,http:// www.itksn ap.org).Dynamic contrast-enhanced MRI (DCE-MRI) were used as references (if done) for ROI segmentation.Special care was taken to avoid vascular structures, pancreatic duct, and artifacts.ROI was placed on DWI images of the b value of 1000 s/mm 2 and copied to the corresponding ADC maps.

Feature extraction
PyRadiomics (Version 3.0.1,https:// www.python.org) open-source python package [22] was used for feature extraction.To eliminate the variance among different MRI scanners, image pre-processing was performed using isotropic resample and Z-score normalization.A total of 1316 features were extracted from each ROI, including 107 features from the original image and 1209 features from the derived images using filters.Details of the pre-processing steps and the 107 features were described in the supplementary file.Combining features from T2WI and ADC, a total of 2632 features were extracted.
T test was used to remove the features that show significant difference (p < 0.05) between 1.5-and 3.0-T scanners and the features that show insignificant difference (p > 0.05) between grade 1 and grade 2/3 groups.Highly correlated features with the absolute value of Pearson correlation coefficient larger than 0.5 were removed.Logistic regression with least absolute shrinkage and selection operator (LASSO) was used to further remove features.Fivefold cross-validation was performed to determine the hyperparameter in LASSO by maximizing the average accuracy in the training group.More details of feature selection steps were described in the supplementary file.Finally, a radiomics score was obtained by linearly combining the selected features.
The fusion model was constructed from the selected qualitative features in the clinical model and the radiomics score in the radiomics model.Logistical regression was used to calculate the risk of grade 2/3 and visualized as a nomogram.Decision curve of analysis was used to evaluate the net benefit of the model.

Pathological analysis
Tumor grade was determined by a pathologist (Q.Y., with 13 years of experience) by counting the mitotic rate and Ki-67 index based on the World Health Organization (WHO) 2017 classification [5].

Statistics
Continuous variables are described as mean values ± standard deviation and were compared with the t test.Categorical variables are described as number and percentage and were compared with the Pearson chi-squared test or Fisher's exact test.Statistical analyses and the logistic regression for the clinical model were performed using SPSS software (version 22.0).Feature selection and the logistic regression for the radiomics model was performed using Python (version 3.6.5).The nomogram for the fusion model, continuous net reclassification index (NRI), and the decision curve of analysis were calculated by R (version 4.1.1)with "rms," "PredictABEL," and "rmda" packages.The DeLong test was performed by the MedCalc software (Version 18.2.1).A two-tailed p value ≤ 0.05 was considered as statistically significant.

Radiomics quality score
Lambin et al developed a 36-point "radiomics quality score" (RQS) metric [23].The criteria are described in Supplemental Table S1, which shows that the current study had a RQS of 22.In addition, a TRIPOD Checklist following reporting guidelines for prediction model development and validation has also been provided in Supplemental Table S2.3).

Fusion model
The fusion model visualized in the nomogram (Fig. 3), which combined the radiomics signature and 3 radiological characteristics, yielded the AUC values of 0.956 (95% CI: 0.922-0.989)and 0.864 (95% CI: 0.794-0.935) in the training and testing groups (Fig. 2b) (Table 5).The calibration curves are displayed in Fig. 4. Hosmer-Lemeshow gave a p value of 0.991 and 0.582 in the training and testing groups, respectively, indicating good calibration.The fusion radiomics model showed better discrimination than the radiological model (p < 0.01).The diagnostic performance of the fusion model was similar to that of the radiomics model, and there were no significant differences between the two models (p = 0.521).

Clinical utility
In Decision curve of analysis (DCA) is shown in Fig. 5, where the horizontal axis is the risk threshold probability and the vertical direction is the normalized net benefit.The DCA showed that using the fusion model in the current study to distinguish NF-PNET grade is more beneficial than the treatall-patients scheme or the treat-none scheme in the whole range of threshold.The fusion model performs better than the radiological model in the threshold range of 0.02-1.00.The fusion model performs better than the radiomics model in the threshold range of 0.05-0.20 and 0.27-0.41and 0.57-1.00.BMI body mass index, TB total bilirubin, ALT alanine aminotransferase, AST aspartate aminotransferase, FBG fasting blood glucose, NLR neutrophil-lymphocyte ratio, CEA carcinoembryonic antigen, CA199 carbohydrate antigen 199, CA724 carbohydrate antigen 724, NSE neuron-specific enolase, MPDD main pancreatic duct dilatation, CBDD common bile duct dilatation *p < 0.05 # Comparisons between training group and validation group

Discussion
To Many researchers have investigated the relationship between imaging characteristic and tumor grade.A study conducted by Robertis et al [24] showed ill-defined margin was more common in grade 2/3 tumors with high specificity of 90.3%.Ricci et al [21] showed that the size of tumors and heterogeneous enhancement were related to the risk of grade 2/3 PNETs, indicating grade 1 PNETs showed significantly increased tumor blood flow than higher-grade lesions.Therefore, PNETs with higher grade were more likely to be more aggressive than lower-grade tumors, including ill-defined margin, vascular invasions, and heterogeneous enhancement, which was consistent with our study.However, the results vary a lot and the accuracy remains challenging, as these studies were commonly based on a small-scale study, utilized subjective semi-quantitative imaging parameters, and lacked reliable external validation.Thus, a reliable method that can predict the grade of the tumor preoperatively remains an urgent need.
Radiomics has been widely used in the evaluation of tumor characteristics such as the spatial-temporal heterogeneity [18,19].With the quantitative analysis of heterogeneity within tumors, radiomics can help clinicians to assess the intrinsic biologic aggressiveness of tumors and  guide individualized treatment.For example, Liang et al [25] constructed a nomogram containing eight radiomics features selected from contrast-enhanced computed tomography (CECT) in combination with clinical stage which showed good performance in the prediction of grade 1 and 2/3 tumors, with AUC of 0.907 and 0.891 in training and testing cohorts, respectively.Similarly, Gu et al [26] found that fusion radiomics model incorporating tumor margin and radiomics signatures was significantly associated with histologic grade, yielding AUC of 0.974 and 0.902 in the training and testing cohorts.However, there were still few studies focused on radiomics analysis on MRI, although multi-parameter MRI exhibited great potential in providing higher soft tissue resolution in comparison with CT.Bian et al [27] selected 14 radiomics features from T2WI and unenhanced T1-weighted fat-suppressed sequences and showed good discrimination between grade 1 and 2/3 tumors in the training (AUC = 0.851) and validation cohort (AUC = 0.736).Recently, Liu et al [28] constructed a model including 6 radiomics features from T2WI and 1 radiomics feature from CECT, which showed better discrimination in the training cohort (AUC = 0.92) and validation cohort (AUC = 0.85) relative to clinical model and the other models using single modality images.Our results were similar with the above results, demonstrating radiomics model were superior to radiological model because it could provide more information and reflect the biological behavior within tumors.In addition, the fusion model could depict more complicated textural information in the tumor heterogeneity, thereby could effectively identify the more aggressive Grade 2/3 NF-PNETs before operation.
In our study, 2632 features were narrowed to only 4 potential predictors to construct the model.One of the significant radiomics predictor is a shape-based feature, namely sphericity.Sphericity has recently been highlighted because it could provide quantitative description of observable shape and its high repeatability [29][30][31].Previous studies have shown that sphericity was not only related to the tumor grade, but also can be used as prognostic predictor in many cancers [32][33][34].For example, Benedetti et al [32] reported that sphericity was related to high grade, microscopic metastasis, and vascular invasion in PNETs.Other significant radiomics predictor were GLCM features maximal correlation coefficient (MCC) from ADC, gradient first-order skewness and small dependence low gray level emphasis (SDLGE) from T2WI, indicating that the texture complexity of tumor from ADC, the scattered low signals and histogram asymmetry in tumor from T2WI were good predictors of the grade for NF-PNETs.As we all know, higher-grade PNETs tended to be more heterogenous due to increased cystic degeneration, necrosis, and calcification.Therefore, by integrating the radiomics features regarding shape of the whole tumor and heterogeneity, the nomogram achieved good performance in discriminating the grade of PNETs with AUC of 0.941 and 0.871 in the training and testing cohorts, respectively.Our models have several advantages.First, interscanner reproducibility of radiomics features were tested and the most repeatable radiomics features between different scanners were selected.It should be pointed out that reproducibility from different vendors was neglected because previous studies reported that texture features are less sensitive to differences between vendors [35].Second, the good performance of model in four testing cohort indicated that the model was robust and reliable, further proving the model had a good predictive ability performance in the unfitting new data and could be used in the clinical setting.Thirdly, radiomics features from noncontrast T2WI and DWI were selected and constructed in the model, which means that the administration of contrast agent was not needed, especially beneficial for the patients with chronic kidney insufficient at higher risk to suffer nephrogenic systemic fibrosis (NSF) after administration of gadolinium-based MR contrast agents [36,37].
This study has several limitations.First, as a retrospective multicenter study, the bias in patient selection and validation is inevitable.Secondly, NF-PNETs confirmed by FNA were excluded because biopsy may lead to misclassification due to intratumoral heterogeneity and sample error.Thirdly, information of other MR sequences was not included in this study although previous studies showed that unenhanced T1-weighted sequence and contrast-enhanced images have great potential for prediction of PNETs grade [25,28].DCE-MRI was only used as reference for ROI segmentation in this study.In addition, we did not analyze the relationship between models and survival outcome of the patients.Lastly, manual segmentation of ROI was rather time-consuming.Recently, auto-segmentation of pancreatic tumors in multi-parametric MRI has been introduced which showed comparable performance to expert oncologists using deep convolutional neural networks [38].Therefore, although the results of our study were promising, more studies are still needed in the future.
In conclusion, we developed a reliable and convenient model integrating radiomics features with radiological characteristics based on non-contrast MRI to predict the grade of NF-PNETs preoperatively from a multicenter study, which can facilitate the surgeon's clinical decision and guide personalized treatment in NF-PNETs patients.
RadiomicsScore = − 0.19277293 × ADC_original_shape_Sphericity − 0.01586678 × ADC_wavelet − HHH_glcm_MCC + 0.09997524 × T2W_gradient_firstorder_Skewness − 0.06597319 × T2W_logarithm_gldm_SmallDependenceLowGrayLevelEmphasis predict the grade of NF-PNETs based on clinical information and radiomics from DWI and T2WI, we developed and validated 4 models: clinical model, radiological mode, radiomics model, and fusion model integrated radiological and radiomics model.The diagnostic performance of the radiomics model and the fusion model was better than that of the radiological model in the testing cohort (AUC = 0.871 vs 0.684, p = 0.001; AUC = 0.864 vs 0.684, p = 0.001).In addition, the fusion model showed similar discrimination in the testing cohort (AUC = 0.864 vs 0.871, p = 0.726) compared with the radiomics model.The number of patients correctly classified in the testing cohort (n = 113) is 67 for the radiological model, 94 for the radiomics model, and 94 for the fusion model.

Fig. 3
Fig. 3 Nomogram of the fusion model that combines radiomics score and 3 qualitative clinical features

Fig. 4 Fig. 5
Fig. 4 Calibration curves of the fusion model in training group (a) and testing group (b)

Table 2
Clinical characteristics, MRI features of patients between different grades of NF-PNETs from the training and testing group

Table 3
Statistical result of the prediction by the radiological model in the training group from center 1 and testing group from the other 4 centers

Table 4
Statistical result of the prediction by the radiomics model in the training group from center 1 and testing group from the other 4 centers

Table 5
Statistical result of the prediction by the fusion model in the training group from center 1 and testing group from the other 4 centers