1 Introduction

Gliomas are highly malignant intracranial tumors [1,2,3]. According to the 2016 World Health Organization (WHO) classification scheme for central nervous system tumors, gliomas can be classified as grades I–IV. Stage I and II are considered low-grade gliomas (LGG), and stage III and IV are focused on as high-grade gliomas (HGG) [4]. Glioblastomas (GBM) are the highest-grade malignant gliomas, which are the most malignant and frequently occurring type of primary astrocytomas [5]. In the treatment process, the prediction of glioma grade is one of the challenging tasks [6,7,8]. Glioma grade is an essential indicator of disease occurrence and patient survival [9], which can guide the formulation of surgery or radiation therapy treatment plans [10, 11]. Therefore, it is necessary to develop an accurate and effective method to predict with reasonable accuracy.

The current clinical diagnosis of glioma generally depends on the judgment of the physician assisting with magnetic resonance imaging technology and their experience [12,13,14]. Standard MRI sequences refer to T2-weighted (T2), fluid-attenuated inversion recovery (FLAIR), anterior gadolinium T1-weighted (T1), and posterior gadolinium T1-weighted (T1Gd) techniques [15,16,17]. Moreover, radiomics is an emerging field in medicine and oncology, which can extract information from imaging techniques [18, 19]. Various valuable features can be obtained from MRI sequences, which cannot be easily observed by the naked eye, and these features can efficiently assist in clinical diagnosis. Those features are mathematical descriptions of the visual properties of an image [20,21,22]. For instance, numerical values are included, such as actual pixel intensities, edge strengths, shape descriptors, and regional variation or texture of pixel value [23], which can provide accurate and reliable evidence to aid in clinical decision-making [24, 25]. For instance, a review study conducted by Lohmann et al. illustrated the relationship between imaging features and clinicopathological manifestations of gliomas, in which the value of the features in diagnosing gliomas was demonstrated. However, this study did not present specific protocols due to the limited space [26]. Moreover, Han et al. identified gliomas by constructing clinical and combined models. The results showed that the imaging features could help identify and distinguish glioma grade. However, the radiographic data they used were only T1-enhanced sequences, while other sequences may contain additional functional and biological information [15]. Despite the limitations of the studies mentioned above, it has demonstrated that imaging features can be used for the clinical diagnosis of gliomas [27,28,29,30]. Therefore, some effective protocols for glioma grading are recommended, which are used for clinical diagnosis and prognostic degree. For example, Lu et al. explored the problem of predicting glioma grade by features to identify better gliomas, to construct the machine learning-based segmentation model and prediction models. The final test results were consistent with clinical results, which demonstrated the clinical value of using tumor features to predict glioma grade. However, they did not combine the high-risk characteristics with the prognosis and establish an effective method to analyze glioma heterogeneity [31].

The above-mentioned studies demonstrated the importance of tumor characteristics in predicting glioma grade. However, there are still deficiencies in predicting the grade of gliomas, in that the high heterogeneity and multiple tumor subregions are not considered. Heterogeneity within tumors generally occurs at the molecular level, which reflects macroscopically by different phenotypes on magnetic resonance imaging, which are divided into three tumor subregions [32, 33]. Tumor subregions usually considered are core-enhanced tumors (ET), non-enhanced tumors (NET), and peritumor edema (ED) [34]. With the help of multiparametric MRI, tumor regions can be identified explicitly as active tumors and necrotic and edematous by MRI sequences.

Previous studies have analyzed the features of the MRI sequence of the whole tumor, while it is ignored that there may be more valuable features in MRI sequences of tumor subregions. For example, Chiu et al. demonstrated that the features of tumor subregions showed a higher correlation with the pathological features of glioma than the whole tumor [35]. Zhou et al. demonstrated that quantitative features from the core-enhanced tumor subregion had the discriminative capability in predicting glioma grading [36]. Yang et al. investigated that the subregions showed much more heterogeneous information. They integrated three radiomic features from different tumor subregions to establish a nomogram that could predict glioma grade, whose performance was superior to the one constructed from the radiomic features of the tumor [37]. However, they only studied the radiomic score of tumor subregions and did not investigate the features of MRI sequences. Furthermore, Yin et al. revealed that the subregional features might contain more information than features extracted from the whole tumor, which could predict glioma grading better. However, the comparison between the whole tumor and tumor subregions was only based on a single parametric MRI (T1), which was not comprehensive enough [38]. It was shown that the tumor subregion-based features had better potential than features extracted from the whole tumor. However, it remains unclear whether radiological features from each single tumor subregion based on multiparametric MRI provide more favorable information than from the whole tumor in predicting glioma grade.

Therefore, this paper aims to investigate the prediction of glioma grade by the radiomic features from tumor subregions based on multiparametric MRI. Features from subregions have performed better than those from the whole tumor, improving the accuracy of the diagnosis. The results showed that the ET subregion outperformed other tumor subregions and the whole tumor in predicting glioma grade.

2 Materials and Methods

2.1 Methodology Framework

The methodology framework is shown in Fig. 1, which was carried out in the following stages. Firstly, an MRI dataset including four sequences was obtained. Secondly, the tumor subregions and whole tumors were obtained according to segmentation labels generated by the GLISTRBoost algorithm. Thirdly, radiomic features were extracted from the subregions and the whole tumor. Fourthly, the optimal features were selected for model construction. The support vector machine method was used to predict glioma grade. Finally, the performance of the model was accessed through the receiver operating characteristic (ROC) analysis.

Fig. 1
figure 1

Subregion-based radiomic analysis for predicting glioma grade. a MRI data acquisition. b Tumor subregion segmentation. c Feature extraction. d Feature selection and model construction. e Results analysis

2.2 Study Population and Image Data Description

The dataset was obtained from the existing public database on The Cancer Image Archive (TCIA, https://www.cancerimagingarchive.net/): International Multimodal Brain Tumor Segmentation Challenge 2015 (BRATS 2015). Four structural modalities (i.e., T1, T1Gd, T2, FLAIR) acquired at the baseline pre-operative time point were included. The data were obtained from the actual clinical practice of radiographically scanning patients diagnosed with gliomas, which comprised the scan collections of GBM (n = 97) and low-grade glioma (LGG, n = 62). The dataset described computer-aided and manually corrected tumor segmentation labels, which were revised and manually corrected by an expert board-certified neuroradiologist. Those labels can serve as gold standards, according to TCIA. Cases with incomplete tumor segmentation labels were removed. Therefore, in this study, 139 patients were obtained [34, 39, 40], including 97 GBM and 42 LGG cases in Neuroimaging Informatics Technology Initiative (NIfTI) format. Two representative cases are studied, including one GBM and one LGG, as shown in Fig. 2. This public data set has no patient identifier and does not need the approval of the institutional review committee.

Fig. 2
figure 2

Cases for glioma with different grades. The first row shows a FLAIR-based MRI for a patient with GBM from the a axial direction, b sagittal direction, and c coronal direction. The second row shows a FLAIR-based MRI for a patient with LGG in the d axial direction, e sagittal direction, and f coronal direction

2.3 Pre-processing and Subregion Segmentation

MRI data and tumor segmentations were downloaded as NIfTI format files from TCIA. The NIfTI files contained struct data, and segmentation labels were stored as three-dimensional image matrices in the struct data. These labels were generated by GLISTRBoost based on a hybrid generation–discrimination model [41]. GLISTRBoost is defined as a combined generative–discriminative tumor segmentation method. In addition, board-certified neuroradiologists reviewed these segmentation markers and manually corrected misclassifications. Segmentations were matched with the image by matrix multiplication to obtain tumor subregions, including core-enhanced tumor (ET), non-enhanced tumor (NET), and peritumor edema (ED). All segmentations were combined to represent a single whole tumor region. Figure 3 illustrates the results of tumor segmentation for two selected cases (GBM and LGG) using the above-mentioned segmentation.

Fig. 3
figure 3

Tumor segmentation results for cases with different grades. Red, yellow, and green represent the NET, ET, and ED subregions. The first row shows the segmented results of GBM in the a axial direction, b sagittal direction, and c coronal direction and the second row shows the segmented results of LGG in the d axial direction, e sagittal direction, and f coronal direction

2.4 Feature Extraction

Radiomic features were calculated through an open-source toolkit, i.e., PyRadiomics (version 3.0.1) [42]. A total of 100 features were extracted from the tumor subregions (ET, NET, and ED subregions) and the whole tumor of four MRI sequences (T1, T2, T1-weighted, and FLAIR). The extracted features referred to 14 shape features, 18 first-order features, and 62 second-order features. Second-order features consisted of 22 gray-level co-occurrence matrix (GLCM) features, 16 gray-level run length matrix (GLRLM) features, 16 gray-level size-zone matrix (GLSZM) features, and 14 gray-level dependence matrix (GLDM) features. Those features are listed in Supplementary Table S1. To improve the comparability of features and eliminate dimensional effects, the feature data were standardized using the z-score formula before feature selection.

2.5 Feature Selection

Different kinds of radiomic features from each region (three subregions and the whole tumor) of each MRI sequence were combined and used as an input dataset for the machine learning model. Therefore, 16 datasets of features were constructed. Each dataset comprised 70% of the data for training and the remaining 30% for data validation. Tenfold cross-validated recursive feature elimination with an SVM classifier was used to select the good predictive features [43]. In recursive feature elimination, first, all features were rated, and then a different number of features was selected sequentially in cross-validation. It was used to determine the number of features with the highest scores and the most practical combination of features for each training session. The evaluation metric based on the area under the receiver operating characteristic curve (AUC) was used to determine the optimal features during the cross-validation, which were used for the final model construction and prediction.

2.6 Model Construction and Evaluation

Wilcoxon rank sum test was adopted to analyze differences in individual features among glioma grades. The superiority of the Wilcoxon rank sum test depends on its ability to handle non-normal data and small sample sizes. Compared to other statistical tests, such as the two-sample t test, the Wilcoxon rank sum test does not require assumptions about the underlying distribution of the data, which makes it a more robust and reliable test for the data set. Another advantage of the Wilcoxon rank sum test is that it can be easily understood and interpreted. It provides a straightforward and intuitive measure of the difference between the two groups, making it an ideal choice for exploratory analysis or for testing hypotheses about the differences between two groups. The P value of a single feature from subregions and the whole tumor was calculated, and P < 0.05 indicated a statistically significant difference. In uniparametric analysis, univariate SVM classifiers were constructed to investigate the diagnostic performance of individual features in differentiating the GBM and LGG states. Each feature was used as an individual dataset to build a classifier.

Multivariate SVM classifiers were subsequently established to predict glioma grade. The grid search method was applied to select the most suitable combination of hyperparameters in various parameters, including alternative regularization parameters and kernel types. During grid search, tenfold cross-validation with AUC as the evaluation index was performed for the training set. Then, the test set was used to validate the models under the optimal hyperparameters. Each classifier was randomly trained five times, and the average results were calculated. Besides, multiparametric analysis was implemented by combining the features of four MRI sequences as the input dataset. The performance was compared between the tumor subregions and the whole tumor. The results of the multiparametric analysis were also compared with those of the uniparametric analysis. The ROC curve was plotted, and AUC was calculated to indicate the diagnostic performance. Performance metrics in the prediction model were adopted, such as a 95% confidence interval, accuracy, specificity, and sensitivity.

Wilcoxon rank sum test was performed in MATLAB (version 2019b). Feature extraction, feature selection, and model construction were implemented in Python (version 3.7.11). Performance evaluation of the model was done using R (version 4.2.1).

3 Results and Discussion

3.1 Difference Analysis of Individual Features

Eight features with the lowest P values of four MRI sequences are listed in Supplementary Tables S2–S5. It was found that the eight features from the ET subregion had a significantly lower P value than the whole tumor features. Four of the eight features from the ET subregion were shape features.

The box plots of the representative shape features (mesh volume and voxel volume) from the ET subregion and whole tumor are provided in Fig. 4. For the shape features, mesh volume (P = 3.1873E−13) and voxel volume (P = 3.2976E−13) from the ET subregion provided significant differences between GBM and LGG grade in four MRI sequences. It was obtained that there was no significant difference in mesh volume (P = 0.3389) and voxel volume (P = 0.3412) from the tumor.

Fig. 4
figure 4

Box plots for the two representative shape features in the ET subregion and whole tumor. a Mesh volume for the ET subregion. b Mesh volume for the whole tumor. c Voxel volume for the ET subregion. d Voxel volume for the whole tumor

3.2 Performance Analysis of Individual Features

The diagnostic performance (in terms of AUC) of the eight top-ranked features is provided from the T1-based ET subregion and that of the whole tumor is listed in Table 1. The AUC values ranged from 0.8223 to 0.8902 for the eight features extracted from the ET subregion. The feature of gray-level non-uniformity (GLRLM) from the ET subregion was the best, with an AUC of 0.8902. The AUC value was the highest among the four sequences. For the whole tumor, variance and gray-level variance obtained the highest AUC of 0.7968. The AUC values of the eight features in the ET subregion were higher than those previously. The results showed that the capability of individual features from the ET subregion were relatively higher than the capability of the whole tumor in distinguishing GBM from LGG. Similar results were found for the FLAIR and T2-based ET subregion and the entire tumor (Tables 2, 3). In FLAIR, T1Gd, and T2, the shape feature of mesh volume from the ET subregion obtained the highest AUC of 0.8870 (Tables 2, 3, 4).

Table 1 Comparative results of AUC of the eight top-ranked features from the T1-based ET subregion and the whole tumor for glioma grade
Table 2 Comparative results of AUC of the eight top-ranked features from the FLAIR-based ET subregion and whole tumor for glioma grade
Table 3 Comparative results of AUC of the eight top-ranked features from the T2-based ET subregion and whole tumor for glioma grade
Table 4 Comparative results of AUC of the eight top-ranked features from the T1Gd-based ET subregion and whole tumor for glioma grade

Various features were explored in univariate analysis, including shape features, first-order features, and other gray-level matrix features. For diagnostic performance, individual features from the ET subregion were better than those from whole regions. Gray-level non-uniformity of GLRLM (GLN) from the ET subregion in T1 showed the most remarkable ability for differentiating GBM grade. GLN measures the similarity between gray-level intensity values within an image where a lower value correlates with more striking similarities in intensity values. A high correlation was revealed between GLN and various immune markers, and GLN was then used to determine glioma grade and other texture features [44]. Results demonstrated the usefulness of the proposed texture features as a non-invasive biomarker for predicting glioma treatment response. Furthermore, the shape feature mesh volume from the ET subregion performed better than all features extracted from the whole tumor in FLAIR, T2, and T1Gd. In addition, the ET subregion-based three-shape features, such as voxel volume, surface area, and least axis length, performed better than those of the whole tumor. The relationship between the volume shape characteristics of the ET subregion and glioma grade was reported from previous research works. Our findings are somewhat similar to their findings [45]. Those shape features from the ET subregion showed significant differences between GBM and LGG grades in the Wilcoxon rank sum test. However, shape features from the whole tumor did not show significant differences. Therefore, shape features from the ET subregion with a small P value can be used as glioma grade indicators. It is suggested that the distribution of samples from GBM and LGG differed significantly.

3.3 SVM-Based Multivariate Analysis for Predicting Glioma Grade

Table 5 lists the diagnostic performance of multivariate classifiers among tumor subregions and the whole tumor. In uniparametric analysis, the FLAIR-based ET subregion produced the highest AUC of 0.8697. The AUC value was higher than that of the NET and ED subregions. The ET subregion also performed best among the three subregions in the other three MRI sequences. Compared with the whole tumor, the AUC of the ET subregion was higher in FLAIR, T1 (AUC = 0.8474), and T2 (AUC = 0.8474). In T1Gd, the whole tumor produced the highest AUC of 0.8718.

Table 5 Performance comparisons of multivariate classifiers among the tumor subregions and the whole tumor

Previous studies predicted glioma grade based on the radiomic characteristics of the whole tumor [34, 46]. Recent studies have analyzed the performance of tumor and edema areas [47]. This paper compared the performance of subregions and the whole tumor in predicting glioma grade. The prediction model with the best performance was established in the multiparametric classifier of the ET subregion. A reasonable explanation is that the ET subregion is considered the region of most intense tumor activity in high-grade gliomas. Since GBM destroys the blood–brain barrier and has rich vascular components, the ET subregion can occupy many contrast agents and show a high enhancement level in MRI-enhanced scanning. Therefore, the characteristic tumor information of the ET subregion is more helpful in distinguishing GBM from LGG than the whole tumor [34].

In the multiparametric analysis, the ET subregion yielded the highest AUC value at 0.8755 in classification among all the subregions and the whole tumor. The performance of the ET subregion was better than the that of other subregions and the whole tumor. In addition, multiparametric classifiers outperformed uniparametric classifiers in the same subregions. ROC and AUC of classifiers of the ET subregion and the whole tumor are presented in Fig. 5. Moreover, the accuracy, sensitivity, and specificity derived from the tumor subregions and the whole tumor are listed in Supplementary Table S6. In FLAIR, T1, and T2, the ET subregion yielded higher accuracy than the other subregions and the whole tumor.

Fig. 5
figure 5

ROC curves derived from the ET subregion and the whole tumor in four MRI sequences. a FLAIR. b T1. c T1Gd. d T2. e Multiparametric MRI

Lambin et al. indicated that using multiple data (e.g., multiparametric MRI data) in radiomics could improve the accuracy and specificity of radiomics [48], which employed multiparametric prediction analysis, i.e., combining the features of four MRI sequences to select and construct SVM classifiers. The whole tumor and subregions showed better prediction performance for multiparametric classification than all uniparametric results. Because the four imaging sequences were acquired differently, the extracted tumor features were different, except for the shape features. In FLAIR, parenchymal lesions and lesions containing bound water had distinct signals. A high signal is evident in the ED subregion of the lesion in T2. In T1Gd, the enhanced tumor areas with rich blood supply appear bright. Image information acquired from multiparametric MRI can effectively complement each other. The key features of the four MRIs can be identified and filtered comprehensively during feature extraction and selection. Moreover, Ning et al. demonstrated the feasibility of integrating multimodal MRI radiomics and in-depth features, which was used to develop a promising non-invasive model for diagnosing glioma grade [49].

Furthermore, the wrapped recursive feature elimination method with cross-validation (REFCV) was used during the feature selection process, which can determine the number of features with the highest scores and the most practical combination of features for each training session [50]. Wrapping feature selection can directly take the performance of the classifier, which is used as the evaluation standard of the feature subset. The feature subset most beneficial to the performance of the classifier is selected. AUC was used as an evaluation metric in both feature selection and classification and SVM was the classifier. Therefore, the feature selection results of the model were more suitable for the model we had chosen. It can improve the accuracy of the experimental results and the standardization of diagnostic prediction.

Public data and open-source software were used in our work. For example, Wu et al. concluded that the public database data could greatly improve clinical data standardization, reproducibility, and generalizability for radiological features. Besides, mature image processing technology and data mining tools were adopted, to perform complex tasks such as automatic tumor segmentation, image processing, and radiomics feature extraction. It can help us improve the efficiency of disease diagnosis and data processing [51].

A preliminary analysis was conducted, and the performance of features from the subregions and the whole tumor was compared in identifying the glioma grade. Several limitations are discussed in this study. Firstly, it was a retrospective study with a relatively small sample size. More samples were collected to validate and refine the present work for future studies. Besides, data were collected from one institution, so the sample may not be representative of the general population, which magnifies the effect of individual difference. Therefore, multiple datasets should be acquired from different institutions to perform external verification of the current model, which improves the generalization ability. Furthermore, we analyzed features only from the four modalities of MRI. In the future, features from other imaging (e.g., diffusion-weighted imaging) can be included. Finally, it would be worthwhile to compare the performance of machine learning models with the diagnoses of physicians.

4 Conclusion

This study predicted glioma grade based on the radiomic features from the tumor subregions, and the performance was compared between the subregions and the whole tumor. The results showed that the ET subregion performed better than the other tumor subregions and the whole tumor in predicting glioma grade. Radiomic features from the ET subregion have the potential to serve as clinical markers, which enhances the prediction of glioma grade. In addition, multiparametric prediction was more accurate than uniparametric prediction. The ET subregion-based multiparametric model best predicted glioma grade (AUC = 0.8755). Therefore, combining multiple uniparametric MRI can provide more comprehensive radiomics information and accurate prediction. Clinicians can depend on the model to accurately predict glioma grades so that treatment plans can be developed promptly. It is beneficial to improve the cure rate of patients with glioma. Further work is required before these methods can be utilized to facilitate the non-invasive evaluation of glioma grade in clinical practice.