Introduction

Adult-type diffuse gliomas (ADGs) are the most common subtype of glioma [1]. Because of the large nuclear heterogeneity of tumor cells, the biological behavior of ADG features invasive growth, and clinically, the tumors are characterized by high disability-adjusted life years. The outcome of the tumor patient depends on a number of genetic traits as well as the histological grade [2]. Additionally, hereditary traits may play a bigger role in the overall development of the disease. However, the 2021 World Health Organization (WHO) Classification of Tumors of the Central Nervous System (CNS) successfully combines the two, and the resulting WHO grade may be a superior clinical biological indicator than an independent predictor (e.g., isocitrate dehydrogenase) in earlier studies [3]. With the continuous exploration and discovery of tumor growth mechanisms and new genetic biomarkers, some new treatment methods have shown great potential, such as immunotherapy [4]. Such advancements are expected to improve long-term survival and quality of life for the subset of patients who cannot be treated surgically, such as those with brainstem or multifocal lesions. Therefore, there is an urgent need for a reliable and noninvasive method to predict ADG grade to help develop precision medicine protocols.

MRI has long been the main method for the diagnosis of brain tumors. Moreover, with the development of various advanced imaging technologies, the clinical application of MRI technology is highly promising [5]. However, such biomarkers have not been widely used in clinical practice due to their complex theoretical composition and lack of effective prospective validation. Previous studies [6,7,8,9] have shown that advanced diffusion technologies can help to identify biological tumor markers, evaluate the damage of fiber tracts, or display the microstructure in the brain. We believe that multiple diffusion technologies can provide information reflecting pathological features, which can complement tumor heterogeneity and existing clinical applications.

Radiomics has made rapid progress in its applications in the medical field [10], especially in supporting decisions to enable machine-assisted diagnosis and prognosis assessments [11]. Previous studies [12] have shown that radiomic-based methods can be used to analyze the genetic characteristics of gliomas with improved performance. However, these studies were usually limited to conventional MRI, did not involve advanced imaging modalities [13] such as mean apparent propagation diffusion-MRI (MAP-MRI), or had low reproducibility due to the lack of prospective validation. Thus, we hypothesized that advanced diffusion technologies can be used for ADG grade prediction. To our knowledge, no similar study has been published.

In this study, we aimed to develop and validate a new integrated model for predicting ADG grade (adult-type diffuse gliomas grade integrated prediction model (ADGGIP)) by combining radiomic, clinical, and imaging morphological features. Additionally, the differences between the advanced diffusion technology and the simple technology were compared.

Materials and methods

We conducted a prospective study in accordance with the Declaration of Helsinki. The ethics committee of our hospital approved the study protocol (No. 2022038), and all participants signed informed consent forms prior to enrollment in the cohort.

Participants and clinical data

In this study, we prospectively and consecutively enrolled participants who visited our hospital from June 2018 to February 2022. All participants were suspected of ADG due to clinical symptoms or previous imaging reports. Then, all participants underwent preoperative conventional MRI (T2, T1, T2-FLAIR, and contrast-enhanced T1) and diffusion imaging (DWI and diffusion spectrum magnetic resonance imaging [DSI]), and surgery was performed within 3 months after the scan to obtain sufficient pathological tissue for the diagnosis of ADG (in accordance with the 2021 WHO Classification of Tumors of the CNS). Most participants received only general symptomatic care, such decreasing cranial pressure, between the MRI scan and surgery. For detailed scanning equipment, scanning parameters and pathological diagnosis information, see pages 1–2 in the Appendix and Supplementary Table S1. Certain participants were excluded based on the exclusion criteria (Fig. 1). Finally, 103 participants with ADG (mean age, 52 years; range, 21–77; 54 [52%] male) were enrolled. Because this was a single-center study, the time-validation method was used to validate the model [14], and participants were assigned to the training and prospective validation cohorts according to enrollment time. Seventy-two participants between June 2018 and May 2021 were assigned to the training cohort, including 38 low-grade glioma (LGG, CNS WHO Grades 2–3) participants and 34 high-grade glioma (HGG, CNS WHO Grade 4) participants. Thirty-one participants between June 2021 and February 2022 were assigned to the prospective validation cohort, including 8 participants with LGG and 23 participants with HGG.

Fig. 1
figure 1

Recruitment pathway for study participants. Based on the date of enrolment, participants were divided into various datasets. The ADGGIP was developed and validated before being formed. DSI, diffusion spectrum magnetic resonance imaging

Model development and validation

We developed and compared the performance and stability of various models, with the aim of comparing the abilities of the advanced diffusion technology and the simple technology in identifying ADG grade and to establish a clinically applicable and more complete prediction model.

After image preprocessing and region of interest selection (Appendix, p. 3–4), feature extraction and model building were performed using FeAture Explorer (FAE v0.5.2, https://github.com/salan668/FAE) [15]. A total of 2782 radiomic features were extracted from the MRI data of 3 advanced technologies (including diffusion kurtosis imaging [DKI], neurite orientation dispersion and density imaging [NODDI], and MAP-MRI), 1 simple technology (diffusion tensor imaging [DTI]) and B0 map (Appendix p. 4–5, Supplementary Figure S1 and Table S2). In addition, sex and age were extracted as clinical features, and 9 morphological features (necrosis, cystic, calcification, hemorrhage, tumor enhancement pattern, location, side, clarity of the solid tumor boundary, and edema) were extracted from the imaging reports (Appendix p. 5–6). Then, multiple pipeline combinations were considered during model development, including 3 feature normalization methods (mean, min–max and Z score normalization), 2 data dimensionality reduction methods (principal component analysis and Pearson correlation coefficients) (Supplementary Figure S2 and S3), 4 feature selection methods (analysis of variance, recursive feature elimination, Kruskal‒Wallis, and Relief), and 10 classifiers (linear [logistic regression, logistic regression via least absolute shrinkage and selection operator, linear discriminant analysis, and support vector machine] and nonlinear [autoencoder, decision tree, random forest, ada-boost, Gaussian process, and naïve Bayes]), for a total of 240 pipelines (Appendix p. 7–9). Model evaluation was performed using an internal validation cohort (by leave-one-out cross-validation) and an independent validation cohort. The flowchart of this study is shown in Fig. 2.

Fig. 2
figure 2

Workflow of the study. Information from preprocessed multilayer diffusion models, raw conventional MRI scans, and clinical features of the study cohort were collected and analyzed to summarize the underlying feature matrices that could be used to build the machine learning models. Model construction was performed using FeAture Explorer (V0.5.2), in which a variety of modeling approaches were tried. The mean receiver operating characteristic (ROC) curves, decision analysis curves and calibration curves were used to construct an integrated diagnostic model (ADGGID). DTI, diffusion tensor imaging; DKI, diffusion kurtosis imaging; NODDI, neurite orientation dispersion and density imaging; MAP-MRI, mean apparent propagation diffusion-MRI; ROI, region of interest; GLCM, gray level co-occurrence matrix; GLRLM, gray level run length matrix; GLSZM, gray level size zone matrix; GLDM, gray level dependence matrix; NGTDM, neighborhood gray tone difference matrix; PCC, Pearson correlation coefficient; PCA, principal component analysis; ANOVA, analysis of variance; KW, Kruskal-Wallis; RFE, recursive feature elimination; SVM, support vector machine; AE, auto-encoder; LDA, linear discriminant analysis; RF, random forest; Lasso, logistic regression via least absolute shrinkage and selection operator; LR, logistic regression; Ab, ada-boost; DT, decision tree; GP, Gaussian process; NB, native Bayes; ADGGIP, adult-type diffuse gliomas grade integrated prediction model

Finally, a total of 9 diagnostic models were established, including 5 single-modality prediction models (B0, DTI, DKI, NODDI, and MAP-MRI models) based on single diffusion technology, 1 fusion prediction model incorporating all diffusion technologies (radiomics MRI [rMRI]), 1 prediction model with clinical and imaging morphological features (ClinicRad), and 2 multimodal prediction models (the single-modality model with the highest diagnostic performance in the prospective validation cohort and the more theoretically relevant prediction model incorporating multiple diffusion features were selected) (incorporating clinical factors, radiologists’ interpretations, and DTI or rMRI data [CliRadDTI/ADGGIP] to predict ADG grade) (Supplementary Figure S4).

Statistical analysis

The performance of the model in predicting ADG grade was evaluated with the receiver operating characteristic curve. The 95% confidence intervals of the area under the curve (AUC) were generated by bootstrap with 1000 samples. We used the DeLong test, net reclassification improvement and integrated discrimination improvement to compare the performances of different models. Deviations between the model and the real results were visualized by calibration curves and quantified by the Brier score. Decision curve analysis was used to compare the net benefits of different models at different threshold probabilities to increase the possibility of practical application in clinical practice.

Quantitative data are expressed as the mean ± standard deviation. Student’s t test was used to compare age, and the χ2 test, Fisher test, or Mann‒Whitney U test were used to compare categorical variables. All statistical analyses were two-sided, and p < 0.05 was considered statistically significant. All statistical analyses were performed using SPSS (version 24.0); R with the irr, pROC, and PredictABEL packages installed (version 4.1.2); or Python (version 3.9.12) and Scikit-Learn (version 0.24.2). Sample size and power calculations are shown in the additional materials (Appendix p. 9–10).

Results

Participant characteristics

The baseline characteristics of all participants are summarized in Table 1 and Supplementary Table S3. There were no significant differences in baseline characteristics between the training and prospective validation cohorts, except for WHO grade (p = 0.012), tumor location (0.011), and side of the tumor (0.046). We attribute this to the increasing incidence of glioblastoma and to new WHO criteria that have elevated some histological grades based on genetic characteristics. However, the imbalanced classification still reflects the severity of gliomas (2/3 of ADG are HGG) in the real world.

Table 1 Participants characteristics

In the training cohort, only sex (p = 0.637), cystic changes (0.592), tumor boundary (0.979), and side of the tumor (0.083) showed no significant differences between the LGG and HGG groups. In the prospective validation cohort, there were no significant differences in baseline characteristics between the LGG and HGG groups except for necrosis (p = 0.027), tumor location (0.034), and enhancement mode (0.034).

Feature selection and pipeline

ADGGIP was established using a support vector machine. Unlike the single-modality radiomics model, the multimodal model incorporates clinical and imaging morphological features. ADGGIP was composed of four features, including one radiomics feature and three clinical features, among which the radiomics feature was the prediction probability of settlement on rMRI, and the three clinical features were calcification, tumor location, and edema; the contributions of the four features were 1.41, 0.94, 0.82, and 0.41, respectively. The radiomics feature had the highest correlation with tumor grade. Edema had the lowest correlation with tumor grade (Supplementary Figure S5). rMRI was composed of four radiomics features, all of which were first-order features. The energy and median values of the DTI fractional anisotropy and MAP non-Gaussianity axial are included, respectively. The feature distribution is shown in Supplementary Figure S6. The detailed constituent factors and pipelines of all models are shown in Table 2.

Table 2 Selected features for model construction

Briefly, ADGGIP was constructed as an integrated prediction model by integrating radiomic, clinical, and imaging morphological features of the training cohort and was considered as the optimal model in this study. The ADGGIP provided a probabilistic forecast of ADG grade (ranging from 0 to 1), and the probability of LGG inversely correlated with the value of the anticipated probability output. Then, the result was artificially converted into a binary prediction, either LGG or HGG, where the threshold value relied on the maximum Youden index.

Development, performance, and validation of prediction models

ADGGIP showed the strongest ability to discriminate tumor grades in the training cohort (AUC 0.958 [95% CI 0.907–0.992]) and internal validation cohort (0.942 [95% CI 0.885–0.982]) (Table 3 and Fig. 3). ADGGIP also showed superior performance in predicting tumor grade in the prospective validation cohort. Among 8 participants with LGG predicted by ADGGIP, 7 (87.5%) were pathologically confirmed to have LGG. In addition, among 23 HGG participants predicted by ADGGIP, 19 participants (82.6%) were confirmed to have HGG by pathology (Supplementary Figure S7). Overall, ADGGIP achieved a favorable AUC of 0.880 (95% CI 0.685–1) in the prospective validation cohort. The prospective validation cohort performed slightly worse, possibly due to the small sample size (n = 31). In addition, ADGGIP also had the highest precision-recall AUC, which ranged from 0.942 to 0.963. The AUCs of the single-modality model and ClinicRad model were slightly lower than that of ADGGIP (AUC range: 0.721–0.860). For the diagnostic model based on a single diffusion technology, the DTI model (simple model) showed the highest diagnostic performance (AUC range: 0.821–0.851) in all cohorts (Supplementary Table S4). In the training cohort, each predictor contained in the rMRI, ClinicRad, CliRadDTI, and ADGGIP was used to independently predict tumor grade, with AUC values ranging from 0.502 to 0.821 (Supplementary Figure S5).

Table 3 Prediction performance of ADGGIP compared with other integrated prediction models
Fig. 3
figure 3

ROC curves (a) and PR curves (b) of ADGGIP and other integrated prediction models in the training and validation cohorts. ClinicRad, model incorporating clinical factors and interpretations from radiologists; rMRI, radiomics MRI; CliRadDTI, model incorporating clinical factors, radiologist interpretations and DTI data; ADGGIP, adult-type diffuse gliomas grade integrated prediction model

The DeLong test, integrated discrimination improvement, and net reclassification improvement were used to compare the diagnosis efficacy of the various models. Since net reclassification improvement requires artificial presetting of the cutoff point and the DeLong test is insensitive to small samples, integrated discrimination improvement was used as the gold standard when results were not matched. The results demonstrated that ADGGIP was superior to all other models (p < 0.05; except for the prospective validation group for the DTI model), while the DTI model was the best of the five single-modality models. The rMRI model performed worse than the DTI model (integrated discrimination improvement < 0) (Supplementary Tables S5-7).

The calibration curve showed that with ADGGIP, the data generated in the training cohort had the highest consistency between the predicted value and the observed value compared with all other models (Fig. 4 and Supplementary Figure S8). In further quantitative analysis, ADGGIP had the smallest Brier score (0.084), which also verified the results of the calibration curve (Supplementary Table S8), indicating that ADGGIP composed of multimodal features was more consistent with the real situation, thus improving the prediction performance.

Fig. 4
figure 4

Decision curve analysis (a) and calibration curve (b) of ADGGIP versus other integrated prediction models in the training cohorts. Decision curves show that ADGGIP predicts adult-type diffuse gliomas grade better than intervention all, no intervention and other single models when the threshold is above 5%. ADGGIP predictions were closer to the real situation and had the lowest Brier score (b). ClinicRad, model incorporating clinical factors and interpretations from radiologists; rMRI, radiomics MRI; CliRadDTI, model incorporating clinical factors, radiologist interpretations and DTI data; ADGGIP, adult-type diffuse gliomas grade integrated prediction model

Clinical value

Decision curve analysis was performed for the nine diagnostic models, and the findings are shown in Fig. 4 and Supplementary Figure S8. At threshold probabilities greater than 5%, ADGGIP provided a greater net benefit than the other eight models in predicting the grade of ADG compared with the case in which no predictive model was used (i.e., all or none).

Discussion

In this study, we developed and validated a radiomics-based model that can assess ADG grade before treatment by combining quantitative diffusion radiomics features and clinical and radiographic morphological features. The integrated model for predicting ADG grade (ADGGIP) accurately predicted the ADG grade, had the best diagnostic efficacy and stability compared to other combination models, and showed good net benefits in decision curve analysis. And first-order traits may be a more correlated biomarker for grade prediction.

A previous study [16] identified the potential application of preoperative DWI and DTI in combination with machine learning for identifying multiple genetic traits aimed at improving the treatment of patients with glioma, particularly primary glioblastoma. They compared multiple prediction models and concluded that a comprehensive model based on radiomics and deep learning could be used to predict genetic biomarkers. However, the application of deep learning reduces the interpretability of the model, and without additional verification, there is a potential risk of overfitting. In a parallel study, several scholars carried out similar work and established four models capable of providing information correlating to different single genetic traits [13]. However, the classification criteria applied to the cohort did not fully conform to the clinical situation [17], leading to unclear clinical applicability. In this study, compared with previous studies, more accurate glioma subgroup classification methods were applied, and the model based on a single diffusion technology showed better performance (AUC range: 0.721–0.851) than the DWI (AUC range: 0.631–0.725). It is suggested that an advanced diffusion technology may be a better clinical choice. The poor correlation between ADC and histopathological features can be considered one explanation for this phenomenon [18].

The excellent predictive performance of ADGGIP may be due to the comprehensive incorporation of tumor macrostructural and microstructural features versus being limited to heterogeneous radiomics features. Gao et al [7] applied four diffusion technologies to study the main genetic information of glioma. They found that single-modality models showed similar diagnostic performances among each other, and the diagnostic models that integrated multiple diffusion modality features did not show better application value. However, they did not carry out additional clinical or higher-order feature extraction. This was improved in our study, considerably enhancing the therapeutic application potential. Even so, their integrated prediction model still showed the contribution of DTI and MAP-MRI, which was similarly reflected in our results. In addition, in our study, multiple pipelines were used to select the optimal model, versus being limited to a certain method or model, i.e., “No Free Lunch Theorem” [19]. Finally, the calibration curve and Brier scores between the models confirmed that the higher diagnostic performance of ADGGIP was more consistent with the real situation.

As advanced non-Gaussian diffusion technologies, the MAP-MRI, NODDI, and DKI should theoretically reflect the real situation of water molecular diffusion more accurately than simple Gaussian diffusion technology (DTI) and better characterize the complexity and inhomogeneity of the tissue microenvironment [20]. However, the DTI prediction model is more accurate than other single-modality prediction models. This finding means that the advantages of the integrated diffusion model may be currently only theoretical, and point-to-point studies are necessary to analyze the internal association between the predictive integrated diffusion model and pathological features. Interestingly, ADGGIP showed the highest diagnostic efficacy and stability when combined with clinical and morphologic features and was superior to the best single-modality model (DTI) combined with clinical and morphologic features. That result indicates the importance of multimodal features, and even omitting one type of feature can degrade prediction performance [21].

Some scholars believe that first-order features have more substantial associations with tumor characteristics than higher-order features [18], and energy often shows the strongest radio-tissue association. In our study, the energy of fractional anisotropy, as independent factor, was incorporated into both the DTI and rMRI models, which proved the point. Energy was distributed differently between LGG and HGG groups, which may be explained as the residual white matter fiber bundle [6, 22] and represents a more invasive pathological feature [23].

Repeatability and reproducibility are necessary for machine learning model building [10, 24]. Some scholars have found that simultaneous multi-slice technology had an impact on the extraction of diffusion features [25]. The field strength also causes some effects [26], and the higher the correlation of features to the actual situation, the smaller the effect of field strength [18]. During data collection, we did not conduct additional simultaneous multi-slice technology to reduce scanning time, but to obtain more accurate diffusion information. Nevertheless, the advanced DSI-based preprocessing technique, which involved fitting four diffusion models by scanning a single sequence, ensured the viability of clinical application. Simultaneous multislice can be used to significantly reduce the scan time in the future, although it may affect the settlement of diffusion parameters [25]. Naturally, it is likely that some centers will not be able to comply with some of the model’s requirements, such scanning for DSI. DTI might be an alternative at this stage. The diagnostic efficacy of a comprehensive predictive model based on DTI is still quite good.

Despite the promising results, our study has some limitations. First, the comprehensive diagnostic model was based on data from a single center. However, comprehensive and effective statistical analysis methods ensure the accuracy and stability of the model. Second, manual delineation of tumor regions of interest is the current gold standard, but it takes considerable time and effort. In the future, fully automated tumor segmentation based on machine learning will be incorporated into practical applications, such as neural networks. Third, there were not enough prognostic data to support the usability of the model, which needs to be further explored and validated in future studies.

In conclusion, we present ADGGIP, a noninvasive and accurate radiomic model that combines radiomic, clinical, and imaging morphological features to facilitate the preoperative assessment of WHO grade in patients with adult-type diffuse gliomas. The performance and stability of ADGGIP highlight the potential of advanced diffusion models for precision therapy in patients with adult-type diffuse gliomas.