Model incorporating multiple diffusion MRI features: development and validation of a radiomics-based model to predict adult-type diffuse gliomas grade

Objectives To develop and validate a radiomics-based model (ADGGIP) for predicting adult-type diffuse gliomas (ADG) grade by combining multiple diffusion modalities and clinical and imaging morphologic features. Methods In this prospective study, we recruited 103 participants diagnosed with ADG and collected their preoperative conventional MRI and multiple diffusion imaging (diffusion tensor imaging, diffusion kurtosis imaging, neurite orientation dispersion and density imaging, and mean apparent propagator diffusion-MRI) data in our hospital, as well as clinical information. Radiomic features of the diffusion images and clinical information and morphological data from the radiological reports were extracted, and multiple pipelines were used to construct the optimal model. Model validation was performed through a time-independent validation cohort. ROC curves were used to evaluate model performance. The clinical benefit was determined by decision curve analysis. Results From June 2018 to May 2021, 72 participants were recruited for the training cohort. Between June 2021 and February 2022, 31 participants were enrolled in the prospective validation cohort. In the training cohort (AUC 0.958), internal validation cohort (0.942), and prospective validation cohort (0.880), ADGGIP had good accuracy in predicting ADG grade. ADGGIP was also significantly better than the single-modality prediction model (AUC 0.860) and clinical imaging morphology model (0.841) (all p < .01) in the prospective validation cohort. When the threshold probability was greater than 5%, ADGGIP provided the greatest net benefit. Conclusion ADGGIP, which is based on advanced diffusion modalities, can predict the grade of ADG with high accuracy and robustness and can help improve clinical decision-making. Clinical relevance statement Integrated multi-modal predictive modeling is beneficial for early detection and treatment planning of adult-type diffuse gliomas, as well as for investigating the genuine clinical significance of biomarkers. Key Points • Integrated model exhibits the highest performance and stability. • When the threshold is greater than 5%, the integrated model has the greatest net benefit. • The advanced diffusion models do not demonstrate better performance than the simple technology. Supplementary information The online version contains supplementary material available at 10.1007/s00330-023-09861-0.

gliomas before treatment.ADGGIP was constructed by machine learning, extracting key factors from multiple diffusion modalities (diffusion tensor imaging [DTI], diffusion kurtosis imaging [DKI], neurite orientation dispersion and density imaging [NODDI] and mean apparent propagator [MAP]-MRI), clinical features, and imaging features.ADGGIP was able to effectively differentiate the pathological grades of adult-type diffuse gliomas and significantly outperformed prediction models constructed from a single factor.The predictive performance of ADGGIP was validated with both internal and time-based external datasets.Time-based external datasets were applied for validation, highlighting the robustness of ADGGIP.
Implications of all the available evidence ADGGIP can help effectively predict the pretreatment grade of adult-type diffuse gliomas.The 2021 edition of the WHO Classification of Tumors of the Central Nervous System facilitates the development of personalized treatment regimens, especially in a period of transition where targeted and immunotherapy modalities have become prominent and important.In future clinical practice, the use of ADGGIP could promote treatment options for patients considered suitable for drug therapy and expand the survival benefits of surgical treatment for patients with highly malignant tumors.Figures     Table S1.Anatomical and diffusion MRI parameters Table S2.Radiomic features extracted by FeAture Explorer Table S3.Adult-type diffuse gliomas classification proportions among cohorts Table S4.Prediction performance of single-modality models All tissue samples were prepared as paraffin blocks and analyzed at our institution's pathology department using the latest methodology consistent with the 2021 World Health Organization (WHO) guidelines on histopathology and immunohistochemistry [2].Genetic characteristics were detected by a one-step method (multiplex PCR amplification combined with next-generation sequencing [NGS]) [3,4].NGS has the characteristics of being unbiased and having wide ) were included (Supplementary Figure S1) [7][8][9][10].The cMRI scans were bias corrected using the N4ITK MRT Bias correction module in 3D-Slicer [11].Diffusion parameter maps were not bias corrected or intensity normalized because diffusion parameters are a quantitative measure [12] and may maintain a relatively consistent trend without bias correction [13].Coregistration of cMRI and diffusion parameter maps was performed using the General Registration module (based on ANTs) in 3D-Slicer.

IV. Region of Interest Segmentation
Semiautomatic selection of regions of interest (ROIs) was performed by two radiologists (Peng Wang and Zhiyue Hao, with 3 and 2 years of neuroimaging experience, respectively) using 3D Slicer under the supervision of another senior physician (Yang Gao, with 27 years of neuroimaging experience).The three radiologists were aware of the tumor diagnosis but were blinded to the clinical and pathological details.The ROIs were outlined on B0 maps with reference to the cMRI.The area of tumor accumulation was selected as the ROI.The areas of the solid tumor and peritumoral edema were usually radiologically described as the area surrounded by abnormal/high signals on T2-FLAIR.Areas located around edema that were suspected to be invaded, such as those with slightly high signal or other abnormal signal patterns found on T2WI, were usually also included in the ROI.
Finally, the mask generated by the 3D-ROIs was assigned to each parameter map for feature extraction.
To ensure the repeatability of manual segmentation and feature extraction, the above steps were repeated after two weeks.For the extracted radiomics features, intraclass correlation coefficients (ICCs) were used to assess intra-and interobserver agreement.In this study, the ICC values for the radiological features extracted all reached 0.60 or higher.We did not set a higher threshold because poor reproducibility of radiomic features does not necessarily translate into poor disease discrimination.That is, the specific values of the features may have changed significantly, but their relative order may not have changed [14].were blinded to the clinical and pathological details.In addition, judgments were broken down into independent tasks, evaluating one well-defined trait at a time while blinded to the other evaluations.When there was disagreement, a senior physician (Qiong Wu, with 15 years of neuroimaging experience) was consulted, and a final decision was then made.This work was completed within two weeks.
Both the solid tumor and intratumor edema were evaluated using the imaging features.The solid tumor components evaluated included the presence of necrosis, cystic regions, calcification, and hemorrhage, as well as the tumor's enhancement pattern, location, and side and clarity of the solid tumor boundary [17].The minimum length (from the solid tumor to the adjacent white matter) was evaluated in the peritumoral edema region.The classification standards applied were as follows: • Necrosis: An area of the tumor body, either patchy or irregular in shape, with a signal intensity similar to but still different from that of cerebrospinal fluid (CSF), with an enhanced edge around the necrosis area.
• Cyst or cysts: Unlike necrosis, the signal was equivalent to the CSF signal, and marginal enhancement was not significant or absent.
• Calcification and hemorrhage: Determined on the basis of T1-weighted imaging, T2-weighted imaging, computed tomography (CT) and susceptibility weighted imaging (SWI) together (additional CT or SWI scans were performed in approximately 65% of the study subjects, a condition that would increase the uncertainty of the results).
• Tumor enhancement patterns: According to the shape and degree of enhancement, the contrast agent patterns were divided into patchy (enhancement degree: slight), annular or central (enhancement degree: obvious), and no enhancement.
• Location and side of tumor: The determination was made according to the central location of the tumor.In the case of multiple lesions, the center of the lesion with the largest volume was analyzed.
• Boundary clarity: On T2-weighted imaging, if more than 50% of the tumor margin area could not be effectively distinguished, the boundary was judged to be blurred.
Edema extent: On T2-weighted imaging, the edema extent was determined in the largest aspect of the tumor entity.When the solid boundary of the tumor was clear (e.g., the tumor showed annular enhancement at the edge), the area of T2 hyperintensity around the tumor was considered edema.When the solid tumor and edema could not be distinguished, T2 hyperintense areas that were closer to adjacent brain tissue and showed hyperintensity on apparent diffusion coefficient (ADC) maps (areas of cytotoxic edema may be more characterized by unrestricted water molecules than solid tumors) were defined as edema.Finally, the minimum extent of edema (perpendicular to the edge of the tumor) was recorded, and whether it exceeded 1.5 cm was determined.The minimum extent was chosen because it was likely to be more representative of the invasion of tumor cells, and 1.5 cm was derived from our clinical work experience.S2 and S3).
• PCA can map potentially correlated high-dimensional features to linearly uncorrelated low-dimensional features and the mapped low-dimensional data.Each feature of the mapped low-dimensional data is linearly independent.
• The PCCs are calculated two by two, and all features are traversed.When the coefficient was greater than a certain threshold (0.85 in this study), one of them was removed randomly, avoiding high similarity with reduced-dimensional features.
After feature dimensionality reduction, four feature selection methods were selected, including analysis of variance (ANOVA), recursive feature elimination (RFE), Kruskal-Wallis (KW) and relief.
• ANOVA: Through statistical analysis of multiple variables, the weight of the F value was calculated.After sorting the values from the largest to the smallest, the most relevant features for the model were determined.
• RFE: The main idea of recursive feature elimination is to repeatedly build a model (such as a support vector machine [SVM] or regression model), pick the best (or worst) features (which can be selected based on coefficients), set the selected features aside, and then repeat the process on the remaining features until all the features have been traversed.The order in which features are eliminated in this process is the ranking of features.
• KW: Kruskal-Wallis is a nonparametric test of three or more sets of data.It is used to test the original hypothesis for consistency of the overall functional distribution and its alternative hypothesis, the hypothesis that there is a difference between at least two samples.
• Relief: The correlation between features and categories is based on the ability of features to distinguish close samples.

X. Sample Size and Power Calculations
The study consecutively enrolled approximately 70 subjects.Regarding AUC (H1), 70 subjects were needed for the study to detect an AUC of 0.80 for ADGGIP at ~85% power and alpha = 0.05 (two-sided).The sample size calculation was based on the following assumptions: 1) the AUC in the null hypothesis was 0.60, and 2) the positive cases accounted for 66% of the population.
The sample size and power calculations were performed in PASS 2021 software.

Figure S7. Prediction profiles of ADGGIP
All individual participants were identified as 'LGG' or 'HGG' by ADGGIP and assigned into four groups according to their true pathological response in all datasets.Using the optimal risk threshold, the sensitivity and specificity were determined.

Figure S1.
Figure S1.Four diffusion models and their parameter diagrams Figure S1.Four diffusion models and their parameter diagrams

Figure S2 .
Figure S2.Heatmaps of feature correlation Figure S3.Heatmaps of feature contributions Figure S4.Flow chart of nine prediction models

Figure S5 .
Figure S5.Importance (a) of multimodal variables and performance (b) of the machine learning models that ADGGIP was based on Figure S6.Two radiomics signatures obtained from ADGGIP Figure S7.Prediction profiles of ADGGIP Figure S8.Decision curve analysis (a) and calibration curves (b) of single-modality predictionmodels in the training cohorts Part III.Supplementary Tables coverage, high sensitivity and high speed.The pathologist (Lixin Weng) who analyzed the images had 23 years of work experience and was blinded to the clinical information and imaging results.The subjectivity of pathologists may result in misclassification of histological grades, but the 2021 WHO Classification of Tumors of the Central Nervous System makes the actual grading more dependent on molecular typing, effectively reducing the incidence of misclassification.NGS steps: A Maxwell® RSC FFPE Plus DNA Purification Kit (Promega AX4920) was used to extract DNA from formalin-fixed paraffin-embedded (FFPE) tissues.DNA was quantified using the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific; Q32851).A six-item glioma detection kit (Genetron Health Technology, Co., Ltd.) was used to build the DNA library, Agencourt AMPure XP (Beckman Coulter; A63880) magnetic beads were used to purify the library, and an Agilent TapeStation 2200 quantitative kit and Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific; Q32851)were used for fragment quality control and quantification.Sequencing was performed using the Ion Torrent next-generation high-throughput sequencing platform.Template preparation and the on-machine sequencing process were performed according to the instructions provided with the general sequencing reaction kit (Genetron Health Technology, Co., Ltd.).The fully automated sample addition system GENETRON Chef produced by Genetron Health Technology, Co., Ltd. was used for template preparation, and the gene sequencer GENETRON S5 produced by Genetron Health Technology, Co., Ltd. was used to sequence the samples.After sequencing was completed, built-in software BaseCaller (GENETRON S5: v5.8.10) provided with the sequencer was used for base identification and base information statistics, and the built-in software TMAP (GENETRON S5: v5.8.10) was used to complete the sequencing results and align the resulting sequences to the human reference genome hg19 (GRCh37).Next, the built-in plug-in of the sequencervariantCaller (GENETRON S5: v5.8.0.19) and the ghall plug-in developed by Genetron Health Technology, Co., Ltd. were used to perform mutation analysis and annotate theoffline data as well as generate automated reports.III.Preprocessing Diffusion parameters were calculated using NeuDiLab, a software developed inhouse with Python based on the open-source tool DIPY (Diffusion Imaging in Python, https://dipy.org)[5].The software is equipped with FSL-based brain extraction, eddy current and head motion correction, and smoothing functions [6].Final quantitative parametric maps for 25 features from the 4 diffusion models and B0 maps were obtained.Three advanced models (diffusional kurtosis imaging [DKI], neurite orientation dispersion and density imaging [NODDI] and mean apparent propagation diffusion magnetic resonance imaging [MAP-MRI]) and 1 simple model (diffusion tensor imaging [DTI]

IX.
Modeling and Validation Strategy We analyzed ten machine learning classifiers to determine the best model.Based on the clinical interpretability of the output, we classified these classifiers into two categories: linear (logistic regression [LR], logistic regression via least absolute shrinkage and selection operator [LR-Lasso], linear discriminant analysis [LDA], and support vector machine [SVM]) and nonlinear (autoencoder [AE], decision trees [DT], random forest [RF], ada-boost [Ab], Gaussian process [GP] and native Bayes [NB]).Explanations of each model can be found on the official website of scikitlearn (https://scikit-learn.org/stable/).Internal and external validations were carried out.Internal validation was performed using leave-one-out cross-validation.Each time, only one sample was left for testing, and the other samples were used for construction.For K samples, the building and learning process needs to be repeated K times and tested K times.This process achieves the highest sample utilization rate, and the result is the closest to the expected value from training the whole test set.The model was then reconstructed if the internal validation set did not yield satisfactory results after the number of predictors was increased or decreased.That is, the number of features selection procedure was performed inside the cross-validation.Prospective validation was performed using a time-independent test set.The diagnostic performance of the model was evaluated with ROC curves and Brier scores (calculated as the mean square error of the probability prediction relative to the test sample, ranging from 0 to 1, with higher scores indicating worse prediction results and worse calibration) in the training set and the internal validation set, while the performance in the prospective validation set was unknown.The above work was repeated until all the pipelines were run.Finally, the established model was tested in the prospective validation set.The selection of the comprehensive model weighed the diagnostic ability and stability of the models.

Figure S3 .
Figure S3.Heatmaps of feature contributions Principal component coefficients among individual features and principal components for DKI (a), NODDI (b) and MAP-MRI (c) were calculated by principal component analysis.Overfitting was prevented by the use of principal component analysis, which decreased the amount of features.DKI = diffusion kurtosis imaging, NODDI = neurite orientation dispersion and density imaging, MAP-MRI = mean apparent propagation diffusion magnetic resonance imaging, RFE = recursive feature elimination, GLCM = gray level cooccurrence matrix, GLRLM = gray level run length matrix, GLSZM = gray level size zone matrix, GLDM = gray level dependence matrix, NGTDM = neighborhood gray-tone difference matrix.

Figure S4 .
Figure S4.Flow chart of nine prediction models Note: The DTI model was the single-modality model with the highest diagnostic performance in the prospective validation cohort, and rMRI was the more theoretically relevant prediction model incorporating multiple diffusion features.B0 = diffusion b0 parameter diagram, DTI = diffusion tensor imaging, DKI = diffusion kurtosis imaging, NODDI = neurite orientation dispersion and density imaging, MAP-MRI = mean apparent propagation diffusion magnetic resonance imaging.cMRI = conventional MRI, ClinicRad = model incorporating clinical factors and interpretations from radiologists, rMRI = radiomics MRI, CliRadDTI = model incorporating clinical factors, radiologist interpretations and DTI data, ADGGIP = Adult-type Diffuse Gliomas Grade Integrated Prediction model

Table S5 .
Delong test for ROC curve improvements in multiple cohorts

Table S6 .
IDI test for prediction improvements in multiple cohorts

Table S7 .
NRI test for prediction improvements in multiple cohorts

Table S8 .
Brier score for ADGGIP in achieving prediction improvements compared with other models in the training cohort VII.Balancing Data and Normalization/Standardization FAE provides random upsampling, random downsampling, and SMOTE methods to achieve data balancing (the data balancing operation is performed only for the training set, the test set data are not processed, and the original training set data are stored).We chose the random upsampling method to balance the training set.
VIII.Feature Dimensionality Reduction and SelectionBecause the number of features is much larger than the number of samples in some cases, the data were processed by reducing the number of features to subsequently perform efficient feature selection and observing model building.Both principal component analysis (PCA) and Pearson correlation coefficients (PCCs) were used to reduce the number of features (Supplementary Figures

Table S2 .
Radiomic features extracted by FeAture Explorer For the original image, 7 feature types were extracted, and a total of 107 features were obtained.
a : The total number of features in a distinct group.b : Included morphological features, first-order histogram features and second-order features.

Table S3 .
Adult-type diffuse gliomas classification proportions among cohorts Data are the mean ± SD or n/N (%), where N is the total number of study participants with available data.P values were calculated with the chi-square test, Fisher's test, Student's t test and Mann-Whitney U test.Clinical data (age and sex) were obtained from the medical record system or in person.CNS WHO Grade: Central Nervous System World Health Organization Grade.IDH = isocitrate dehydrogenase, 1p/19q = synchronous deletion of the short arm of chromosome 1 andlong arm of chromosome 19, LGG = lowgrade glioma, HGG = high-grade glioma.
a :

Table S5 .
Delong test for ROC curve improvements in multiple cohorts incorporating clinical factors and interpretations from radiologists, CliRadDTI = model incorporating clinical factors, radiologist interpretations and DTI data, ADGGIP = Adult-type Diffuse Gliomas Grade Integrated Prediction model.Eur Radiol (2023) Wang P, Xie S, Wu Q et al.

Table S6 .
IDI test for prediction improvements in multiple cohorts Eur Radiol (2023) Wang P, Xie S, Wu Q et al.

Table S7 .
NRI test for prediction improvements in multiple cohorts

Table S8 .
Brier score for ADGGIP in achieving prediction improvements compared with other models in the training cohort