Identifying key factors for predicting O6-Methylguanine-DNA methyltransferase status in adult patients with diffuse glioma: a multimodal analysis of demographics, radiomics, and MRI by variable Vision Transformer

Purpose This study aimed to perform multimodal analysis by vision transformer (vViT) in predicting O6-methylguanine-DNA methyl transferase (MGMT) promoter status among adult patients with diffuse glioma using demographics (sex and age), radiomic features, and MRI. Methods The training and test datasets contained 122 patients with 1,570 images and 30 patients with 484 images, respectively. The radiomic features were extracted from enhancing tumors (ET), necrotic tumor cores (NCR), and the peritumoral edematous/infiltrated tissues (ED) using contrast-enhanced T1-weighted images (CE-T1WI) and T2-weighted images (T2WI). The vViT had 9 sectors; 1 demographic sector, 6 radiomic sectors (CE-T1WI ET, CE-T1WI NCR, CE-T1WI ED, T2WI ET, T2WI NCR, and T2WI ED), 2 image sectors (CE-T1WI, and T2WI). Accuracy and area under the curve of receiver-operating characteristics (AUC-ROC) were calculated for the test dataset. The performance of vViT was compared with AlexNet, GoogleNet, VGG16, and ResNet by McNemar and Delong test. Permutation importance (PI) analysis with the Mann–Whitney U test was performed. Results The accuracy was 0.833 (95% confidence interval [95%CI]: 0.714–0.877) and the area under the curve of receiver-operating characteristics was 0.840 (0.650–0.995) in the patient-based analysis. The vViT had higher accuracy than VGG16 and ResNet, and had higher AUC-ROC than GoogleNet (p<0.05). The ED radiomic features extracted from the T2-weighted image demonstrated the highest importance (PI=0.239, 95%CI: 0.237–0.240) among all other sectors (p<0.0001). Conclusion The vViT is a competent deep learning model in predicting MGMT status. The ED radiomic features of the T2-weighted image demonstrated the most dominant contribution. Supplementary Information The online version contains supplementary material available at 10.1007/s00234-024-03329-8.


Introduction
Glioma is one of the most common primary tumors of the central nervous system (CNS) [1,2].The 2021 World Health Organization (WHO) CNS tumors classification recommends performing CNS tumor grading by adding molecular parameters to histological features [3,4] because certain molecular markers can provide prognostic information [3].Over the past decade, the methylation status of O6-methylguanine-DNA methyl transferase (MGMT) promoter is associated with an overall survival rate as well as their diagnostic value [5][6][7].When patients with glioblastoma received chemotherapy, patients with methylation of MGMT promoter results in longer survival compared to patients with unmethylated MGMT promoter [8,9].MGMT promoter methylation was strongly associated with a superior progression-free rate and survival rate at 12 months [10].A biopsy and histological examination need to be performed to determine MGMT promoter methylation.The brain tumor property is evaluated by radiological imaging when a biopsy cannot be performed due to reasons, such as tumor size, tumor location, patient comorbidity, and patient condition [10,11].Recent studies aimed to predict MGMT promoter methylation by analyzing radiomic features or images using machine learning algorithms, including a convolutional neural network (CNN) [12][13][14][15][16][17][18][19][20][21].However, the most dominant factors among patient characteristics, radiomic features, and magnetic resonance imaging (MRI) for predicting MGMT promoter methylation remain unclear.This is partly because machine and deep learning have a limitation in simultaneously analyzing these factors in one model [22].Identifying the dominant factor to predict MGMT promoter methylation is important in the context of the global development of multimodal artificial intelligence solutions [23,24].
A previous study has proposed a Vision Transformer (ViT)-inspired model, named variable ViT (vViT) that analyzes multiple sequences of different lengths [22,[25][26][27].The vViT simultaneously handles multimodal factors (patient characteristics, radiomic features, and MRI), calculating prediction accuracy for each factor and then integrating them into the overall performance.One strength of vViT is its ability to quantitatively evaluate, or identify, the most dominant factor by calculating the prediction accuracy for each factor in a single model [27].This strength is attributed to that the vViT analyzes input factors separately.However, limited studies have applied vViT to predicting MGMT promoter methylation among adult patients with diffuse gliomas.This study aimed to investigate the performance of vViT and identify the dominant factor among patient characteristics, radiomic features, and MRI using vViT in predicting MGMT promoter methylation among adult patients with diffuse glioma.

Data collection
This cross-sectional study obtained all data from the University of California San Francisco Preoperative Diffuse Glioma MRI (UCSF-PDGM) dataset following the Cancer Image Archive data usage policy and restrictions [28,29].The UCSF institutional review board approved data collection, with a waiver for consent, and retrospectively performed this study following the relevant guidelines and regulations.The UCSF-PDGM dataset consisted of 501 adult patients with histopathologically confirmed diffuse glioma (following the 2021 WHO classification) who underwent preoperative MRI, initial tumor resection, and tumor genetic testing at a single medical center from 2015 to 2021 [4].The MGMT promoter methylation of tumors was tested by immunohistochemical staining, Sanger, or next-generation genetic sequencing in the UCSF-PDGM dataset [28].

Radiomic feature extraction and image processing
Image data in the UCSF-PDGM dataset first underwent automated segmentation using an ensemble model consisting of brain tumor segmentation challenge algorithms [28,30].The segmentation was finally approved by a board-certified neuro-radiologist with more than 15 years of experience [28,30].Segmentation included three major tumor compartments: enhancing tumor (ET), necrotic tumor core (NCR), and the peritumoral edematous/infiltrated tissue (ED).We extracted 105 radiomic features (Appendix 1) from each compartment of contrast material-enhanced T1-weighted images (CE-T1WI) and T2-weighted images (T2WI) using the PyRadiomics package [31].We excluded images in which the annotation image contained fewer than 256 pixels from the extraction of radiomic features.After extracting the radiomic features, the image was cropped to the minimum rectangle that contained the tumor.The cropped image was expanded to a 128 × 128 image using the Python Pillow package with the LANCZOS option.

Datasets construction and vViT setting
The construction of the model excluded patients (i) with no age or sex information, (ii) with unexplained tumor MGMT promoter methylation, (iii) with radiomic features of ET, NCR, or ED that could not be extracted from CE-T1WI and T2WI images, and (iv) with images far from the standard deviation of the mean number of images.Additionally, (v) random selection was conducted to equalize the number of MGMT-methylated images to that of MGMT-unmethylated images.The criterion (v) was imposed to perform imagebased analysis in the balanced setting and to avoid overestimation of performance.After imposing the UCSF-PDGM dataset criteria, 152 patients with 2,054 images (1,027 MGMT-methylated, 1,027 MGMT unmethylated) remained (Appendix 2).These images were randomly categorized into a training dataset (122 patients with 1,570 images [785 MGMT-methylated and 785 MGMT unmethylated]) and a test dataset (30 patients with 484 images [242 MGMTmethylated and 242 MGMT unmethylated]).All patients in the training and test datasets had grade 4 glioma.We performed an analysis of variance (ANOVA) to select the radiomic features associated with MGMT promoter methylation.The highest 64 radiomic features in F-value were selected in decreasing order from each ET, NCR, and ED of the training datasets.The selected features were shown in Appnedix 3 with F-score and p-value.
Figure 1 shows an architectural overview of the vViT constructed in the present study.The constructed vViT demonstrated nine sectors, including the class token sector: demographics, CE-T1WI ET, CE-T1WI NCR, CE-T1WI ED, T2WI ET, T2WI NCR, T2WI ED radiomic, CE-T1WI image, and T2WI image sectors.All data was converted to 1-dimensional arrays before inputting into vViT.The prediction of MGMT methylation from each sector can be individually derived.The performance of each sector can be calculated.The prediction of each sector was integrated by voting: among predictions from nine sectors, the majority prediction was regarded as the total model output.Pytorch version 1.7.1 was used to implement vViT as the deep learning framework.Binary cross entropy was optimized by the Adam optimizer (β1=0.9,β2=0.999,ε=1.0×10 -8 , weight-decay=0, and AMSGrad=False).The detailed explanation of vViT and terminologies including Binary cross entropy and Adam optimizer are written in Appendix 4.

Statistical analysis
Patient characteristics and calculated values are shown as means and 95% confidence interval (95%CI) or as numbers (n) and ratios (%).We calculated classification accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F-score, the area under the curve of receiver-operating characteristics (AUC-ROC), logarithmic loss, and Cohen's κ coefficient as metrics to evaluate the performance of vViT.We organized the output of vViT to the prediction of each patient (patient-based analysis) because vViT was implemented to calculate metrics for each image (image-based analysis).This organization used voting and mean for binary and continuous variables, respectively.The performance of CE-T1WI and T2WI image sectors in vViT was compared with AlexNet [32], GoogleNet [33], visual geometry group (VGG) 16 [34], and ResNet [35].These CNN models were tested after 1000 epochs training using each CE-T1WI and T2WI images.McNemar test and DeLong test were performed to compare contingency table and AUC-ROC, respectively.Permutation importance analysis was performed to evaluate the contribution from each sector to output.The following three calculation steps were performed to evaluate permutation importance.
(i) Permutation was performed in the respective sector following the previously reported method [36].This implementation assigned the data of a patient to another patient.(ii) After permutation, the accuracy was calculated using trained vViT.(iii) The difference between the original accuracy and the accuracy calculated using the permutated dataset was then saved.We defined the difference as permutation importance.
Procedures (i), (ii), and (iii) were repeated one hundred times for each sector. Figure 2 shows the procedures for calculating the permutation importance to the demographic sector as an example.We compared the difference in accuracy in each sector using the Mann-Whitney U test.
All analyses were performed using Python Language, version 3.8.2(Python Software Foundation at http:// www.python.org).Statistical significance was evaluated by 95%CI as indicated by a p-value of <0.05.

Results
Table 1 shows the characteristics of the patients in the training and test datasets.

Patient-based analysis
Accuracy, sensitivity, specificity, PPV, NPV, F-score, AUC-ROC, logarithmic loss, and Cohen's κ coefficient of the total model output for the test dataset were 0.833 (95%CI: 0.714-0.877),0.600 (95%CI: 0.395-0.749),0.950 (95%CI: 0.788-0.967),0.857 (95%CI: 0.535-0.927),0.826 (95%CI: 0.682-0.877),0.706 (95%CI: 0.543-0.869),0.840 (95%CI: 0.650-0.995),0.613 (95%CI: 0.561-0.665),and 0.595 (95%CI: 0.540-0.649),respectively.Figure 3c shows ROC Fig. 2 Procedures of permutation importance analysis.Demographics (age and sex) were permutated as an example.The following three steps were performed.(i) The original accuracy was calculated using the original dataset and trained variable vision transformer (vViT).(ii) Permutation was then performed, and accuracy was calculated using a permutated dataset and trained vViT.(iii) The difference between the original accuracy and the accuracy calculated using a permutated dataset was calculated.We defined the difference as the permutation importance.These processes were repeated 100 times by changing the permutation pattern. of the total model output.Table 2b shows the statistics of the total model output and each sector.sectors.Figure 3d shows the results of the permutation feature importance of each sector for the test dataset.Table 4 shows the results of the Mann-Whitney U test for each combination of sectors, wherein the cells colored by light gray in the lower triangle area represent the combination of sectors which achieved a p-value of <0.0001 by the Mann-Whitney U test.

Discussion
The The inconsistency may be caused by differences in study population characteristics and the classification method.Hence, the reproducibility of each method should be continuously confirmed.In a recent study, Xu et al. achieved 0.952 of accuracy by ViT using CE-T1WI and T2WI [38].
Although the performance was overestimated due to the imbalance of the dataset, the result indicates that the multimodal approach by Transformer is promising in predicting MGMT status.In the present study, we revealed that the multi-modality analysis reached higher performance than CNNs in predicting MGMT promoter methylation although vViT could not achieve state-of-the-art performance.The development of effective multimodal fusion approaches is becoming increasingly important to capture features of complex diseases [39].Predicting MGMT promoter methylation among adult patients with diffuse glioma is not an exception.The dominance of radiomic features or MRI itself in predicting MGMT promoter methylation remains controversial.The radiomic features tend to be analyzed using a machine learning algorithm [12,[14][15][16], while MRI tends to be analyzed using CNN [17][18][19][20][21].The present study compared the permutation importance of radiomic features and MRI by vViT.By evaluating the importance of each sector, our vViT overcomes a difficulty of CNN: convoluting the input values makes it difficult to estimate the importance of input factors in making predictions although conventional CNN can handle multiple factors.We revealed that the radiomic features of ED had the highest importance in both image-based and patient-based analysis.A previous study indicated that the heterogeneity of edema region may have key information on MGMT promoter methylation [2].As Yang et al. mentioned in this study, radiomics may be a promising technique to evaluate the heterogeneity in the edema region.Other Abbreviations: CE-T1WI contrast-enhanced T1WI-weighted image, T2WI T2WI-weighted image, ET enhancing tumor, NCR necrotic tumor core, ED peritumoral edematous/infiltrated tissue studies insisted that the edema region represents the aggressive degree of glioma [40,41].The result of the present study reconfirmed these previous studies.
The present study has some limitations.First, we used the retrospectively collected dataset in a single center.The generalizability of the present study should be validated using another dataset.If there are errors such as data duplication or misidentification, the errors may not be identified.Of course, there is another open dataset that collected data from patients with glioma [42].However, predicting MGMT promoter methylation is a challenging task and there remains inconsistency in performance.This inconsistency can be explained by differences among datasets or methods of image preprocessing as well as the performance of the deep learning model.Another dataset collected under different criteria and imaging conditions may be inappropriate for validation [43].It cannot be stated that vViT is applicable in a clinical setting based on the results of this study alone.A comparison of vViT with radiologists or diagnostic improvement by radiologists using vViT should be examined in a future study.Second, the patient-based analysis revealed inequality between the numbers of patients.The training and test datasets were constructed to have the same number of images to input as many as possible number of images into the vViT.By this equalization, the overestimation of performance was avoided.This development led to inequality and selection bias, with an overestimated performance of the patient-based analysis.However, the performance of imagebased analysis was partly comparable with previously reported MGMT prediction.This effect may be limited.Third, the radiomic and image sectors demonstrated an imbalance.The vViT had six radiomic and two image sectors in predicting MGMT promoter methylation.The imbalance between the number of radiomic and image sectors in vViT may make findings of radiomic features dominant.Changing the implementation setting in split-sequence and linear projection may be a solution.When the parameters of vViT and the number of radiomics are changed, the performance can be improved.As far as we investigated, the best performance was obtained when 64 radiomic features were used.Fourth, the interpretability of each demographic and radiomic feature was insufficient.We were not able to determine the contribution of each radiomic feature included in the T2WI ED radiomic sector.This point leads to difficulty in clarifying how the edema region was evaluated by radiomics.However, we can speculate the dominant radiomic feature by the F-value shown in Appendix 3. In addition to this, the biological meaning of each feature can be checked by a document [31].
In conclusion, vViT can be a competent model for predicting MGMT promoter methylation among adult patients with diffuse glioma compared with conventional CNN models.The input factors can be ranked by combining vViT with permutation feature importance.The most dominant factor among demographics, radiomic features, and MRI in predicting MGMT promoter methylation was the radiomic features derived from the edema region in T2WI for both image-and patient-based analysis.The radiomic features derived from CE-T1WI and T2WI had statistically higher importance than CE-T1WI and T2WI itself in predicting MGMT promoter methylation.The present study demonstrates that radiomic features have higher permutation importance in predicting MGMT promoter methylation compared with MRI itself.

Fig. 3
Fig.3The receiver operating characteristic curve and results of permutation importance analysis for the test dataset.The curve of receiver-operating characteristics (ROC) of image-based analysis and patient-based analysis of the test dataset is shown in (a) and (c), respectively.The gray zone in each figure represents the 95% confidence interval (95%CI).The box plots of the difference between the original accuracy and accuracy calculated using the permutated dataset to respective sectors for image-base analysis and patient-base

Table 1
Patient characteristics a World Health Organization Classification of Central Nervous System Tumors, 5th edition

Table 4
The results of the Mann-Whitney U test for the difference between original accuracy and accuracy were calculated using a permutated dataset.The cells colored by dark and light gray areas present the combination of sectors which gave p< 0.0001 by the Mann-