Abstract
Objective
To assess the effectiveness of the vViT model for predicting postoperative renal function decline by leveraging clinical data, medical images, and image-derived features; and to identify the most dominant factor influencing this prediction.
Materials and Methods
We developed two models, eGFR10 and eGFR20, to identify patients with a postoperative reduction in eGFR of more than 10 and more than 20, respectively, among renal cell carcinoma patients. The eGFR10 model was trained on 75 patients and tested on 27, while the eGFR20 model was trained on 77 patients and tested on 24. The vViT model inputs included class token, patient characteristics (age, sex, BMI), comorbidities (peripheral vascular disease, diabetes, liver disease), habits (smoking, alcohol), surgical details (ischemia time, blood loss, type and procedure of surgery, approach, operative time), radiomics, and tumor and kidney imaging. We used permutation feature importance to evaluate each sector's contribution. The performance of vViT was compared with CNN models, including VGG16, ResNet50, and DenseNet121, using McNemar and DeLong tests.
Results
The eGFR10 model achieved an accuracy of 0.741 and an AUC-ROC of 0.692, while the eGFR20 model attained an accuracy of 0.792 and an AUC-ROC of 0.812. The surgical and radiomics sectors were the most influential in both models. The vViT had higher accuracy and AUC-ROC than VGG16 and ResNet50, and higher AUC-ROC than DenseNet121 (p < 0.05). Specifically, the vViT did not have a statistically different AUC-ROC compared to VGG16 (p = 1.0) and ResNet50 (p = 0.7) but had a statistically different AUC-ROC compared to DenseNet121 (p = 0.87) for the eGFR10 model. For the eGFR20 model, the vViT did not have a statistically different AUC-ROC compared to VGG16 (p = 0.72), ResNet50 (p = 0.88), and DenseNet121 (p = 0.64).
Conclusion
The vViT model, a transformer-based approach for multimodal data, shows promise for preoperative CT-based prediction of eGFR status in patients with renal cell carcinoma.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Deterioration in renal function is one of the most common complications following partial nephrectomy (PN) or radical nephrectomy (RN). Preoperative estimation of postoperative renal function plays an important role in decision-making for patients with renal tumors [1,2,3]. This estimation provides clinicians with important insights when evaluating whether surgical treatment is beneficial for a patient [4]. Multiple prediction models have been proposed to estimate the impact of surgical treatment on renal function. Shimada et al. proposed a formula to accurately predict eGFR at 1 year after RN in each sex using a stepwise regression model [5]. They identified preoperative eGFR and age as significant predictors of deteriorating renal function after RN in both males and females, whereas tumor size and body mass index (BMI) were significant predictors only in males [5]. These results indicate that prediction of postoperative eGFR should include take into account various patient characteristics in addition to image-derived features such as tumor size. Deep learning is another promising technique for estimating postoperative eGFR. Kuo et al. proposed a deep learning approach for automatically determining estimated glomerular filtration rate (eGFR) and chronic kidney disease (CKD) status using ultrasound images. The overall classification accuracy for CKD status using their proposed model was 85.6%, which is higher than that of experienced nephrologists (60.3%–80.1%) [6]. These results suggest that medical images contain information relevant to eGFR, but there is limited research regarding which clinical information, medical images, and image-derived features can be simultaneously analyzed to predict deterioration in renal function. When only the patient characteristics are used as input, the image-derived features are not considered. Conversely, when the input relies solely on images, important patient characteristics are not incorporated in the prediction of renal impairment. Therefore, an accurate preoperative estimation of postoperative renal function remains an unmet clinical need.
To overcome these challenges, and inspired by ViT, variable Vision Transformer (vViT) was proposed as a model capable of analyzing multiple sequences of different dimensions [7,8,9, 22, 23]. Originally introduced for natural language tasks, transformer architectures have been shown to be ideally suited for combining imaging and non-imaging data [10]. These architectures feature a self-attention mechanism to model long-range dependencies, which is better than conventional models at understanding characteristic features of multimodal data such as contextual information and medical images.
The vViT can separately analyze input factors such as clinical information, medical images, and image-derived features. Combining vViT with feature importance analysis enables determination of the dominant factor among numerous input factors; i.e., for identifying the dominant factor in predicting deterioration in renal function among clinical information, medical images, and image-derived features.
The aim of this study is to investigate the performance of vViT in predicting deterioration in postoperative renal function by utilizing clinical information, medical images, and image-derived features. Additionally, the study seeks to identify the most dominant factor contributing to prediction of deterioration in postoperative renal function.
Material and Methods
Data collection and Selection Criteria
Data Collection
All data used in this cross-sectional study were obtained from the training set of the 2019 Kidney and Kidney Tumor Segmentation Challenge (C4KC-KiTS), following The Cancer Image Archive data usage policy and restrictions [11]. We included all available 210 patients with demographics, clinical data, and contrast-enhanced CT (Computed Tomography) images from the datasets, the details of which are described elsewhere [11]. The retrospective data collections were approved by the institutional review board of the University of Minnesota [11].
Selection Criteria
The following exclusion criteria were applied in construction of the model: (i) incomplete data for any of characteristics (age, sex, and BMI), disease history (peripheral vascular disease history, diabetes mellitus history, and liver disease history), habit information (smoking history, and alcohol intake), and surgical information (ischemia time, estimated blood loss, surgical type, surgical procedure, surgical approach, and operative time); (ii) incomplete data for any preoperative or postoperative eGFR; (iii) radiomic features could not be extracted from the arterial phase contrast-enhanced CT images and annotation image of a tumor contained fewer than 256 pixels; and (iv) number of images beyond the mean ± standard deviation. In addition to these criteria, random selection (v) was performed to equalize the number of images classified in each category.
Model Architecture
Figure 1a shows an overview of the vViT model architecture. The vViT receives a one-dimensional input sequence that undergoes a splitting procedure based on split-sequence. After division, the smaller split sequences are input into the Transformer encoder. The Transformer encoder comprises alternating layers of multiheaded self-attention and multilayer perceptron (MLP) blocks. Layer norm is applied before every block, and residual connections are established after each block. Our implementation of the MLP involved two layers with a Gaussian Error Linear Unit (GELU) non-linearity. Specifically, the patch dimension, head dimension, number of heads, MLP dimension, and depth were set to 32, 64, 2, 64, and 8, respectively. The classification head is realized through an MLP with four hidden layers and the SoftMax function. Each component of the Transformer and the classification head is termed a sector. Consequently, vViT can accommodate an arbitrary number of sectors that receive sequences of varying lengths. We added a class token to the input. The probability tensor, containing output from each sector, was integrated by voting into the total model output. In this integration, the respective probabilities of eGFR deterioration in all sectors were averaged. The total model output was determined using these averaged probabilities. In vViT, the output of each sector is derived from the probability tensor. Each sector was named according to the property of its input, such as class token sector, demographics sector, comorbidity sector, habit sector, radiomic sector, tumor image sector, or kidney and tumor sector. Pytorch version 1.7.1 was used to implement vViT as the deep learning framework. Binary cross-entropy was optimized by the Adam optimizer (β1 = 0.9, β2 = 0.999, ε = 1.0 × 10–8, weight decay = 0, AMSGrad = False). The number of epochs was set to 200 for each model, and the parameters with the highest accuracy were saved.
Radiomic Feature Extraction and Image Preprocessing
We extracted 105 radiomic features (Supplement 1) using the PyRadiomics package [12], and then derived two image types: tumor, and kidney with tumor. Each tumor was cropped by the minimum rectangles that contained the tumor. Similarly, kidney with tumor was cropped by the minimum rectangles that contained the kidney and tumor. The cropped image was expanded to a 128 × 128 image by the Python Pillow package with the LANCZOS option.
Constructed Models
We constructed two models: vViT to predict patients whose eGFR decreased over 10 mL/min/1.73 m2 after surgical treatment (eGFR10 model), and vViT to predict patients whose eGFR decreased over 20 mL/min/1.73 m2 after surgical treatment (eGFR20 model). We imposed the same criteria on both models as described in the selection criteria. In patients who underwent RN, there was no ischemia time for the residual kidney, and ischemia time was therefore considered and calculated as 0.
eGFR10 Model
The eGFR10 model was designed to classify patients whose eGFR reduction was over 10 (ΔeGFR > 10) and other patients (ΔeGFR ≤ 10) across the following eight sectors: class token sector, characteristics sector (age, sex, and BMI), comorbidity sector (peripheral vascular disease, diabetes mellitus, and liver disease), habit sector (smoking history and alcohol intake), surgical sector (ischemia time, estimated blood loss, surgical type, surgical procedure, surgical approach, and operative time), radiomic sector, tumor image sector, and kidney and tumor sector. Figure 1b shows the model summary constructed in this paper. Thirteen patients were excluded based on criteria (i) (due to missing ischemia time: 10, estimated blood loss: 1, and operative time: 2). Eighty-five patients were excluded by criteria (ii) (due to missing preoperative eGFR: 54 and postoperative eGFR: 31). Three patients were excluded by criterion (iii) due to insufficient pixel count, and one after applying criterion (iv), leaving 109 patients. Following random selection (v), seven patients were removed, resulting in a final total of 102 patients with 1,682 images (ΔeGFR > 10, n = 841; ΔeGFR ≤ 10, n = 841). Following the exclusion criteria, we divided the dataset into training and testing groups on a per-patient basis, ensuring each set respects the correlation of all images within individual cases. These patients were divided into a training dataset (75 patients with 1,340 images [ΔeGFR > 10, n = 670; ΔeGFR ≤ 10, n = 670]) and a test dataset (27 patients with 342 images [ΔeGFR > 10, n = 171; ΔeGFR ≤ 10, n = 171]). The selection processes are shown in Supplementary Fig. 1a. An analysis of variance was conducted to identify the radiomic features associated with eGFR decrease. The top 32 radiomic features by F-value were selected in descending order from the training dataset (Supplementary Table 1a). Figure 2 shows the radiomic feature selection process.
eGFR20 Model
The eGFR20 model was designed to classify patients whose eGFR reduction was over 20 (ΔeGFR > 20), and other patients (ΔeGFR ≤ 20) across the same seven sectors in the eGFR10 model. Thirteen patients were excluded based on criteria (i) (due to missing ischemia time: 10, estimated blood loss: 1, and operative time: 2). Eighty-five patients were excluded by criteria (ii) (due to missing preoperative eGFR: 54 and postoperative eGFR: 31). One patient was excluded by criterion (iii) due to insufficient pixel count, and two patients were excluded after applying criterion (iv), leaving 109 patients. Following random selection (v), one patient was removed, resulting in a final total of 101 patients with 1,014 images (507 ΔeGFR > 20, 507 ΔeGFR ≤ 20). These patients were divided into a training dataset (77 patients with 778 images [389 ΔeGFR > 20, 389 ΔeGFR ≤ 20]) and a test dataset (24 patients with 236 images [118 ΔeGFR > 20, 118 ΔeGFR ≤ 20]). As in the eGFR10 model case, we divided the dataset into training and testing groups on a per-patient basis, ensuring each set respects the correlation of all images within individual cases. The selection processes are shown in Supplementary Fig. 1b. An analysis of variance was conducted to select radiomic features associated with eGFR decrease. The top 32 radiomic features by F-value were selected in descending order from the training dataset (Supplementary Table 1b). The both eGFR10 and eGFR20 models had the same architecture.
Statistical Analysis
The characteristics of the patients and the calculated values are presented as the mean and 95% confidence interval (95%CI) or as the number (n) and ratio (%). The differences between the train and test datasets were tested using the Mann–Whitney U test for continuous variables and the chi-square test for categorical variables. As vViT was implemented to classify the patients into those in whom eGFR decreased and other patients, using images (image-based analysis), we organized the output of vViT to provide predictions for each patient (patient-based analysis). In this organizational approach, voting and mean were used for binary and continuous variables, respectively. We calculated the metrics of classification accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F-score, the area under the curve of receiver-operating characteristics (AUC-ROC), logarithmic loss, and kappa score for evaluating the performance of vViT. To evaluate the contribution of each sector to output, permutation feature importance was calculated by performing the following three calculation steps.
-
(i)
Permutation was performed to the respective sector according to the method reported previously [13]. By this implementation, data of a patient was assigned to another patient.
-
(ii)
After permutation was performed, accuracy was calculated using trained vViT.
-
(iii)
Then, the difference between the original accuracy and the accuracy calculated using the permutated dataset was saved.
Procedures (i), (ii), and (iii) were repeated 100 times for each sector. Figure 3 shows the procedures for calculating difference of accuracy, using the characteristics sector as an example. We compared the difference of accuracy in each sector using the Mann–Whitney U test. The performance of vViT model was compared with convolutional neural network (CNN) models; visual geometry group (VGG) 16 [19], Residual Networks (ResNet) 50 [20], and Dense Convolutional Network (DenseNet) 121 [21]. These models were tested after 1000 epochs training using each tumor and kidney and tumor images. We used the same training and test datasets as used for vViT. McNemar test and DeLong test were performed to compare contingency table and AUC-ROC, respectively. All analyses were performed using Python Language, version 3.8.2 (Python Software Foundation at http://www.python.org). Statistical significance was evaluated by 95%CI and p-value < 0.05.
Results
Table 1 shows the characteristics of patients in the eGFR10 and eGFR20 models.
eGFR10 Model
Accuracy, sensitivity, specificity, PPV, NPV, F-score, AUC-ROC, logarithmic loss, and kappa score for the test dataset were 0.741 (95%CI; 0.615–0.806), 0.538 (0.375–0.684), 0.929 (0.716–0.956), 0.875 (0.574–0.933), 0.684 (0.532–0.774), 0.667 (0.489–0.844), 0.692 (0.489–0.861), 0.555 (0.504–0.607), and 0.474 (0.392–0.555), respectively. Sectors ranked in descending order of importance of permutation feature are as follows: surgical (difference between the original accuracy and accuracy calculated using permuted dataset = 0.249, 95%CI; 0.248–0.251), radiomics (0.234, 0.233–0.236), comorbidity (0.113, 0.111–0.114), demographics (0.0858, 0.0841–0.0875), habit (0.0336, 0.0319–0.0352), kidney and tumor (–0.0211, –0.0228 to –0.0194), and tumor (–0.0250, –0.0267 to –0.0234) sectors. Figure 4a and b show the ROC and the permutation feature importance of each sector for the test dataset, respectively. When using tumor images, vViT did not have a statistically different contingency table from VGG16 (p-value = 0.34) and ResNet50 (1.0) and had a statistically different contingency table from DenseNet121 (0.0010). There was no statistical difference in AUC-ROC compared with VGG16 (1.0), ResNet50 (0.70), and DenseNet121 (0.87). When using tumor and kidney images, the vViT did not have a statistically different contingency table from VGG16 (0.057) and had a statistically different contingency table from ResNet50 (0.00030) and DenseNet121 (0.021). There was no statistical difference in AUC-ROC compared with VGG16 (0.73), ResNet50 (0.91), and DenseNet121 (0.81). Tables 2a and 3a list the statistical data for each sector and the results of the comparison with the CNN models, respectively. The results of the image-based analysis are shown in Supplement 2a, Supplementary Fig. 2a, b, Supplementary Tables 2a, and 3a.
eGFR20 Model
Accuracy, sensitivity, specificity, PPV, NPV, F-score, AUC-ROC, logarithmic loss, and kappa score for the test dataset were 0.792 (95% CI; 0.653–0.850), 0.875 (0.574–0.933), 0.750 (0.572–0.831), 0.636 (0.434–0.768), 0.923 (0.700–0.954), 0.737 (0.561–0.913), 0.812 (0.609–0.972), 1.29 (1.22–1.36), and 0.571 (0.492–0.650), respectively. Sectors ranked in descending order of importance of permutation feature are as follows: surgical (difference between the original accuracy and accuracy calculated using permuted dataset = 0.302, 95%CI; 0.299–0.305), radiomics (0.288, 0.285–0.291), kidney and tumor (–0.0513, –0.0541 to –0.0485), habit (–0.0664, –0.0692 to –0.0635), comorbidity (–0.0710, –0.0738 to –0.0682), demographics (–0.0741, –0.0769 to –0.0712), and tumor (–0.0757, –0.0785 to –0.0729) sectors. Figure 4c and d show the ROC and the permutation feature importance of each sector for the test dataset, respectively. When using tumor images, vViT did not have a statistically different contingency table from VGG16 (p-value = 0.18), ResNet50 (0.79), and DenseNet121 (0.58). There was no statistical difference in AUC-ROC compared with VGG16 (0.72), ResNet50 (0.88), and DenseNet121 (0.64). When using tumor and kidney images, vViT did not have a statistically different contingency table from VGG16 (p-value = 0.34), ResNet50 (0.092), and DenseNet121 (0.39). There was no statistical difference in AUC-ROC compared with VGG16 (0.34), ResNet50 (0.94), and DenseNet121 (0.11). Tables 2b and 2b list the statistical data for each sector and the results of the comparison with the CNN models, respectively. The results of the image-based analysis are shown in Supplement 2b, Supplementary Fig. 2c, d, Supplementary Tables 2a, and 3b.
Discussion
This study aims to evaluate the effectiveness of vViT in predicting postoperative renal function decline using clinical information, medical images, and derived image features. We used preoperative information to predict and tried to identify the primary factor influencing on the deterioration. To the best of our knowledge, this is the first application of vViT for multimodal analysis of postoperative renal function, particularly for identifying the dominant factor among demographics, radiomic features, and CT imaging. In the eGFR10 model test dataset, vViT showed accuracy of 0.741 and AUC-ROC of 0.692. For the eGFR20 model, accuracy was 0.792 and AUC-ROC was 0.812. Since the eGFR10 and eGFR20 models had the same architecture, the difference in performance just reflected the difficulty level of tasks. Although vViT achieved comparable performance with CNN models, vViT did not achieve state-of-the-art performance. Kuo et al. achieved higher accuracy (0.86) and AUC-ROC (0.90) in their study of CKD prediction using deep learning and ultrasound images [6]. Notably, their accuracy surpassed that of experienced nephrologists (0.603–0.801) [6]. Although ultrasound is convenient and non-invasive, its results are operator-dependent [14], whereas CT provides more consistent and reproducible outcomes. Direct comparison with the present study is challenging because they used a different definition of CKD (eGFR < 60 mL/min/1.73 m2).
The eGFR10 model's two most important sectors were the surgical (permutation feature importance = 0.249, 95%CI: 0.248–0.251) and radiomic (0.234, 0.233–0.236) sectors. Similarly, the eGFR20 model's two most important features were the surgical (0.302, 0.299–0.305) and radiomic (0.288, 0.285–0.291) sectors. Especially in clinical practice, we should consider surgical information (ischemia time, estimated blood loss, surgical type, surgical procedure, surgical approach, and operative time). A previous study showed that surgical factors such as ischemic time and type of surgery had an impact on predicting a decline in eGFR [15]. In cases where surgical intervention is complex or extensive, surgical factors may influence the prediction of future decline in kidney function. Radiomics analysis, in which quantitative features are extracted from medical images, provides comprehensive data on kidney and tumor characteristics [16]. Radiomic features provide detailed insight into the pathology of the kidney and associated tumors [16]. In a previous study, Shimada et al. identified preoperative eGFR and age as significant predictors of deteriorating renal function after RN in both males and females, whereas tumor size and BMI were significant predictors only in males [5]. The study of Ishiyama et al. showed that BMI and diabetes mellitus are associated with upstaging of kidney cancer [17]. Although the demographic sector had relatively less contribution than the surgical and radiomic sectors, they still played a role in prediction. In the present study, we input two types of images: tumor only and kidney and tumor. The properties of a kidney tumor should be discussed based on its association with normal tissue as well as the tumor itself. Radiomics cannot analyze the association between tumor and normal tissue because radiomic features are calculated to the region specified by a mask image. We included images of the kidney and tumor in addition to the tumor images. Our analyses revealed that CT images had a relatively lower contribution compared with surgical information and radiomic features. Combining CNN with vViT may be a solution. Another method that analyzes the association between tumor and normal tissue should be investigated. As a rationale behind the importance ranking, the sector with higher permutation importance had a higher contribution to the prediction. The permutation feature importance analysis assumes that random permutation of input data decreases the performance of a deep learning model. The development of effective multimodal fusion approaches is becoming increasingly important to capture features of complex diseases [18]. Predicting deterioration in postoperative renal function among adult patients is not an exception.
The performance of the vViT was compared to various CNN models using both the tumor, and kidney and tumor images. Our analyses revealed that, in the eGFR10 model, vViT had the potential to outperform ResNet50 and DenseNet. In addition to this, the non-inferiority of vViT compared with CNN models in AUC-ROC was proven for both eGFR10 and eGFR20 models.
The present study has some limitations. First, our study utilized a retrospective data collection method from a single center, which, after specific exclusions, might have introduced selection bias. We acknowledge the importance of external validation using diverse datasets to enhance the generalizability of our findings. However, the collection of additional datasets is challenging due to constraints. We recognize that our findings cannot be broadly generalized without further validation. Future research should focus on validating the generalizability of our results using additional datasets and exploring comparisons with radiologists or diagnostic improvements by radiologists using our model. Second, an inherent variability may be present due to potential differences in the timing of blood sampling among patients, which could have influenced the eGFR measurements. Following PN or RN, patients are at risk of deterioration of renal function due to the invasive nature of the surgical procedures. Some patients recover from the deterioration in function, but others do not. It is also necessary to consider the timing of blood sampling, which should be consistent among patients, e.g., the day after surgery and at the 3-month follow-up. Rigorous follow-up would clarify the time course of the deterioration of eGFR. In the present study, however, we analyzed patients at follow-up after discharge and the effect of the time of blood sampling may have been limited. Third, our analysis compared different sectors, but further studies using traditional statistical methods are required to pinpoint the most influential factors within each sector. Although we combined inputs from all sectors, a more in-depth analysis could offer clearer insights into their complex interactions. Focusing on radiomics and the CT image itself, we were able to compare these in one model. A medical image has full information, and it is doubtful that a deep-learning model will learn the appropriate area. On the other hand, radiomics is a statistic of a region of interest, and we lose information in other areas. We believe that vViT may be a solution to multimodal analysis for medical image analysis. Finally, we were unable to provide straightforward visualizations that effectively illustrate the model's decision-making process. While the importance of AI result visualization to enhance explainability is well-recognized, the visualization techniques for different model architectures vary. Grad-CAM, a popular method for visualizing CNN models, does not directly apply to transformer-based models due to fundamental differences in architecture. In transformer-based models like ViT, attention maps are typically used to interpret the model's focus areas. However, generating these visualizations cannot be directly applied to vViT. However, our study used segmented tumor images, which are designed to specify regions of interest. This design choice further complicates the visualization process, as it requires the model to focus on pre-defined areas rather than learning these areas autonomously. This limitation highlights the need for further research into developing robust visualization techniques for transformer-based models, particularly in medical imaging. Future work should focus on overcoming these challenges to improve the interpretability and explainability of such advanced AI models. These limitations should be considered in future research Table 3.
Conclusion
In conclusion, vViT is an effective model for predicting postoperative renal function decline in adults. When combined with permutation feature importance, the surgical and radiomics sectors were the most influential factors in both the eGFR10 and eGFR20 models. The high permutation importance of these factors underscores their critical roles in this prediction.
Data Availability and Source Code Sharing
The data used in this study can be obtained at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=61081171. We are willing to distribute source code of vViT for those who are interested in accessing the source code. Please contact the corresponding author.
Abbreviations
- 95% CI:
-
Confidence interval
- C4KC-KiTS:
-
2019 Kidney and Kidney Tumor Segmentation Challenge
- AUC-ROC:
-
Area under the curve of receiver-operating characteristics
- BMI:
-
Body mass index
- CKD:
-
Chronic kidney disease
- CNN:
-
Convolutional neural network
- CT:
-
Computed tomography
- DenseNet121:
-
Dense Convolutional Network 121
- GELU:
-
Gaussian error linear unit
- eGFR:
-
Estimated glomerular filtration rate.
- MLP:
-
Multilayer perceptron
- NPV:
-
Negative predictive value
- PPV:
-
Positive predictive value
- RCC:
-
Renal cell carcinoma
- ResNet50:
-
Residual Networks 50
- VGG16:
-
Visual geometry group 16
- vViT:
-
Variable vision transformer
References
Chandrasekar T, Boorjian SA, Capitanio U, Gershman B, Mir MC, Kutikov A. Collaborative Review: Factors Influencing Treatment Decisions for Patients with a Localized Solid Renal Mass. European Urology. https://doi.org/10.1016/j.eururo.2021.01.021.
Mir MC, Derweesh I, Porpiglia F, Zargar H, Mottrie A, Autorino R. Partial Nephrectomy Versus Radical Nephrectomy for Clinical T1b and T2 Renal Tumors: A Systematic Review and Meta-analysis of Comparative Studies. European Urology. https://doi.org/10.1016/j.eururo.2016.08.060.
Kim SP, Campbell SC, Gill I, Lane BR, Van Poppel H, Smaldone MC, Volpe A, Kutikov A. Collaborative Review of Risk Benefit Trade-offs Between Partial and Radical Nephrectomy in the Management of Anatomically Complex Renal Masses. European Urology. https://doi.org/10.1016/j.eururo.2016.11.038.
Van Poppel H, Da Pozzo L, Albrecht W et al. A Prospective, Randomised EORTC Intergroup Phase 3 Study Comparing the Oncologic Outcome of Elective Nephron-Sparing Surgery and Radical Nephrectomy for Low-Stage Renal Cell Carcinoma. European Urology. https://doi.org/10.1016/j.eururo.2010.12.013.
Shimada S, Saito H, Kawasaki Y et al. Clinical predictors of the estimated glomerular filtration rate 1 year after radical nephrectomy in Japanese patients. Investig Clin Urol. 2017 Jul;58(4):228-234. https://doi.org/10.4111/icu.2017.58.4.228.
Kuo CC, Chang CM, Liu KT et al. Automation of the kidney function prediction and classification through ultrasound-based kidney imaging using deep learning. NPJ Digit Med 2:29. https://doi.org/10.1038/s41746-019-0104-2, April 26, 2019.
Usuzaki T. Splitting expands the application range of Vision Transformer – variable Vision Transformer (vViT). arXiv. 2023, https://doi.org/10.48550/ARXIV.2211.03992.
Dosovitskiy A, Beyer L, Kolesnikov A et al. Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv, 2021. https://doi.org/10.48550/ARXIV.2010.11929.
Usuzaki T, Takahashi K, Inamori R et al. Grading diffuse glioma based on 2021 WHO grade using self-attention-base deep learning architecture: variable Vision Transformer (vViT). Biomedical Signal Processing and Control. 2024, https://doi.org/10.1016/j.bspc.2024.106001.
Fahad S, Salman K, Syed W et al. Transformers in medical imaging: A survey. Medical Image Analysis. 2023, https://doi.org/10.1016/j.media.2023.102802.
Heller N, Sathianathen N, Kalapara A et al. The KiTS19 Challenge Data: 300 Kidney Tumor Cases with Clinical Context. CT Semantic Segmentations, and Surgical Outcomes. 2019. https://doi.org/10.48550/ARXIV.1904.00445.
van Griethuysen JJM, Fedorov A, Parmar C et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res 2017;77:e104-e107.
Usuzaki T, Ishikuro M, Murakami K et al. How can we evaluate whether an association is truly inter-generational? J Hypertens 2020;38:1866-1868
M.A. Sargent, B.P.M. Wilson, Observer variability in the sonographic measurement of renal length in childhood, Clinical Radiology, 1992, https://doi.org/10.1016/S0009-9260(05)80382-4.
Liu X, Jin D, Zhang Y, Zhang S. Limited non-linear impact of warm ischemia time on renal functional decline after partial nephrectomy: a propensity score-matched study. Int Urol Nephrol. 2023 Jul;55(7):1699-1708. doi: https://doi.org/10.1007/s11255-023-03630-0. Epub 2023 May 16. PMID: 37191733.
Suarez-Ibarrola R, Basulto-Martinez M, Heinze A, Gratzke C, Miernik A. Radiomics Applications in Renal Tumor Assessment: A Comprehensive Review of the Literature. Cancers (Basel). 2020 May 28;12(6):1387. doi: https://doi.org/10.3390/cancers12061387. PMID: 32481542; PMCID: PMC7352711.
Ishiyama R, Omae K, Kondo T et al. Predictive factors and oncological outcomes of pathological T3a upstaging in patients with clinical T1 renal cell carcinoma undergoing partial nephrectomy. Japanese Journal of Clinical Oncology, 2023, hyad142, https://doi.org/10.1093/jjco/hyad142.
Steyaert, S., Pizurica, M., Nagaraj, D. et al. Multimodal data fusion for cancer biomarker discovery with deep learning. Nat Mach Intell 5, 351–362 (2023). https://doi.org/10.1038/s42256-023-00633-5
Simonyan K, Zisserman A: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv, https://doi.org/10.48550/arXiv.1409.1556, September 4, 2014.
He K, Zhang X, Ren S, Sun J: Deep Residual Learning for Image Recognition. CoRR, DOI: https://doi.org/10.48550/arXiv.1512.03385, December 10, 2015.
Huang G, Liu Z, van der Maaten L, Weinberger KQ: Densely Connected Convolutional Networks. arXiv, https://doi.org/10.48550/arXiv.1608.06993, August 25, 2016.
Takuma U, Ryusei I, Takashi S, Yohei M, Hidenobu T, Mami I, Taku O, Kei T (2024) Predicting isocitrate dehydrogenase status among adult patients with diffuse glioma using patient characteristics radiomic features and magnetic resonance imaging: Multi-modal analysis by variable vision transformer. Magnetic Resonance Imaging 1266-276. https://doi.org/10.1016/j.mri.2024.05.012
Takuma, Usuzaki Kengo, Takahashi Ryusei, Inamori Yohei, Morishita Takashi, Shizukuishi Hidenobu, Takagi Mami, Ishikuro Taku, Obara Kei, Takase (2024) Identifying key factors for predicting O6-Methylguanine-DNA methyltransferase status in adult patients with diffuse glioma: a multimodal analysis of demographics radiomics and MRI by variable Vision Transformer Abstract Neuroradiology 66(5) 761-773. https://doi.org/10.1007/s00234-024-03329-8
Acknowledgements
The authors acknowledge the National Cancer Institute of the National Institutes of Health, and Climb 4 Kidney Cancer (C4KC) for providing student scholarships, which play a critical role in the creation of the free publicly available the training set of the 2019 Kidney and Kidney Tumor Segmentation Challenge (C4KC-KiTS) database used in this study. We used the C4KC-KiTS datasets based on the TCIA Data Usage Policy and Restrictions. Data used in this research were obtained from TCIA sponsored by the Cancer Imaging Program, DCTD/NCI/NIH, https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=61081171. We are also grateful to Yasuyo Usuzaki for providing the high-quality figure in this paper.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Contributions
Takuma Usuzaki: Conceptualization, Investigation, Writing—Original Draft, Project administration. Ryusei Inamori: Conceptualization, Investigation, Writing—Original Draft, Project administration. Eichi Takaya: Writing—Review & Editing. Noriyasu Homma: Writing—Review & Editing. Mami Ishikuro: Writing—Review & Editing. Taku Obara: Writing—Review & Editing. Kei Takase: Writing—Review & Editing.
Corresponding author
Ethics declarations
Ethics Approval
All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent
The institutional review board approved this retrospective study and waived the requirement for obtaining informed consent from the patients.
Research Involving Human Participants and/or Animals
All procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Competing Interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Usuzaki, T., Inamori, R., Ishikuro, M. et al. Predicting EGFR Status After Radical Nephrectomy or Partial Nephrectomy for Renal Cell Carcinoma on CT Using a Self-attention-based Model: Variable Vision Transformer (vViT). J Digit Imaging. Inform. med. (2024). https://doi.org/10.1007/s10278-024-01180-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10278-024-01180-0