Large-Kernel Attention for 3D Medical Image Segmentation

Automated segmentation of multiple organs and tumors from 3D medical images such as magnetic resonance imaging (MRI) and computed tomography (CT) scans using deep learning methods can aid in diagnosing and treating cancer. However, organs often overlap and are complexly connected, characterized by extensive anatomical variation and low contrast. In addition, the diversity of tumor shape, location, and appearance, coupled with the dominance of background voxels, makes accurate 3D medical image segmentation difficult. In this paper, a novel 3D large-kernel (LK) attention module is proposed to address these problems to achieve accurate multi-organ segmentation and tumor segmentation. The advantages of biologically inspired self-attention and convolution are combined in the proposed LK attention module, including local contextual information, long-range dependencies, and channel adaptation. The module also decomposes the LK convolution to optimize the computational cost and can be easily incorporated into CNNs such as U-Net. Comprehensive ablation experiments demonstrated the feasibility of convolutional decomposition and explored the most efficient and effective network design. Among them, the best Mid-type 3D LK attention-based U-Net network was evaluated on CT-ORG and BraTS 2020 datasets, achieving state-of-the-art segmentation performance when compared to avant-garde CNN and Transformer-based methods for medical image segmentation. The performance improvement due to the proposed 3D LK attention module was statistically validated.


Introduction
Malignant tumors and other organ illnesses have long been a problem for humans, seriously endangering their lives and general well-being.Worldwide, millions of people die from cancer each year, making it the leading cause of mortality [1].Nevertheless, early identification and therapy are still the most effective means of enhancing cancer survival.Identifying the location of organs and lesions is a crucial step in the diagnostic process and plays a vital role in treating diseases.In general, locating organs and lesions from medical images such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) is a segmentation task.Clinicians can determine the location, size, and subtype of a tumor through the precise segmentation of tumors.This benefits not only in the diagnostic process but also in planning radiation therapy or surgery.On the other hand, accurate organ segmentation can help clinicians select personalized treatment strategies for various patients, enabling the practice of precision medicine and individualized care, which can lessen the patient's financial and psychological burdens.Additionally, the segmentation of longitudinal MRI images can be utilized to track tumor development or shrinkage as well as the response and recovery of diseased organs to therapeutic interventions.Therefore, the research and implementation of medical image segmentation are of major significance.
Segmentation of organs and lesions is typically performed manually by experienced radiologists in current clinical practice.Observing medical images to differentiate human organs, tissues, and lesions is a challenging and timeconsuming endeavor.Additionally, because manual labeling results rely heavily on the radiologist's expertise and subjective judgment, they are rarely reproducible and might even involve human bias.Consequently, these problems contribute to the low practicability of manual segmentation.Automated or computer-aided segmentation approaches can solve these issues by requiring less labor and producing objective, reproducible results for later disease diagnosis and management.As a result, automated medical picture segmentation has been thoroughly researched and has emerged as the benchmark.
With the increase in GPU computing power and the quick advancement of deep learning technology in recent years, the field of image semantic segmentation has also grown rapidly.Natural image segmentation approaches based on fully convolutional neural networks (FCN) [2] have been more developed over time.In the meantime, medical image segmentation remains a formidable challenge, as medical images are characterized by uneven grayscale, significant contrast variation, and substantial noise.Since U-Net [3] was published, medical image semantic segmentation has also undergone tremendous development.Numerous convolutional neural network variations, such as FCN and U-Net, are rapidly being applied.
However, the existing technology for the automatic segmentation of medical images lacks sufficient intelligence and precision.Organs are challenging to differentiate because of their overlap and intricate connections.Additionally, the wide variations in anatomical structure and the low contrast make segmentation tasks ambiguous.Moreover, for lesion segmentation, tumors can arise in any organ location and exhibit a wide range of size, shape, and appearance [4].Second, in many cases, the tumor volume is rather small relative to the entire scan, resulting in the dominance of the background noise [5].All of these issues lower the segmentation accuracy.In clinical practice, even minute inaccuracies in medical picture segmentation might result in misdiagnosis.Therefore, segmentation models based on deep learning have significant space for development in this discipline.
Long-range self-attention can be used to enable the network to learn only the truly crucial information [6], such as the organ boundaries or tumor-related features.It is a mechanism for adaptive selection based on the inputs' features.Different self-attention techniques have been used in medical image segmentation [7][8][9].They have obtained superior performance compared to traditional FCNs because of their efficiency in capturing long-range dependencies.Despite recent attempts [7][8][9], self-attention has several shortcomings when it comes to medical image segmentation since it was designed for Natural Language Processing (NLP).First, it analyzes images as one-dimensional sequences, ignoring the structural details required for obtaining morphological features in medical images.Second, since 3D scans like MRI or CT are too computationally expensive with quadratic complexity, most self-attention research is 2D-based.Third, it disregards the necessity of channel adaptation for attention processes.For image semantic segmentation tasks, different channels usually represent features of different objects.Thus, adaptation in channel maps is important for attentions to build dependencies within channels [8,[10][11][12][13][14].
In order to address these issues, this paper introduces a novel large-kernel (LK) attention module for enhancing medical image segmentation.The LK attention module combines self-attention and convolution's advantages, such as long-range dependence, spatial adaptation, and local contextual information, and avoiding their disadvantages, such as disregarding channel adaptation and computational complexity.This paper is based on our previous work on MRI brain tumor segmentation at the Medical Image Understanding and Analysis Conference (MIUA) [15].On this basis, we optimized the LK attention model, conducted comprehensive ablation experiments to demonstrate its feasibility and explore more efficient design and deployment strategies.We also further investigated whether LK attention could improve the performance of CT multiorgan segmentation to expand the application scope and adaptability of LK attention in medical imaging and segmentation tasks.The following highlights the key contributions of this paper: • A novel LK attention module utilizing decomposed LK convolutions was proposed, which combines the advantages of convolution and self-attention while avoiding their disadvantages.• A U-Net architecture that efficiently incorporated LK attention was proposed for the segmentation of 3D medical images.By adaptively amplifying the weights of key features while reducing the weights of noisy voxels, the LK attention-based U-Net can accurately identify the locations of various organs and tumor subregions.The rest of the article is structured as follows: Section 2 will briefly review related work.Section 3 will detail our segmentation method, including the LK attention module and network architecture.Section 4 will illustrate the experimental setup, and results and discussion will be presented in Section 5.The conclusion will be given in the final Section 6.

Related Work
In this section, we will briefly review the recent work related to multi-organ segmentation and tumor segmentation, including some applications of selfattention.

Multi-organ Segmentation
Multi-organ segmentation, which comprehensively classifies voxels into multiple organ classes rather than just organs or other tissues, gives a broader viewpoint on the task of organ segmentation.This involves identifying which organ type a particular voxel belongs to, in addition to determining if it belongs to an organ [16].Due to the increased data volume and image complexity, the automatic segmentation of multiple organs in 3D medical images is a tedious challenge.
A method for segmenting 3D CT images using the majority voting was proposed in [17] based on the FCN.In [18], a neural network dubbed 3D DSN avoids unnecessary computation and overfitting via volume-to-volume learning, making it suited for applying to cardiac and hepatic anatomy.H.R. Roth et al. [19] presented a coarse-to-fine method for multi-organ segmentation that included two stages.The 3D FCN in the first stage extracts candidate regions coarsely, whereas the second 3D FCN focuses on potential organ region boundaries in a cascaded way, hence minimizing the number of voxels to be processed.Similar research was conducted by [20] employing cascaded 3D FCNs for dual-energy CT. [21] presented a 3D-U-JAPA-net based on transfer learning, whereas [22] created a semi-supervised network to fully exploit the unlabeled data.To save GPU memory, [23] suggested combining 2D and 3D models, performing segmentation using 2D convolutions, and extracting spatial information from 3D models.
Although FCNs have been shown to be very successful, learning long-range spatial relationships is challenging due to the localization of convolutional layers.The UNETR architecture was proposed by [24], who was inspired by transformers used in NLP.The transformer acting as an encoder enables U-Net to gather global information and long-range spatial relationships, leading to superior segmentation results.
Since 2014, deep learning algorithms have been extensively researched for tumor segmentation in the BraTS challenge [4,6,[28][29][30][31][32][33][34][35].Myronenko [32] won the BraTS 2018 competition by training an asymmetrical U-Net with a broader encoder and an additional variational decoder branch that provided further regularization.A two-stage cascaded asymmetrical U-Net comparable to Myronenko [32] was proposed by Jiang et al. [31].The first step generated a coarse prediction, whereas the second stage utilized a larger network to refine the outcome.In order to automatically adapt the traditional U-Net to a particular dataset with just minor alterations, Isensee et al. [30] adopted a self-configuring framework called nnU-Net.Wang et al. [34] suggested a modality-pairing learning method that uses the layer connection on parallel branches to extract the complicated interactions and rich information between various MRI modalities.Jia et al. [6] created the Hybrid High-resolution and Non-local Feature Network (H2NF-Net), which used parallel multi-scale convolutional blocks to utilize multi-scale features and preserve high-resolution features representation simultaneously.The self-attention mechanism implemented in this study permits the aggregation of local information across spatial locations and the acquisition of long-range dependence.

Method
Our method is detailed in this section, including the new LK attention module and the modified U-Net based on the LK attention module for 3D medical image segmentation.Numerous studies have demonstrated that the integration of diverse attention mechanisms has the potential to enhance segmentation performance.The attention map reflects the relative significance across the feature space, which necessarily involves the capture of correlations between various locations.The self-attention can be used to discover long-range dependence, but it has several disadvantages, as stated in the previous section.Applying large-kernel convolution to establish long-distance relationships and generate the attention map is an alternative method [10][11][12][13][14]36].Nevertheless, this strategy substantially increases the computational cost.

LK Attention
To address these limitations and maximize the benefits of self-attention and large-kernel (LK) convolution, we developed an LK attention module (shown in Figure 1).Assuming K is the number of channels, a K × K × K LK convolution was decomposed into a (2d − 1) × (2d − 1) × (2d − 1) depth-wise (DW) convolution, a K d × K d × K d depth-wise dilated (DWD Conv) convolution with dilation of d and a 1 × 1 × 1 convolution.For an input with dimensions of H × W × D × C, the number of parameters (N PRM ) and the number of floating-point operations (FLOPs) for the original LK convolution and its decomposition can be calculated as follows: where O and D represent the original LK convolution and decomposed LK convolution, respectively.To determine the optimal d such that N P RM is minimal for a particular kernel size K, we set the first derivative of Equation 3 to 0 and then solved as follows: In Equation 5, the superscript * distinguishes dilation d from derivation d.For K = 21, solving Equation 3.1 numerically yielded an optimal approximation of d of approximately 3.4159.As shown in Table 1, the number of parameters can be significantly lowered with a dilation rate of 3. We can also observe that as the number of channels increases, the decomposition becomes more efficient.
The entire LK attention module is formulated as follows: where A denotes the attention map, and GN is the group normalization.σ lReLU and σ sigmoid denote to leaky ReLU activation function and sigmoid activation function, respectively.The LK Attention module's output is formed by multiplying and summing the input feature map and the attention map element by element.Using the LK attention module, we can extract long-range correlations within a feature space and generate the attention map with minimal computing complexity and parameters.

LK Attention-based U-Net
Fig. 2 The network architecture of our proposed LK attention-based U-Net.
The U-Net [3] has served as a basis for numerous studies on medical image processing.Its capacity to capture fine object features utilizing skiparchitecture is particularly advantageous for precise segmentation.As shown in Figure 2, the 3D LK attention-based U-Net architecture is based on the U-Net and comprises an encoding path of feature extraction and a decoding path of inference with the skip connection.

Encoder
The encoder is composed of convolution blocks of six scales.Each block contains two convolution layers with a 3 × 3 × 3 kernel, GN, and lReLU (with a slope of 0.01).The input data of I channels is convoluted by 32 kernels to generate initial 32 feature maps, and the channel number I corresponds to the number of imaging modalities.Between the two scales, a stride-2 3 × 3 × 3 convolution is used to downsample the feature map by 2 and increase the number of channels to a maximum of 512.The deepest feature map is 1/32 of the original size.

LK Attention-Based Decoder
The architecture of the decoder is identical to that of the encoder, using 4 × 4 × 4 transposed convolution for upsampling.The LK attention module can be applied to each upsampled feature map to form a fully applied (Full) network as in our previous paper.The details of the LK attention module for the Full network are shown in Table 2.At the last layer, a 1 × 1 × 1 convolution is applied to compress the channel number O according to the number of segmentation classes, followed by the softmax/sigmoid to generate probability maps for different organs or tumor regions.Additional softmax/sigmoid outputs were added to all scales except the two lowest levels for deep supervision and boost gradient propagation.

Data Acquisition
The CT-ORG [37] dataset consists of 140 CT images of six organ classes, including liver, lungs, bladder, kidneys, bones, and brain.Of the total 140 image volumes, 131 were dedicated CTs, and 9 were CT components collected during PET-CT examinations.Each image was acquired from a different patient.Most images displayed benign or malignant liver lesions; some showed metastasis from breast, colon, bone, and lung cancers.The images were collected from a variety of sources, including low-dose, high-dose, contrast, and non-contrast CT, with dedicated CTs ranging from 0.56 mm to 1 mm in axial resolution.With the aid of ITK-SNAP and morphological segmentation, manual segmentation was conducted on the test dataset (21 cases).Some images were received from the Liver Tumor Segmentation Challenge (LiTS) [38].The BraTS 2020 dataset was collected using various clinical protocols and scanners from different institutions.The ground truth (GT) labels are annotated by one to four raters and approved by specialists, which include the GD-enhancing tumor (ET), peritumoral edema (ED), and necrotic and nonenhancing tumor core (NCR + NET).The segmentation results are evaluated on three subregions of the tumor: the GD-enhancing tumor (ET), the tumor core (TC = ET + NCR + NET), and the whole tumor (WT = ET + NCR + NET + ED).The image modalities T1, T1ce, T2, and T2-FLAIR are co-registered to the same template with an image size of 240 × 240 × 155.Afterward, they are interpolated to the same resolution (1 mm 3 ) and skullstripped.Annotations are only available for the training set (369 cases).The evaluation of the independent validation set (125 cases) should be conducted on the official online platform (CBICA's IPP1 ).Details of the two datasets are summarized in Table 3.

Pre-processing and Data Augmentation
For the CT-ORG dataset, our network takes an image volume of 128×128×256 as input.To reduce GPU memory usage, all image volumes were resampled to 3 mm 3 .Resampling uses Gaussian smoothing to avoid aliasing artifacts, followed by resolution interpolation.All image volumes for the BraTS 2020 dataset are cropped to 160 × 192 × 128 to reduce computational waste on background voxels.All input volumes are then pre-processed by intensity normalization.
Various data augmentation techniques have been applied to artificially increase dataset size and minimize the risk of overfitting.All augmentations are applied on-the-fly throughout the training to expand the training dataset indefinitely.Furthermore, to increase the variability of the generated data, all augmentations are applied randomly based on preset probabilities, and most parameters are also drawn randomly (see Table 4 for details).

Training and Optimization
The

Evaluation Metrics
The segmentation results were evaluated using the Dice score and 95 percent Hausdorff distance (HD95), which are defined as: where X and Y are sets of GT and prediction, and P represents the percentile.HD95 indicates the 95th percentile of maximum distances between two boundaries, whereas the Dice score measures spatial overlap between the segmentation result and the GT annotation.The final performance of LK attention-based U-Net wasis evaluated using independent test sets from CT-ORG (21 cases) and BraTS 2020 (125 cases), respectively.The brain class was excluded from evaluation because it was present in only 8 of 119 training images, so the model had difficulty learning to differentiate it.

Results and Discussion
This section will first experimentally demonstrate the effectiveness of our LK attention module design, and then quantitatively analyze the segmentation results.The limitations of the proposed method will be also discussed in the last subsection.

Qualitative Analysis of Ablation Experiments
For the ablation study, the CT-ORG test dataset was used for evaluation, and the network without any attention module was adopted as the base model.We first verify the effectiveness of LK convolutional decomposition and then look for efficient ways to compute the attention map through different model variants.
We conducted ablation experiments by adding different single attention modules to the base network.By comparing the attention module using the original LK convolution with the attention module using the decomposed LK convolution, the decomposition of the LK convolution was proven to be effective and efficient.The comparative results in Table 5 show that the segmentation results of the two attention modules were very close at both the deepest and shallowest levels.For the bottleneck LK attention module, the segmentation of the decomposed LK convolution performed slightly worse than the original (average 0.09 difference in Dice score).Moreover, the segmentation performance of the decomposed LK convolution at the highest level was even better.The changes in Dice score were verified by paired t-tests in the test set, giving p-values of 0.094 and 0.122, respectively.In addition, we can also see that the decomposition of LK convolution significantly reduced the number of added parameters to about 0.5% and 0.2% of the original, respectively.The LK attention module can be applied to each upsampled feature map.However, the additional computational cost of a fully applied (Full) network is high, and the efficiency of its design deserves to be analyzed.Therefore, we explored many variants of attention modules with different sizes and positions, as shown in Table 6.Applying decomposed LK attention modules with different kernel sizes at the same location (160 × 192 × 128) indicated that larger kernel coverage leads to better segmentation performance.Kernel coverage refers to the ratio of the kernel size to the feature space size.This is reasonable because convolutions with larger kernels capture correlations across longer distances more effectively.While decomposed LK convolutions with the same kernel size (6,6,6) at different locations show that the LK attention module worked best in the middle of the decoder.We can see that when the fixed kernel size LK attention module was applied to larger scales, its segmentation performance initially increased but started to decrease slightly due to the significant reduction of kernel coverage at high levels.Therefore, to balance the effects of kernel size and position, we applied the largest LK attention module in the middle, which achieved the highest Dice score.Therefore, the network structure utilizing LK attention in the middle of the decoder (Mid) is the most effective and efficient, with the number of added parameters being nearly one-sixth of the Full network.

Quantitative Analysis of Segmentation
The evaluation of the segmentation performance of the proposed methods was conducted and compared with state-of-the-art methods, including CBAM [14] using an independent CT-ORG test set (21 cases) and BraTS 2020 validation set (125 cases), which are shown in Table 7 and 8.
Quantitative results show that the proposed Mid-type network outperformed other model architectures and all state-of-the-art methods for segmenting all organs and tumor subregions.For multi-organ segmentation, the proposed method achieved the highest Dice score and the lowest HD95 score in all organs, especially the lungs.This might be attributed to the fact that the LK attention module emphasizes the features of the correct organ, thereby reducing distracting and false predictions.For the Dice score, the Mid network was only slightly inferior to the Base network in the segmentation of the bladder.We found that adding any attention mechanism caused a decrease in Dice for bladder segmentation.This may be due to the uneven distribution of attention to fine organs, resulting in a greater concentration of computing power on other organs.For brain tumor segmentation, the proposed method was only slightly lower than other methods in the HD95 score of TC by a tiny 0.05 margin.On the other hand, the Mid network performed very well on ET's HD95 score, also due to the LK attention module adding feature weights to the correct tumor subregions.Representative segmentation results are also compared visually in Figures 3 and 4, which further proves the effectiveness of the LK attention module.
Comparing the results of the Base and Mid networks, the performance improvement due to the presence of the LK attention module can be seen.Lungs, ET, and TC had more significant improvements in segmentation performance, as shown in Tables 9 and 10.To verify the metric's increase, we performed a paired t-test, and the p-values are shown in Table 6.The improvements brought by the LK attention module on all segmentation targets were statistically validated, except for bladder and ET.The LK attention module caused a slight decrease in the accuracy of bladder segmentation.On the other hand, since BraTS 2020 set a penalty of Dice = 0 and HD95 = 373.13for false positives of ET, the paired t-test cannot verify changes in ET.This statistic  validates the effectiveness of the adaptive feature selection of the LK attention module, as visualized in Figure 5.
Furthermore, high-performance deep learning models usually produce incomprehensible results for humans.While these models can yield better efficiencies than humans, it is not easy to express intuitive explanations to justify their findings or to derive additional clinical insights from these computational "black boxes" [39].Given the importance of explainability in the clinical domain, our proposed LK attention module proved that deep learning models could identify appropriate regions in medical images without overemphasizing unimportant findings.The local explanation furnished directly by the LK attention map (in Figure 5) argued that there was medical reasoning for the focused parts of the CT scan, which could facilitate clinicians' decision-making.

Limitations
Our method still has some limitations.First, as shown in Figure 3, the segmentation results' resolution was lower than GT due to resampling.In future work, the resolution of the segmentation mask can be improved by resampling the image to a higher resolution with sliding windows.Moreover, in the second example of Figure 4, the TC was not accurately segmented, which might be due to the blurring of the T2 modality.This demonstrates the importance of data integrity for the accurate segmentation of medical images.This can be solved by more diverse data acquisition and data augmentation or by training generative networks to synthesize clear images.

Conclusion
This paper introduced LK attention for 3D medical image segmentation, which can be easily incorporated into any FCN such as U-Net.The LK attention module combines the advantages of convolution and self-attention, exploits local contextual information, long-range dependencies, spatial and channel adaptation, and uses convolutional decomposition to eliminate the disadvantage of high computational cost.Ablation experiments on the CT-ORG dataset first verified the feasibility of the decomposition of LK convolutions and secondly explored the most efficient deployment design of the LK attention module.The quantitative results of ablation learning indicated that incorporating the LK attention module in the middle of the decoder achieved optimal performance.The Mid-type LK attention-based U-Net achieved state-of-the-art performance on both multi-organ and tumor segmentation.Segmentation results of CT-ORG and BraTS 2020 datasets showed that the LK attention module improved predictions for all organs and tumor subregions except the bladder, especially for lung, ET, and TC.In addition, the LK attention module was proven to be effective in adaptively selecting important features and suppressing noise, which provided local explanations of model's prediction.However, some challenges remained.First, the addition of attention caused the scattered computing power for some fine targets such as the bladder.Thus, the LK attention module can be further customized for multi-target segmentation.Second, for large medical images, better sampling or training strategies can be used to further improve the resolution of the segmentation results.Furthermore, since the low quality of the images can significantly reduce the segmentation accuracy, more comprehensive data augmentation strategies and larger training datasets can be considered, or a generative network can be used to synthesize high-quality images.

Fig. 1
Fig. 1 LK attention module.The decomposed LK convolution is applied on the feature map after group normalization (GN) and leaky ReLU (lReLU).The attention map is obtained by sigmoid activation, which is then multiplied and summed elementwise with the original feature map to generate the module output.The figure shows a representative decomposition of a 21 × 21 × 21 convolution into a 5 × 5 × 5 depth-wise (DW) convolution, a 7 × 7 × 7 depthwise dilated (DWD) convolution with dilation of 3, and a 1 × 1 × 1 convolution.The position of the kernel is indicated by colored voxels, and the yellow voxels show the kernel's centers.(The figure only illustrates a corner of the feature space of the decomposed LK convolution and disregards the zero-padding.) LK attention-based U-Net is trained separately on CT-ORG and BraTS 2020 training datasets.For the CT-ORG training set (119 cases), the network parameters are optimized for weighted soft Dice loss.The weight for each segmentation class is one minus the ratio of foreground voxels to background voxels.For the BraTS 2020 training set (369 cases), binary cross-entropy (BCE) and soft Dice losses are utilized.The adaptive moment estimator (Adam) optimizer was applied to optimize the parameters of the network.Each training process had 200 epochs with a batch size of 1 and an initial learning rate of 0.0003.All experiments were implemented with Pytorch 1.10 on an NVIDIA GeForce RTX 3090 GPU of 24GB VRAM.
HD95 = P 95 max max x∈X min y∈Y |y − x|, max y∈Y min x∈X |x − y| ,

Fig. 4
Fig. 4 Representative visual results of proposed methods for BraTS 2020.From left to right: four MRI modalities, ground truth (GT), and predictions.The labels are enhancing tumor (yellow), edema (green), and necrotic and non-enhancing tumor (red).

Fig. 5 A
Fig. 5 A representative visual effect of the LK attention module.(a) The CT scan input.(b) The upsampled feature map at the middle scale of the decoder.(c) The attention map.(d) The feature map after multiplying with the attention map.(e) The GT labels.

Table 1
Complexity analysis: comparison of the number of parameters N P RM for a 21 × 21 × 21 convolution.
The subscripts O and D denote the original convolution and the proposed decomposed convolution, respectively.C: number of channels.

Table 2
Details of LK attention modules in the Full LK attention-based U-Net

Table 3
Details of datasets.

Table 4
Details of data augmentation strategies.

Table 5
Quantitative results to compare the decomposed (D) 3D LK convolution with the original (O) 3D LK convolution.

Table 6
Quantitative results to compare 3D LK attention modules of different convolutional kernel sizes at different locations in the network.

Table 7
Quantitative results of proposed methods compared to state-of-the-art methods for CT-ORG.(Bold numbers are the best results)

Table 8
Quantitative results of proposed methods compared to state-of-the-art methods for BraTS 2020.(Bold numbers are the best results)

Table 9
Improvement in quantitative results due to the LK attention module for CT-ORG.

Table 10
Improvement in quantitative results due to the LK attention module for BraTS 2020.