Abstract
With the introduction of Transformers, different attention-based models have been proposed for image segmentation with promising results. Although self-attention allows capturing of long-range dependencies, it suffers from a quadratic complexity in the image size especially in 3D. To avoid the out-of-memory error during training, input size reduction is usually required for 3D segmentation, but the accuracy can be suboptimal when the trained models are applied on the original image size. To address this limitation, inspired by the Fourier neural operator (FNO), we introduce the HartleyMHA model which is robust to training image resolution with efficient self-attention. FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the appealing properties of zero-shot super-resolution and global receptive field. We modify the FNO by using the Hartley transform with shared parameters to reduce the model size by orders of magnitude, and this allows us to further apply self-attention in the frequency domain for more expressive high-order feature combination with improved efficiency. When tested on the BraTS’19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In Python, this means \(\hat{U}[N_x, :, :] = \hat{U}[0, : , :]\), etc.
- 2.
Note that \(k_\textrm{max} = (k_{\textrm{max},x}, k_{\textrm{max},y}, k_{\textrm{max},z})\) corresponds to a frequency domain of size \(2k_{\textrm{max},x} \times 2k_{\textrm{max},y} \times 2k_{\textrm{max},z}\) to cover both positive and negative frequency terms.
- 3.
References
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016)
Bakas, S., et al.: Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data 4(170117), 1–13 (2017)
Bakas, S., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv:1811.02629 (2018)
Bracewell, R.N.: Discrete Hartley transform. J. Opt. Soc. Am. 73(12), 1832–1835 (1983)
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022. LNCS, vol. 13803, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 61–71 (2021)
Hartley, R.V.L.: A more symmetrical Fourier analysis applied to transmission problems. Proc. IRE 30(3), 144–150 (1942)
Hatamizadeh, A., et al.: UNETR: transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hesamian, M.H., Jia, W., He, X., Kennedy, P.: Deep learning techniques for medical image segmentation: achievements and challenges. J. Digital Imaging 32(4), 582–596 (2019)
Kingma, D.P., Ba, J.L.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Klambauer, G., Unterthiner, T., Mayr, A., Hochreiter, S.: Self-normalizing neural networks. In: Advances in Neural Information Processing Systems, pp. 972–981 (2017)
Lee, C.Y., Xie, S., Gallagher, P.W., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: International Conference on Artificial Intelligence and Statistics, pp. 562–570 (2015)
Lee-Thorp, J., Ainslie, J., Eckstein, I., Ontanon, S.: FNet: Mixing tokens with Fourier transforms. arXiv:2105.03824 (2021)
Li, Z., et al.: Neural operator: Graph kernel network for partial differential equations. arXiv:2003.03485 (2020)
Li, Z., et al.: Fourier neural operator for parametric partial differential equations. In: International Conference on Learning Representations (2021)
Liu, X., Song, L., Liu, S., Zhang, Y.: A review of deep-learning-based medical image segmentation methods. Sustainability 13(3), 1224 (2021)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (2021)
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: International Conference on Learning Representations (2017)
Lu, J., et al.: SOFT: softmax-free transformer with linear complexity. In: Advances in Neural Information Processing Systems, vol. 34, pp. 21297–21309 (2021)
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2015)
Tolstikhin, I.O., et al.: MLP-Mixer: an all-MLP architecture for vision. In: Advances in Neural Information Processing Systems, vol. 34, pp. 24261–24272 (2021)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Wong, K.C.L., Moradi, M.: 3D segmentation with fully trainable Gabor kernels and Pearson’s correlation coefficient. In: Machine Learning in Medical Imaging, pp. 53–61 (2022)
Wong, K.C.L., Moradi, M., Tang, H., Syeda-Mahmood, T.: 3D segmentation with exponential logarithmic loss for highly unbalanced object sizes. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11072, pp. 612–619. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00931-1_70
Xie, Y., Zhang, J., Shen, C., Xia, Y.: CoTr: efficiently bridging CNN and transformer for 3D medical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 171–180 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wong, K.C.L., Wang, H., Syeda-Mahmood, T. (2023). HartleyMHA: Self-attention in Frequency Domain for Resolution-Robust and Parameter-Efficient 3D Image Segmentation. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14223. Springer, Cham. https://doi.org/10.1007/978-3-031-43901-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-43901-8_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43900-1
Online ISBN: 978-3-031-43901-8
eBook Packages: Computer ScienceComputer Science (R0)