Skip to main content

Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation

  • Conference paper
  • First Online:
Multiscale Multimodal Medical Imaging (MMMI 2022)

Abstract

Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. However, since their convolution layers suffer from limited kernel size, they are not able to capture long-range dependencies and global context. To tackle this restriction, vision transformers have been introduced to solve FCNN’s locality of receptive fields. Although transformers can capture long-range features, their segmentation performance decreases with various tumor sizes due to the model sensitivity to the input patch size. While finding an optimal patch size improves the performance of vision transformer-based models on segmentation tasks, it is a time-consuming and challenging procedure. This paper proposes a technique to select the vision transformer’s optimal input multi-resolution image patch size based on the average volume size of metastasis lesions. We further validated our suggested framework using a transfer-learning technique, demonstrating that the highest Dice similarity coefficient (DSC) performance was obtained by pre-training on training data with a larger tumour volume using the suggested ideal patch size and then training with a smaller one. We experimentally evaluate this idea through pre-training our model on a multi-resolution public dataset. Our model showed consistent and improved results when applied to our private multi-resolution mCRC dataset with a smaller average tumor volume. This study lays the groundwork for optimizing semantic segmentation of small objects using vision transformers. The implementation source code is available at: https://github.com/Ramtin-Mojtahedi/OVTPS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Colorectal cancer - statistics. https://www.cancer.net/cancer-types/colorectal-cancer/statistics. Accessed 31 May 2022

  2. Colorectal cancer survival rates: Colorectal cancer prognosis. https://www.cancer.org/cancer/colon-rectal-cancer/detection-diagnosis-staging/survival-rates. Accessed 1 Mar 2022

  3. Liver metastases (secondary liver cancer). https://www.mskcc.org/cancer-care/types/liver-metastases

  4. Valderrama-Treviño, A.I., Barrera-Mera, B., Ceballos-Villalva, J.C., Montalvo-Javé, E.E.: Hepatic metastasis from colorectal cancer. Eur. J. Hepato-Gastroenterol. 7, 166–175 (2016).https://doi.org/10.5005/jp-journals-10018-1241

  5. Wu, W., Wu, S., Zhou, Z., Zhang, R., Zhang, Y.: 3D liver tumor segmentation in CT images using improved fuzzy c-means and graph cuts. BioMed. Res. Int. 1–11 (2017). https://doi.org/10.1155/2017/5207685

  6. Soleymanifard, M., Hamghalam, M.: Segmentation of whole tumor using localized active contour and trained neural network in boundaries. In: 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI) (2019)

    Google Scholar 

  7. Hamghalam, M., Wang, T., Qin, J., Lei, B.: Transforming intensity distribution of brain lesions via conditional GANs for segmentation. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) (2020)

    Google Scholar 

  8. Hamghalam, M., Lei, B., Wang, T.: Convolutional 3D to 2D patch conversion for pixel-wise glioma segmentation in MRI scans. In: Crimi, A., Bakas, S. (eds.) BrainLes 2019. LNCS, vol. 11992, pp. 3–12. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-46640-4_1

    Chapter  Google Scholar 

  9. Hamghalam, M., Frangi, A.F., Lei, B., Simpson, A.L.: Modality completion via gaussian process prior variational autoencoders for multi-modal glioma segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12907, pp. 442–452. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87234-2_42

    Chapter  Google Scholar 

  10. Hamghalam, M., Lei, B., Wang, T.: High tissue contrast MRI synthesis using multi-stage attention-GAN for segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 4067–4074 (2020)

    Google Scholar 

  11. Zheng, S., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with Transformers. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  12. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  13. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-net architecture for medical image segmentation. In: DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1

    Chapter  Google Scholar 

  14. Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: NNU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2020). https://doi.org/10.1038/s4159202001008z

    Article  Google Scholar 

  15. Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. In: ICLR 2021 (2021)

    Google Scholar 

  16. Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4

    Chapter  Google Scholar 

  17. Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv: 2103.14030 (2021)

  18. Hatamizadeh, A., et al.: UNETR: transformers for 3D medical image segmentation. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2022)

    Google Scholar 

  19. Milletari, F., Navab, N., Ahmadi, S.-A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV) (2016). https://doi.org/10.1109/3DV.2016.79

  20. Antonelli, M., et al.: The medical segmentation decathlon. Nat. Commun. 13, 1–13 (2022)

    Article  Google Scholar 

  21. Bilic, P., et al.: The liver tumor segmentation benchmark (LiTS). arXiv preprint arXiv:1901.04056 (2019)

  22. Simpson, A.L., et al.: Computed tomography image texture: a noninvasive prognostic marker of hepatic recurrence after hepatectomy for metastatic colorectal cancer. Ann. Surg. Oncol. 24, 2482–2490 (2017)

    Article  Google Scholar 

Download references

Acknowledgement

This work was funded in part by National Institutes of Health R01CA233888.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amber L. Simpson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mojtahedi, R., Hamghalam, M., Do, R.K.G., Simpson, A.L. (2022). Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation. In: Li, X., Lv, J., Huo, Y., Dong, B., Leahy, R.M., Li, Q. (eds) Multiscale Multimodal Medical Imaging. MMMI 2022. Lecture Notes in Computer Science, vol 13594. Springer, Cham. https://doi.org/10.1007/978-3-031-18814-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18814-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18813-8

  • Online ISBN: 978-3-031-18814-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics