Skip to main content

AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 Workshops (MICCAI 2023)

Abstract

Skin lesion segmentation (SLS) plays an important role in skin lesion analysis. Vision transformers (ViTs) are considered an auspicious solution for SLS, but they require more training data compared to convolutional neural networks (CNNs) due to their inherent parameter-heavy structure and lack of some inductive biases. To alleviate this issue, current approaches fine-tune pre-trained ViT backbones on SLS datasets, aiming to leverage the knowledge learned from a larger set of natural images to lower the amount of skin training data needed. However, fully fine-tuning all parameters of large backbones is computationally expensive and memory intensive. In this paper, we propose AViT, a novel efficient strategy to mitigate ViTs’ data-hunger by transferring any pre-trained ViTs to the SLS task. Specifically, we integrate lightweight modules (adapters) within the transformer layers, which modulate the feature representation of a ViT without updating its pre-trained weights. In addition, we employ a shallow CNN as a prompt generator to create a prompt embedding from the input image, which grasps fine-grained information and CNN’s inductive biases to guide the segmentation task on small datasets. Our quantitative experiments on 4 skin lesion datasets demonstrate that AViT achieves competitive, and at times superior, performance to SOTA but with significantly fewer trainable parameters. Our code is available at https://github.com/siyi-wind/AViT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adegun, A., Viriri, S.: Deep learning techniques for skin lesion analysis and melanoma cancer detection: a survey of state-of-the-art. Artif. Intell. Rev. 54, 811–841 (2021)

    Article  Google Scholar 

  2. Bahng, H., Jahanian, A., et al.: Visual prompting: modifying pixel space to adapt pre-trained models. arXiv preprint arXiv:2203.17274 (2022)

  3. Ballerini, L., Fisher, R.B., Aldridge, B., Rees, J.: A color and texture based hierarchical K-NN approach to the classification of non-melanoma skin lesions. In: Celebi, M.E., Schaefer, G. (eds.) Color Medical Image Analysis, pp. 63–86. Springer Netherlands, Dordrecht (2013). https://doi.org/10.1007/978-94-007-5389-1_4

    Chapter  Google Scholar 

  4. Birkenfeld, J.S., Tucker-Schwartz, J.M., et al.: Computer-aided classification of suspicious pigmented lesions using wide-field images. Comput. Methods Programs Biomed. 195, 105631 (2020)

    Article  Google Scholar 

  5. Cao, H., et al.: Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9

    Chapter  Google Scholar 

  6. Chen, L.C., Papandreou, G., et al.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)

    Article  Google Scholar 

  7. Chen, S., Ge, C., Tong, Z., Wang, J., Song, Y., et al.: AdaptFormer: adapting vision transformers for scalable visual recognition. In: NeurIPS 2022 (2022)

    Google Scholar 

  8. Codella, N., Rotemberg, V., Tschandl, P., Celebi, M.E., Dusza, S., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv preprint arXiv:1902.03368 (2019)

  9. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009, pp. 248–255. IEEE (2009)

    Google Scholar 

  10. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR 2020 (2020)

    Google Scholar 

  11. Du, S., Bayasi, N., Harmarneh, G., Garbi, R.: MDViT: multi-domain vision transformer for small medical image segmentation datasets. arXiv preprint arXiv:2307.02100 (2023)

  12. Du, S., Hers, B., Bayasi, N., Hamarneh, G., Garbi, R.: FairDisCo: fairer AI in dermatology via disentanglement contrastive learning. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops: Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IV, pp. 185–202. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25069-9_13

    Chapter  Google Scholar 

  13. Gao, Y., Shi, X., Zhu, Y., Wang, H., et al.: Visual prompt tuning for test-time domain adaptation. arXiv preprint arXiv:2210.04831 (2022)

  14. Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: de Bruijne, M., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part III, pp. 61–71. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_6

    Chapter  Google Scholar 

  15. Gao, Y., et al.: A data-scalable transformer for medical image segmentation: architecture, model efficiency, and benchmark. arXiv preprint arXiv:2203.00131 (2022)

  16. Glaister, J., Amelard, R., Wong, A., Clausi, D.A.: MSIM: multistage illumination modeling of dermatological photographs for illumination-corrected skin lesion analysis. IEEE Trans. Biomed. Eng. 60(7), 1873–1883 (2013)

    Article  Google Scholar 

  17. Gulzar, Y., Khan, S.A.: Skin lesion segmentation based on vision transformers and convolutional neural networks-a comparative study. Appl. Sci. 12(12), 5990 (2022)

    Article  Google Scholar 

  18. Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., et al.: UNETR: transformers for 3D medical image segmentation. In: WACV 2022, pp. 574–584 (2022)

    Google Scholar 

  19. He, A., Wang, K., et al.: H2Former: An efficient hierarchical hybrid transformer for medical image segmentation. IEEE Trans. Med. Imaging 42, 2763–2775 (2023)

    Google Scholar 

  20. Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., et al.: Parameter-efficient transfer learning for NLP. In: ICML 2019, pp. 2790–2799. PMLR (2019)

    Google Scholar 

  21. Jia, M., et al.: Visual prompt tuning. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIII, pp. 709–727. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19827-4_41

    Chapter  Google Scholar 

  22. Kinyanjui, N.M., et al.: Fairness of classifiers across skin tones in dermatology. In: Martel, A.L., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VI, pp. 320–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59725-2_31

    Chapter  Google Scholar 

  23. Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

  24. Li, J., Chen, J., Tang, Y., Wang, C., Landman, B.A., Zhou, S.K.: Transforming medical imaging with transformers? a comparative review of key properties, current progresses, and future perspectives. Medical image analysis p. 102762 (2023)

    Google Scholar 

  25. Liu, Z., Lin, Y., Cao, Y., Hu, H., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV 2021, pp. 10012–10022 (2021)

    Google Scholar 

  26. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  27. Maron, R.C., Hekler, A., Krieghoff-Henning, E., Schmitt, M., et al.: Reducing the impact of confounding factors on skin cancer classification via image segmentation: technical model study. J. Med. Internet Res. 23(3), e21695 (2021)

    Article  Google Scholar 

  28. Matsoukas, C., Haslum, J.F., et al.: What makes transfer learning work for medical images: feature reuse & other factors. In: CVPR 2022. pp. 9225–9234 (2022)

    Google Scholar 

  29. Mendonça, T., Ferreira, P.M., et al.: PH 2-A dermoscopic image database for research and benchmarking. In: EMBC 2013, pp. 5437–5440. IEEE (2013)

    Google Scholar 

  30. Mirikharaji, Z., Abhishek, K., Bissoto, A., Barata, C., et al.: A survey on deep learning for skin lesion segmentation. Med. Image Anal. 88, 102863 (2023)

    Article  Google Scholar 

  31. Peters, M.E., Ruder, S., Smith, N.A.: To tune or not to tune? adapting pretrained representations to diverse tasks. ACL 2019, 7 (2019)

    Google Scholar 

  32. Siegel, R.L., Miller, K.D., Wagle, N.S., Jemal, A.: Cancer statistics, 2023. CA: a cancer journal for clinicians 73(1), 17–48 (2023)

    Google Scholar 

  33. Taghanaki, S.A., Zheng, Y., Zhou, S.K., Georgescu, B., Sharma, P., Xu, D., et al.: Combo loss: handling input and output imbalance in multi-organ segmentation. Comput. Med. Imaging Graph. 75, 24–33 (2019)

    Article  Google Scholar 

  34. Tang, Y., Yang, D., Li, W., Roth, H.R., Landman, B., Xu, D., Nath, V., Hatamizadeh, A.: Self-supervised pre-training of swin transformers for 3D medical image analysis. In: CVPR 2022, pp. 20730–20740 (2022)

    Google Scholar 

  35. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In: ICML 2021. pp. 10347–10357, PMLR (2021)

    Google Scholar 

  36. Wang, J., Wei, L., Wang, L., Zhou, Q., Zhu, L., Qin, J.: Boundary-aware transformers for skin lesion segmentation. In: de Bruijne, M., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I, pp. 206–216. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_20

    Chapter  Google Scholar 

  37. Wu, H., Chen, S., et al.: FAT-Net: feature adaptive transformers for automated skin lesion segmentation. Med. Image Anal. 76, 102327 (2022)

    Article  Google Scholar 

  38. Wu, J., Fu, R., Fang, H., Liu, Y., Wang, Z., Xu, Y., Jin, Y., Arbel, T.: Medical SAM adapter: adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620 (2023)

  39. Xie, Y., Zhang, J., Xia, Y., Shen, C.: A mutual bootstrapping model for automated skin lesion segmentation and classification. IEEE Trans. Med. Imaging 39(7), 2482–2493 (2020)

    Article  Google Scholar 

  40. Yan, Y., Kawahara, J., Hamarneh, G.: Melanoma Recognition via Visual Attention. In: Chung, A.C.S., Gee, J.C., Yushkevich, P.A., Bao, S. (eds.) Information Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings, pp. 793–804. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20351-1_62

    Chapter  Google Scholar 

  41. Yang, T., Zhu, Y., Xie, Y., Zhang, A., Chen, C., Li, M.: AIM: adapting image models for efficient video action recognition. In: ICLR 2023 (2023)

    Google Scholar 

  42. Zhang, Y., Liu, H., Hu, Q.: TransFuse: fusing transformers and CNNs for medical image segmentation. In: de Bruijnede Bruijne, M., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part I, pp. 14–24. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_2

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siyi Du .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 210 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Du, S., Bayasi, N., Hamarneh, G., Garbi, R. (2023). AViT: Adapting Vision Transformers for Small Skin Lesion Segmentation Datasets. In: Celebi, M.E., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 Workshops . MICCAI 2023. Lecture Notes in Computer Science, vol 14393. Springer, Cham. https://doi.org/10.1007/978-3-031-47401-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47401-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47400-2

  • Online ISBN: 978-3-031-47401-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics