Can surgical computer vision benefit from large-scale visual foundation models?

Rabbani, Navid; Bartoli, Adrien

doi:10.1007/s11548-024-03125-y

Can surgical computer vision benefit from large-scale visual foundation models?

Short communication
Published: 12 April 2024

(2024)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Navid Rabbani¹ &
Adrien Bartoli¹

57 Accesses
Explore all metrics

Abstract

Purpose

We investigate whether foundation models pretrained on diverse visual data could be beneficial to surgical computer vision. We use instrument and uterus segmentation in mini-invasive procedures as benchmarks. We propose multiple supervised, unsupervised and few-shot supervised adaptations of foundation models, including two novel adaptation methods.

Methods

We use DINOv1, DINOv2, DINOv2 with registers, and SAM backbones, with the ART-Net surgical instrument and the SurgAI3.8K uterus segmentation datasets. We investigate five approaches: DINO unsupervised, few-shot learning with a linear decoder, supervised learning with the proposed DINO-UNet adaptation, DPT with DINO encoder, and unsupervised learning with the proposed SAM adaptation.

Results

We evaluate 17 models for instrument segmentation and 7 models for uterus segmentation and compare to existing ad hoc models for the tasks at hand. We show that the linear decoder can be learned with few shots. The unsupervised and linear decoder methods obtain slightly subpar results but could be considered useful in data scarcity settings. The unsupervised SAM model produces finer edges but has inconsistent outputs. However, DPT and DINO-UNet obtain strikingly good results, defining a new state of the art by outperforming the previous-best by 5.6 and 4.1 pp for instrument and 4.4 and 1.5 pp for uterus segmentation. Both methods obtain semantic and spatial precision, accurately segmenting intricate details.

Conclusion

Our results show the huge potential of using DINO and SAM for surgical computer vision, indicating a promising role for visual foundation models in medical image analysis, particularly in scenarios with limited or complex data

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, Min Y, Zhang B, Zhang J, Dong Z et al (2023) A survey of large language models. arXiv preprint arXiv:2303.18223
Oquab M, Darcet T, Moutakanni T, Vo HV, Szafraniec M et al (2023) DINOv2: learning robust visual features without supervision. arXiv:2304.07193
Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo W-Y et al (2023) Segment anything. In: ICCV
Zou X, Yang J, Zhang H, Li F, Li L, Wang J, Wang L, Gao J, Lee YJ (2023) Segment everything everywhere all at once. In: NeurIPS
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G et al (2021) Learning transferable visual models from natural language supervision. In: ICML
Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. In: ICCV
Darcet T, Oquab M, Mairal J, Bojanowski P (2023) Vision transformers need registers. arXiv:2309.16588
Hasan MK, Calvet L, Rabbani N, Bartoli A (2021) Detection, segmentation, and 3D pose estimation of surgical tools using convolutional neural networks and algebraic geometry. Med Image Anal 70:101994
Article PubMed Google Scholar
Zadeh SM, François T, Comptour A, Canis M, Bourdel N, Bartoli A (2023) Surgai3. 8k: a labeled dataset of gynecologic organs in laparoscopy with application to automatic augmented reality surgical guidance. J Minim Invasive Gynecol 30(5):397–405
Article Google Scholar
Ranftl R, Bochkovskiy A, Koltun V (2021) Vision transformers for dense prediction. ArXiv preprint
Ramesh S, Srivastav V, Alapatt D, Yu T, Murali A, Sestini L, Nwoye CI, Hamoud I, Sharma S, Fleurentin A et al (2023) Dissecting self-supervised learning methods for surgical computer vision. Med Image Anal 88:102844

Download references

Author information

Authors and Affiliations

DIA2M, DRCI, CHU Clermont-Ferrand, Clermont-Ferrand, France
Navid Rabbani & Adrien Bartoli

Authors

Navid Rabbani
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Bartoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Navid Rabbani.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest

Ethical approval

All procedures involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors. Informed consent was obtained from the patients included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rabbani, N., Bartoli, A. Can surgical computer vision benefit from large-scale visual foundation models?. Int J CARS (2024). https://doi.org/10.1007/s11548-024-03125-y

Download citation

Received: 03 March 2024
Accepted: 22 March 2024
Published: 12 April 2024
DOI: https://doi.org/10.1007/s11548-024-03125-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can surgical computer vision benefit from large-scale visual foundation models?

Abstract

Purpose

Methods

Results

Conclusion

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation