Self-distillation for Surgical Action Recognition

Yamlahi, Amine; Tran, Thuy Nuong; Godau, Patrick; Schellenberg, Melanie; Michael, Dominik; Smidt, Finn-Henri; Nölke, Jan-Hinrich; Adler, Tim J.; Tizabi, Minu Dietlinde; Nwoye, Chinedu Innocent; Padoy, Nicolas; Maier-Hein, Lena

doi:10.1007/978-3-031-43996-4_61

Amine Yamlahi^14,15,
Thuy Nuong Tran^14,18,
Patrick Godau^14,15,16,18,
Melanie Schellenberg^14,15,16,18,
Dominik Michael^14,15,
Finn-Henri Smidt¹⁴,
Jan-Hinrich Nölke^14,18,
Tim J. Adler^14,15,18,
Minu Dietlinde Tizabi¹⁴,
Chinedu Innocent Nwoye¹⁷,
Nicolas Padoy¹⁷ &
…
Lena Maier-Hein^14,15,18,19

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14228))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

3243 Accesses
2 Citations

Abstract

Surgical scene understanding is a key prerequisite for context-aware decision support in the operating room. While deep learning-based approaches have already reached or even surpassed human performance in various fields, the task of surgical action recognition remains a major challenge. With this contribution, we are the first to investigate the concept of self-distillation as a means of addressing class imbalance and potential label ambiguity in surgical video analysis. Our proposed method is a heterogeneous ensemble of three models that use Swin Transformers as backbone and the concepts of self-distillation and multi-task learning as core design choices. According to ablation studies performed with the CholecT45 challenge data via cross-validation, the biggest performance boost is achieved by the usage of soft labels obtained by self-distillation. External validation of our method on an independent test set was achieved by providing a Docker container of our inference model to the challenge organizers. According to their analysis, our method outperforms all other solutions submitted to the latest challenge in the field. Our approach thus shows the potential of self-distillation for becoming an important tool in medical image analysis applications. Code available at https://github.com/IMSY-DKFZ/self-distilled-swin.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Castro, D.C., Walker, I., Glocker, B.: Causality matters in medical imaging. Nat. Commun. 11(1), 3673 (2020)
Article Google Scholar
Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33
Chapter Google Scholar
Czempiel, T., Paschali, M., Ostler, D., Kim, S.T., Busam, B., Navab, N.: OperA: attention-regularized transformers for surgical phase recognition. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 604–614. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_58
Chapter Google Scholar
Eisenmann, M., et al.: Biomedical image analysis competitions: The state of current participation practice. arXiv preprint arXiv:2212.08568 (2022)
Gao, X., Jin, Y., Long, Y., Dou, Q., Heng, P.-A.: Trans-SVNet: accurate phase recognition from surgical videos via hybrid embedding aggregation transformer. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 593–603. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_57
Chapter Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Jin, Y., et al.: Sv-rcnet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans. Med. Imaging 37(5), 1114–1126 (2017)
Article MathSciNet Google Scholar
Jin, Y., Long, Y., Chen, C., Zhao, Z., Dou, Q., Heng, P.A.: Temporal memory relation network for workflow recognition from surgical video. IEEE Trans. Med. Imaging 40(7), 1911–1923 (2021)
Article Google Scholar
Kim, K., Ji, B., Yoon, D., Hwang, S.: Self-knowledge distillation with progressive refinement of targets. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6567–6576 (2021)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10012–10022 (October 2021)
Google Scholar
Maier-Hein, L., et al.: Surgical data science-from concepts toward clinical translation. Med. Image Anal. 76, 102306 (2022)
Article Google Scholar
MICCAI SIG for Challenges: MICCAI registered challenges (2022). https://www.miccai.org/special-interest-groups/challenges/miccai-registered-challenges/
Mobahi, H., Farajtabar, M., Bartlett, P.: Self-distillation amplifies regularization in hilbert space. Adv. Neural. Inf. Process. Syst. 33, 3351–3361 (2020)
Google Scholar
Nwoye, C.I., et al.: Cholectriplet 2021: a benchmark challenge for surgical action triplet recognition. arXiv preprint arXiv:2204.04746 (2022)
Nwoye, C.I., et al.: Recognition of instrument-tissue interactions in endoscopic videos via action triplets. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 364–374. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_35
Chapter Google Scholar
Nwoye, C.I., Padoy, N.: Data splits and metrics for benchmarking methods on surgical action triplet datasets. arXiv preprint arXiv:2204.05235 (2022)
Nwoye, C.I., Padoy, N.: Surgical action triplet detection 2022 (2022). https://cholectriplet2022.grand-challenge.org/
Nwoye, C.I., et al.: Rendezvous: attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Med. Image Anal. 78, 102433 (2022)
Article Google Scholar
Nwoye, C.I., , et al.: Cholectriplet 2022: show me a tool and tell me the triplet-an endoscopic vision challenge for surgical action triplet detection. arXiv preprint arXiv:2302.06294 (2023)
Ramesh, S., et al.: Multi-task temporal convolutional networks for joint recognition of surgical phases and steps in gastric bypass procedures. Int. J. Comput. Assist. Radiol. Surg. 16(7), 1111–1119 (2021). https://doi.org/10.1007/s11548-021-02388-z
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)
Article Google Scholar
Vu, D.Q., Le, N., Wang, J.C.: Teaching yourself: a self-knowledge distillation approach to action recognition. IEEE Access 9, 105711–105723 (2021)
Article Google Scholar
Wightman, R.: Pytorch image models. https://github.com/rwightman/pytorch-image-models (2019). https://doi.org/10.5281/zenodo.4414861
Yu, T., Mutter, D., Marescaux, J., Padoy, N.: Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. arXiv preprint arXiv:1812.00033 (2018)
Yun, S., Park, J., Lee, K., Shin, J.: Regularizing class-wise predictions via self-knowledge distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13876–13885 (2020)
Google Scholar

Download references

Acknowledgements

This project was supported by a Twinning Grant of the German Cancer Research Center (DKFZ) and the Robert Bosch Center for Tumor Diseases (RBCT). Part of this work was funded by HELMHOLTZ IMAGING, a platform of the Helmholtz Information & Data Science Incubator and the Helmholtz Association under the joint research school "HIDSS4Health - Helmholtz Information and Data Science School for Health" and by French state funds managed within the Plan Investissements d’Avenir by the ANR under references: National AI Chair AI4ORSafety [ANR-20-CHIA-0029-01], Labex CAMI [ANR-11-LABX-0004], DeepSurg [ANR-16-CE33-0009], IHU Strasbourg [ANR-10-IAHU-02] and by BPI France under references: project CONDOR, project 5G-OR. Model Docker evaluation were performed with servers/HPC resources managed by CAMMA, IHU Strasbourg, Unistra Mésocentre, and GENCI-IDRIS [Grant 2021-AD011011638R1, 2021-AD011011638R2, 2021-AD011011638R3].

Author information

Authors and Affiliations

Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ), Heidelberg, Germany
Amine Yamlahi, Thuy Nuong Tran, Patrick Godau, Melanie Schellenberg, Dominik Michael, Finn-Henri Smidt, Jan-Hinrich Nölke, Tim J. Adler, Minu Dietlinde Tizabi & Lena Maier-Hein
National Center for Tumor Diseases (NCT), NCT Heidelberg a Partnership between DKFZ and University Medical Center Heidelberg, Heidelberg, Germany
Amine Yamlahi, Patrick Godau, Melanie Schellenberg, Dominik Michael, Tim J. Adler & Lena Maier-Hein
HIDSS4Health - Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
Patrick Godau & Melanie Schellenberg
ICube Laboratory, University of Strasbourg, Strasbourg, France
Chinedu Innocent Nwoye & Nicolas Padoy
Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany
Thuy Nuong Tran, Patrick Godau, Melanie Schellenberg, Jan-Hinrich Nölke, Tim J. Adler & Lena Maier-Hein
Medical Faculty, Heidelberg University, Heidelberg, Germany
Lena Maier-Hein

Authors

Amine Yamlahi
View author publications
You can also search for this author in PubMed Google Scholar
Thuy Nuong Tran
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Godau
View author publications
You can also search for this author in PubMed Google Scholar
Melanie Schellenberg
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Michael
View author publications
You can also search for this author in PubMed Google Scholar
Finn-Henri Smidt
View author publications
You can also search for this author in PubMed Google Scholar
Jan-Hinrich Nölke
View author publications
You can also search for this author in PubMed Google Scholar
Tim J. Adler
View author publications
You can also search for this author in PubMed Google Scholar
Minu Dietlinde Tizabi
View author publications
You can also search for this author in PubMed Google Scholar
Chinedu Innocent Nwoye
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Padoy
View author publications
You can also search for this author in PubMed Google Scholar
Lena Maier-Hein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amine Yamlahi .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 645 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yamlahi, A. et al. (2023). Self-distillation for Surgical Action Recognition. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14228. Springer, Cham. https://doi.org/10.1007/978-3-031-43996-4_61

Download citation

DOI: https://doi.org/10.1007/978-3-031-43996-4_61
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43995-7
Online ISBN: 978-3-031-43996-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Self-distillation for Surgical Action Recognition