OperA: Attention-Regularized Transformers for Surgical Phase Recognition

Czempiel, Tobias; Paschali, Magdalini; Ostler, Daniel; Kim, Seong Tae; Busam, Benjamin; Navab, Nassir

doi:10.1007/978-3-030-87202-1_58

Tobias Czempiel¹⁵,
Magdalini Paschali¹⁵,
Daniel Ostler¹⁶,
Seong Tae Kim¹⁷,
Benjamin Busam¹⁵ &
…
Nassir Navab^15,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12904))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

7165 Accesses
43 Citations

Abstract

In this paper we introduce OperA, a transformer-based model that accurately predicts surgical phases from long video sequences. A novel attention regularization loss encourages the model to focus on high-quality frames during training. Moreover, the attention weights are utilized to identify characteristic high attention frames for each surgical phase, which could further be used for surgery summarization. OperA is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos, outperforming various state-of-the-art temporal refinement approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Maier-Hein, L., et al.: Surgical data science: a consensus perspective. arXiv preprint arXiv:1806.03184 (2018)
Garrow, C.R., et al.: Machine learning for surgical phase recognition: a systematic review. Ann. Surg. 273, 684–693 (2020)
Article Google Scholar
Padoy, N.: Machine and deep learning for workflow recognition during surgery. Minim. Invasive Ther. Allied Technol. 28, 82–90 (2019)
Article Google Scholar
Huaulmé, A., Jannin, P., Reche, F., Faucheron, J.L., Moreau-Gaudry, A., Voros, S.: Offline identification of surgical deviations in laparoscopic rectopexy. Artif. Intell. Med. 104(May), 2020 (2019)
Google Scholar
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2017)
Article Google Scholar
Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3D convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 14(7), 1217–1225 (2019)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 2017-Decem, no. Nips, pp. 5999–6009 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 1, no. Mlm, pp. 4171–4186 (2019)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Ahmadi, S.-A., Sielhorst, T., Stauder, R., Horn, M., Feussner, H., Navab, N.: Recovery of surgical workflow without explicit models. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 420–428. Springer, Heidelberg (2006). https://doi.org/10.1007/11866565_52
Chapter Google Scholar
Padoy, N., Blum, T., Ahmadi, S.A., Feussner, H., Berger, M.O., Navab, N.: Statistical modeling and recognition of surgical workflow. Med. Image Anal. 16(3), 632–641 (2012)
Article Google Scholar
Twinanda, A.P., Padoy, N., Troccaz, M.J., Hager, G.: Vision-based approaches for surgical activity recognition using laparoscopic and RBGD videos, Thesis, no. Umr 7357 (2017)
Google Scholar
Yengera, G., Mutter, D., Marescaux, J., Padoy, N.: Less is more: surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:1805.08569 (2018)
Jin, Y., et al.: SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans. Med. Imaging 37(5), 1114–1126 (2018)
Article Google Scholar
Jin, Y., et al.: Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med. Image Anal. 59, 101572 (2020)
Article Google Scholar
Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun,J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2016)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. arXiv (2020)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Ramesh, A., et al.: Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092 (2021)
Heo, L., Feig, M.: High-accuracy protein structures by combining machine-learning with physics-based refinement. Proteins 88, 637–642 (2020)
Article Google Scholar
Kondo, S.: Lapformer: surgical tool detection in laparoscopic surgical video using transformer architecture. In: Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, pp. 1–6 (2020)
Google Scholar
Jain, S., Wallace, B.C.: Attention is not explanation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, vol. 1, pp. 3543–3556. Association for Computational Linguistics (2019)
Google Scholar
Wiegreffe, S., Pinter, Y.: Attention is not not explanation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 11–20. Association for Computational Linguistics (2019)
Google Scholar
Kim, Y., Denton, C., Hoang, L., Rush, A.M.: Structured attention networks. In: International Conference on Learning Representations, pp. 1–21 (2017)
Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE (2015)
Google Scholar
Al-Rfou, R., Choe, D., Constant, N., Guo, M., Jones, L.: Character-level language modeling with deeper self-attention. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, July 2019
Google Scholar

Download references

Acknowledgements

Our research is partly funded by the DFG research unit PLAFOKON (FKZ 620/33-2) and BMBF research project ARTEKMED (FKZ 16SV8088) in collaboration with the Minimal-invasive Interdisciplinary Intervention Group.

Author information

Authors and Affiliations

Computer Aided Medical Procedures, Technische Universität München, Munich, Germany
Tobias Czempiel, Magdalini Paschali, Benjamin Busam & Nassir Navab
MITI, Klinikum Rechts der Isar, Technische Universität München, Munich, Germany
Daniel Ostler
Department of Computer Science and Engineering, Kyung Hee University, Yongin-si, South Korea
Seong Tae Kim
Computer Aided Medical Procedures, Johns Hopkins University, Baltimore, USA
Nassir Navab

Authors

Tobias Czempiel
View author publications
You can also search for this author in PubMed Google Scholar
Magdalini Paschali
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Ostler
View author publications
You can also search for this author in PubMed Google Scholar
Seong Tae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Busam
View author publications
You can also search for this author in PubMed Google Scholar
Nassir Navab
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tobias Czempiel .

Editor information

Editors and Affiliations

Erasmus MC - University Medical Center Rotterdam, Rotterdam, The Netherlands
Marleen de Bruijne
University of Basel, Allschwil, Switzerland
Philippe C. Cattin
Inria Nancy Grand Est, Villers-lès-Nancy, France
Stéphane Cotin
ICube, Université de Strasbourg, CNRS, Strasbourg, France
Nicolas Padoy
National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
Stefanie Speidel
Tencent Jarvis Lab, Shenzhen, China
Yefeng Zheng
ICube, Université de Strasbourg, CNRS, Strasbourg, France
Caroline Essert

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5770 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Czempiel, T., Paschali, M., Ostler, D., Kim, S.T., Busam, B., Navab, N. (2021). OperA: Attention-Regularized Transformers for Surgical Phase Recognition. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12904. Springer, Cham. https://doi.org/10.1007/978-3-030-87202-1_58

Download citation

DOI: https://doi.org/10.1007/978-3-030-87202-1_58
Published: 21 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87201-4
Online ISBN: 978-3-030-87202-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)