An End-to-End Spatial-Temporal Transformer Model for Surgical Action Triplet Recognition

Zou, Xiaoyang; Yu, Derong; Tao, Rong; Zheng, Guoyan

doi:10.1007/978-3-031-51485-2_14

Xiaoyang Zou¹²,
Derong Yu¹²,
Rong Tao¹² &
…
Guoyan Zheng¹²

Part of the book series: IFMBE Proceedings ((IFMBE,volume 104))

Included in the following conference series:

Asian-Pacific Conference on Medical and Biological Engineering

210 Accesses

Abstract

Surgical activity recognition plays an important role in computer assisted surgery. Recently, surgical action triplet has become the representative definition of fine-grained surgical activity, which is a combination of three components in the form of <instrument, verb and target>. In this work, we propose an end-to-end spatial-temporal transformer model trained with multi-task auxiliary supervisions, establishing a powerful baseline for surgical action triplet recognition. Rigorous experiments are conducted on a publicly available dataset CholecT45 for ablation studies and comparisons with state-of-the-arts. Experimental results show that our method outperforms state-of-the-arts by 6.8%, achieving 36.5% mAP for triplet recognition. Our method won the 2nd place in action triplet recognition racing track of CholecTriplet 2022 Challenge, which also demonstrates the superior capability of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Nwoye, C.I., Gonzalez, C., Yu, T., et al.: Recognition of instrument-tissue interactions in endoscopic videos via action triplets. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, pp. 364–374, Lima, Peru (2020)
Google Scholar
Wang, S., Xu, Z., Yan, C., et al.: Graph convolutional nets for tool presence detection in surgical videos. In: Information Processing in Medical Imaging: 26th International Conference, IPMI 2019, pp. 467–478, Hong Kong, China (2019)
Google Scholar
Jin, Y., Long, Y., Gao, X., et al.: Trans-SVNet: hybrid embedding aggregation transformer for surgical workflow analysis. Int. J. Comput. Assist. Radiol. Surg. 17(12), 1–10 (2022)
Article Google Scholar
Czempiel, T., Paschali, M., Ostler, D., et al.: Opera: attention-regularized transformers for surgical phase recognition. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, pp. 604–614 (2021)
Google Scholar
Zou, X., Liu, W., Wang, J., et al.: ARST: auto-regressive surgical transformer for phase recognition from laparoscopic videos. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 1–7 (2022)
Google Scholar
Twinanda, A.P., Shehata, S., Mutter, D., et al.: Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Nwoye, C.I., Yu, T., Gonzalez, C., et al.: Rendezvous: attention mechanisms for the recognition of surgical action triplets in endoscopic videos. Med. Image Anal. 78, 102433 (2022)
Article Google Scholar
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Nwoye, C.I., Padoy, N.: Data splits and metrics for method benchmarking on surgical action triplet datasets (2022). arXiv:2204.05235
Sharma, S., Nwoye, C.I., Mutter, D., et al.: Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition (2022). arXiv:2211.16963
Nwoye, C.I., Yu, T., Sharma, S., et al.: CholecTriplet2022: Show me a tool and tell me the triplet—An endoscopic vision challenge for surgical action triplet detection (2023). arXiv:2302.06294

Download references

Acknowledgment

This study was partially supported by Shanghai Municipal Science and Technology Commission via Project 20511105205 and by the National Natural Science Foundation of China via project U20A20199.

Author information

Authors and Affiliations

Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Dongchuan Road, Shanghai, China
Xiaoyang Zou, Derong Yu, Rong Tao & Guoyan Zheng

Authors

Xiaoyang Zou
View author publications
You can also search for this author in PubMed Google Scholar
Derong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Rong Tao
View author publications
You can also search for this author in PubMed Google Scholar
Guoyan Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoyan Zheng .

Editor information

Editors and Affiliations

Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China
Guangzhi Wang
School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, China
Dezhong Yao
State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
Zhongze Gu
Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
Yi Peng
School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
Shanbao Tong
State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, China
Chengyu Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zou, X., Yu, D., Tao, R., Zheng, G. (2024). An End-to-End Spatial-Temporal Transformer Model for Surgical Action Triplet Recognition. In: Wang, G., Yao, D., Gu, Z., Peng, Y., Tong, S., Liu, C. (eds) 12th Asian-Pacific Conference on Medical and Biological Engineering. APCMBE 2023. IFMBE Proceedings, vol 104. Springer, Cham. https://doi.org/10.1007/978-3-031-51485-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-51485-2_14
Published: 26 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51484-5
Online ISBN: 978-3-031-51485-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics