Surgical Skill Assessment via Video Semantic Aggregation

Li, Zhenqiang; Gu, Lin; Wang, Weimin; Nakamura, Ryosuke; Sato, Yoichi

doi:10.1007/978-3-031-16449-1_39

Zhenqiang Li¹²,
Lin Gu¹³,
Weimin Wang¹⁴,
Ryosuke Nakamura¹⁵ &
…
Yoichi Sato¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13437))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

5237 Accesses
4 Citations

Abstract

Automated video-based assessment of surgical skills is a promising task in assisting young surgical trainees, especially in poor-resource areas. Existing works often resort to a CNN-LSTM joint framework that models long-term relationships by LSTMs on spatially pooled short-term CNN features. However, this practice would inevitably neglect the difference among semantic concepts such as tools, tissues, and background in the spatial dimension, impeding the subsequent temporal relationship modeling. In this paper, we propose a novel skill assessment framework, Video Semantic Aggregation (ViSA), which discovers different semantic parts and aggregates them across spatiotemporal dimensions. The explicit discovery of semantic parts provides an explanatory visualization that helps understand the neural network’s decisions. It also enables us to further incorporate auxiliary information such as the kinematic data to improve representation learning and performance. The experiments on two datasets show the competitiveness of ViSA compared to state-of-the-art methods. Source code is available at: bit.ly/MICCAI2022ViSA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Evaluating Surgical skills from kinematic data using convolutional neural networks. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 214–221. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_25
Chapter Google Scholar
Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3d convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 14(7), 1217–1225 (2019)
Article Google Scholar
Gao, J., et al.: An asymmetric modeling for action assessment. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12375, pp. 222–238. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58577-8_14
Chapter Google Scholar
Gao, Y., et al.: Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In: MICCAI workshop: M2cai, vol. 3, p. 3 (2014)
Google Scholar
Girdhar, R., Ramanan, D., Gupta, A., Sivic, J., Russell, B.: Actionvlad: learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 971–980 (2017)
Google Scholar
Hassani, A., Walton, S., Shah, N., Abuduweili, A., Li, J., Shi, H.: Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:2104.05704 (2021)
Huang, Z., Li, Y.: Interpretable and accurate fine-grained recognition via region grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8662–8672 (2020)
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Kelly, J.D., Petersen, A., Lendvay, T.S., Kowalewski, T.M.: Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks. Int. J. Comput. Assist. Radiol. Surg. 15(12), 2079–2088 (2020). https://doi.org/10.1007/s11548-020-02269-x
Article Google Scholar
Lavanchy, J.L., et al.: Automation of surgical skill assessment using a three-stage machine learning algorithm. Sci. Rep. 11(1), 1–9 (2021)
Google Scholar
Li, Z., Huang, Y., Cai, M., Sato, Y.: Manipulation-skill assessment from videos with spatial attention network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp. 0–0 (2019)
Google Scholar
Li, Z., Wang, W., Li, Z., Huang, Y., Sato, Y.: Spatio-temporal perturbations for video attribution. IEEE Trans. Circuits Syst. Video Technol. 32(4), 2043–2056 (2021)
Article Google Scholar
Liu, D., Li, Q., Jiang, T., Wang, Y., Miao, R., Shan, F., Li, Z.: Towards unified surgical skill assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2021)
Google Scholar
Pan, J.H., Gao, J., Zheng, W.S.: Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 6331–6340 (2019)
Google Scholar
Parmar, P., Morris, B.: Action quality assessment across multiple actions. In: IEEE Winter Conference on Applications of Computer Vision. pp. 1468–1476 (2019)
Google Scholar
Parmar, P., Tran Morris, B.: Learning to score olympic events. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2017)
Google Scholar
Tang, Y., Ni, Z., Zhou, J., Zhang, D., Lu, J., Wu, Y., Zhou, J.: Uncertainty-aware score distribution learning for action quality assessment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Tao, L., Elhamifar, E., Khudanpur, S., Hager, G.D., Vidal, R.: Sparse hidden Markov models for surgical gesture classification and skill evaluation. In: Abolmaesumi, P., Joskowicz, L., Navab, N., Jannin, P. (eds.) IPCAI 2012. LNCS, vol. 7330, pp. 167–177. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30618-1_17
Chapter Google Scholar
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Advances in Neural Information Processing Systems 30 (2017)
Google Scholar
Wagner, M., et al.: Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the heichole benchmark. arXiv preprint arXiv:2109.14956 (2021)
Wang, T., Wang, Y., Li, M.: Towards accurate and interpretable surgical skill assessment: a video-based method incorporating recognized surgical gestures and skill levels. In: Martel, A.L., Abolmaesumi, P., Stoyanov, D., Mateus, D., Zuluaga, M.A., Zhou, S.K., Racoceanu, D., Joskowicz, L. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 668–678. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_64
Chapter Google Scholar
Wang, Z., Majewicz Fey, A.: Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int. J. Comput. Assist. Radiol. Surg. 13(12), 1959–1970 (2018). https://doi.org/10.1007/s11548-018-1860-1
Article Google Scholar
Xiang, X., Tian, Y., Reiter, A., Hager, G.D., Tran, T.D.: S3d: stacking segmental p3d for action quality assessment. In: IEEE International Conference on Image Processing, pp. 928–932. IEEE (2018)
Google Scholar
Zia, A., Essa, I.: Automated surgical skill assessment in RMIS training. Int. J. Comput. Assist. Radiol. Surg. 13(5), 731–739 (2018)
Article Google Scholar
Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Essa, I.: Video and accelerometer-based motion analysis for automated surgical skills assessment. Int. J. Comput. Assist. Radiol. Surg. 13(3), 443–455 (2018). https://doi.org/10.1007/s11548-018-1704-z
Article Google Scholar

Download references

Acknowledgment

This work is supported by JST AIP Acceleration Research Grant Number JPMJCR20U1, JSPS KAKENHI Grant Number JP20H04205, JST ACT-X Grant Number JPMJAX190D, JST Moonshot R &D Grant Number JPMJMS2011, Fundamental Research Funds for the Central Universities under Grant DUT21RC(3)028 and a project commissioned by NEDO.

Author information

Authors and Affiliations

The University of Tokyo, Tokyo, Japan
Zhenqiang Li & Yoichi Sato
RIKEN, Tokyo, Japan
Lin Gu
Dalian University of Technology, Dalian, China
Weimin Wang
AIST, Tsukuba, Japan
Ryosuke Nakamura

Authors

Zhenqiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Lin Gu
View author publications
You can also search for this author in PubMed Google Scholar
Weimin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ryosuke Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Yoichi Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weimin Wang .

Editor information

Editors and Affiliations

Rochester Institute of Technology, Rochester, NY, USA
Linwei Wang
Chinese University of Hong Kong, Hong Kong, Hong Kong
Qi Dou
University of Virginia, Charlottesville, VA, USA
P. Thomas Fletcher
National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
Stefanie Speidel
Case Western Reserve University, Cleveland, OH, USA
Shuo Li

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 726 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Z., Gu, L., Wang, W., Nakamura, R., Sato, Y. (2022). Surgical Skill Assessment via Video Semantic Aggregation. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13437. Springer, Cham. https://doi.org/10.1007/978-3-031-16449-1_39

Download citation

DOI: https://doi.org/10.1007/978-3-031-16449-1_39
Published: 17 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16448-4
Online ISBN: 978-3-031-16449-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Surgical Skill Assessment via Video Semantic Aggregation