Automatic Operating Room Surgical Activity Recognition for Robot-Assisted Surgery

Sharghi, Aidean; Haugerud, Helene; Oh, Daniel; Mohareri, Omid

doi:10.1007/978-3-030-59716-0_37

Aidean Sharghi¹⁶,
Helene Haugerud¹⁶,
Daniel Oh¹⁶ &
…
Omid Mohareri¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12263))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

7354 Accesses
20 Citations

Abstract

Automatic recognition of surgical activities in the operating room (OR) is a key technology for creating next generation intelligent surgical devices and workflow monitoring/support systems. Such systems can potentially enhance efficiency in the OR, resulting in lower costs and improved care delivery to the patients. In this paper, we investigate automatic surgical activity recognition in robot-assisted operations. We collect the first large-scale dataset including 400 full-length multi-perspective videos from a variety of robotic surgery cases captured using Time-of-Flight cameras. We densely annotate the videos with 10 most recognized and clinically relevant classes of activities. Furthermore, we investigate state-of-the-art computer vision action recognition techniques and adapt them for the OR environment and the dataset. First, we fine-tune the Inflated 3D ConvNet (I3D) for clip-level activity recognition on our dataset and use it to extract features from the videos. These features are then fed to a stack of 3 Temporal Gaussian Mixture layers which extracts context from neighboring clips, and eventually go through a Long Short Term Memory network to learn the order of activities in full-length videos. We extensively assess the model and reach a peak performance of \({\sim }88\%\) mean Average Precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Catchpole, K., et al.: Safety, efficiency and learning curves in robotic surgery: a human factors analysis. Surg. Endosc. 30(9), 3749–3761 (2015). https://doi.org/10.1007/s00464-015-4671-2
Article Google Scholar
Vercauteren, T., Unberath, M., Padoy, N., Navab, N.: CAI4CAI: the rise of contextual artificial intelligence in computer-assisted interventions. Proc. IEEE 108(1), 198–214 (2019)
Article Google Scholar
Allers, J.C., et al.: Evaluation and impact of workflow interruptions during robot-assisted surgery. Urology 92, 33–37 (2016)
Article Google Scholar
Zeybek, B., Öge, T., Kılıç, C.H., Borahay, M.A., Kılıç, G.S.: A financial analysis of operating room charges for robot-assisted gynaecologic surgery: efficiency strategies in the operating room for reducing the costs. J. Turk. Ger. Gynecol. Assoc. 15(1), 25 (2014)
Article Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., Zisserman, A.: A short note about kinetics-600. arXiv preprint arXiv:1808.01340 (2018)
Carreira, J., Noland, E., Hillier, C., Zisserman, A.: A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987 (2019)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Dutta, A., Zisserman, A.: The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2276–2279, October 2019
Google Scholar
Yengera, G., Mutter, D., Marescaux, J., Padoy, N.: Less is more: surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:1805.08569 (2018)
Yeung, S., et al.: Vision-based hand hygiene monitoring in hospitals. In: AMIA (2016)
Google Scholar
Ma, A.J., et al.: Measuring patient mobility in the ICU using a novel noninvasive sensor. Crit. Care Med. 45(4), 630 (2017)
Article Google Scholar
Yeung, S., et al.: A computer vision system for deep learning-based detection of patient mobilization activities in the ICU. NPJ Digit. Med. 2(1), 1–5 (2019)
Article Google Scholar
Chou, E., et al.: Privacy-preserving action recognition for smart hospitals using low-resolution depth images. arXiv preprint arXiv:1811.09950 (2018)
Zia, A., Hung, A., Essa, I., Jarc, A.: Surgical activity recognition in robot-assisted radical prostatectomy using deep learning. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 273–280. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_32
Chapter Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Piergiovanni, A.J., Ryoo, M.S.: Temporal Gaussian mixture layer for videos. arXiv preprint arXiv:1803.06316 (2018)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar
Sudhakaran, S., Escalera, S., Lanz, O.: Gate-shift networks for video action recognition. arXiv preprint arXiv:1912.00381 (2019)
Liu, Y., Ma, L., Zhang, Y., Liu, W., Chang, S.F.: Multi-granularity generator for temporal action proposal. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3604–3613 (2019)
Google Scholar
Xu, H., Das, A., Saenko, K.: Two-stream region convolutional 3D network for temporal activity detection. IEEE Trans. Pattern Anal. Mach. Intell. 41(10), 2319–2332 (2019)
Article Google Scholar
Yeung, S., Russakovsky, O., Jin, N., Andriluka, M., Mori, G., Fei-Fei, L.: Every moment counts: dense detailed labeling of actions in complex videos. Int. J. Comput. Vis. 126(2–4), 375–389 (2018)
Article MathSciNet Google Scholar
Ryoo, M.S., Piergiovanni, A.J., Tan, M., Angelova, A.: AssembleNet: searching for multi-stream neural connectivity in video architectures. arXiv preprint arXiv:1905.13209 (2019)
Tran, D., Wang, H., Torresani, L., Feiszli, M.: Video classification with channel-separated convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5552–5561 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Intuitive Surgical Inc., Sunnyvale, CA, USA
Aidean Sharghi, Helene Haugerud, Daniel Oh & Omid Mohareri

Authors

Aidean Sharghi
View author publications
You can also search for this author in PubMed Google Scholar
Helene Haugerud
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Oh
View author publications
You can also search for this author in PubMed Google Scholar
Omid Mohareri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aidean Sharghi .

Editor information

Editors and Affiliations

University of Toronto, Toronto, ON, Canada
Anne L. Martel
The University of British Columbia, Vancouver, BC, Canada
Purang Abolmaesumi
University College London, London, UK
Danail Stoyanov
École Centrale de Nantes, Nantes, France
Diana Mateus
EURECOM, Biot, France
Maria A. Zuluaga
Chinese Academy of Sciences, Beijing, China
S. Kevin Zhou
Sorbonne University, Paris, France
Daniel Racoceanu
The Hebrew University of Jerusalem, Jerusalem, Israel
Leo Joskowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharghi, A., Haugerud, H., Oh, D., Mohareri, O. (2020). Automatic Operating Room Surgical Activity Recognition for Robot-Assisted Surgery. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12263. Springer, Cham. https://doi.org/10.1007/978-3-030-59716-0_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-59716-0_37
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59715-3
Online ISBN: 978-3-030-59716-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)