Skip to main content

Automatic Operating Room Surgical Activity Recognition for Robot-Assisted Surgery

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2020 (MICCAI 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12263))

Abstract

Automatic recognition of surgical activities in the operating room (OR) is a key technology for creating next generation intelligent surgical devices and workflow monitoring/support systems. Such systems can potentially enhance efficiency in the OR, resulting in lower costs and improved care delivery to the patients. In this paper, we investigate automatic surgical activity recognition in robot-assisted operations. We collect the first large-scale dataset including 400 full-length multi-perspective videos from a variety of robotic surgery cases captured using Time-of-Flight cameras. We densely annotate the videos with 10 most recognized and clinically relevant classes of activities. Furthermore, we investigate state-of-the-art computer vision action recognition techniques and adapt them for the OR environment and the dataset. First, we fine-tune the Inflated 3D ConvNet (I3D) for clip-level activity recognition on our dataset and use it to extract features from the videos. These features are then fed to a stack of 3 Temporal Gaussian Mixture layers which extracts context from neighboring clips, and eventually go through a Long Short Term Memory network to learn the order of activities in full-length videos. We extensively assess the model and reach a peak performance of \({\sim }88\%\) mean Average Precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Catchpole, K., et al.: Safety, efficiency and learning curves in robotic surgery: a human factors analysis. Surg. Endosc. 30(9), 3749–3761 (2015). https://doi.org/10.1007/s00464-015-4671-2

    Article  Google Scholar 

  2. Vercauteren, T., Unberath, M., Padoy, N., Navab, N.: CAI4CAI: the rise of contextual artificial intelligence in computer-assisted interventions. Proc. IEEE 108(1), 198–214 (2019)

    Article  Google Scholar 

  3. Allers, J.C., et al.: Evaluation and impact of workflow interruptions during robot-assisted surgery. Urology 92, 33–37 (2016)

    Article  Google Scholar 

  4. Zeybek, B., Öge, T., Kılıç, C.H., Borahay, M.A., Kılıç, G.S.: A financial analysis of operating room charges for robot-assisted gynaecologic surgery: efficiency strategies in the operating room for reducing the costs. J. Turk. Ger. Gynecol. Assoc. 15(1), 25 (2014)

    Article  Google Scholar 

  5. Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)

  6. Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., Zisserman, A.: A short note about kinetics-600. arXiv preprint arXiv:1808.01340 (2018)

  7. Carreira, J., Noland, E., Hillier, C., Zisserman, A.: A short note on the kinetics-700 human action dataset. arXiv preprint arXiv:1907.06987 (2019)

  8. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

    Google Scholar 

  9. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  10. Dutta, A., Zisserman, A.: The VIA annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 2276–2279, October 2019

    Google Scholar 

  11. Yengera, G., Mutter, D., Marescaux, J., Padoy, N.: Less is more: surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv preprint arXiv:1805.08569 (2018)

  12. Yeung, S., et al.: Vision-based hand hygiene monitoring in hospitals. In: AMIA (2016)

    Google Scholar 

  13. Ma, A.J., et al.: Measuring patient mobility in the ICU using a novel noninvasive sensor. Crit. Care Med. 45(4), 630 (2017)

    Article  Google Scholar 

  14. Yeung, S., et al.: A computer vision system for deep learning-based detection of patient mobilization activities in the ICU. NPJ Digit. Med. 2(1), 1–5 (2019)

    Article  Google Scholar 

  15. Chou, E., et al.: Privacy-preserving action recognition for smart hospitals using low-resolution depth images. arXiv preprint arXiv:1811.09950 (2018)

  16. Zia, A., Hung, A., Essa, I., Jarc, A.: Surgical activity recognition in robot-assisted radical prostatectomy using deep learning. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 273–280. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_32

    Chapter  Google Scholar 

  17. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  18. Piergiovanni, A.J., Ryoo, M.S.: Temporal Gaussian mixture layer for videos. arXiv preprint arXiv:1803.06316 (2018)

  19. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

    Google Scholar 

  20. Sudhakaran, S., Escalera, S., Lanz, O.: Gate-shift networks for video action recognition. arXiv preprint arXiv:1912.00381 (2019)

  21. Liu, Y., Ma, L., Zhang, Y., Liu, W., Chang, S.F.: Multi-granularity generator for temporal action proposal. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3604–3613 (2019)

    Google Scholar 

  22. Xu, H., Das, A., Saenko, K.: Two-stream region convolutional 3D network for temporal activity detection. IEEE Trans. Pattern Anal. Mach. Intell. 41(10), 2319–2332 (2019)

    Article  Google Scholar 

  23. Yeung, S., Russakovsky, O., Jin, N., Andriluka, M., Mori, G., Fei-Fei, L.: Every moment counts: dense detailed labeling of actions in complex videos. Int. J. Comput. Vis. 126(2–4), 375–389 (2018)

    Article  MathSciNet  Google Scholar 

  24. Ryoo, M.S., Piergiovanni, A.J., Tan, M., Angelova, A.: AssembleNet: searching for multi-stream neural connectivity in video architectures. arXiv preprint arXiv:1905.13209 (2019)

  25. Tran, D., Wang, H., Torresani, L., Feiszli, M.: Video classification with channel-separated convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5552–5561 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aidean Sharghi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sharghi, A., Haugerud, H., Oh, D., Mohareri, O. (2020). Automatic Operating Room Surgical Activity Recognition for Robot-Assisted Surgery. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12263. Springer, Cham. https://doi.org/10.1007/978-3-030-59716-0_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59716-0_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59715-3

  • Online ISBN: 978-3-030-59716-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics