E-DSSR: Efficient Dynamic Surgical Scene Reconstruction with Transformer-Based Stereoscopic Depth Perception

Long, Yonghao; Li, Zhaoshuo; Yee, Chi Hang; Ng, Chi Fai; Taylor, Russell H.; Unberath, Mathias; Dou, Qi

doi:10.1007/978-3-030-87202-1_40

Yonghao Long¹⁵,
Zhaoshuo Li¹⁶,
Chi Hang Yee¹⁷,
Chi Fai Ng¹⁷,
Russell H. Taylor¹⁶,
Mathias Unberath¹⁶ &
…
Qi Dou^15,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12904))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

7601 Accesses
25 Citations

Abstract

Reconstructing the scene of robotic surgery from the stereo endoscopic video is an important and promising topic in surgical data science, which potentially supports many applications such as surgical visual perception, robotic surgery education and intra-operative context awareness. However, current methods are mostly restricted to reconstructing static anatomy assuming no tissue deformation, tool occlusion and de-occlusion, and camera movement. However, these assumptions are not always satisfied in minimal invasive robotic surgeries. In this work, we present an efficient reconstruction pipeline for highly dynamic surgical scenes that runs at 28 fps. Specifically, we design a transformer-based stereoscopic depth perception for efficient depth estimation and a light-weight tool segmentor to handle tool occlusion. After that, a dynamic reconstruction algorithm which can estimate the tissue deformation and camera movement, and aggregate the information over time is proposed for surgical scene reconstruction. We evaluate the proposed pipeline on two datasets, the public Hamlyn Centre Endoscopic Video Dataset and our in-house DaVinci robotic surgery dataset. The results demonstrate that our method can recover the scene obstructed by the surgical tool and handle the movement of camera in realistic surgical scenarios effectively at real-time speed.

Y. Long and Z. Li—Authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Allan, M., et al.: 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426 (2019)
Ferguson, J.M., et al.: Comparing the accuracy of the da Vinci xi and da Vinci si for image guidance and automation. Int. J. Med. Robot. Comput. Assist. Surgery 16(6), 1–10 (2020)
Article Google Scholar
Gao, W., Tedrake, R.: SurfelWarp: efficient non-volumetric single view dynamic reconstruction. arXiv preprint arXiv:1904.13073 (2019)
Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 270–279 (2017)
Google Scholar
Hore, A., Ziou, D.: Image quality metrics: PSNR vs. SSIM. In: 2010 20th International Conference on Pattern Recognition, pp. 2366–2369. IEEE (2010)
Google Scholar
Jin, Yueming, Cheng, Keyun, Dou, Qi., Heng, Pheng-Ann.: Incorporating temporal prior from motion flow for instrument segmentation in minimally invasive surgery video. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 440–448. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_49
Chapter Google Scholar
Lamarca, J., Parashar, S., Bartoli, A., Montiel, J.: DefSLAM: tracking and mapping of deforming scenes from monocular sequences. IEEE Trans. Robot. 37, 291–303 (2020)
Article Google Scholar
Li, H., Adams, B., Guibas, L.J., Pauly, M.: Robust single-view geometry and motion reconstruction. ACM Trans. Graph. (ToG) 28(5), 1–10 (2009)
Article Google Scholar
Li, L., Li, X., Yang, S., Ding, S., Jolfaei, A., Zheng, X.: Unsupervised learning-based continuous depth and motion estimation with monocular endoscopy for virtual reality minimally invasive surgery. IEEE Trans. Ind. Inform. 17, 3920–3928 (2020)
Article Google Scholar
Li, Y., et al.: SuPer: a surgical perception framework for endoscopic tissue manipulation with surgical robotics. IEEE Robot. Autom. Lett. 5(2), 2294–2301 (2020)
Article Google Scholar
Li, Z., et al.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. arXiv preprint arXiv:2011.02910 (2020)
Liu, X., et al.: Reconstructing sinus anatomy from endoscopic video – towards a radiation-free approach for quantitative longitudinal assessment. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 3–13. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_1
Chapter Google Scholar
Lu, J., Jayakumari, A., Richter, F., Li, Y., Yip, M.C.: Super deep: a surgical perception framework for robotic tissue manipulation using deep learning for feature extraction. arXiv preprint arXiv:2003.03472 (2020)
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040–4048 (2016)
Google Scholar
Newcombe, R.A., Fox, D., Seitz, S.M.: Dynamicfusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 343–352 (2015)
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)
Ronneberger, Olaf, Fischer, Philipp, Brox, Thomas: U-Net: convolutional networks for biomedical image segmentation. In: Navab, Nassir, Hornegger, Joachim, Wells, William M.., Frangi, Alejandro F.. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional (2010)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, J.: 3D non-rigid SLAM in minimally invasive surgery. Ph.D. thesis (2020)
Google Scholar
Song, J., Wang, J., Zhao, L., Huang, S., Dissanayake, G.: MIS-SLAM: real-time large-scale dense deformable slam system in minimal invasive surgery based on heterogeneous computing. IEEE Robot. Autom. Lett. 3(4), 4068–4075 (2018)
Article Google Scholar
Stoyanov, D., Mylonas, G.P., Lerotic, M., Chung, A.J., Yang, G.Z.: Intra-operative visualizations: perceptual fidelity and human factors. J. Display Technol. 4(4), 491–501 (2008)
Article Google Scholar
Taylor, Russell H.., Menciassi, Arianna, Fichtinger, Gabor, Fiorini, Paolo, Dario, Paolo: Medical robotics and computer-integrated surgery. In: Siciliano, Bruno, Khatib, Oussama (eds.) Springer Handbook of Robotics, pp. 1657–1684. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32552-1_63
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Yang, G., Manela, J., Happold, M., Ramanan, D.: Hierarchical deep stereo matching on high-resolution images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5515–5524 (2019)
Google Scholar
Ye, M., Johns, E., Handa, A., Zhang, L., Pratt, P., Yang, G.Z.: Self-supervised siamese learning on stereo image pairs for depth estimation in robotic surgery. arXiv preprint arXiv:1705.08260 (2017)
Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000)
Article Google Scholar

Download references

Acknowledgement

This project was supported by CUHK Shun Hing Institute of Advanced Engineering (project MMT-p5-20), CUHK T Stone Robotics Institute, Hong Kong RGC TRS Project No. T42-409/18-R, and Multi-Scale Medical Robotics Center InnoHK under grant 8312051.

Author information

Authors and Affiliations

Deptment of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
Yonghao Long & Qi Dou
Department of Computer Science, Johns Hopkins University, Baltimore, USA
Zhaoshuo Li, Russell H. Taylor & Mathias Unberath
SH Ho Urology Centre, Department of Surgery, The Chinese University of Hong Kong, Shatin, Hong Kong
Chi Hang Yee & Chi Fai Ng
T Stone Robotics Institute, The Chinese University of Hong Kong, Shatin, Hong Kong
Qi Dou

Authors

Yonghao Long
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoshuo Li
View author publications
You can also search for this author in PubMed Google Scholar
Chi Hang Yee
View author publications
You can also search for this author in PubMed Google Scholar
Chi Fai Ng
View author publications
You can also search for this author in PubMed Google Scholar
Russell H. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Unberath
View author publications
You can also search for this author in PubMed Google Scholar
Qi Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yonghao Long .

Editor information

Editors and Affiliations

Erasmus MC - University Medical Center Rotterdam, Rotterdam, The Netherlands
Marleen de Bruijne
University of Basel, Allschwil, Switzerland
Philippe C. Cattin
Inria Nancy Grand Est, Villers-lès-Nancy, France
Stéphane Cotin
ICube, Université de Strasbourg, CNRS, Strasbourg, France
Nicolas Padoy
National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
Stefanie Speidel
Tencent Jarvis Lab, Shenzhen, China
Yefeng Zheng
ICube, Université de Strasbourg, CNRS, Strasbourg, France
Caroline Essert

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 5016 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Long, Y. et al. (2021). E-DSSR: Efficient Dynamic Surgical Scene Reconstruction with Transformer-Based Stereoscopic Depth Perception. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12904. Springer, Cham. https://doi.org/10.1007/978-3-030-87202-1_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-87202-1_40
Published: 21 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87201-4
Online ISBN: 978-3-030-87202-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

E-DSSR: Efficient Dynamic Surgical Scene Reconstruction with Transformer-Based Stereoscopic Depth Perception