Dynamic surface reconstruction in robot-assisted minimally invasive surgery based on neural radiance fields

Sun, Xinan; Wang, Feng; Ma, Zhikang; Su, He

doi:10.1007/s11548-023-03016-8

Dynamic surface reconstruction in robot-assisted minimally invasive surgery based on neural radiance fields

Original Article
Published: 28 September 2023

Volume 19, pages 519–530, (2024)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Xinan Sun^1,2,
Feng Wang¹,
Zhikang Ma¹ &
…
He Su ORCID: orcid.org/0000-0001-6337-1472^1,2

465 Accesses
1 Citation
Explore all metrics

Abstract

Purpose

The purpose of this study was to improve surgical scene perception by addressing the challenge of reconstructing highly dynamic surgical scenes. We proposed a novel depth estimation network and a reconstruction framework that combines neural radiance fields to provide more accurate scene information for surgical task automation and AR navigation.

Methods

We added a spatial pyramid pooling module and a Swin-Transformer module to enhance the robustness of stereo depth estimation. We also improved depth accuracy by adding unique matching constraints from optimal transport. To avoid deformation distortion in highly dynamic scenes, we used neural radiance fields to implicitly represent scenes in the time dimension and optimized them with depth and color information in a learning-based manner.

Results

Our experiments on the KITTI and SCARED datasets show that the proposed depth estimation network performs close to the state-of-the-art method on natural images and surpasses the SOTA method on medical images with 1.12% in 3 px Error and 0.45 px in EPE. The proposed dynamic reconstruction framework successfully reconstructed the dynamic cardiac surface on a totally endoscopic coronary artery bypass video, achieving SOTA performance with 27.983 dB in PSNR, 0.812 in SSIM, and 0.189 in LPIPS.

Conclusion

Our proposed depth estimation network and reconstruction framework provide a significant contribution to the field of surgical scene perception. The framework achieves better results than SOTA methods on medical datasets, reducing mismatches on depth maps and resulting in more accurate depth maps with clearer edges. The proposed ER framework is verified on a series of dynamic cardiac surgical images. Future efforts will focus on improving the training speed and solving the problem of limited field of view.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery

E-DSSR: Efficient Dynamic Surgical Scene Reconstruction with Transformer-Based Stereoscopic Depth Perception

Learning how to robustly estimate camera pose in endoscopic videos

Article Open access 15 May 2023

Data availability

The public dataset used during the current study is available from MICCAI2019 EndoVis Chall enge (https://endovissub2019-scared.grand-challenge.org/) and Hamlyn Centre Endoscopic Video Dataset (http://hamlyn.doc.ic.ac.uk/vision/).

Code availability

Code will be publicly available with the publication of this work.

References

Vitiello V, Kwok KW, Yang GZ (2012) Introduction to robot-assisted minimally invasive surgery (MIS). In: Gomes P (ed) Medical robotics. Woodhead Publishing, Sawston. pp 1–P1. https://doi.org/10.1533/9780857097392.1
Simaan N, Yasin RM, Wang L (2018) Medical technologies and challenges of robot-assisted minimally invasive intervention and diagnostics. In: Annual review of control, robotics, and autonomous systems vol 1(1), pp 465–490. https://doi.org/10.1146/annurev-control-060117-104956
Ginesi M, Meli D, Nakawala H, Roberti A, Fiorini P (2019) A knowledge-based framework for task automation in surgery. In: International conference on advanced robotics, pp 37–42. https://doi.org/10.1109/ICAR46387.2019.8981619
Ficuciello F, Tamburrini G, Arezzo A, Villani L, Siciliano B (2019) Autonomy in surgical robots and its meaningful human control. Paladyn J Behav Robot 10(1):30–43. https://doi.org/10.1515/pjbr-2019-0002
Article Google Scholar
Souza JC, Bandeira Diniz JO, Ferreira JL, França da Silva GL, Corrêa Silva A, de Paiva AC (2019) An automatic method for lung segmentation and reconstruction in chest X-ray using deep neural networks. Comput Methods Programs Biomed 177:285–296. https://doi.org/10.1016/j.cmpb.2019.06.005
Article PubMed Google Scholar
Shimasaki Y, Iwahori Y, Neog DR, Woodham RJ, Bhuyan M (2013) Generating lambertian image with uniform reflectance for endoscope image. In: International workshop on advanced image technology, pp 1–6
Basak H, Ghosal S, Sarkar M, Das M, Chattopadhyay S (2020) Monocular depth estimation using encoder-decoder architecture and transfer learning from single RGB image. In: International conference on electrical, electronics and computer engineering:1–6. https://doi.org/10.1109/UPCON50219.2020.9376365
Modi P, Rodriguez E, Chitwood WR Jr (2009) Robot-assisted cardiac surgery. Interact Cardiovasc Thorac Surg 9(3):500–505. https://doi.org/10.1007/978-3-642-10781-8_37
Article PubMed Google Scholar
Song J, Wang J, Zhao L, Huang S, Dissanayake G (2018) MIS-SLAM: real-time large-scale dense deformable SLAM system in minimal invasive surgery based on heterogeneous computing. IEEE Robot Autom Lett 3(4):4068–4075. https://doi.org/10.1109/LRA.2018.2856519
Article Google Scholar
Song J, Wang J, Zhao L, Huang S, Dissanayake G (2017) Dynamic reconstruction of deformable soft-tissue with stereo scope in minimal invasive surgery. IEEE Robot Autom Lett 3(1):155–162. https://doi.org/10.1109/LRA.2017.2735487
Article Google Scholar
Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2021) NeRF: representing scenes as neural radiance fields for view synthesis. Commun ACM 65(1):99–106. https://doi.org/10.1145/3503250
Article Google Scholar
Sun X, Zou Y, Wang S, Su H, Guan B (2022) A parallel network utilizing local features and global representations for segmentation of surgical instruments. Int J Comput Assist Radiol Surg 17(10):1903–1913. https://doi.org/10.1007/s11548-022-02687-z
Article PubMed Google Scholar
Chang J, Chen Y (2018) Pyramid stereo matching network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5410–5418
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/ICCV48922.2021.00986
Su H, Jampani V, Sun D, Gallo O, Learned-Miller E, Kautz J (2019) Pixel-adaptive convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11166–11175. https://doi.org/10.1109/CVPR.2019.01142
Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao Y, Zhang Z, Dong L, Wei F, Guo B (2022) Swin Transformer V2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11999–12009. https://doi.org/10.1109/CVPR52688.2022.01170
Li Z, Liu X, Drenkow N, Ding A, Creighton FX, Taylor RH, Unberath M (2021) Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6197–6206. https://doi.org/10.1109/ICCV48922.2021.00614
Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R (2019) Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the association for computational linguistics. https://doi.org/10.18653/v1/P19-1285
Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q (2018) Cosine normalization: Using cosine similarity instead of dot product in neural networks. In: Artificial neural networks and machine learning, pp 382–391. https://doi.org/10.1007/978-3-030-01418-6_38
Liu Y, Zhu L, Yamada M, Yang Y (2020) Semantic correspondence as an optimal transport problem. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4463–4472. https://doi.org/10.1109/CVPR42600.2020.00452
Tulyakov S, Ivanov A, Fleuret F (2018) Toward applications-friendly deep stereo matching. Neural Inf Process Syst 31
Pumarola A, Corona E, Pons-Moll G, Moreno-Noguer F (2021) D-NeRF: neural radiance fields for dynamic scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10318–10327. https://doi.org/10.1109/CVPR46437.2021.01018
Aghakhani N, Geravand M, Shahriari N, Vendittelli M, Oriolo G (2013) Task control with remote center of motion constraint for minimally invasive robotic surgery. In International conference on robotics and automation, pp 5807–5812. https://doi.org/10.1109/ICRA.2013.6631412
Tancik M, Srinivasan P, Mildenhall B, Fridovich-Keil S, Raghavan N, Singhal U, Ramamoorthi R, Barron J, Ng R (2020) Fourier features let networks learn high frequency functions in low dimensional domains. Neural Inf Process Syst 33:7537–7547. https://doi.org/10.5555/3495724.3496356
Article Google Scholar
Yin W, Liu Y, Shen C, Yan Y (2019) Enforcing geometric constraints of virtual normal for depth prediction. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5684–5693. https://doi.org/10.1109/ICCV.2019.00578
Allan M, Mcleod J, Wang C, Rosenthal JC, Hu Z, Gard N, Eisert P, Fu KX, Zeffiro T, Xia W (2021) Stereo correspondence and reconstruction of endoscopic data challenge. arXiv preprint arXiv:210101133
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the Kitti dataset. Int J Robot Res 32(11):1231–1237
Article Google Scholar
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074
Baker S, Roth S, Scharstein D, Black MJ, Lewis JP, Szeliski R (2007) A database and evaluation methodology for optical flow. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1–8. https://doi.org/10.1109/ICCV.2007.4408903
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Neural Inf Process Syst. https://doi.org/10.5555/2969033.2969091
Article Google Scholar
Kotevski Z, Mitrevski P (2010) Experimental comparison of psnr and ssim metrics for video quality estimation. In: International conference on ICT innovations, pp 357–366. https://doi.org/10.1007/978-3-642-10781-8_37
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861
Article ADS PubMed Google Scholar
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 586–595. https://doi.org/10.1109/CVPR.2018.00068
Xu G, Cheng J, Guo P, Yang X (2022) Attention concatenation volume for accurate and efficient stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12981–12990. https://doi.org/10.1109/CVPR52688.2022.01264
Li J, Wang P, Xiong P, Cai T, Yan Z, Yang L, Liu J, Fan H, Liu S (2022) Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16263–16272. https://doi.org/10.1109/CVPR52688.2022.01578
Shen Z, Dai Y, Song X, Rao Z, Zhou D, Zhang L (2022) PCW-Net: pyramid combination and warping cost volume for stereo matching. In: European conference on computer vision, pp 280–297. https://doi.org/10.1007/978-3-031-19824-3_17
Zhang F, Prisacariu V, Yang R, Torr PH (2019) GA-Net: Guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 185–194. https://doi.org/10.1109/CVPR.2019.00027
Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3273–3282. https://doi.org/10.1109/CVPR.2019.00339
Recasens D, Lamarca J, Fácil JM, Montiel J, Civera J (2021) Endo-depth-and-motion: reconstruction and tracking in endoscopic videos using depth networks and photometric constraints. IEEE Robot Autom Lett 6(4):7225–7232. https://doi.org/10.1109/LRA.2021.3095528
Article Google Scholar
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 270–279. https://doi.org/10.1109/CVPR.2017.699
Geiger A, Roser M, Urtasun R (2010) Efficient large-scale stereo matching. In: Asian conference on computer vision, pp 25–38. https://doi.org/10.1007/978-3-642-19315-6_3
Song J, Zhu Q, Lin J, Ghaffari M (2023) BDIS: Bayesian dense inverse searching method for real-time stereo surgical image matching. IEEE Trans Rob 39(2):1388–1406. https://doi.org/10.1109/TRO.2022.3215018
Article Google Scholar
Wang Y, Long Y, Fan SH, Dou Q (2022) Neural rendering for stereo 3d reconstruction of deformable tissues in robotic surgery. In: Medical image computing and computer assisted intervention. Springer, Cham, pp 431–441. https://doi.org/10.1007/978-3-031-16449-1_41
Mildenhall B, Hedman P, Martin-Brualla R, Srinivasan PP, Barron JT (2022) NeRF in the Dark: High dynamic range view synthesis from noisy raw images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16190–16199. https://doi.org/10.1109/CVPR52688.2022.01571

Download references

Acknowledgements

The authors thank Ziqi Liu for polishing the article and Yuehao Wang for the guidance on NeRF theory.

Funding

This study was funded by the National Natural Science Foundation of China (Grant No. 52175028) and the National Natural Science Foundation of China (Grant No. 51721003).

Author information

Authors and Affiliations

School of Mechanical Engineering, Tianjin University, 135 Yaguan Road, Jinnan District, Tianjin, 300350, China
Xinan Sun, Feng Wang, Zhikang Ma & He Su
Key Laboratory of Mechanism Theory and Equipment Design of Ministry of Education, Tianjin University, 135 Yaguan Road, Tianjin, 300350, China
Xinan Sun & He Su

Authors

Xinan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhikang Ma
View author publications
You can also search for this author in PubMed Google Scholar
He Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to He Su.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 16995 kb)

Supplementary file2 (MP4 4139 kb)

Supplementary file3 (MP4 7667 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sun, X., Wang, F., Ma, Z. et al. Dynamic surface reconstruction in robot-assisted minimally invasive surgery based on neural radiance fields. Int J CARS 19, 519–530 (2024). https://doi.org/10.1007/s11548-023-03016-8

Download citation

Received: 01 February 2023
Accepted: 29 August 2023
Published: 28 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11548-023-03016-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic surface reconstruction in robot-assisted minimally invasive surgery based on neural radiance fields