Skip to main content
Log in

Dynamic surface reconstruction in robot-assisted minimally invasive surgery based on neural radiance fields

  • Original Article
  • Published:
International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Abstract

Purpose

The purpose of this study was to improve surgical scene perception by addressing the challenge of reconstructing highly dynamic surgical scenes. We proposed a novel depth estimation network and a reconstruction framework that combines neural radiance fields to provide more accurate scene information for surgical task automation and AR navigation.

Methods

We added a spatial pyramid pooling module and a Swin-Transformer module to enhance the robustness of stereo depth estimation. We also improved depth accuracy by adding unique matching constraints from optimal transport. To avoid deformation distortion in highly dynamic scenes, we used neural radiance fields to implicitly represent scenes in the time dimension and optimized them with depth and color information in a learning-based manner.

Results

Our experiments on the KITTI and SCARED datasets show that the proposed depth estimation network performs close to the state-of-the-art method on natural images and surpasses the SOTA method on medical images with 1.12% in 3 px Error and 0.45 px in EPE. The proposed dynamic reconstruction framework successfully reconstructed the dynamic cardiac surface on a totally endoscopic coronary artery bypass video, achieving SOTA performance with 27.983 dB in PSNR, 0.812 in SSIM, and 0.189 in LPIPS.

Conclusion

Our proposed depth estimation network and reconstruction framework provide a significant contribution to the field of surgical scene perception. The framework achieves better results than SOTA methods on medical datasets, reducing mismatches on depth maps and resulting in more accurate depth maps with clearer edges. The proposed ER framework is verified on a series of dynamic cardiac surgical images. Future efforts will focus on improving the training speed and solving the problem of limited field of view.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The public dataset used during the current study is available from MICCAI2019 EndoVis Chall enge (https://endovissub2019-scared.grand-challenge.org/) and Hamlyn Centre Endoscopic Video Dataset (http://hamlyn.doc.ic.ac.uk/vision/).

Code availability

Code will be publicly available with the publication of this work.

References

  1. Vitiello V, Kwok KW, Yang GZ (2012) Introduction to robot-assisted minimally invasive surgery (MIS). In: Gomes P (ed) Medical robotics. Woodhead Publishing, Sawston. pp 1–P1. https://doi.org/10.1533/9780857097392.1

  2. Simaan N, Yasin RM, Wang L (2018) Medical technologies and challenges of robot-assisted minimally invasive intervention and diagnostics. In: Annual review of control, robotics, and autonomous systems vol 1(1), pp 465–490. https://doi.org/10.1146/annurev-control-060117-104956

  3. Ginesi M, Meli D, Nakawala H, Roberti A, Fiorini P (2019) A knowledge-based framework for task automation in surgery. In: International conference on advanced robotics, pp 37–42. https://doi.org/10.1109/ICAR46387.2019.8981619

  4. Ficuciello F, Tamburrini G, Arezzo A, Villani L, Siciliano B (2019) Autonomy in surgical robots and its meaningful human control. Paladyn J Behav Robot 10(1):30–43. https://doi.org/10.1515/pjbr-2019-0002

    Article  Google Scholar 

  5. Souza JC, Bandeira Diniz JO, Ferreira JL, França da Silva GL, Corrêa Silva A, de Paiva AC (2019) An automatic method for lung segmentation and reconstruction in chest X-ray using deep neural networks. Comput Methods Programs Biomed 177:285–296. https://doi.org/10.1016/j.cmpb.2019.06.005

    Article  PubMed  Google Scholar 

  6. Shimasaki Y, Iwahori Y, Neog DR, Woodham RJ, Bhuyan M (2013) Generating lambertian image with uniform reflectance for endoscope image. In: International workshop on advanced image technology, pp 1–6

  7. Basak H, Ghosal S, Sarkar M, Das M, Chattopadhyay S (2020) Monocular depth estimation using encoder-decoder architecture and transfer learning from single RGB image. In: International conference on electrical, electronics and computer engineering:1–6. https://doi.org/10.1109/UPCON50219.2020.9376365

  8. Modi P, Rodriguez E, Chitwood WR Jr (2009) Robot-assisted cardiac surgery. Interact Cardiovasc Thorac Surg 9(3):500–505. https://doi.org/10.1007/978-3-642-10781-8_37

    Article  PubMed  Google Scholar 

  9. Song J, Wang J, Zhao L, Huang S, Dissanayake G (2018) MIS-SLAM: real-time large-scale dense deformable SLAM system in minimal invasive surgery based on heterogeneous computing. IEEE Robot Autom Lett 3(4):4068–4075. https://doi.org/10.1109/LRA.2018.2856519

    Article  Google Scholar 

  10. Song J, Wang J, Zhao L, Huang S, Dissanayake G (2017) Dynamic reconstruction of deformable soft-tissue with stereo scope in minimal invasive surgery. IEEE Robot Autom Lett 3(1):155–162. https://doi.org/10.1109/LRA.2017.2735487

    Article  Google Scholar 

  11. Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2021) NeRF: representing scenes as neural radiance fields for view synthesis. Commun ACM 65(1):99–106. https://doi.org/10.1145/3503250

    Article  Google Scholar 

  12. Sun X, Zou Y, Wang S, Su H, Guan B (2022) A parallel network utilizing local features and global representations for segmentation of surgical instruments. Int J Comput Assist Radiol Surg 17(10):1903–1913. https://doi.org/10.1007/s11548-022-02687-z

    Article  PubMed  Google Scholar 

  13. Chang J, Chen Y (2018) Pyramid stereo matching network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5410–5418

  14. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin Transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/ICCV48922.2021.00986

  15. Su H, Jampani V, Sun D, Gallo O, Learned-Miller E, Kautz J (2019) Pixel-adaptive convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11166–11175. https://doi.org/10.1109/CVPR.2019.01142

  16. Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J, Cao Y, Zhang Z, Dong L, Wei F, Guo B (2022) Swin Transformer V2: scaling up capacity and resolution. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11999–12009. https://doi.org/10.1109/CVPR52688.2022.01170

  17. Li Z, Liu X, Drenkow N, Ding A, Creighton FX, Taylor RH, Unberath M (2021) Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6197–6206. https://doi.org/10.1109/ICCV48922.2021.00614

  18. Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R (2019) Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the association for computational linguistics. https://doi.org/10.18653/v1/P19-1285

  19. Luo C, Zhan J, Xue X, Wang L, Ren R, Yang Q (2018) Cosine normalization: Using cosine similarity instead of dot product in neural networks. In: Artificial neural networks and machine learning, pp 382–391. https://doi.org/10.1007/978-3-030-01418-6_38

  20. Liu Y, Zhu L, Yamada M, Yang Y (2020) Semantic correspondence as an optimal transport problem. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4463–4472. https://doi.org/10.1109/CVPR42600.2020.00452

  21. Tulyakov S, Ivanov A, Fleuret F (2018) Toward applications-friendly deep stereo matching. Neural Inf Process Syst 31

  22. Pumarola A, Corona E, Pons-Moll G, Moreno-Noguer F (2021) D-NeRF: neural radiance fields for dynamic scenes. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10318–10327. https://doi.org/10.1109/CVPR46437.2021.01018

  23. Aghakhani N, Geravand M, Shahriari N, Vendittelli M, Oriolo G (2013) Task control with remote center of motion constraint for minimally invasive robotic surgery. In International conference on robotics and automation, pp 5807–5812. https://doi.org/10.1109/ICRA.2013.6631412

  24. Tancik M, Srinivasan P, Mildenhall B, Fridovich-Keil S, Raghavan N, Singhal U, Ramamoorthi R, Barron J, Ng R (2020) Fourier features let networks learn high frequency functions in low dimensional domains. Neural Inf Process Syst 33:7537–7547. https://doi.org/10.5555/3495724.3496356

    Article  Google Scholar 

  25. Yin W, Liu Y, Shen C, Yan Y (2019) Enforcing geometric constraints of virtual normal for depth prediction. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5684–5693. https://doi.org/10.1109/ICCV.2019.00578

  26. Allan M, Mcleod J, Wang C, Rosenthal JC, Hu Z, Gard N, Eisert P, Fu KX, Zeffiro T, Xia W (2021) Stereo correspondence and reconstruction of endoscopic data challenge. arXiv preprint arXiv:210101133

  27. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the Kitti dataset. Int J Robot Res 32(11):1231–1237

    Article  Google Scholar 

  28. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074

  29. Baker S, Roth S, Scharstein D, Black MJ, Lewis JP, Szeliski R (2007) A database and evaluation methodology for optical flow. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1–8. https://doi.org/10.1109/ICCV.2007.4408903

  30. Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single image using a multi-scale deep network. Neural Inf Process Syst. https://doi.org/10.5555/2969033.2969091

    Article  Google Scholar 

  31. Kotevski Z, Mitrevski P (2010) Experimental comparison of psnr and ssim metrics for video quality estimation. In: International conference on ICT innovations, pp 357–366. https://doi.org/10.1007/978-3-642-10781-8_37

  32. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  ADS  PubMed  Google Scholar 

  33. Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 586–595. https://doi.org/10.1109/CVPR.2018.00068

  34. Xu G, Cheng J, Guo P, Yang X (2022) Attention concatenation volume for accurate and efficient stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 12981–12990. https://doi.org/10.1109/CVPR52688.2022.01264

  35. Li J, Wang P, Xiong P, Cai T, Yan Z, Yang L, Liu J, Fan H, Liu S (2022) Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16263–16272. https://doi.org/10.1109/CVPR52688.2022.01578

  36. Shen Z, Dai Y, Song X, Rao Z, Zhou D, Zhang L (2022) PCW-Net: pyramid combination and warping cost volume for stereo matching. In: European conference on computer vision, pp 280–297. https://doi.org/10.1007/978-3-031-19824-3_17

  37. Zhang F, Prisacariu V, Yang R, Torr PH (2019) GA-Net: Guided aggregation net for end-to-end stereo matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 185–194. https://doi.org/10.1109/CVPR.2019.00027

  38. Guo X, Yang K, Yang W, Wang X, Li H (2019) Group-wise correlation stereo network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3273–3282. https://doi.org/10.1109/CVPR.2019.00339

  39. Recasens D, Lamarca J, Fácil JM, Montiel J, Civera J (2021) Endo-depth-and-motion: reconstruction and tracking in endoscopic videos using depth networks and photometric constraints. IEEE Robot Autom Lett 6(4):7225–7232. https://doi.org/10.1109/LRA.2021.3095528

    Article  Google Scholar 

  40. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 270–279. https://doi.org/10.1109/CVPR.2017.699

  41. Geiger A, Roser M, Urtasun R (2010) Efficient large-scale stereo matching. In: Asian conference on computer vision, pp 25–38. https://doi.org/10.1007/978-3-642-19315-6_3

  42. Song J, Zhu Q, Lin J, Ghaffari M (2023) BDIS: Bayesian dense inverse searching method for real-time stereo surgical image matching. IEEE Trans Rob 39(2):1388–1406. https://doi.org/10.1109/TRO.2022.3215018

    Article  Google Scholar 

  43. Wang Y, Long Y, Fan SH, Dou Q (2022) Neural rendering for stereo 3d reconstruction of deformable tissues in robotic surgery. In: Medical image computing and computer assisted intervention. Springer, Cham, pp 431–441. https://doi.org/10.1007/978-3-031-16449-1_41

  44. Mildenhall B, Hedman P, Martin-Brualla R, Srinivasan PP, Barron JT (2022) NeRF in the Dark: High dynamic range view synthesis from noisy raw images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 16190–16199. https://doi.org/10.1109/CVPR52688.2022.01571

Download references

Acknowledgements

The authors thank Ziqi Liu for polishing the article and Yuehao Wang for the guidance on NeRF theory.

Funding

This study was funded by the National Natural Science Foundation of China (Grant No. 52175028) and the National Natural Science Foundation of China (Grant No. 51721003).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to He Su.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 16995 kb)

Supplementary file2 (MP4 4139 kb)

Supplementary file3 (MP4 7667 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, X., Wang, F., Ma, Z. et al. Dynamic surface reconstruction in robot-assisted minimally invasive surgery based on neural radiance fields. Int J CARS 19, 519–530 (2024). https://doi.org/10.1007/s11548-023-03016-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11548-023-03016-8

Keywords

Navigation