Skip to main content

Recurrent Implicit Neural Graph for Deformable Tracking in Endoscopic Videos

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2022 (MICCAI 2022)

Abstract

Tracking points in robotic assisted surgery will help to enable models in augmented reality and image guidance applications. For these applications, both speed and accuracy are critical. Current dense convolutional neural networks can be costly, especially so when we only desire to track user defined regions. Faster methods use keypoints and their movement as a way to estimate flow in an image. In this paper we introduce a recurrent implicit neural graph (RING) which estimates flow efficiently. RING interpolates the flow at any selected query points with a implicit neural representation (also known as coordinate-based representation) that takes the surrounding points and history of the tracked (query) points as input. RING is able to track an arbitrary number of image points. We demonstrate that RING estimates point motion better than methods that do not use a state. We evaluate RING both photometrically and using ground truth depth data. Finally we demonstrate RING’s real-time effectiveness in timing experiments.

This work was supported by Intuitive Surgical.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Allan, M., et al.: Stereo correspondence and reconstruction of endoscopic data challenge. arXiv:2101.01133 [cs] (2021)

  2. Bian, J.-W., et al.: GMS: grid-based motion statistics for fast, ultra-robust feature correspondence. Int. J. Comput. Vis. 128(6), 1580–1593 (2019). https://doi.org/10.1007/s11263-019-01280-3

    Article  MathSciNet  Google Scholar 

  3. Božič, A., Palafox, P., Zollhöfer, M., Thies, J., Dai, A., Nießner, M.: Neural deformation graphs for globally-consistent non-rigid reconstruction. In: CVPR (2021)

    Google Scholar 

  4. Du, Y., Zhang, Y., Yu, H.X., Tenenbaum, J.B., Wu, J.: Neural radiance flow for 4D view synthesis and video processing. In: ICCV, pp. 14324–14334 (2021)

    Google Scholar 

  5. Erler, P., Guerrero, P., Ohrhallinger, S., Mitra, N.J., Wimmer, M.: Points2Surf learning implicit surfaces from point clouds. In: ECCV (2020)

    Google Scholar 

  6. Feng, W., Li, J., Cai, H., Luo, X., Zhang, J.: Neural points: point cloud representation with neural fields. arXiv:2112.04148 [cs] (2021)

  7. Giannarou, S., Visentini-Scarzanella, M., Yang, G.: Probabilistic tracking of affine-invariant anisotropic regions. IEEE Trans. Patt. Anal. Mach. Intell. 35(1), 130–143 (2013). https://doi.org/10.1109/TPAMI.2012.81

    Article  Google Scholar 

  8. Giannarou, S., Ye, M., Gras, G., Leibrandt, K., Marcus, H.J., Yang, G.-Z.: Vision-based deformation recovery for intraoperative force estimation of tool–tissue interaction for neurosurgery. Int. J. Comput. Assist. Radiol. Surg. 11(6), 929–936 (2016). https://doi.org/10.1007/s11548-016-1361-z

    Article  Google Scholar 

  9. González, C., Bravo-Sánchez, L., Arbelaez, P.: ISINet: An instance-based approach for sdurgical instrument segmentation. In: MICCAI (2020)

    Google Scholar 

  10. He, K., Zhao, Y., Liu, Z., Li, D., Ma, X.: Whole-pixel registration of non-rigid images using correspondences interpolation on sparse feature seeds. Vis. Comput. 38(5), 1815–1832 (2021). https://doi.org/10.1007/s00371-021-02107-4

    Article  Google Scholar 

  11. Jiang, S., Lu, Y., Li, H., Hartley, R.: Learning optical flow from a few matches. In: CVPR, pp. 16587–16595. IEEE, Nashville, TN, USA (2021)

    Google Scholar 

  12. Kalia, M., Mathur, P., Tsang, K., Black, P., Navab, N., Salcudean, S.: Evaluation of a marker-less, intra-operative, augmented reality guidance system for robot-assisted laparoscopic radical prostatectomy. Int. J. Comput. Assist. Radiol. Surg. 15(7), 1225–1233 (2020). https://doi.org/10.1007/s11548-020-02181-4

    Article  Google Scholar 

  13. Kuang, Z., Li, J., He, M., Wang, T., Zhao, Y.: DenseGAP: graph-structured dense correspondence learning with anchor points. arXiv:2112.06910 [cs] (2021)

  14. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  15. Lu, J., Jayakumari, A., Richter, F., Li, Y., Yip, M.C.: Super deep: a surgical perception framework for robotic tissue manipulation using deep learning for feature extraction. In: ICRA. IEEE (2021)

    Google Scholar 

  16. Ozyoruk, K.B., et al.: EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med. Image Anal. 71, 102058 (2021)

    Article  Google Scholar 

  17. Qi, C.R., Su, H., Kaichun, M., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77–85. IEEE, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.16

  18. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)

    Google Scholar 

  19. Richa, R., Bó, A.P., Poignet, P.: Towards robust 3D visual tracking for motion compensation in beating heart surgery. Med. Image Anal. 15(3), 302–315 (2011)

    Article  Google Scholar 

  20. Rodríguez, J.J.G., Lamarca, J., Morlana, J., Tardós, J.D., Montiel, J.M.M.: SD-DefSLAM: semi-direct monocular SLAM for deformable and intracorporeal scenes. arXiv:2010.09409 [cs] (2020)

  21. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)

    Google Scholar 

  22. Schmidt, A., Mohareri, O., DiMaio, S.P., Salcudean, S.E.: Fast graph refinement and implicit neural representation for tissue tracking. In: ICRA (2022)

    Google Scholar 

  23. Schmidt, A., Salcudean, S.E.: Real-time rotated convolutional descriptor for surgical environments. In: MICCAI (2021)

    Google Scholar 

  24. Shao, S., et al.: Self-supervised monocular depth and ego-motion estimation in endoscopy: appearance flow to the rescue. arXiv:2112.08122 [cs] (2021)

  25. Sinha, A., Murez, Z., Bartolozzi, J., Badrinarayanan, V., Rabinovich, A.: Deltas: depth estimation by learning triangulation and densification of sparse points. In: ECCV (2020)

    Google Scholar 

  26. Sitzmann, V., Martel, J.N.P., Bergman, A.W., Lindell, D.B., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: NeurIPS (2020)

    Google Scholar 

  27. Song, J., Wang, J., Zhao, L., Huang, S., Dissanayake, G.: MIS-SLAM: real-time large-scale dense deformable SLAM system in minimal invasive surgery based on heterogeneous computing. IEEE Robot. Autom. Lett. 3(4), 4068–4075 (2018)

    Article  Google Scholar 

  28. Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: NeurIPS (2020)

    Google Scholar 

  29. Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24

    Chapter  Google Scholar 

  30. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  31. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio’, P., Bengio, Y.: Graph attention networks. In: ICLR (2018). https://doi.org/10.17863/CAM.48429

  32. Wang, Q., Zhou, X., Hariharan, B., Snavely, N.: Learning feature descriptors using camera pose supervision. In: Computer Vision – ECCV 2020 (2020)

    Google Scholar 

  33. Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 1–12 (2019)

    Article  Google Scholar 

  34. Yang, Z., Simon, R., Li, Y., Linte, C.A.: Dense depth estimation from stereo endoscopy videos using unsupervised optical flow methods. In: Papież, B.W., et al. (eds.) MIUA 2021. LNCS, vol. 12722, pp. 337–349. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80432-9_26

    Chapter  Google Scholar 

  35. Ye, M., Johns, E., Handa, A., Zhang, L., Pratt, P., Yang, G.Z.: Self-supervised siamese learning on stereo image pairs for depth estimation in robotic surgery. arXiv:1705.08260 [cs] (2017)

  36. Yip, M.C., Lowe, D.G., Salcudean, S.E., Rohling, R.N., Nguan, C.Y.: Tissue tracking and registration for image-guided surgery. IEEE Trans. Med. Imaging 31(11), 2169–2182 (2012)

    Article  Google Scholar 

  37. Zhang, Y., et al.: ColDE: a depth estimation framework for colonoscopy reconstruction. arXiv:2111.10371 [cs, eess] (2021)

  38. Zhou, H., Jayender, J.: EMDQ-SLAM: real-time high-resolution reconstruction of soft tissue surface from stereo laparoscopy videos. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 331–340. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_32

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adam Schmidt .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8430 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schmidt, A., Mohareri, O., DiMaio, S., Salcudean, S.E. (2022). Recurrent Implicit Neural Graph for Deformable Tracking in Endoscopic Videos. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13434. Springer, Cham. https://doi.org/10.1007/978-3-031-16440-8_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16440-8_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16439-2

  • Online ISBN: 978-3-031-16440-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics