Abstract
Tracking points in robotic assisted surgery will help to enable models in augmented reality and image guidance applications. For these applications, both speed and accuracy are critical. Current dense convolutional neural networks can be costly, especially so when we only desire to track user defined regions. Faster methods use keypoints and their movement as a way to estimate flow in an image. In this paper we introduce a recurrent implicit neural graph (RING) which estimates flow efficiently. RING interpolates the flow at any selected query points with a implicit neural representation (also known as coordinate-based representation) that takes the surrounding points and history of the tracked (query) points as input. RING is able to track an arbitrary number of image points. We demonstrate that RING estimates point motion better than methods that do not use a state. We evaluate RING both photometrically and using ground truth depth data. Finally we demonstrate RING’s real-time effectiveness in timing experiments.
This work was supported by Intuitive Surgical.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allan, M., et al.: Stereo correspondence and reconstruction of endoscopic data challenge. arXiv:2101.01133 [cs] (2021)
Bian, J.-W., et al.: GMS: grid-based motion statistics for fast, ultra-robust feature correspondence. Int. J. Comput. Vis. 128(6), 1580–1593 (2019). https://doi.org/10.1007/s11263-019-01280-3
Božič, A., Palafox, P., Zollhöfer, M., Thies, J., Dai, A., Nießner, M.: Neural deformation graphs for globally-consistent non-rigid reconstruction. In: CVPR (2021)
Du, Y., Zhang, Y., Yu, H.X., Tenenbaum, J.B., Wu, J.: Neural radiance flow for 4D view synthesis and video processing. In: ICCV, pp. 14324–14334 (2021)
Erler, P., Guerrero, P., Ohrhallinger, S., Mitra, N.J., Wimmer, M.: Points2Surf learning implicit surfaces from point clouds. In: ECCV (2020)
Feng, W., Li, J., Cai, H., Luo, X., Zhang, J.: Neural points: point cloud representation with neural fields. arXiv:2112.04148 [cs] (2021)
Giannarou, S., Visentini-Scarzanella, M., Yang, G.: Probabilistic tracking of affine-invariant anisotropic regions. IEEE Trans. Patt. Anal. Mach. Intell. 35(1), 130–143 (2013). https://doi.org/10.1109/TPAMI.2012.81
Giannarou, S., Ye, M., Gras, G., Leibrandt, K., Marcus, H.J., Yang, G.-Z.: Vision-based deformation recovery for intraoperative force estimation of tool–tissue interaction for neurosurgery. Int. J. Comput. Assist. Radiol. Surg. 11(6), 929–936 (2016). https://doi.org/10.1007/s11548-016-1361-z
González, C., Bravo-Sánchez, L., Arbelaez, P.: ISINet: An instance-based approach for sdurgical instrument segmentation. In: MICCAI (2020)
He, K., Zhao, Y., Liu, Z., Li, D., Ma, X.: Whole-pixel registration of non-rigid images using correspondences interpolation on sparse feature seeds. Vis. Comput. 38(5), 1815–1832 (2021). https://doi.org/10.1007/s00371-021-02107-4
Jiang, S., Lu, Y., Li, H., Hartley, R.: Learning optical flow from a few matches. In: CVPR, pp. 16587–16595. IEEE, Nashville, TN, USA (2021)
Kalia, M., Mathur, P., Tsang, K., Black, P., Navab, N., Salcudean, S.: Evaluation of a marker-less, intra-operative, augmented reality guidance system for robot-assisted laparoscopic radical prostatectomy. Int. J. Comput. Assist. Radiol. Surg. 15(7), 1225–1233 (2020). https://doi.org/10.1007/s11548-020-02181-4
Kuang, Z., Li, J., He, M., Wang, T., Zhao, Y.: DenseGAP: graph-structured dense correspondence learning with anchor points. arXiv:2112.06910 [cs] (2021)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Lu, J., Jayakumari, A., Richter, F., Li, Y., Yip, M.C.: Super deep: a surgical perception framework for robotic tissue manipulation using deep learning for feature extraction. In: ICRA. IEEE (2021)
Ozyoruk, K.B., et al.: EndoSLAM dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med. Image Anal. 71, 102058 (2021)
Qi, C.R., Su, H., Kaichun, M., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 77–85. IEEE, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.16
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
Richa, R., Bó, A.P., Poignet, P.: Towards robust 3D visual tracking for motion compensation in beating heart surgery. Med. Image Anal. 15(3), 302–315 (2011)
Rodríguez, J.J.G., Lamarca, J., Morlana, J., Tardós, J.D., Montiel, J.M.M.: SD-DefSLAM: semi-direct monocular SLAM for deformable and intracorporeal scenes. arXiv:2010.09409 [cs] (2020)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Schmidt, A., Mohareri, O., DiMaio, S.P., Salcudean, S.E.: Fast graph refinement and implicit neural representation for tissue tracking. In: ICRA (2022)
Schmidt, A., Salcudean, S.E.: Real-time rotated convolutional descriptor for surgical environments. In: MICCAI (2021)
Shao, S., et al.: Self-supervised monocular depth and ego-motion estimation in endoscopy: appearance flow to the rescue. arXiv:2112.08122 [cs] (2021)
Sinha, A., Murez, Z., Bartolozzi, J., Badrinarayanan, V., Rabinovich, A.: Deltas: depth estimation by learning triangulation and densification of sparse points. In: ECCV (2020)
Sitzmann, V., Martel, J.N.P., Bergman, A.W., Lindell, D.B., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: NeurIPS (2020)
Song, J., Wang, J., Zhao, L., Huang, S., Dissanayake, G.: MIS-SLAM: real-time large-scale dense deformable SLAM system in minimal invasive surgery based on heterogeneous computing. IEEE Robot. Autom. Lett. 3(4), 4068–4075 (2018)
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: NeurIPS (2020)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio’, P., Bengio, Y.: Graph attention networks. In: ICLR (2018). https://doi.org/10.17863/CAM.48429
Wang, Q., Zhou, X., Hariharan, B., Snavely, N.: Learning feature descriptors using camera pose supervision. In: Computer Vision – ECCV 2020 (2020)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38(5), 1–12 (2019)
Yang, Z., Simon, R., Li, Y., Linte, C.A.: Dense depth estimation from stereo endoscopy videos using unsupervised optical flow methods. In: Papież, B.W., et al. (eds.) MIUA 2021. LNCS, vol. 12722, pp. 337–349. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80432-9_26
Ye, M., Johns, E., Handa, A., Zhang, L., Pratt, P., Yang, G.Z.: Self-supervised siamese learning on stereo image pairs for depth estimation in robotic surgery. arXiv:1705.08260 [cs] (2017)
Yip, M.C., Lowe, D.G., Salcudean, S.E., Rohling, R.N., Nguan, C.Y.: Tissue tracking and registration for image-guided surgery. IEEE Trans. Med. Imaging 31(11), 2169–2182 (2012)
Zhang, Y., et al.: ColDE: a depth estimation framework for colonoscopy reconstruction. arXiv:2111.10371 [cs, eess] (2021)
Zhou, H., Jayender, J.: EMDQ-SLAM: real-time high-resolution reconstruction of soft tissue surface from stereo laparoscopy videos. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 331–340. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_32
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schmidt, A., Mohareri, O., DiMaio, S., Salcudean, S.E. (2022). Recurrent Implicit Neural Graph for Deformable Tracking in Endoscopic Videos. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13434. Springer, Cham. https://doi.org/10.1007/978-3-031-16440-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-031-16440-8_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16439-2
Online ISBN: 978-3-031-16440-8
eBook Packages: Computer ScienceComputer Science (R0)