Abstract
High-accuracy per-pixel depth is vital for computational photography, so smartphones now have multimodal camera systems with time-of-flight (ToF) depth sensors and multiple color cameras. However, producing accurate high-resolution depth is still challenging due to the low resolution and limited active illumination power of ToF sensors. Fusing RGB stereo and ToF information is a promising direction to overcome these issues, but a key problem remains: to provide high-quality 2D RGB images, the main color sensor’s lens is optically stabilized, resulting in an unknown pose for the floating lens that breaks the geometric relationships between the multimodal image sensors. Leveraging ToF depth estimates and a wide-angle RGB camera, we design an automatic calibration technique based on dense 2D/3D matching that can estimate camera extrinsic, intrinsic, and distortion parameters of a stabilized main RGB sensor from a single snapshot. This lets us fuse stereo and ToF cues via a correlation volume. For fusion, we apply deep learning via a real-world training dataset with depth supervision estimated by a neural reconstruction method. For evaluation, we acquire a test dataset using a commercial high-power depth camera and show that our approach achieves higher accuracy than existing baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Deep learning for confidence information in stereo and ToF data fusion. In: ICCV Workshops (2017)
Agresti, G., Minto, L., Marin, G., Zanuttigh, P.: Stereo and ToF data fusion by learning from synthetic data. Inf. Fus. 49, 161–173 (2019)
Agresti, G., Zanuttigh, P.: Deep learning for multi-path error removal in ToF sensors. In: ECCV Workshops (2018)
Yu, A., Fridovich-Keil, S., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: radiance fields without neural networks (2021)
Attal, B., et al.: Törf: time-of-flight radiance fields for dynamic scene view synthesis. In: NeurIPS (2021)
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-NeRF: a multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)
Brown, D.C.: Decentering distortion of lenses. Photogramm. Eng. (1966)
Brown, M.A., Süsstrunk, S.: Multi-spectral sift for scene category recognition. In: CVPR (2011)
Conrady, A.E.: Decentred lens-systems. Monthly Notices of the Royal Astronomical Society (1919)
Dal Mutto, C., Zanuttigh, P., Cortelazzo, G.M.: Probabilistic ToF and stereo data fusion based on mixed pixels measurement models. IEEE Trans. Patt. Anal. Mach. Intell. (TPAMI) 37, 2260–2272 (2015)
Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: fewer views and faster training for free. arXiv preprint arXiv:2107.02791 (2021)
DiVerdi, S., Barron, J.T.: Geometric calibration for mobile, stereo, autofocus cameras. In: WACV (2016)
Efe, U., Ince, K.G., Alatan, A.: Dfm: A performance baseline for deep feature matching. In: CVPR Workshops (2021)
Evangelidis, G.D., Hansard, M.E., Horaud, R.: Fusion of range and stereo data for high-resolution scene-modeling. IEEE Trans. Patt. Anal. Mach. Intell. (TPAMI) 37, 2178–2192 (2015)
Gao, Y., Esquivel, S., Koch, R., Keinert, J.: A novel self-calibration method for a stereo-ToF system using a Kinect V2 and two 4K GoPro cameras. In: 3DV (2017)
Gil, Y., Elmalem, S., Haim, H., Marom, E., Giryes, R.: Online training of stereo self-calibration using monocular depth estimation. IEEE Trans. Comput. Imaging 7, 812–823 (2021)
Guo, Q., Frosio, I., Gallo, O., Zickler, T., Kautz, J.: Tackling 3D ToF artifacts through learning and the flat dataset. In: ECCV (2018)
Ha, H., Lee, J.H., Meuleman, A., Kim, M.H.: NormalFusion: real-time acquisition of surface normals for high-resolution RGB-D scanning. In: CVPR (2021)
Hansard, M., Lee, S., Choi, O., Horaud, R.: Time of Flight Cameras: Principles, Methods, and Applications. Springer Briefs in Computer Science, Springer (2012). https://doi.org/10.1007/978-1-4471-4658-2
Holynski, A., Kopf, J.: Fast depth densification for occlusion-aware augmented reality. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 37, 1–11 (2018)
Jeong, Y., Ahn, S., Choy, C., Anandkumar, A., Cho, M., Park, J.: Self-calibrating neural radiance fields. In: ICCV (2021)
Jung, H., Brasch, N., Leonardis, A., Navab, N., Busam, B.: Wild ToFu: improving range and quality of indirect time-of-flight depth with RGB fusion in challenging environments. In: 3DV (2021)
Kopf, J., et al.: One shot 3D photography. In: ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH) (2020)
Kopf, J., Rong, X., Huang, J.B.: Robust consistent video depth estimation. In: CVPR (2021)
Li, L.: Time-of-flight camera – an introduction (2014). https://www.ti.com/lit/wp/sloa190b/sloa190b.pdf
Li, Z., et al.: Learning the depths of moving people by watching frozen people. In: CVPR (2019)
Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: BARF: bundle-adjusting neural radiance fields. In: ICCV (2021)
Lipson, L., Teed, Z., Deng, J.: RAFT-Stereo: multilevel recurrent field transforms for stereo matching. In: 3DV (2021)
Lombardi, S., Simon, T., Saragih, J., Schwartz, G., Lehrmann, A., Sheikh, Y.: Neural volumes: learning dynamic renderable volumes from images. ACM Trans. Graph. (2019)
Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. (2021)
Luo, X., Huang, J., Szeliski, R., Matzen, K., Kopf, J.: Consistent video depth estimation. ACM Trans. Graph. (Proceedings of ACM SIGGRAPH) (2020)
Marco, J., et al.: DeepToF: off-the-shelf real-time correction of multipath interference in time-of-flight imaging. ACM Trans. Graph. 36, 1–12 (2017)
Marin, G., Zanuttigh, P., Mattoccia, S.: Reliable fusion of ToF and stereo depth driven by confidence measures. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 386–401. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_24
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR (2015)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: ECCV (2020)
Pham, F.: Fusione di dati stereo e time-of-flight mediante tecniche di deep learning (2019). https://github.com/frankplus/tof-stereo-fusion
Poggi, M., Agresti, G., Tosi, F., Zanuttigh, P., Mattoccia, S.: Confidence estimation for ToF and stereo sensors and its application to depth data fusion. IEEE Sens. J. 20, 1411–1421(2020)
Qiu, D., Pang, J., Sun, W., Yang, C.: Deep end-to-end alignment and refinement for time-of-flight RGB-D modules. In: ICCV (2019)
Gao, R., Fan, N., Li, C., Liu, W., Chen, Q.: Joint depth and normal estimation from real-world time-of-flight raw data. In: IROS (2021)
Sachs, D., Nasiri, S., Goehl, D.: Image stabilization technology overview. InvenSense Whitepaper (2006)
Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 31–42. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_3
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Shih, M.L., Su, S.Y., Kopf, J., Huang, J.B.: 3D photography using context-aware layered depth inpainting. In: CVPR (2020)
Son, K., Liu, M.Y., Taguchi, Y.: Learning to remove multipath distortions in time-of-flight range images for a robotic arm setup. In: ICRA (2016)
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-d: A RGB-d scene understanding benchmark suite. In: CVPR (2015)
Su, S., Heide, F., Wetzstein, G., Heidrich, W.: Deep end-to-end time-of-flight imaging. In: CVPR (2018)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Valentin, J., et al.: Depth from motion for smartphone ar. SIGGRAPH Asia (2018)
Wadhwa, N., et al.: Synthetic depth-of-field with a single-camera mobile phone. SIGGRAPH (2018)
Wang, J., Qiu, K.F., Chao, P.: Control design and digital implementation of a fast 2-degree-of-freedom translational optical image stabilizer for image sensors in mobile camera phones. Sensors 17, 2333 (2017)
Wei, Y., Liu, S., Rao, Y., Zhao, W., Lu, J., Zhou, J.: NerfingMVS: guided optimization of neural radiance fields for indoor multi-view stereo. In: ICCV (2021)
Xian, W., Huang, J.B., Kopf, J., Kim, C.: Space-time neural irradiance fields for free-viewpoint video. In: CVPR (2021)
Zhang, X., Matzen, K., Nguyen, V., Yao, D., Zhang, Y., Ng, R.: Synthetic defocus and look-ahead autofocus for casual videography. SIGGRAPH (2019)
Zhou, Y., Barnes, C., Jingwan, L., Jimei, Y., Hao, L.: On the continuity of rotation representations in neural networks. In: CVPR (2019)
Zhu, R., Yu, D., Ji, S., Lu, M.: Matching RGB and infrared remote sensing images with densely-connected convolutional neural networks. Remote Sens. 11, 2836 (2019)
Acknowledgement
Min H. Kim acknowledges funding from Samsung Electronics, in addition to the partial support of the MSIT/IITP of Korea (RS-2022-00155620, 2022-0-00058, and 2017-0-00072), the NIRCH of Korea (2021A02P02-001), Microsoft Research Asia, and the Samsung Research Funding Center (SRFC-IT2001-04) for developing 3D imaging algorithms. James Tompkin thanks US NSF CAREER-2144956.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Meuleman, A., Kim, H., Tompkin, J., Kim, M.H. (2022). FloatingFusion: Depth from ToF and Image-Stabilized Stereo Cameras. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13661. Springer, Cham. https://doi.org/10.1007/978-3-031-19769-7_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-19769-7_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19768-0
Online ISBN: 978-3-031-19769-7
eBook Packages: Computer ScienceComputer Science (R0)