Abstract
Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices. We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. The key idea is to flood illuminate large scene patches (or the entire scene) with a pulsed light source and measure the time-resolved reflected light by integrating over the entire illuminated area. The one-dimensional measured temporal waveform, called transient, encodes both distances and albedoes at all visible scene points and as such is an aggregate proxy for the scene’s 3D geometry. We explore the viability and limitations of the transient waveforms by themselves for recovering scene information, and also when combined with traditional RGB cameras. We show that plane estimation can be performed from a single transient and that using only a few more it is possible to recover a depth map of the whole scene. We also show two proof-of-concept hardware prototypes that demonstrate the feasibility of our approach for compact, mobile, and budget-limited applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
This is different from a transient scene response [26] which is acquired at each (x, y) location (either by raster scanning, or a sensor pixel array) whereas a transient histogram integrates over all patches.
- 2.
The quantum efficiency of the SPAD pixel is absorbed into \(\varphi _i\).
- 3.
- 4.
Details of experimental hardware are discussed later in Sect. 5.3.
References
Aich, S., Vianney, J.M.U., Islam, M.A., Kaur, M., Liu, B.: Bidirectional attention network for monocular depth estimation. arXiv:2009.00743 [cs] (2020)
Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv:1812.11941 [cs] (2019). version: 2
Bergman, A.W., Lindell, D.B., Wetzstein, G.: Deep adaptive LiDAR: end-to-end optimization of sampling and depth completion at low sampling rates. In: 2020 IEEE International Conference on Computational Photography (ICCP), pp. 1–11. IEEE, Saint Louis, MO, USA (2020). https://doi.org/10.1109/ICCP48838.2020.9105252, https://ieeexplore.ieee.org/document/9105252/
Callenberg, C., Shi, Z., Heide, F., Hullin, M.B.: Low-cost SPAD sensing for non-line-of-sight tracking, material classification and depth imaging. ACM Trans. Graph. 40(4), 1–12 (2021). https://doi.org/10.1145/3450626.3459824, https://dl.acm.org/doi/10.1145/3450626.3459824
Chang, J., Wetzstein, G.: Deep Optics for Monocular Depth Estimation and 3D Object Detection. arXiv:1904.08601 [cs, eess] (2019)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. arXiv:1411.4734 [cs] (2015)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, pp. 2366–2374. NIPS 2014, MIT Press, Cambridge, MA, USA (2014)
Fang, Z., Chen, X., Chen, Y., Van Gool, L.: Towards good practice for CNN-based monocular depth estimation. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1080–1089. IEEE, Snowmass Village, CO, USA (2020). https://doi.org/10.1109/WACV45572.2020.9093334, https://ieeexplore.ieee.org/document/9093334/
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. CoRR abs/1806.02446 (2018). http://arxiv.org/abs/1806.02446, _eprint: 1806.02446
Gupta, A., Ingle, A., Gupta, M.: Asynchronous single-photon 3D imaging. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7908–7917. IEEE, Seoul, Korea (South) (2019). https://doi.org/10.1109/ICCV.2019.00800, https://ieeexplore.ieee.org/document/9009520/
Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. arXiv:1809.00646 [cs] (2018)
Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. In: ACM SIGGRAPH 2005 Papers, pp. 577–584. SIGGRAPH 2005, Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1186822.1073232
Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vis. 75(1), 151–172 (2007). https://doi.org/10.1007/s11263-006-0031-y, http://link.springer.com/10.1007/s11263-006-0031-y
Huynh, L., Nguyen-Ha, P., Matas, J., Rahtu, E., Heikkila, J.: Guiding monocular depth estimation using depth-attention volume. arXiv:2004.02760 [cs] (2020)
Kim, B., Ponce, J., Ham, B.: Deformable kernel networks for joint image filtering. Int. J. Comput. Vis. 129(2), 579–600 (2021). https://doi.org/10.1007/s11263-020-01386-z, http://arxiv.org/abs/1910.08373, arXiv: 1910.08373
Lee, D.N.: A theory of visual control of braking based on information about time-to-collision. Perception 5(4), 437–459 (1976). https://doi.org/10.1068/p050437
Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326 [cs] (2020)
Lee, J., Gupta, M.: Blocks-world cameras. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11407–11417. IEEE, Nashville, TN, USA (2021). https://doi.org/10.1109/CVPR46437.2021.01125, https://ieeexplore.ieee.org/document/9578739/
Lindell, D.B., O’Toole, M., Wetzstein, G.: Single-photon 3D imaging with deep sensor fusion. ACM Trans. Graph. 37(4), 113:1–113:12 (2018). https://doi.org/10.1145/3197517.3201316
Liu, C., Kim, K., Gu, J., Furukawa, Y., Kautz, J.: PlaneRCNN: 3D plane detection and reconstruction from a single image. arXiv:1812.04072 [cs] (2019)
Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. arXiv:1804.06278 [cs] (2018). version: 1
Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2016). https://doi.org/10.1109/TPAMI.2015.2505283, arXiv: 1502.07411
Metzler, C.A., Lindell, D.B., Wetzstein, G.: Keyhole imaging: non-line-of-sight imaging and tracking of moving objects along a single optical path. IEEE Trans. Comput. Imaging 7, 1–12 (2021). https://doi.org/10.1109/TCI.2020.3046472, conference Name: IEEE Transactions on Computational Imaging
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Nishimura, M., Lindell, D.B., Metzler, C., Wetzstein, G.: Disambiguating monocular depth estimation with a single transient. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 139–155. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_9
O’Toole, M., Heide, F., Lindell, D.B., Zang, K., Diamond, S., Wetzstein, G.: Reconstructing transient images from single-photon sensors. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2289–2297 (2017). https://doi.org/10.1109/CVPR.2017.246, iSSN: 1063-6919
Owen, A.B.: A robust hybrid of lasso and ridge regression. In: Verducci, J.S., Shen, X., Lafferty, J. (eds.) Contemporary Mathematics, vol. 443, pp. 59–71. American Mathematical Society, Providence, Rhode Island (2007). https://doi.org/10.1090/conm/443/08555, http://www.ams.org/conm/443/
Pediredla, A.K., Sankaranarayanan, A.C., Buttafava, M., Tosi, A., Veeraraghavan, A.: Signal processing based pile-up compensation for gated single-photon avalanche diodes. arXiv:1806.07437 [physics] (2018)
Pediredla, A.K., Buttafava, M., Tosi, A., Cossairt, O., Veeraraghavan, A.: Reconstructing rooms using photon echoes: a plane based model and reconstruction algorithm for looking around the corner. In: 2017 IEEE International Conference on Computational Photography (ICCP), pp. 1–12 (2017). https://doi.org/10.1109/ICCPHOT.2017.7951478, iSSN: 2472-7636
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. arXiv:2103.13413 [cs] (2021)
Rapp, J., Rapp, J., Ma, Y., Dawson, R.M.A., Goyal, V.K.: High-flux single-photon lidar. Optica 8(1), 30–39 (2021). https://doi.org/10.1364/OPTICA.403190, https://opg.optica.org/optica/abstract.cfm?uri=optica-8-1-30, publisher: Optica Publishing Group
Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)
Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, pp. 1161–1168. NIPS 2005, MIT Press, Cambridge, MA, USA (2005)
Tsai, C.Y., Kutulakos, K.N., Narasimhan, S.G., Sankaranarayanan, A.C.: The geometry of first-returning photons for non-line-of-sight imaging. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2336–2344 (2017). https://doi.org/10.1109/CVPR.2017.251, iSSN: 1063-6919
Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. arXiv:1812.07179 [cs] (2020)
Wu, Y., Boominathan, V., Chen, H., Sankaranarayanan, A., Veeraraghavan, A.: PhaseCam3D - learning phase masks for passive single view depth estimation. In: 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–12 (2019). https://doi.org/10.1109/ICCPHOT.2019.8747330, iSSN: 2472-7636
Xia, Z., Sullivan, P., Chakrabarti, A.: Generating and exploiting probabilistic monocular depth estimates. arXiv:1906.05739 [cs] (2019)
Xin, S., Nousias, S., Kutulakos, K.N., Sankaranarayanan, A.C., Narasimhan, S.G., Gkioulekas, I.: A theory of fermat paths for non-line-of-sight shape reconstruction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6793–6802. IEEE, Long Beach, CA, USA (2019). https://doi.org/10.1109/CVPR.2019.00696, https://ieeexplore.ieee.org/document/8954312/
Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., Torr, P.: Domain-invariant stereo matching networks. arXiv:1911.13287 [cs] (2019)
Zhang, K., Xie, J., Snavely, N., Chen, Q.: Depth sensing beyond LiDAR range. arXiv:2004.03048 [cs] (2020)
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612–6619. IEEE, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.700, http://ieeexplore.ieee.org/document/8100183/
Zwald, L., Lambert-Lacroix, S.: The BerHu penalty and the grouped effect. arXiv:1207.6868 [math, stat] (2012)
Acknowledgments
This research was supported in part by the NSF CAREER award 1943149, NSF award CNS-2107060 and Intel-NSF award CNS-2003129. We thank Talha Sultan for help with data acquisition.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jungerman, S., Ingle, A., Li, Y., Gupta, M. (2022). 3D Scene Inference from Transient Histograms. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13667. Springer, Cham. https://doi.org/10.1007/978-3-031-20071-7_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-20071-7_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20070-0
Online ISBN: 978-3-031-20071-7
eBook Packages: Computer ScienceComputer Science (R0)