Skip to main content

3D Scene Inference from Transient Histograms

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13667))

Included in the following conference series:

Abstract

Time-resolved image sensors that capture light at pico-to-nanosecond timescales were once limited to niche applications but are now rapidly becoming mainstream in consumer devices. We propose low-cost and low-power imaging modalities that capture scene information from minimal time-resolved image sensors with as few as one pixel. The key idea is to flood illuminate large scene patches (or the entire scene) with a pulsed light source and measure the time-resolved reflected light by integrating over the entire illuminated area. The one-dimensional measured temporal waveform, called transient, encodes both distances and albedoes at all visible scene points and as such is an aggregate proxy for the scene’s 3D geometry. We explore the viability and limitations of the transient waveforms by themselves for recovering scene information, and also when combined with traditional RGB cameras. We show that plane estimation can be performed from a single transient and that using only a few more it is possible to recover a depth map of the whole scene. We also show two proof-of-concept hardware prototypes that demonstrate the feasibility of our approach for compact, mobile, and budget-limited applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    This is different from a transient scene response [26] which is acquired at each (xy) location (either by raster scanning, or a sensor pixel array) whereas a transient histogram integrates over all patches.

  2. 2.

    The quantum efficiency of the SPAD pixel is absorbed into \(\varphi _i\).

  3. 3.

    In case of high ambient illumination, existing pile-up mitigation techniques [10, 28, 31] can be employed.

  4. 4.

    Details of experimental hardware are discussed later in Sect. 5.3.

References

  1. Aich, S., Vianney, J.M.U., Islam, M.A., Kaur, M., Liu, B.: Bidirectional attention network for monocular depth estimation. arXiv:2009.00743 [cs] (2020)

  2. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv:1812.11941 [cs] (2019). version: 2

  3. Bergman, A.W., Lindell, D.B., Wetzstein, G.: Deep adaptive LiDAR: end-to-end optimization of sampling and depth completion at low sampling rates. In: 2020 IEEE International Conference on Computational Photography (ICCP), pp. 1–11. IEEE, Saint Louis, MO, USA (2020). https://doi.org/10.1109/ICCP48838.2020.9105252, https://ieeexplore.ieee.org/document/9105252/

  4. Callenberg, C., Shi, Z., Heide, F., Hullin, M.B.: Low-cost SPAD sensing for non-line-of-sight tracking, material classification and depth imaging. ACM Trans. Graph. 40(4), 1–12 (2021). https://doi.org/10.1145/3450626.3459824, https://dl.acm.org/doi/10.1145/3450626.3459824

  5. Chang, J., Wetzstein, G.: Deep Optics for Monocular Depth Estimation and 3D Object Detection. arXiv:1904.08601 [cs, eess] (2019)

  6. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. arXiv:1411.4734 [cs] (2015)

  7. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, pp. 2366–2374. NIPS 2014, MIT Press, Cambridge, MA, USA (2014)

    Google Scholar 

  8. Fang, Z., Chen, X., Chen, Y., Van Gool, L.: Towards good practice for CNN-based monocular depth estimation. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1080–1089. IEEE, Snowmass Village, CO, USA (2020). https://doi.org/10.1109/WACV45572.2020.9093334, https://ieeexplore.ieee.org/document/9093334/

  9. Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. CoRR abs/1806.02446 (2018). http://arxiv.org/abs/1806.02446, _eprint: 1806.02446

  10. Gupta, A., Ingle, A., Gupta, M.: Asynchronous single-photon 3D imaging. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7908–7917. IEEE, Seoul, Korea (South) (2019). https://doi.org/10.1109/ICCV.2019.00800, https://ieeexplore.ieee.org/document/9009520/

  11. Hao, Z., Li, Y., You, S., Lu, F.: Detail preserving depth estimation from a single image using attention guided networks. arXiv:1809.00646 [cs] (2018)

  12. Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. In: ACM SIGGRAPH 2005 Papers, pp. 577–584. SIGGRAPH 2005, Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1186822.1073232

  13. Hoiem, D., Efros, A.A., Hebert, M.: Recovering surface layout from an image. Int. J. Comput. Vis. 75(1), 151–172 (2007). https://doi.org/10.1007/s11263-006-0031-y, http://link.springer.com/10.1007/s11263-006-0031-y

  14. Huynh, L., Nguyen-Ha, P., Matas, J., Rahtu, E., Heikkila, J.: Guiding monocular depth estimation using depth-attention volume. arXiv:2004.02760 [cs] (2020)

  15. Kim, B., Ponce, J., Ham, B.: Deformable kernel networks for joint image filtering. Int. J. Comput. Vis. 129(2), 579–600 (2021). https://doi.org/10.1007/s11263-020-01386-z, http://arxiv.org/abs/1910.08373, arXiv: 1910.08373

  16. Lee, D.N.: A theory of visual control of braking based on information about time-to-collision. Perception 5(4), 437–459 (1976). https://doi.org/10.1068/p050437

    Article  Google Scholar 

  17. Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv:1907.10326 [cs] (2020)

  18. Lee, J., Gupta, M.: Blocks-world cameras. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11407–11417. IEEE, Nashville, TN, USA (2021). https://doi.org/10.1109/CVPR46437.2021.01125, https://ieeexplore.ieee.org/document/9578739/

  19. Lindell, D.B., O’Toole, M., Wetzstein, G.: Single-photon 3D imaging with deep sensor fusion. ACM Trans. Graph. 37(4), 113:1–113:12 (2018). https://doi.org/10.1145/3197517.3201316

  20. Liu, C., Kim, K., Gu, J., Furukawa, Y., Kautz, J.: PlaneRCNN: 3D plane detection and reconstruction from a single image. arXiv:1812.04072 [cs] (2019)

  21. Liu, C., Yang, J., Ceylan, D., Yumer, E., Furukawa, Y.: PlaneNet: piece-wise planar reconstruction from a single RGB image. arXiv:1804.06278 [cs] (2018). version: 1

  22. Liu, F., Shen, C., Lin, G., Reid, I.: Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell. 38(10), 2024–2039 (2016). https://doi.org/10.1109/TPAMI.2015.2505283, arXiv: 1502.07411

  23. Metzler, C.A., Lindell, D.B., Wetzstein, G.: Keyhole imaging: non-line-of-sight imaging and tracking of moving objects along a single optical path. IEEE Trans. Comput. Imaging 7, 1–12 (2021). https://doi.org/10.1109/TCI.2020.3046472, conference Name: IEEE Transactions on Computational Imaging

  24. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54

    Chapter  Google Scholar 

  25. Nishimura, M., Lindell, D.B., Metzler, C., Wetzstein, G.: Disambiguating monocular depth estimation with a single transient. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 139–155. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_9

    Chapter  Google Scholar 

  26. O’Toole, M., Heide, F., Lindell, D.B., Zang, K., Diamond, S., Wetzstein, G.: Reconstructing transient images from single-photon sensors. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2289–2297 (2017). https://doi.org/10.1109/CVPR.2017.246, iSSN: 1063-6919

  27. Owen, A.B.: A robust hybrid of lasso and ridge regression. In: Verducci, J.S., Shen, X., Lafferty, J. (eds.) Contemporary Mathematics, vol. 443, pp. 59–71. American Mathematical Society, Providence, Rhode Island (2007). https://doi.org/10.1090/conm/443/08555, http://www.ams.org/conm/443/

  28. Pediredla, A.K., Sankaranarayanan, A.C., Buttafava, M., Tosi, A., Veeraraghavan, A.: Signal processing based pile-up compensation for gated single-photon avalanche diodes. arXiv:1806.07437 [physics] (2018)

  29. Pediredla, A.K., Buttafava, M., Tosi, A., Cossairt, O., Veeraraghavan, A.: Reconstructing rooms using photon echoes: a plane based model and reconstruction algorithm for looking around the corner. In: 2017 IEEE International Conference on Computational Photography (ICCP), pp. 1–12 (2017). https://doi.org/10.1109/ICCPHOT.2017.7951478, iSSN: 2472-7636

  30. Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. arXiv:2103.13413 [cs] (2021)

  31. Rapp, J., Rapp, J., Ma, Y., Dawson, R.M.A., Goyal, V.K.: High-flux single-photon lidar. Optica 8(1), 30–39 (2021). https://doi.org/10.1364/OPTICA.403190, https://opg.optica.org/optica/abstract.cfm?uri=optica-8-1-30, publisher: Optica Publishing Group

  32. Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009)

    Article  Google Scholar 

  33. Saxena, A., Chung, S.H., Ng, A.Y.: Learning depth from single monocular images. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, pp. 1161–1168. NIPS 2005, MIT Press, Cambridge, MA, USA (2005)

    Google Scholar 

  34. Tsai, C.Y., Kutulakos, K.N., Narasimhan, S.G., Sankaranarayanan, A.C.: The geometry of first-returning photons for non-line-of-sight imaging. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2336–2344 (2017). https://doi.org/10.1109/CVPR.2017.251, iSSN: 1063-6919

  35. Wang, Y., Chao, W.L., Garg, D., Hariharan, B., Campbell, M., Weinberger, K.Q.: Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving. arXiv:1812.07179 [cs] (2020)

  36. Wu, Y., Boominathan, V., Chen, H., Sankaranarayanan, A., Veeraraghavan, A.: PhaseCam3D - learning phase masks for passive single view depth estimation. In: 2019 IEEE International Conference on Computational Photography (ICCP), pp. 1–12 (2019). https://doi.org/10.1109/ICCPHOT.2019.8747330, iSSN: 2472-7636

  37. Xia, Z., Sullivan, P., Chakrabarti, A.: Generating and exploiting probabilistic monocular depth estimates. arXiv:1906.05739 [cs] (2019)

  38. Xin, S., Nousias, S., Kutulakos, K.N., Sankaranarayanan, A.C., Narasimhan, S.G., Gkioulekas, I.: A theory of fermat paths for non-line-of-sight shape reconstruction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6793–6802. IEEE, Long Beach, CA, USA (2019). https://doi.org/10.1109/CVPR.2019.00696, https://ieeexplore.ieee.org/document/8954312/

  39. Zhang, F., Qi, X., Yang, R., Prisacariu, V., Wah, B., Torr, P.: Domain-invariant stereo matching networks. arXiv:1911.13287 [cs] (2019)

  40. Zhang, K., Xie, J., Snavely, N., Chen, Q.: Depth sensing beyond LiDAR range. arXiv:2004.03048 [cs] (2020)

  41. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6612–6619. IEEE, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.700, http://ieeexplore.ieee.org/document/8100183/

  42. Zwald, L., Lambert-Lacroix, S.: The BerHu penalty and the grouped effect. arXiv:1207.6868 [math, stat] (2012)

Download references

Acknowledgments

This research was supported in part by the NSF CAREER award 1943149, NSF award CNS-2107060 and Intel-NSF award CNS-2003129. We thank Talha Sultan for help with data acquisition.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sacha Jungerman .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1250 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jungerman, S., Ingle, A., Li, Y., Gupta, M. (2022). 3D Scene Inference from Transient Histograms. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13667. Springer, Cham. https://doi.org/10.1007/978-3-031-20071-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20071-7_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20070-0

  • Online ISBN: 978-3-031-20071-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics