Advertisement

Deep Multi Depth Panoramas for View Synthesis

Conference paper
  • 493 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12358)

Abstract

We propose a learning-based approach for novel view synthesis for multi-camera 360\(^\circ \) panorama capture rigs. Previous work constructs RGBD panoramas from such data, allowing for view synthesis with small amounts of translation, but cannot handle the disocclusions and view-dependent effects that are caused by large translations. To address this issue, we present a novel scene representation—Multi Depth Panorama (MDP)—that consists of multiple RGBD\(\alpha \) panoramas that represent both scene geometry and appearance. We demonstrate a deep neural network-based method to reconstruct MDPs from multi-camera 360\(^\circ \) images. MDPs are more compact than previous 3D scene representations and enable high-quality, efficient new view rendering. We demonstrate this via experiments on both synthetic and real data and comparisons with previous state-of-the-art methods spanning both learning-based approaches and classical RGBD-based methods.

Keywords

360\(^\circ \) panoramas View synthesis Image-based rendering Virtual reality 

Notes

Acknowledgements

We would like to thank In-Kyu Park for helpful discussion and comments. This work was supported in part by ONR grants N000141712687, N000141912293, N000142012529, NSF grant 1617234, Adobe, the Ronald L. Graham Chair and the UC San Diego Center for Visual Computing.

Supplementary material

Supplementary material 1 (mp4 23331 KB)

504454_1_En_20_MOESM2_ESM.pdf (36.2 mb)
Supplementary material 2 (pdf 37039 KB)

References

  1. 1.
    Anderson, R., et al.: Jump: virtual reality video. ACM Trans. Graph. (TOG) 35(6), 1–13 (2016)CrossRefGoogle Scholar
  2. 2.
    Bertel, T., Campbell, N.D.F., Richardt, C.: MegaParallax: casual 360\(^\circ \) panoramas with motion parallax. IEEE Trans. Vis. Comput. Graph. 25(5), 1828–1835 (2019).  https://doi.org/10.1109/TVCG.2019.2898799CrossRefGoogle Scholar
  3. 3.
    Broxton, M., et al.: Immersive light field video with a layered mesh representation. ACM Trans. Graph. 39(4), 86:1–86:15 (2020) CrossRefGoogle Scholar
  4. 4.
    Brunet, T., et al.: Soft 3D acoustic metamaterial with negative index. Nat. Mater. 14(4), 384–388 (2015)CrossRefGoogle Scholar
  5. 5.
    Buehler, C., Bosse, M., McMillan, L., Gortler, S., Cohen, M.: Unstructured lumigraph rendering. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 425–432. ACM (2001)Google Scholar
  6. 6.
    Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1511–1520 (2017)Google Scholar
  7. 7.
    Chen, S.E., Williams, L.: View interpolation for image synthesis. In: Proceedings of SIGGRAPH, pp. 279–288 (1993)Google Scholar
  8. 8.
    Cheng, S., et al.: Deep stereo using adaptive thin volume representation with uncertainty awareness. arXiv preprint arXiv:1911.12012 (2019)
  9. 9.
    Choi, I., Gallo, O., Troccoli, A., Kim, M.H., Kautz, J.: Extreme view synthesis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7781–7790 (2019)Google Scholar
  10. 10.
    Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 11–20. ACM (1996)Google Scholar
  11. 11.
    Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5515–5524 (2016)Google Scholar
  12. 12.
    Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 43–54. ACM (1996)Google Scholar
  13. 13.
    Hedman, P., Alsisan, S., Szeliski, R., Kopf, J.: Casual 3D photography. ACM Trans. Graph. 36(6), 234:1–234:15 (2017)CrossRefGoogle Scholar
  14. 14.
    Huang, J., Chen, Z., Ceylan, D., Jin, H.: 6-DOF VR videos with a single 360-camera. In: 2017 IEEE Virtual Reality (VR), pp. 37–44, March 2017.  https://doi.org/10.1109/VR.2017.7892229
  15. 15.
    Ishiguro, H., Yamamoto, M., Tsuji, S.: Omni-directional stereo. IEEE Trans. Pattern Anal. Mach. Intell. 2(2), 257–262 (1992)CrossRefGoogle Scholar
  16. 16.
    Kalantari, N.K., Wang, T.C., Ramamoorthi, R.: Learning-based view for light field cameras. ACM Trans. Graph. (TOG) 35(6), 193 (2016)CrossRefGoogle Scholar
  17. 17.
    Lee, J.H., Han, M.K., Ko, D.W., Suh, I.H.: From big to small: multi-scale local planar guidance for monocular depth estimation. arXiv preprint arXiv:1907.10326 (2019)
  18. 18.
    Levoy, M., Hanrahan, P.: Light field rendering. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 31–42. ACM (1996)Google Scholar
  19. 19.
    Luo, B., Xu, F., Richardt, C., Yong, J.H.: Parallax360: stereoscopic 360\(^\circ \) scene representation for head-motion parallax. IEEE Trans. Vis. Comput. Graph. 24(4), 1545–1553 (2018).  https://doi.org/10.1109/TVCG.2018.2794071CrossRefGoogle Scholar
  20. 20.
    Mildenhall, B., et al.: Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 38(4), 1–14 (2019)CrossRefGoogle Scholar
  21. 21.
    Peleg, S., Ben-Ezra, M., Pritch, Y.: Omnistereo: panoramic stereo imaging. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 279–290 (2001)CrossRefGoogle Scholar
  22. 22.
    Porter, T., Duff, T.: Compositing digital images. In: Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, pp. 253–259 (1984)Google Scholar
  23. 23.
    Pozo, A.P., et al.: An integrated 6DoF video camera and system design. ACM Trans. Graph. (TOG) 38(6), 1–16 (2019)CrossRefGoogle Scholar
  24. 24.
    Pulli, K., Hoppe, H., Cohen, M., Shapiro, L., Duchamp, T., Stuetzle, W.: View-based rendering: visualizing real objects from scanned range and color data. In: Dorsey, J., Slusallek, P. (eds.) EGSR 1997. E, pp. 23–34. Springer, Vienna (1997).  https://doi.org/10.1007/978-3-7091-6858-5_3CrossRefGoogle Scholar
  25. 25.
    Qiu, W., et al.: UnrealCV: Virtual worlds for computer vision. In: ACM Multimedia Open Source Software Competition (2017)Google Scholar
  26. 26.
    Richter, S.R., Roth, S.: Matryoshka networks: predicting 3D geometry via nested shape layers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1936–1944 (2018)Google Scholar
  27. 27.
    Serrano, A., et al.: Motion parallax for 360 RGBD video. IEEE Trans. Vis. Comput. Graph. 25(5), 1817–1827 (2019)CrossRefGoogle Scholar
  28. 28.
    Shade, J., Gortler, S., He, L.W., Szeliski, R.: Layered depth images. In: Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques, pp. 231–242. Association for Computing Machinery, Inc., July 1998. https://www.microsoft.com/en-us/research/publication/layered-depth-images/
  29. 29.
    Sinha, S., Steedly, D., Szeliski, R.: Piecewise planar stereo for image-based rendering. In: 12th International Conference on Computer Vision (ICCV), pp. 1881–1888. IEEE (2009)Google Scholar
  30. 30.
    Srinivasan, P.P., Tucker, R., Barron, J.T., Ramamoorthi, R., Ng, R., Snavely, N.: Pushing the boundaries of view extrapolation with multiplane images. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  31. 31.
    Szeliski, R., Golland, P.: Stereo matching with transparency and matting. Int. J. Comput. Vis. 32(1), 45–61 (1999). https://www.microsoft.com/en-us/research/publication/stereo-matching-with-transparency-and-matting/
  32. 32.
    Thatte, J., Boin, J.B., Lakshman, H., Girod, B.: Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2016)Google Scholar
  33. 33.
    Tulsiani, S., Tucker, R., Snavely, N.: Layer-structured 3D scene inference via view synthesis. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 311–327. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01234-2_19CrossRefGoogle Scholar
  34. 34.
    Xu, Z., Bi, S., Sunkavalli, K., Hadap, S., Su, H., Ramamoorthi, R.: Deep view synthesis from sparse photometric images. ACM Trans. Graph. 38(4), 76 (2019)Google Scholar
  35. 35.
    Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)Google Scholar
  36. 36.
    Zheng, K.C., Kang, S.B., Cohen, M.F., Szeliski, R.: Layered depth panoramas. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar
  37. 37.
    Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: learning view synthesis using multiplane images. In: SIGGRAPH (2018)Google Scholar
  38. 38.
    Zitnick, C.L., Kang, S.B., Uyttendaele, M., Winder, S., Szeliski, R.: High-quality video view interpolation using a layered representation. ACM Trans. Graph. (TOG) 23(3), 600–608 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.UC San DiegoSan DiegoUSA
  2. 2.UC BerkeleyBerkeleyUSA
  3. 3.Adobe ResearchSan JoseUSA

Personalised recommendations