Advertisement

Geometry-Guided View Synthesis with Local Nonuniform Plane-Sweep Volume

  • Ao Li
  • Li FangEmail author
  • Long Ye
  • Wei Zhong
  • Qin Zhang
Conference paper
  • 33 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1181)

Abstract

In this paper we develop a geometry-guided image generation technology for scene-independent novel view synthesis from a stereo image pair. We employ the successful plane-sweep strategy to tackle the problem of 3D scene structure approximation. But instead of putting on a general configuration, we use depth information to perform a local nonuniform plane spacing. More specifically, we first explicitly estimate a depth map in the reference view and use it to guide the planes spacing in plane-sweep volume, resulting in a geometry-guided manner for scene geometry approximation. Next we learn to predict a multiplane images (MPIs) representation, which can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline, to allow for efficient view synthesis. Our results on massive YouTube video frames dataset indicate that our approach makes it possible to synthesize higher quality images, while keeping the number of depth planes.

Keywords

Image-based rendering View synthesis Deep neural networks Plane sweep volume 

References

  1. 1.
    Tanimoto, M.: Overview of FTV (free-viewpoint television). In: Proceedings of the IEEE Conference on Multimedia and Expo (ICME 2009), pp. 1552–1553, June 2009Google Scholar
  2. 2.
    Kopf, J., Cohen, M.F., Szeliski, R.: First-person hyperlapse videos. In: SIGGRAPH (2014)Google Scholar
  3. 3.
    Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)CrossRefGoogle Scholar
  4. 4.
    Kim, C., Zimmer, H., Pritch, Y., Sorkine-Hornung, A., Gross, M.: Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graph. 32(4), 1–12 (2013)zbMATHGoogle Scholar
  5. 5.
    Adelson, E., Bergen, J.: The plenoptic function and the elements of early vision. In: Computational Models of Visual Processing. MIT Press, Cambridge (1991)Google Scholar
  6. 6.
    Levoy, M., Hanrahan, P.: Light field rendering. In: Proceedings of the ACM SIGGRAPH, pp. 31–42 (1996)Google Scholar
  7. 7.
    Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the ACM SIGGRAPH, pp. 43–54 (1996)Google Scholar
  8. 8.
    Buehler, C., Bosse, M., Mcmillan, L., et al.: Unstructured lumigraph rendering. In: Conference on Computer Graphics & Interactive Techniques. ACM (2001)Google Scholar
  9. 9.
    Chai, J., Tong, X., Chan, S., et al.: Plenoptic sampling. In: Proceedings of the ACM SIGGRAPH, pp. 307–318 (2000)Google Scholar
  10. 10.
    Pearson, J., Brookes, M., Dragotti, P.L.: Plenoptic layer-based modeling for image based rendering. IEEE Trans. Image Process. 22(9), 3405–3419 (2013)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Single-view to multi-view: reconstructing unseen views with a convolutional network. Knowl. Inf. Syst. 38(1), 231–257 (2015)Google Scholar
  12. 12.
    Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_18CrossRefGoogle Scholar
  13. 13.
    Sun, S.-H., Huh, M., Liao, Y.-H., Zhang, N., Lim, Joseph J.: Multi-view to novel view: synthesizing novel views with self-learned confidence. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 162–178. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_10CrossRefGoogle Scholar
  14. 14.
    Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5515–5524 (2016)Google Scholar
  15. 15.
    Takeuchi, K., Okami, K., Ochi, D., et al.: Partial plane sweep volume for deep learning based view synthesis. In: ACM SIGGRAPH 2017 Posters. ACM (2017)Google Scholar
  16. 16.
    Liu, M., He, X., Salzmann, M.: Geometry-aware deep network for single-image novel view synthesis. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4616–4624 (2018)Google Scholar
  17. 17.
    Kalantari, N.K., Wang, T.-C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35(6), 1–10 (2016)CrossRefGoogle Scholar
  18. 18.
    Tao, M.W., Srinivasan, P.P., Malik, J., et al.: Depth from shading, defocus, and correspondence using light-field angular coherence. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society (2015)Google Scholar
  19. 19.
    Penner, E., Zhang, L.: Soft 3D reconstruction for view synthesis. In: Proceedings of the SIGGRAPH Asia (2017)CrossRefGoogle Scholar
  20. 20.
    Zhou, T., Tucker, R., Flynn, J., et al.: Stereo magnification: learning view synthesis using multiplane images (2018)Google Scholar
  21. 21.
    Kalantari, N.K., Wang, T.-C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. In: Proceedings of the SIGGRAPH Asia (2016)Google Scholar
  22. 22.
    Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 785–801. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01237-3_47CrossRefGoogle Scholar
  23. 23.
    Hu, J., Ozay, M., Zhang, Y., et al.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries (2018)Google Scholar
  24. 24.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Shade, J., Gortler, S., He, L., Szeliski, R.: Layered depth images. In: Proceedings of the SIGGRAPH (1998)Google Scholar
  26. 26.
    Collins, R.T.: A space-sweep approach to true multi-image matching. In: CVPR (1996)Google Scholar
  27. 27.
    Szeliski, R., Golland, P.: Stereo matching with transparency and matting. IJCV 32(1), 45–61 (1999)CrossRefGoogle Scholar
  28. 28.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS (2015)Google Scholar
  29. 29.
    Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
  30. 30.
    Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI (2016)Google Scholar
  31. 31.
    Agarwal, S., Mierle, K., et al.: Ceres Solver (2016). http://ceres-solver.org
  32. 32.
    Hasinoff, S.W., et al.: Burst photography for high dynamic range and low-light imaging on mobile cameras. In: Proceedings of the SIGGRAPH Asia (2016)Google Scholar
  33. 33.
    Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NIPS (2016)Google Scholar
  34. 34.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  35. 35.
    Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)CrossRefGoogle Scholar
  36. 36.
    Lin, Z., Shum, H.-Y.: A geometric analysis of light field rendering. Int. J. Comput. Vis. 58(2), 121–138 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Key Laboratory of Media Audio and VideoCommunication University of China, Ministry of EducationBeijingChina

Personalised recommendations