Geometry-Guided View Synthesis with Local Nonuniform Plane-Sweep Volume

Li, Ao; Fang, Li; Ye, Long; Zhong, Wei; Zhang, Qin

doi:10.1007/978-981-15-3341-9_32

Ao Li,
Li Fang¹¹,
Long Ye¹¹,
Wei Zhong¹¹ &
…
Qin Zhang¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1181))

Included in the following conference series:

International Forum on Digital TV and Wireless Multimedia Communications

636 Accesses
2 Citations

Abstract

In this paper we develop a geometry-guided image generation technology for scene-independent novel view synthesis from a stereo image pair. We employ the successful plane-sweep strategy to tackle the problem of 3D scene structure approximation. But instead of putting on a general configuration, we use depth information to perform a local nonuniform plane spacing. More specifically, we first explicitly estimate a depth map in the reference view and use it to guide the planes spacing in plane-sweep volume, resulting in a geometry-guided manner for scene geometry approximation. Next we learn to predict a multiplane images (MPIs) representation, which can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline, to allow for efficient view synthesis. Our results on massive YouTube video frames dataset indicate that our approach makes it possible to synthesize higher quality images, while keeping the number of depth planes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tanimoto, M.: Overview of FTV (free-viewpoint television). In: Proceedings of the IEEE Conference on Multimedia and Expo (ICME 2009), pp. 1552–1553, June 2009
Google Scholar
Kopf, J., Cohen, M.F., Szeliski, R.: First-person hyperlapse videos. In: SIGGRAPH (2014)
Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)
Article Google Scholar
Kim, C., Zimmer, H., Pritch, Y., Sorkine-Hornung, A., Gross, M.: Scene reconstruction from high spatio-angular resolution light fields. ACM Trans. Graph. 32(4), 1–12 (2013)
MATH Google Scholar
Adelson, E., Bergen, J.: The plenoptic function and the elements of early vision. In: Computational Models of Visual Processing. MIT Press, Cambridge (1991)
Google Scholar
Levoy, M., Hanrahan, P.: Light field rendering. In: Proceedings of the ACM SIGGRAPH, pp. 31–42 (1996)
Google Scholar
Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the ACM SIGGRAPH, pp. 43–54 (1996)
Google Scholar
Buehler, C., Bosse, M., Mcmillan, L., et al.: Unstructured lumigraph rendering. In: Conference on Computer Graphics & Interactive Techniques. ACM (2001)
Google Scholar
Chai, J., Tong, X., Chan, S., et al.: Plenoptic sampling. In: Proceedings of the ACM SIGGRAPH, pp. 307–318 (2000)
Google Scholar
Pearson, J., Brookes, M., Dragotti, P.L.: Plenoptic layer-based modeling for image based rendering. IEEE Trans. Image Process. 22(9), 3405–3419 (2013)
Article MathSciNet Google Scholar
Tatarchenko, M., Dosovitskiy, A., Brox, T.: Single-view to multi-view: reconstructing unseen views with a convolutional network. Knowl. Inf. Syst. 38(1), 231–257 (2015)
Google Scholar
Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_18
Chapter Google Scholar
Sun, S.-H., Huh, M., Liao, Y.-H., Zhang, N., Lim, Joseph J.: Multi-view to novel view: synthesizing novel views with self-learned confidence. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 162–178. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_10
Chapter Google Scholar
Flynn, J., Neulander, I., Philbin, J., Snavely, N.: DeepStereo: learning to predict new views from the world’s imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5515–5524 (2016)
Google Scholar
Takeuchi, K., Okami, K., Ochi, D., et al.: Partial plane sweep volume for deep learning based view synthesis. In: ACM SIGGRAPH 2017 Posters. ACM (2017)
Google Scholar
Liu, M., He, X., Salzmann, M.: Geometry-aware deep network for single-image novel view synthesis. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4616–4624 (2018)
Google Scholar
Kalantari, N.K., Wang, T.-C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35(6), 1–10 (2016)
Article Google Scholar
Tao, M.W., Srinivasan, P.P., Malik, J., et al.: Depth from shading, defocus, and correspondence using light-field angular coherence. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society (2015)
Google Scholar
Penner, E., Zhang, L.: Soft 3D reconstruction for view synthesis. In: Proceedings of the SIGGRAPH Asia (2017)
Article Google Scholar
Zhou, T., Tucker, R., Flynn, J., et al.: Stereo magnification: learning view synthesis using multiplane images (2018)
Google Scholar
Kalantari, N.K., Wang, T.-C., Ramamoorthi, R.: Learning-based view synthesis for light field cameras. In: Proceedings of the SIGGRAPH Asia (2016)
Google Scholar
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 785–801. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_47
Chapter Google Scholar
Hu, J., Ozay, M., Zhang, Y., et al.: Revisiting single image depth estimation: toward higher resolution maps with accurate object boundaries (2018)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Shade, J., Gortler, S., He, L., Szeliski, R.: Layered depth images. In: Proceedings of the SIGGRAPH (1998)
Google Scholar
Collins, R.T.: A space-sweep approach to true multi-image matching. In: CVPR (1996)
Google Scholar
Szeliski, R., Golland, P.: Stereo matching with transparency and matting. IJCV 32(1), 45–61 (1999)
Article Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: NIPS (2015)
Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI (2016)
Google Scholar
Agarwal, S., Mierle, K., et al.: Ceres Solver (2016). http://ceres-solver.org
Hasinoff, S.W., et al.: Burst photography for high dynamic range and low-light imaging on mobile cameras. In: Proceedings of the SIGGRAPH Asia (2016)
Google Scholar
Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NIPS (2016)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2018)
Article Google Scholar
Lin, Z., Shum, H.-Y.: A geometric analysis of light field rendering. Int. J. Comput. Vis. 58(2), 121–138 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Media Audio and Video, Communication University of China, Ministry of Education, Beijing, 100024, China
Li Fang, Long Ye, Wei Zhong & Qin Zhang

Authors

Ao Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Fang
View author publications
You can also search for this author in PubMed Google Scholar
Long Ye
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Qin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Fang .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
Shanghai Jiao Tong University, Shanghai, China
Jun Zhou
Shanghai Jiao Tong University, Shanghai, China
Hua Yang
Shanghai University, Shanghai, China
Ping An
Shanghai Jiao Tong University, Shanghai, China
Xiaokang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, A., Fang, L., Ye, L., Zhong, W., Zhang, Q. (2020). Geometry-Guided View Synthesis with Local Nonuniform Plane-Sweep Volume. In: Zhai, G., Zhou, J., Yang, H., An, P., Yang, X. (eds) Digital TV and Wireless Multimedia Communication. IFTC 2019. Communications in Computer and Information Science, vol 1181. Springer, Singapore. https://doi.org/10.1007/978-981-15-3341-9_32

Download citation

DOI: https://doi.org/10.1007/978-981-15-3341-9_32
Published: 16 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3340-2
Online ISBN: 978-981-15-3341-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics