Advertisement

Multi-view to Novel View: Synthesizing Novel Views With Self-learned Confidence

  • Shao-Hua Sun
  • Minyoung Huh
  • Yuan-Hong Liao
  • Ning Zhang
  • Joseph J. Lim
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)

Abstract

In this paper, we address the task of multi-view novel view synthesis, where we are interested in synthesizing a target image with an arbitrary camera pose from given source images. We propose an end-to-end trainable framework that learns to exploit multiple viewpoints to synthesize a novel view without any 3D supervision. Specifically, our model consists of a flow prediction module and a pixel generation module to directly leverage information presented in source views as well as hallucinate missing pixels from statistical priors. To merge the predictions produced by the two modules given multi-view source images, we introduce a self-learned confidence aggregation mechanism. We evaluate our model on images rendered from 3D object models as well as real and synthesized scenes. We demonstrate that our model is able to achieve state-of-the-art results as well as progressively improve its predictions when more source images are available.

Keywords

Novel view synthesis Multi-view novel view synthesis 

Notes

Acknowledgments

This project was supported by SKT. The research of Shao-Hua Sun and Minyoung Huh were partially supported by Snap Inc. The authors are grateful to Youngwoon Lee and Hyeonwoo Noh for helpful discussions about this work.

Supplementary material

474178_1_En_10_MOESM1_ESM.pdf (11.7 mb)
Supplementary material 1 (pdf 11930 KB)

References

  1. 1.
    Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28, 976–990 (2010)CrossRefGoogle Scholar
  2. 2.
    Gavrila, D.M., Davis, L.S.: 3-D model-based tracking of humans in action: a multi-view approach. In: Computer Vision and Pattern Recognition (CVPR) (1996)Google Scholar
  3. 3.
    Junejo, I.N., Dexter, E., Laptev, I., Pérez, P.: Cross-view action recognition from temporal self-similarities. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 293–306. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88688-4_22CrossRefGoogle Scholar
  4. 4.
    Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Computer Vision and Pattern Recognition (CVPR) (2006)Google Scholar
  5. 5.
    Furukawa, Y., Ponce, J.: Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32(8), 1362–1376 (2010)CrossRefGoogle Scholar
  6. 6.
    Choy, C.B., Xu, D., Gwak, J.Y., Chen, K., Savarese, S.: 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 628–644. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_38CrossRefGoogle Scholar
  7. 7.
    Fan, H., Su, H., Guibas, L.: A point set generation network for 3D object reconstruction from a single image. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  8. 8.
    Remondino, F., El-Hakim, S.: Image-based 3D modelling: a review. Photogramm. Rec. 21, 269–291 (2006)CrossRefGoogle Scholar
  9. 9.
    Zwicker, M., Pauly, M., Knoll, O., Gross, M.: Pointshop 3D: an interactive system for point-based surface editing. In: ACM Transactions on Graphics (TOG) (2002)Google Scholar
  10. 10.
    Seitz, S.M., Dyer, C.R.: View morphing. In: Special Interest Group on GRAPHics and Interactive Techniques (SIGGRAPH) (1996)Google Scholar
  11. 11.
    Chen, S.E.: Quicktime VR: an image-based approach to virtual environment navigation. In: Special Interest Group on GRAPHics and Interactive Techniques (SIGGRAPH) (1995)Google Scholar
  12. 12.
    Szeliski, R., Shum, H.Y.: Creating full view panoramic image mosaics and environment maps. In: Special Interest Group on GRAPHics and Interactive Techniques (SIGGRAPH) (1997)Google Scholar
  13. 13.
    Forsyth, D., Ponce, J.: Computer Vision: A Modern Approach. Pearson, London (2011)Google Scholar
  14. 14.
    Sturm, P., Triggs, B.: A factorization based algorithm for multi-image projective structure and motion. In: Buxton, B., Cipolla, R. (eds.) ECCV 1996. LNCS, vol. 1065, pp. 709–720. Springer, Heidelberg (1996).  https://doi.org/10.1007/3-540-61123-1_183CrossRefGoogle Scholar
  15. 15.
    Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B., et al.: FastSLAM: a factored solution to the simultaneous localization and mapping problem. In: AAAI/IAAI (2002)Google Scholar
  16. 16.
    Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. IEEE Robot. Autom. Mag. 13, 99–110 (2006)CrossRefGoogle Scholar
  17. 17.
    Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996)Google Scholar
  18. 18.
    Chang, A.X., et al.: Shapenet: an information-rich 3D model repository. Technical report arXiv:1512.03012 [cs.GR] (2015)
  19. 19.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Single-view to multi-view: reconstructing unseen views with a convolutional network. CoRR abs/1511.06702 (2015)Google Scholar
  20. 20.
    Yang, J., Reed, S.E., Yang, M.H., Lee, H.: Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In: Advances in Neural Information Processing Systems, pp. 1099–1107 (2015)Google Scholar
  21. 21.
    Rematas, K., Nguyen, C.H., Ritschel, T., Fritz, M., Tuytelaars, T.: Novel views of objects from a single image. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1576–1590 (2017)CrossRefGoogle Scholar
  22. 22.
    Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_18CrossRefGoogle Scholar
  23. 23.
    Park, E., Yang, J., Yumer, E., Ceylan, D., Berg, A.C.: Transformation-grounded image generation network for novel 3D view synthesis. In: CVPR (2017)Google Scholar
  24. 24.
    Flynn, J., Neulander, I., Philbin, J., Snavely, N.: Deepstereo: learning to predict new views from the world’s imagery. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  25. 25.
    Ji, D., Kwon, J., McFarland, M., Savarese, S.: Deep view morphing. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  26. 26.
    Zhou, T., Krahenbuhl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  27. 27.
    Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  28. 28.
    Ren, Z., Yan, J., Ni, B., Liu, B., Yang, X., Zha, H.: Unsupervised deep learning for optical flow estimation. In: AAAI, pp. 1495–1501 (2017)Google Scholar
  29. 29.
    Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5304, pp. 28–42. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88690-7_3CrossRefGoogle Scholar
  30. 30.
    Dosovitskiy, A., Springenberg, J.T., Tatarchenko, M., Brox, T.: Learning to generate chairs, tables and cars with convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 39, 692–705 (2017)Google Scholar
  31. 31.
    Huang, R., Zhang, S., Li, T., He, R.: Beyond face rotation: global and local perception gan for photorealistic and identity preserving frontal view synthesis. arXiv preprint arXiv:1704.04086 (2017)
  32. 32.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016)
  33. 33.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017)
  34. 34.
    Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. arXiv preprint arXiv:1703.00848 (2017)
  35. 35.
    Kim, T., Cha, M., Kim, H., Lee, J., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks. arXiv preprint arXiv:1703.05192 (2017)
  36. 36.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  37. 37.
    Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.k., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, pp. 802–810 (2015)Google Scholar
  38. 38.
    Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: ICCV (2017)Google Scholar
  39. 39.
    Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning (2016)Google Scholar
  40. 40.
    Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems (2017)Google Scholar
  41. 41.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)Google Scholar
  42. 42.
    Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  43. 43.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  44. 44.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)Google Scholar
  45. 45.
    Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111, 98–136 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Shao-Hua Sun
    • 1
  • Minyoung Huh
    • 2
  • Yuan-Hong Liao
    • 3
  • Ning Zhang
    • 4
  • Joseph J. Lim
    • 1
  1. 1.University of Southern CaliforniaLos AngelesUSA
  2. 2.Carnegie Mellon UniversityPittsburghUSA
  3. 3.National Tsing Hua UniversityHsinchuTaiwan
  4. 4.Snap Inc.VeniceUSA

Personalised recommendations