View Synthesis by Appearance Flow

  • Tinghui Zhou
  • Shubham Tulsiani
  • Weilun Sun
  • Jitendra Malik
  • Alexei A. Efros
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9908)

Abstract

We address the problem of novel view synthesis: given an input image, synthesizing new images of the same object or scene observed from arbitrary viewpoints. We approach this as a learning task but, critically, instead of learning to synthesize pixels from scratch, we learn to copy them from the input image. Our approach exploits the observation that the visual appearance of different views of the same instance is highly correlated, and such correlation could be explicitly learned by training a convolutional neural network (CNN) to predict appearance flows – 2-D coordinate vectors specifying which pixels in the input view could be used to reconstruct the target view. Furthermore, the proposed framework easily generalizes to multiple input views by learning how to optimally combine single-view predictions. We show that for both objects and scenes, our approach is able to synthesize novel views of higher perceptual quality than previous CNN-based techniques.

References

  1. 1.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Single-view to multi-view: reconstructing unseen views with a convolutional network. arXiv preprint arXiv:1511.06702 (2015)
  2. 2.
    Shepard, R.N., Metzler, J.: Mental rotation of three-dimensional objects. Science 171, 701–703 (1971)CrossRefGoogle Scholar
  3. 3.
    Horry, Y., Anjyo, K.I., Arai, K.: Tour into the picture: using a spidery mesh interface to make animation from a single image. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (1997)Google Scholar
  4. 4.
    Oh, B.M., Chen, M., Dorsey, J., Durand, F.: Image-based modeling and photo editing. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (2001)Google Scholar
  5. 5.
    Zhang, L., Dugas-Phocion, G., Samson, J.S., Seitz, S.M.: Single-view modelling of free-form scenes. J. Vis. Comput. Anim. 13, 225–235 (2002)CrossRefMATHGoogle Scholar
  6. 6.
    Hoiem, D., Efros, A.A., Hebert, M.: Automatic photo pop-up. ACM Trans. Graph. (TOG) 24, 577–584 (2005)CrossRefGoogle Scholar
  7. 7.
    Zheng, Y., Chen, X., Cheng, M.M., Zhou, K., Hu, S.M., Mitra, N.J.: Interactive images: cuboid proxies for smart image manipulation. ACM Trans. Graph. (TOG) (2012)Google Scholar
  8. 8.
    Chen, T., Zhu, Z., Shamir, A., Hu, S.M., Cohen-Or, D.: 3-sweep: Extracting editable objects from a single photo. ACM Trans. Graph. (TOG) 32, 195 (2013)Google Scholar
  9. 9.
    Kholgade, N., Simon, T., Efros, A.A., Sheikh, Y.: 3d object manipulation in a single photograph using stock 3d models. ACM Trans. Graph. (TOG) (2014)Google Scholar
  10. 10.
    Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 44–51. Springer, Heidelberg (2011). doi:10.1007/978-3-642-21735-7_6 CrossRefGoogle Scholar
  11. 11.
    Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  12. 12.
    Jayaraman, D., Grauman, K.: Learning image representations tied to egomotion. In: IEEE International Conference on Computer Vision (2015)Google Scholar
  13. 13.
    Cheung, B., Livezey, J.A., Bansal, A.K., Olshausen, B.A.: Discovering hidden factors of variation in deep networks. arXiv preprint arXiv:1412.6583 (2014)
  14. 14.
    Kulkarni, T.D., Whitney, W.F., Kohli, P., Tenenbaum, J.: Deep convolutional inverse graphics network. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  15. 15.
    Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to hand-written zip code recognition. In: Neural Computation (1989)Google Scholar
  17. 17.
    Dosovitskiy, A., Springenberg, J.T., Brox, T.: Learning to generate chairs with convolutional neural networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  18. 18.
    Yang, J., Reed, S.E., Yang, M.H., Lee, H.: Weakly-supervised disentangling with recurrent transformations for 3d view synthesis. In: Advances in Neural Information Processing Systems (2015)Google Scholar
  19. 19.
    Furukawa, Y., Hernández, C.: Multi-view stereo: a tutorial. Found. Trends Comput. Graph. Vis. 9, 1–147 (2015)CrossRefGoogle Scholar
  20. 20.
    Rematas, K., Nguyen, C., Ritschel, T., Fritz, M., Tuytelaars, T.: Novel views of objects from a single image. arXiv preprint arXiv:1602.00328 (2015)
  21. 21.
    Su, H., Wang, F., Yi, L., Guibas, L.: 3d-assisted image feature synthesis for novel views of an object. In: International Conference on Computer Vision (2015)Google Scholar
  22. 22.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2005)Google Scholar
  23. 23.
    Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs: a hybrid geometry-and image-based approach. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (1996)Google Scholar
  24. 24.
    Levoy, M., Hanrahan, P.: Light field rendering. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 31–42. ACM (1996)Google Scholar
  25. 25.
    Gortler, S.J., Grzeszczuk, R., Szeliski, R., Cohen, M.F.: The lumigraph. In: Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, pp. 43–54. ACM (1996)Google Scholar
  26. 26.
    Buehler, C., Bosse, M., McMillan, L., Gortler, S., Cohen, M.: Unstructured lumigraph rendering. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 425–432. ACM (2001)Google Scholar
  27. 27.
    Flynn, J., Neulander, I., Philbin, J., Snavely, N.: Deepstereo: learning to predict new views from the world’s imagery. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  28. 28.
    Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1033–1038. IEEE (1999)Google Scholar
  29. 29.
    Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.: Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. (TOG) 28, 24 (2009)CrossRefGoogle Scholar
  30. 30.
    Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 327–340. ACM (2001)Google Scholar
  31. 31.
    Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 341–346. ACM (2001)Google Scholar
  32. 32.
    Jojic, N., Frey, B.J., Kannan, A.: Epitomic analysis of appearance and shape. In: IEEE International Conference on Computer Vision (2003)Google Scholar
  33. 33.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition (2012)Google Scholar
  34. 34.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  35. 35.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  36. 36.
    Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., Yu, F.: ShapeNet: an information-rich 3D model repository. Technical report arXiv:1512.03012 [cs.GR], Stanford University – Princeton University – Toyota Technological Institute at Chicago (2015)
  37. 37.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC2012) Results (2012). http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Tinghui Zhou
    • 1
  • Shubham Tulsiani
    • 1
  • Weilun Sun
    • 1
  • Jitendra Malik
    • 1
  • Alexei A. Efros
    • 1
  1. 1.University of CaliforniaBerkeleyUSA

Personalised recommendations