Advertisement

Distortion-Aware Convolutional Filters for Dense Prediction in Panoramic Images

  • Keisuke Tateno
  • Nassir Navab
  • Federico Tombari
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)

Abstract

There is a high demand of 3D data for 360\(^\circ \) panoramic images and videos, pushed by the growing availability on the market of specialized hardware for both capturing (e.g., omni-directional cameras) as well as visualizing in 3D (e.g., head mounted displays) panoramic images and videos. At the same time, 3D sensors able to capture 3D panoramic data are expensive and/or hardly available. To fill this gap, we propose a learning approach for panoramic depth map estimation from a single image. Thanks to a specifically developed distortion-aware deformable convolution filter, our method can be trained by means of conventional perspective images, then used to regress depth for panoramic images, thus bypassing the effort needed to create annotated panoramic training dataset. We also demonstrate our approach for emerging tasks such as panoramic monocular SLAM, panoramic semantic segmentation and panoramic style transfer.

References

  1. 1.
    Armeni, I., Sax, A., Zamir, A.R., Savarese, S.: Joint 2D–3D-semantic data for indoor scene understanding. ArXiv e-prints (2017)Google Scholar
  2. 2.
    Caruso, D., Engel, J., Cremers, D.: Large-scale direct slam for omnidirectional cameras. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2015)Google Scholar
  3. 3.
    Dai, J., et al.: Deformable convolutional networks. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  4. 4.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)Google Scholar
  5. 5.
    Eigen, D., Puhrsch, C., Fergus, R.: Prediction from a single image using a multi-scale deep network. In: Proceedings of Conference on Neural Information Processing Systems (NIPS) (2014)Google Scholar
  6. 6.
    Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  7. 7.
    Godard, C., Aodha, O.M., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  8. 8.
    Greene, N.: Environment mapping and other applications of world projections. IEEE Comput. Graph. Appl. 6, 21–29 (1986)CrossRefGoogle Scholar
  9. 9.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  10. 10.
    Henriques, J.F., Vedaldi, A.: Warped convolutions: efficient invariance to spatial transformations. In: International Conference on Machine Learning (ICML) (2017)Google Scholar
  11. 11.
    Hoiem, D., Efros, A., Hebert, M.: Geometric context from a single image. In: Computer Vision and Pattern Recognition (CVPR) (2005)Google Scholar
  12. 12.
    Jaderberg, M., Simonyan, K., Zisserman, A., kavukcuoglu, k.: Spatial transformer networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)Google Scholar
  13. 13.
    Jeon, Y., Kim, J.: Active convolution: learning the shape of convolution for image classification. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  14. 14.
    Kuznietsov, Y., Stuckler, J., Leibe, B.: Semi-supervised deep learning for monocular depth map prediction. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  15. 15.
    Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: IEEE International Conference on 3D Vision (3DV) (arXiv:1606.00373), October 2016
  16. 16.
    Li, B., Shen, C., Dai, Y., den Hengel, A.V., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1119–1127 (2015)Google Scholar
  17. 17.
    Liu, B., Gould, S., Koller, D.: Single image depth estimation from predicted semantic labels. In: Computer Vision and Pattern Recognition (CVPR) (2010)Google Scholar
  18. 18.
    Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5162–5170 (2015)Google Scholar
  19. 19.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  20. 20.
    Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from rgbd images. In: ECCV (2012)Google Scholar
  22. 22.
    Tateno, K., Tombari, F., Laina, I., Navab, N.: Cnn-slam: Real-time dense monocular slam with learned depth prediction. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  23. 23.
    Telea, A.: An image inpainting technique based on the fast marching method. J. Graph. Tools 9(1), 23–34 (2004)CrossRefGoogle Scholar
  24. 24.
    Wang, P., Shen, X., Lin, Z., Cohen, S., Price, B., Yuille, A.L.: Towards unified depth and semantic prediction from a single image. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2800–2809 (2015)Google Scholar
  25. 25.
    Xu, D., Ricci, E.: Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  26. 26.
    Xu, J., Stenger, B., Kerola, T., Tung, T.: Pano2cad: room layout from a single panorama image. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2017)Google Scholar
  27. 27.
    Yang, H., Zhang, H.: Efficient 3d room shape recovery from a single panorama. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  28. 28.
    Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: ICLR (2016)Google Scholar
  29. 29.
    Yu-Chuan, S., Kristen, G.: Flat2sphere: Learning spherical convolution for fast features from 360 imagery. In: Proceedings of International Conference on Neural Information Processing Systems (NIPS) (2017)Google Scholar
  30. 30.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Keisuke Tateno
    • 1
    • 2
  • Nassir Navab
    • 1
    • 3
  • Federico Tombari
    • 1
  1. 1.CAMP - TU MunichGarchingGermany
  2. 2.Canon Inc.TokyoJapan
  3. 3.Johns Hopkins UniversityBaltimoreUSA

Personalised recommendations