Advertisement

Surface Normal Estimation of Tilted Images via Spatial Rectifier

Conference paper
  • 895 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12349)

Abstract

In this paper, we present a spatial rectifier to estimate surface normals of tilted images. Tilted images are of particular interest as more visual data are captured by arbitrarily oriented sensors such as body-/robot-mounted cameras. Existing approaches exhibit bounded performance on predicting surface normals because they were trained using gravity-aligned images. Our two main hypotheses are: (1) visual scene layout is indicative of the gravity direction; and (2) not all surfaces are equally represented by a learned estimator due to the structured distribution of the training data, thus, there exists a transformation for each tilted image that is more responsive to the learned estimator than others. We design a spatial rectifier that is learned to transform the surface normal distribution of a tilted image to the rectified one that matches the gravity-aligned training data distribution. Along with the spatial rectifier, we propose a novel truncated angular loss that offers a stronger gradient at smaller angular errors and robustness to outliers. The resulting estimator outperforms the state-of-the-art methods including data augmentation baselines not only on ScanNet and NYUv2 but also on a new dataset called Tilt-RGBD that includes considerable roll and pitch camera motion.

Keywords

Surface normal estimation Spatial rectifier Tilted images 

Notes

Acknowledgements

This work is supported by NSF IIS-1328722 and NSF CAREER IIS-1846031.

Supplementary material

Supplementary material 1 (mp4 82430 KB)

References

  1. 1.
    Bansal, A., Russell, B., Gupta, A.: Marr revisited: 2D–3D alignment via surface normal prediction. In: CVPR (2016)Google Scholar
  2. 2.
    Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. TPAMI (2017)Google Scholar
  3. 3.
    Chen, W., Xiang, D., Deng, J.: Surface normals in the wild. In: CVPR (2017)Google Scholar
  4. 4.
    Coughlan, J.M., Yuille, A.L.: The Manhattan world assumption: Regularities in scene statistics which enable Bayesian inference. In: NeurIPS (2001)Google Scholar
  5. 5.
    Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)Google Scholar
  6. 6.
    Do, T., Neira, L., Yang, Y., Roumeliotis, S.I.: Attitude tracking from a camera and an accelerometer on gyro-less devices. In: ISRR (2019)Google Scholar
  7. 7.
    Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: CVPR (2015)Google Scholar
  8. 8.
    Fei, X., Wong, A., Soatto, S.: Geo-supervised visual depth prediction. RA-L (2019)Google Scholar
  9. 9.
    Fischer, P., Dosovitskiy, A., Brox, T.: Image orientation estimation with convolutional networks. In: GCPR (2015)Google Scholar
  10. 10.
    Fouhey, D.F., Gupta, A., Hebert, M.: Data-driven 3D primitives for single image understanding. In: ICCV (2013)Google Scholar
  11. 11.
    Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: CVPR (2018)Google Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  13. 13.
    Hickson, S., Raveendran, K., Fathi, A., Murphy, K., Essa, I.: Floors are flat: leveraging semantics for real-time surface normal prediction. arXiv:1906.06792 (2019)
  14. 14.
    Huang, J., Zhou, Y., Funkhouser, T., Guibas, L.J.: FrameNet: learning local canonical frames of 3D surfaces from a single RGB image. In: ICCV (2019)Google Scholar
  15. 15.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NeurIPS (2015)Google Scholar
  16. 16.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR, vol. 1412 (2014)Google Scholar
  17. 17.
    Kirillov, A., Girshick, R., He, K., Dollár, P.: Panoptic feature pyramid networks. In: CVPR (2019)Google Scholar
  18. 18.
    Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. (1951)Google Scholar
  19. 19.
    Ladický, L., Zeisl, B., Pollefeys, M.: Discriminatively trained dense surface normal estimation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 468–484. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_31CrossRefGoogle Scholar
  20. 20.
    Lee, J.K., Yoon, K.J.: Real-time joint estimation of camera orientation and vanishing points. In: CVPR (2015)Google Scholar
  21. 21.
    Li, B., Shen, C., Dai, Y., Van Den Hengel, A., He, M.: Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs. In: CVPR (2015)Google Scholar
  22. 22.
    Liao, S., Gavves, E., Snoek, C.G.: Spherical regression: learning viewpoints, surface normals and 3D rotations on N-spheres. In: CVPR (2019)Google Scholar
  23. 23.
    Liu, C., Kim, K., Gu, J., Furukawa, Y., Kautz, J.: PlaneRCNN: 3D plane detection and reconstruction from a single image. In: CVPR (2019)Google Scholar
  24. 24.
    Mirzaei, F.M., Roumeliotis, S.I.: Optimal estimation of vanishing points in a Manhattan world. In: CVPR (2011)Google Scholar
  25. 25.
    Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33715-4_54CrossRefGoogle Scholar
  26. 26.
    Olmschenk, G., Tang, H., Zhu, Z.: Pitch and roll camera orientation from a single 2D image using convolutional neural networks. In: CRV (2017)Google Scholar
  27. 27.
    Qi, X., Liao, R., Liu, Z., Urtasun, R., Jia, J.: Geonet: Geometric neural network for joint depth and surface normal estimation. In: CVPR (2018)Google Scholar
  28. 28.
    Qiu, J., et al.: DeepLiDAR: deep surface normal guided depth prediction for outdoor scene from sparse lidar data and single color image. In: CVPR (2019)Google Scholar
  29. 29.
    Saito, Y., Hachiuma, R., Yamaguchi, M., Saito, H.: In-plane rotation-aware monocular depth estimation using SLAM. In: Ohyama, W., Jung, S.K. (eds.) IW-FCV 2020. CCIS, vol. 1212, pp. 305–317. Springer, Singapore (2020).  https://doi.org/10.1007/978-981-15-4818-5_23CrossRefGoogle Scholar
  30. 30.
    Tang, J., Folkesson, J., Jensfelt, P.: Sparse2Dense: from direct sparse odometry to dense 3-D reconstruction. RA-L (2019)Google Scholar
  31. 31.
    Wang, P., Shen, X., Russell, B., Cohen, S., Price, B., Yuille, A.L.: SURGE: surface regularized geometry estimation from a single image. In: NeurIPS (2016)Google Scholar
  32. 32.
    Wang, R., Geraghty, D., Matzen, K., Szeliski, R., Frahm, J.M.: VPLNET: deep single view normal estimation with vanishing points and lines. In: NeurIPS (2020)Google Scholar
  33. 33.
    Wang, X., Fouhey, D., Gupta, A.: Designing deep networks for surface normal estimation. In: CVPR (2015)Google Scholar
  34. 34.
    Weerasekera, C.S., Latif, Y., Garg, R., Reid, I.: Dense monocular reconstruction using surface normals. In: ICRA (2017)Google Scholar
  35. 35.
    Xian, W., Li, Z., Fisher, M., Eisenmann, J., Shechtman, E., Snavely, N.: UprightNet: geometry-aware camera orientation estimation from single images. In: CVPR (2019)Google Scholar
  36. 36.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)Google Scholar
  37. 37.
    Zeng, J., et al.: Deep surface normal estimation with hierarchical RGB-D fusion. In: CVPR (2019)Google Scholar
  38. 38.
    Zhan, H., Weerasekera, C.S., Garg, R., Reid, I.: Self-supervised learning for single view depth and surface normal estimation. arXiv:1903.00112 (2019)
  39. 39.
    Zhang, Y., Funkhouser, T.: Deep depth completion of a single RGB-D image. In: CVPR (2018)Google Scholar
  40. 40.
    Zhang, Y., et al.: Physically-based rendering for indoor scene understanding using convolutional neural networks. In: CVPR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of MinnesotaMinneapolisUSA

Personalised recommendations