Single View Metrology in the Wild

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12356)


Most 3D reconstruction methods may only recover scene properties up to a global scale ambiguity. We present a novel approach to single view metrology that can recover the absolute scale of a scene represented by 3D heights of objects or camera height above the ground as well as camera parameters of orientation and field of view, using just a monocular image acquired in unconstrained condition. Our method relies on data-driven priors learned by a deep network specifically designed to imbibe weakly supervised constraints from the interplay of the unknown camera with 3D entities such as object heights, through estimation of bounding box projections. We leverage categorical priors for objects such as humans or cars that commonly occur in natural images, as references for scale estimation. We demonstrate state-of-the-art qualitative and quantitative results on several datasets as well as applications including virtual object insertion. Furthermore, the perceptual quality of our outputs is validated by a user study.


Single view metrology Absolute scale estimation Camera calibration Virtual object insertion 

Supplementary material

504452_1_En_19_MOESM1_ESM.pdf (69 mb)
Supplementary material 1 (pdf 70663 KB)


  1. 1.
    Andaló, F.A., Taubin, G., Goldenstein, S.: Efficient height measurements in single images based on the detection of vanishing points. Comput. Vis. Image Underst. 138, 51–60 (2015)CrossRefGoogle Scholar
  2. 2.
    Atapour-Abarghouei, A., Breckon, T.P.: Real-time monocular depth estimation using synthetic data with domain adaptation via image style transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2810 (2018)Google Scholar
  3. 3.
    Barinova, O., Lempitsky, V., Tretiak, E., Kohli, P.: Geometric image parsing in man-made environments. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision – ECCV 2010. Lecture Notes in Computer Science, vol. 6312, pp. 57–70. Springer, Berlin, Heidelberg (2010). Scholar
  4. 4.
    Chen, Q., Wu, H., Wada, T.: Camera calibration with two arbitrary coplanar circles. In: Pajdla, T., Matas, J. (eds.) Computer Vision – ECCV 2004. Lecture Notes in Computer Science, vol. 3023, pp. 521–532. Springer, Berlin, Heidelberg (2004). Scholar
  5. 5.
    Chen, W., Fu, Z., Yang, D., Deng, J.: Single-image depth perception in the wild. In: Advances in Neural Information Processing Systems, pp. 730–738 (2016)Google Scholar
  6. 6.
    Criminisi, A., Reid, I., Zisserman, A.: Single view metrology. Int. J. Comput. Vis. 40(2), 123–148 (2000)CrossRefGoogle Scholar
  7. 7.
    Denis, P., Elder, J.H., Estrada, F.J.: Efficient edge-based methods for estimating Manhattan frames in urban imagery. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer Vision – ECCV 2008. Lecture Notes in Computer Science, vol. 5303, pp. 197–210. Springer, Berlin, Heidelberg (2008). Scholar
  8. 8.
    Deutscher, J., Isard, M., MacCormick, J.: Automatic camera calibration from a single Manhattan image. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) Computer Vision – ECCV 2002. Lecture Notes in Computer Science, vol. 2353, pp. 175–188. Springer, Berlin, Heidelberg (2002). Scholar
  9. 9.
    Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, vol. 2, pp. 2366–2374 (2014)Google Scholar
  10. 10.
    Garg, R., Wadhwa, N., Ansari, S., Barron, J.T.: Learning single camera depth estimation using dual-pixels. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7628–7637 (2019)Google Scholar
  11. 11.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013)CrossRefGoogle Scholar
  12. 12.
    Gunel, S., Rhodin, H., Fua, P.: What face and body shapes can tell us about height. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0–0 (2019)Google Scholar
  13. 13.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision (2003)Google Scholar
  14. 14.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)Google Scholar
  15. 15.
    Hoiem, D., Efros, A.A., Hebert, M.: Putting objects in perspective. Int. J. Comput. Vis. 80(1), 3–15 (2008)CrossRefGoogle Scholar
  16. 16.
    Hold-Geoffroy, Y., et al.: A perceptual measure for deep single image camera calibration. In: CVPR, pp. 2354–2363 (2018)Google Scholar
  17. 17.
    Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Amodal completion and size constancy in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 127–135 (2015)Google Scholar
  18. 18.
    Kim, W., Ramanagopal, M.S., Barto, C., Yu, M.Y., Rosaen, K., Goumas, N., Vasudevan, R., Johnson-Roberson, M.: PedX: benchmark dataset for metric 3-D pose estimation of pedestrians in complex urban intersections. IEEE Robot. Autom. Lett. 4(2), 1940–1947 (2019)CrossRefGoogle Scholar
  19. 19.
    Kluger, F., Ackermann, H., Yang, M.Y., Rosenhahn, B.: Temporally consistent horizon lines. In: 2020 International Conference on Robotics and Automation (ICRA) (2020)Google Scholar
  20. 20.
    Lalonde, J.F., Hoiem, D., Efros, A.A., Rother, C., Winn, J., Criminisi, A.: Photo clip art. ACM Trans. Graph. (TOG) 26(3), 3 (2007)CrossRefGoogle Scholar
  21. 21.
    Lee, H., Shechtman, E., Wang, J., Lee, S.: Automatic upright adjustment of photographs with robust camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 833–844 (2013)CrossRefGoogle Scholar
  22. 22.
    Li, Z., et al.: Learning the depths of moving people by watching frozen people. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4521–4530 (2019)Google Scholar
  23. 23.
    Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: CVPR (2018)Google Scholar
  24. 24.
    Lin, T.Y., et al.: Microsoft coco: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. Lecture Notes in Computer Science, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar
  25. 25.
    Man, Y., Weng, X., Li, X., Kitani, K.: GroundNet: monocular ground plane estimation with geometric consistency. In: Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  26. 26.
    Martinez II, M.A.: Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars. Ph.D. thesis, Princeton University (2018)Google Scholar
  27. 27.
    Massa, F., Girshick, R.: Maskrcnn-benchmark: fast, modular reference implementation of instance segmentation and object detection algorithms in PyTorch (2018). Accessed 16 Oct 2019
  28. 28.
    Murphy, K.P., Torralba, A., Freeman, W.T.: Graphical model for recognizing scenes and objects. In: NIPS, pp. 1499–1506 (2003)Google Scholar
  29. 29.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)Google Scholar
  30. 30.
    Ranftl, R., Koltun, V.: Deep fundamental matrix estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science, vol. 11205, pp. 292–309. Springer, Cham (2018). Scholar
  31. 31.
    Wang, C., Miguel Buenaposada, J., Zhu, R., Lucey, S.: Learning depth from monocular videos using direct methods. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2022–2030 (2018)Google Scholar
  32. 32.
    Wang, L., et al.: DeepLens: shallow depth of field from a single image. arXiv preprint arXiv:1810.08100 (2018)
  33. 33.
    Workman, S., Greenwell, C., Zhai, M., Baltenberger, R., Jacobs, N.: DEEPFOCAL: a method for direct focal length estimation. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1369–1373. IEEE (2015)Google Scholar
  34. 34.
    Workman, S., Zhai, M., Jacobs, N.: Horizon lines in the wild. arXiv preprint arXiv:1604.02129 (2016)
  35. 35.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82. IEEE (2014)Google Scholar
  36. 36.
    Xiao, J., Ehinger, K.A., Oliva, A., Torralba, A.: Recognizing scene viewpoint using panoramic place representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2695–2702. IEEE (2012)Google Scholar
  37. 37.
    Zhai, M., Workman, S., Jacobs, N.: Detecting vanishing points using global image context in a non-Manhattan world. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5657–5665 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of California San DiegoLa JollaUSA
  2. 2.Adobe ResearchSan JoseUSA

Personalised recommendations