Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization

  • Xiaotian LiEmail author
  • Juha Ylioinas
  • Jakob Verbeek
  • Juho Kannala
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11131)


Image-based camera relocalization is an important problem in computer vision and robotics. Recent works utilize convolutional neural networks (CNNs) to regress for pixels in a query image their corresponding 3D world coordinates in the scene. The final pose is then solved via a RANSAC-based optimization scheme using the predicted coordinates. Usually, the CNN is trained with ground truth scene coordinates, but it has also been shown that the network can discover 3D scene geometry automatically by minimizing single-view reprojection loss. However, due to the deficiencies of the reprojection loss, the network needs to be carefully initialized. In this paper, we present a new angle-based reprojection loss, which resolves the issues of the original reprojection loss. With this new loss function, the network can be trained without careful initialization, and the system achieves more accurate results. The new loss also enables us to utilize available multi-view constraints, which further improve performance.


Camera relocalization Scene coordinate regression Deep neural networks 



Authors acknowledge funding from the Academy of Finland (grant numbers 277685, 309902). This work has also been partially supported by the grant “Deep in France” (ANR16-CE23-0006) and LabEx PERSYVAL (ANR-11-LABX0025-01).


  1. 1.
    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). Scholar
  2. 2.
    Brachmann, E., et al.: DSAC - differentiable RANSAC for camera localization. In: CVPR (2017)Google Scholar
  3. 3.
    Brachmann, E., Michel, F., Krull, A., Yang, M.Y., Gumhold, S., Rother, C.: Uncertainty-driven 6D pose estimation of objects and scenes from a single RGB image. In: CVPR (2016)Google Scholar
  4. 4.
    Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: CVPR (2018)Google Scholar
  5. 5.
    Castle, R.O., Klein, G., Murray, D.W.: Video-rate localization in multiple maps for wearable augmented reality. In: ISWC (2008)Google Scholar
  6. 6.
    Cavallari, T., Golodetz, S., Lord, N.A., Valentin, J.P.C., di Stefano, L., Torr, P.H.S.: On-the-fly adaptation of regression forests for online camera relocalisation. In: CVPR (2017)Google Scholar
  7. 7.
    Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image Analysis. Springer, London (2013). Scholar
  8. 8.
    Cummins, M., Newman, P.: FAB-MAP: probabilistic localization and mapping in the space of appearance. Int. J. Rob. Res. 27(6), 647–665 (2008)CrossRefGoogle Scholar
  9. 9.
    Eade, E., Drummond, T.: Scalable monocular SLAM. In: CVPR (2006)Google Scholar
  10. 10.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Garg, R., Vijay Kumar, B.G., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). Scholar
  12. 12.
    Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)Google Scholar
  13. 13.
    Guzmán-Rivera, A., et al.: Multi-output learning for camera relocalization. In: CVPR (2014)Google Scholar
  14. 14.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  15. 15.
    Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR (2009)Google Scholar
  16. 16.
    Izadi, S., et al.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: UIST (2011)Google Scholar
  17. 17.
    Kendall, A., Cipolla, R.: Modelling uncertainty in deep learning for camera relocalization. In: ICRA (2016)Google Scholar
  18. 18.
    Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: CVPR (2017)Google Scholar
  19. 19.
    Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: ICCV (2015)Google Scholar
  20. 20.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  21. 21.
    Kneip, L., Scaramuzza, D., Siegwart, R.: A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In: CVPR (2011)Google Scholar
  22. 22.
    Laskar, Z., Melekhov, I., Kalia, S., Kannala, J.: Camera relocalization by computing pairwise relative poses using convolutional neural network. In: ICCVW (2017)Google Scholar
  23. 23.
    Li, X., Ylioinas, J., Kannala, J.: Full-frame scene coordinate regression for image-based localization. In: RSS (2018)Google Scholar
  24. 24.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Lynen, S., Sattler, T., Bosse, M., Hesch, J.A., Pollefeys, M., Siegwart, R.: Get out of my lab: large-scale, real-time visual-inertial localization. In: RSS (2015)Google Scholar
  26. 26.
    Massiceti, D., Krull, A., Brachmann, E., Rother, C., Torr, P.H.S.: Random forests versus neural networks - what’s best for camera relocalization? In: ICRA (2017)Google Scholar
  27. 27.
    Melekhov, I., Ylioinas, J., Kannala, J., Rahtu, E.: Image-based localization using hourglass networks. In: ICCVW (2017)Google Scholar
  28. 28.
    Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: ISMAR (2011)Google Scholar
  29. 29.
    Radwan, N., Valada, A., Burgard, W.: VlocNet++: deep multitask learning for semantic visual localization and odometry. CoRR abs/1804.08366 (2018)Google Scholar
  30. 30.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.R.: ORB: an efficient alternative to SIFT or SURF. In: ICCV (2011)Google Scholar
  31. 31.
    Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: ICCV (2011)Google Scholar
  32. 32.
    Sattler, T., Leibe, B., Kobbelt, L.: Improving image-based localization by active correspondence search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012). Scholar
  33. 33.
    Sattler, T., Leibe, B., Kobbelt, L.: Efficient effective prioritized matching for large-scale image-based localization. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1744–1756 (2017)CrossRefGoogle Scholar
  34. 34.
    Sattler, T., et al.: Are large-scale 3D models really necessary for accurate visual localization? In: CVPR (2017)Google Scholar
  35. 35.
    Schönberger, J.L., Frahm, J.: Structure-from-motion revisited. In: CVPR (2016)Google Scholar
  36. 36.
    Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: CVPR (2018)Google Scholar
  37. 37.
    Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., Fitzgibbon, A.W.: Scene coordinate regression forests for camera relocalization in RGB-D images. In: CVPR (2013)Google Scholar
  38. 38.
    Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. 80(2), 189–210 (2008)CrossRefGoogle Scholar
  39. 39.
    Taira, H., et al.: InLoc: indoor visual localization with dense matching and view synthesis. In: CVPR (2018)Google Scholar
  40. 40.
    Valada, A., Radwan, N., Burgard, W.: Deep auxiliary learning for visual localization and odometry. In: ICRA (2018)Google Scholar
  41. 41.
    Valentin, J.P.C., Nießner, M., Shotton, J., Fitzgibbon, A.W., Izadi, S., Torr, P.H.S.: Exploiting uncertainty in regression forests for accurate camera relocalization. In: CVPR (2015)Google Scholar
  42. 42.
    Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: ICCV (2017)Google Scholar
  43. 43.
    Wu, C.: Towards linear-time incremental structure from motion. In: 3DV (2013)Google Scholar
  44. 44.
    Yu, J.J., Harley, A.W., Derpanis, K.G.: Back to basics: unsupervised learning of optical flow via brightness constancy and motion smoothness. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 3–10. Springer, Cham (2016). Scholar
  45. 45.
    Zeisl, B., Sattler, T., Pollefeys, M.: Camera pose voting for large-scale image-based localization. In: ICCV (2015)Google Scholar
  46. 46.
    Zhang, W., Kosecka, J.: Image based localization in urban environments. In: 3DPVT (2006)Google Scholar
  47. 47.
    Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: CVPR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Xiaotian Li
    • 1
    Email author
  • Juha Ylioinas
    • 1
  • Jakob Verbeek
    • 2
  • Juho Kannala
    • 1
  1. 1.Aalto UniversityEspooFinland
  2. 2.Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJKGrenobleFrance

Personalised recommendations