Advertisement

Relative Pose from Deep Learned Depth and a Single Affine Correspondence

Conference paper
  • 924 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12357)

Abstract

We propose a new approach for combining deep-learned non-metric monocular depth with affine correspondences (ACs) to estimate the relative pose of two calibrated cameras from a single correspondence. Considering the depth information and affine features, two new constraints on the camera pose are derived. The proposed solver is usable within 1-point RANSAC approaches. Thus, the processing time of the robust estimation is linear in the number of correspondences and, therefore, orders of magnitude faster than by using traditional approaches. The proposed 1AC+D (Source code: Open image in new window  https://github.com/eivan/one-ac-pose) solver is tested both on synthetic data and on 110395 publicly available real image pairs where we used an off-the-shelf monocular depth network to provide up-to-scale depth per pixel. The proposed 1AC+D leads to similar accuracy as traditional approaches while being significantly faster. When solving large-scale problems, e.g. pose-graph initialization for Structure-from-Motion (SfM) pipelines, the overhead of obtaining ACs and monocular depth is negligible compared to the speed-up gained in the pairwise geometric verification, i.e., relative pose estimation. This is demonstrated on scenes from the 1DSfM dataset using a state-of-the-art global SfM algorithm.

Keywords

Pose estimation Minimal solver Depth prediction Affine correspondences Global structure from motion Pose graph initialization 

Notes

Acknowledgement

Supported by Exploring the Mathematical Foundations of Artificial Intelligence (2018-1.2.1-NKP-00008), -Intensification of the activities of HU-MATHS-IN–Hungarian Service Network of Mathematics for Industry and Innovation’ under grant number EFOP-3.6.2-16-2017-00015, the Ministry of Education OP VVV project CZ.02.1.01/0.0/0.0/16 019/0000765 Research Center for Informatics, and the Czech Science Foundation grant GA18-05360S.

Supplementary material

504453_1_En_37_MOESM1_ESM.pdf (57 kb)
Supplementary material 1 (pdf 56 KB)

References

  1. 1.
    Albl, C., Kukelova, Z., Fitzgibbon, A., Heller, J., Smid, M., Pajdla, T.: On the two-view geometry of unsynchronized cameras. In: Computer Vision and Pattern Recognition, July 2017Google Scholar
  2. 2.
    Barath, D., Matas, J.: Graph-cut RANSAC. In: Computer Vision and Pattern Recognition, pp. 6733–6741 (2018)Google Scholar
  3. 3.
    Baráth, D., Tóth, T., Hajder, L.: A minimal solution for two-view focal-length estimation using two affine correspondences. In: Computer Vision and Pattern Recognition (2017)Google Scholar
  4. 4.
    Barath, D., Eichhardt, I., Hajder, L.: Optimal multi-view surface normal estimation using affine correspondences. IEEE Trans. Image Process. 28(7), 3301–3311 (2019)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Barath, D., Hajder, L.: A theory of point-wise homography estimation. Pattern Recogn. Lett. 94, 7–14 (2017)CrossRefGoogle Scholar
  6. 6.
    Barath, D., Hajder, L.: Efficient recovery of essential matrix from two affine correspondences. IEEE Trans. Image Process. 27(11), 5328–5337 (2018)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Barath, D., Matas, J., Noskova, J.: MAGSAC: marginalizing sample consensus. In: Computer Vision and Pattern Recognition, pp. 10197–10205 (2019)Google Scholar
  8. 8.
    Batra, D., Nabbe, B., Hebert, M.: An alternative formulation for five point relative pose problem. In: Workshop on Motion and Video Computing, p. 21. IEEE (2007)Google Scholar
  9. 9.
    Baumberg, A.: Reliable feature matching across widely separated views. In: Computer Vision and Pattern Recognition, vol. 1, pp. 774–781. IEEE (2000)Google Scholar
  10. 10.
    Bentolila, J., Francos, J.M.: Conic epipolar constraints from affine correspondences. Comput. Vis. Image Underst. 122, 105–114 (2014)CrossRefGoogle Scholar
  11. 11.
    Chatterjee, A., Madhav Govindu, V.: Efficient and robust large-scale rotation averaging. In: Proceedings of International Conference on Computer Vision, pp. 521–528 (2013)Google Scholar
  12. 12.
    Chum, O., Matas, J.: Matching with PROSAC-progressive sample consensus. In: Computer Vision and Pattern Recognition. IEEE (2005)Google Scholar
  13. 13.
    Eichhardt, I., Barath, D.: Optimal multi-view correction of local affine frames. In: British Machine Vision Conf. (September 2019)Google Scholar
  14. 14.
    Eichhardt, I., Chetverikov, D.: Affine correspondences between central cameras for rapid relative pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 488–503. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01231-1_30CrossRefGoogle Scholar
  15. 15.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Frahm, J.M., Pollefeys, M.: RANSAC for (quasi-) degenerate data (QDEGSAC). In: Computer Vision and Pattern Recognition, pp. 453–460. IEEE (2006)Google Scholar
  17. 17.
    Fraundorfer, F., Tanskanen, P., Pollefeys, M.: A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 269–282. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15561-1_20CrossRefGoogle Scholar
  18. 18.
    Guan, B., Zhao, J., Li, Z., Sun, F., Fraundorfer, F.: Minimal solutions for relative pose with a single affine correspondence. In: Computer Vision and Pattern Recognition (2020)Google Scholar
  19. 19.
    Hajder, L., Baráth, D.: Relative planar motion for vehicle-mounted cameras from a single affine correspondence. In: Proceedings of International Conference of Robotics and Automation (2020)Google Scholar
  20. 20.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  21. 21.
    Hartley, R., Li, H.: An efficient hidden variable approach to minimal-case camera motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 34(12), 2303–2314 (2012)CrossRefGoogle Scholar
  22. 22.
    Hesch, J.A., Roumeliotis, S.I.: A direct least-squares (DLS) method for PnP. In: Proceedings of International Conference on Computer Vision, pp. 383–390. IEEE (2011)Google Scholar
  23. 23.
    Hodaň, T., et al.: BOP: benchmark for 6D object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 19–35. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_2CrossRefGoogle Scholar
  24. 24.
    Köser, K.: Geometric estimation with local affine frames and free-form surfaces. Shaker (2009)Google Scholar
  25. 25.
    Kukelova, Z., Bujnak, M., Pajdla, T.: Polynomial eigenvalue solutions to minimal problems in computer vision. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1381–1393 (2011)CrossRefGoogle Scholar
  26. 26.
    Larsson, V., Kukelova, Z., Zheng, Y.: Camera pose estimation with unknown principal point. In: Computer Vision and Pattern Recognition, June 2018Google Scholar
  27. 27.
    Larsson, V., Sattler, T., Kukelova, Z., Pollefeys, M.: Revisiting radial distortion absolute pose. In: Proceedings of International Conference on Computer Vision, October 2019Google Scholar
  28. 28.
    Li, B., Heng, L., Lee, G.H., Pollefeys, M.: A 4-point algorithm for relative pose estimation of a calibrated camera with a known relative rotation angle. In: International Conference on Intelligent Robots and Systems, pp. 1595–1601. IEEE (2013)Google Scholar
  29. 29.
    Li, H., Hartley, R.: Five-point motion estimation made easy. In: Proceedings of International Conference on Pattern Recognition, vol. 1, pp. 630–633. IEEE (2006)Google Scholar
  30. 30.
    Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: Computer Vision and Pattern Recognition, June 2018Google Scholar
  31. 31.
    Li, Z., Wang, G., Ji, X.: CDPN: coordinates-based disentangled pose network for real-time RGB-based 6-DoF object pose estimation. In: Proceedings of International Conference on Computer Vision, pp. 7678–7687 (2019)Google Scholar
  32. 32.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of International Conference on Computer Vision. IEEE (1999)Google Scholar
  33. 33.
    Mikolajczyk, K., Schmid, C.: Comparison of affine-invariant local detectors and descriptors. In: European Signal Processing Conference, pp. 1729–1732. IEEE (2004)Google Scholar
  34. 34.
    Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)CrossRefGoogle Scholar
  35. 35.
    Mur-Artal, R., Tardós, J.D.: ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017)CrossRefGoogle Scholar
  36. 36.
    Nakano, G.: A versatile approach for solving PnP, PnPf, and PnPfr problems. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 338–352. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_21CrossRefGoogle Scholar
  37. 37.
    Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26, 756–770 (2004)CrossRefGoogle Scholar
  38. 38.
    Perdoch, M., Matas, J., Chum, O.: Epipolar geometry from two correspondences. In: Proceedings of International Conference on Computer Vision (2006)Google Scholar
  39. 39.
    Pritts, J., Kukelova, Z., Larsson, V., Lochman, Y., Chum, O.: Minimal solvers for rectifying from radially-distorted conjugate translations. arXiv preprint arXiv:1911.01507 (2019)
  40. 40.
    Raguram, R., Chum, O., Pollefeys, M., Matas, J., Frahm, J.M.: USAC: a universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2022–2038 (2013)CrossRefGoogle Scholar
  41. 41.
    Raposo, C., Barreto, J.P.: Theory and practice of structure-from-motion using affine correspondences. In: Computer Vision and Pattern Recognition, pp. 5470–5478 (2016)Google Scholar
  42. 42.
    Saurer, O., Pollefeys, M., Lee, G.H.: A minimal solution to the rolling shutter pose estimation problem. In: International Conference on Intelligent Robots and Systems, pp. 1328–1334. IEEE (2015)Google Scholar
  43. 43.
    Scaramuzza, D.: 1-point-RANSAC structure from motion for vehicle-mounted cameras by exploiting non-holonomic constraints. Int. J. Comput. Vis. 95(1), 74–85 (2011)CrossRefGoogle Scholar
  44. 44.
    Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)Google Scholar
  45. 45.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. In: ACM Transmission Graphics, vol. 25, pp. 835–846. ACM (2006)Google Scholar
  46. 46.
    Snavely, S., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comput. Vis. 80(2), 189–210 (2008)CrossRefGoogle Scholar
  47. 47.
    Solomon, J.: Numerical Algorithms: Methods for Computer Vision, Machine Learning, and Graphics. AK Peters/CRC Press, London (2015)CrossRefGoogle Scholar
  48. 48.
    Stewenius, H., Engels, C., Nistér, D.: Recent developments on direct relative orientation. J. Photogrammetry Remote Sens. 60(4), 284–294 (2006)CrossRefGoogle Scholar
  49. 49.
    Stewénius, H., Nistér, D., Kahl, F., Schaffalitzky, F.: A minimal solution for relative pose with unknown focal length. In: Computer Vision and Pattern Recognition, vol. 2, pp. 789–794. IEEE (2005)Google Scholar
  50. 50.
    Sweeney, C.: Theia multiview geometry library. http://theia-sfm.org
  51. 51.
    Sweeney, C., Sattler, T., Hollerer, T., Turk, M., Pollefeys, M.: Optimizing the viewing graph for structure-from-motion. In: Proceedings of International Conference on Computer Vision, pp. 801–809 (2015)Google Scholar
  52. 52.
    Torr, P.H.S., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78, 138–156 (2000)CrossRefGoogle Scholar
  53. 53.
    Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 376–380 (1991)CrossRefGoogle Scholar
  54. 54.
    Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008). http://www.vlfeat.org/
  55. 55.
    Ventura, J., Arth, C., Reitmayr, G., Schmalstieg, D.: A minimal solution to the generalized pose-and-scale problem. In: Computer Vision and Pattern Recognition, pp. 422–429 (2014)Google Scholar
  56. 56.
    Wilson, K., Snavely, N.: Robust global translations with 1DSfM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 61–75. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10578-9_5CrossRefGoogle Scholar
  57. 57.
    Wu, C.: Towards linear-time incremental structure from motion. In: International Conference on 3D Vision, pp. 127–134. IEEE (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Machine Perception Research Laboratory, SZTAKIBudapestHungary
  2. 2.VRG, Faculty of Electrical EngineeringCzech Technical University in PraguePragueCzechia
  3. 3.Faculty of InformaticsUniversity of DebrecenDebrecenHungary

Personalised recommendations