Advertisement

Solving the Blind Perspective-n-Point Problem End-to-End with Robust Differentiable Geometric Optimization

Conference paper
  • 911 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12347)

Abstract

Blind Perspective-n-Point (PnP) is the problem of estimating the position and orientation of a camera relative to a scene, given 2D image points and 3D scene points, without prior knowledge of the 2D–3D correspondences. Solving for pose and correspondences simultaneously is extremely challenging since the search space is very large. Fortunately it is a coupled problem: the pose can be found easily given the correspondences and vice versa. Existing approaches assume that noisy correspondences are provided, that a good pose prior is available, or that the problem size is small. We instead propose the first fully end-to-end trainable network for solving the blind PnP problem efficiently and globally, that is, without the need for pose priors. We make use of recent results in differentiating optimization problems to incorporate geometric model fitting into an end-to-end learning framework, including Sinkhorn, RANSAC and PnP algorithms. Our proposed approach significantly outperforms other methods on synthetic and real data.

Keywords

Camera pose estimation PnP Implicit differentiation 

Notes

Acknowledgements

This work was conducted by the Australian Research Council Centre of Excellence for Robotic Vision (CE140100016), funded by the Australian Government.

Supplementary material

504434_1_En_15_MOESM1_ESM.pdf (21.8 mb)
Supplementary material 1 (pdf 22328 KB)

Supplementary material 2 (mp4 6358 KB)

Supplementary material 3 (mp4 4770 KB)

References

  1. 1.
    Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, Z.: Differentiable convex optimization layers. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32 (NIPS 2019). Curran Associates, Inc., pp. 9562–9574 (2019)Google Scholar
  2. 2.
    Agrawal, A., Barratt, S., Boyd, S., Busseti, E., Moursi, W.: Differentiating through a cone program. J. Appl. Numer. Optim. 1(2), 107–115 (2019)Google Scholar
  3. 3.
    Amos, B., Kolter, J.Z.: OptNet: differentiable optimization as a layer in neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia. 70, 136–145 (2017)Google Scholar
  4. 4.
    Baka, N., Metz, C., Schultz, C.J., van Geuns, R.J., Niessen, W.J., van Walsum, T.: Oriented Gaussian mixture models for nonrigid 2D/3D coronary artery registration. IEEE Trans. Med. Imag. 33(5), 1023–1034 (2014).  https://doi.org/10.1109/TMI.2014.2300117CrossRefGoogle Scholar
  5. 5.
    Bard, J.F.: Practical Bilevel Optimization: Algorithms and Applications. Kluwer Academic Press (1998)Google Scholar
  6. 6.
    Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 14(2), 239–256 (1992)CrossRefGoogle Scholar
  7. 7.
    Brachmann, E., et al.: DSAC - differentiable RANSAC for camera localization. In: Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 2492–2500. IEEE (2017)  https://doi.org/10.1109/CVPR.2017.267
  8. 8.
    Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 4654–4662. IEEE (2018)Google Scholar
  9. 9.
    Brown, M., Windridge, D., Guillemaut, J.Y.: Globally optimal 2D–3D registration from points or lines without correspondences. In: Proceedings of the 2015 International Conference on Computer Vision, pp. 2111–2119 (2015)Google Scholar
  10. 10.
    Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Campbell, D., Petersson, L., Kneip, L., Li, H.: Globally-optimal inlier set maximisation for camera pose and correspondence estimation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 42(2), 328–342 (2020).  https://doi.org/10.1109/TPAMI.2018.2848650CrossRefGoogle Scholar
  12. 12.
    Campbell, D., Petersson, L., Kneip, L., Li, H., Gould, S.: The alignment of the spheres: globally-optimal spherical mixture alignment for camera pose estimation. In: Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 11796–11806. IEEE (2019)Google Scholar
  13. 13.
    Chen, B., Parra, A., Cao, J., Li, N., Chin, T.J.: End-to-end learnable geometric vision by backpropagating PnP optimization. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 8100–8109. IEEE (2020)Google Scholar
  14. 14.
    Cherian, A., Fernando, B., Harandi, M., Gould, S.: Generalized rank pooling for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 3222–3231. IEEE (2017)Google Scholar
  15. 15.
    Cuturi, M.: Sinkhorn distances: Light speed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NeurIPS). Curran Associates Inc., pp. 2292–2300 (2013)Google Scholar
  16. 16.
    Dang, Z., Yi, K.M., Hu, Y., Wang, F., Fua, P., Salzmann, M.: Eigendecomposition-free training of deep networks with zero eigenvalue-based losses. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 792–807. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01228-1_47CrossRefGoogle Scholar
  17. 17.
    David, P., Dementhon, D., Duraiswami, R., Samet, H.: SoftPOSIT: simultaneous pose and correspondence determination. Int. J. Comput. Vis. (IJCV) 59(3), 259–284 (2004)CrossRefGoogle Scholar
  18. 18.
    Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. SSORFE. Springer, New York (2014).  https://doi.org/10.1007/978-1-4939-1037-3CrossRefzbMATHGoogle Scholar
  19. 19.
    Enqvist, O., Kahl, F.: Robust optimal pose estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 141–153. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88682-2_12CrossRefGoogle Scholar
  20. 20.
    Fathy, M.E., Tran, Q.-H., Zia, M.Z., Vernaza, P., Chandraker, M.: Hierarchical metric learning and matching for 2D and 3D geometric correspondences. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 832–850. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01267-0_49CrossRefGoogle Scholar
  21. 21.
    Fernando, B., Gould, S.: Learning end-to-end video classification with rank-pooling. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proc. of the International Conference on Machine Learning (ICML). PMLR, pp. 1187–1196 (2016)Google Scholar
  22. 22.
    Fernando, B., Gould, S.: Discriminatively learned hierarchical rank pooling networks. Int. J. Comput. Vis. (IJCV) 124, 335–355 (2017)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 930–943 (2003)CrossRefGoogle Scholar
  25. 25.
    Gould, S., Hartley, R., Campbell, D.: Deep declarative networks: a new hope. Tech. rep., Australian National University (arXiv:1909.04866) (2019)
  26. 26.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 770–778. IEEE (2016)Google Scholar
  27. 27.
    Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 6555–6564. IEEE (2017)  https://doi.org/10.1109/CVPR.2017.694
  28. 28.
    Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the 2015 International Conference on Computer Vision, Computer Society, pp. 2938–2946. IEEE (2015)  https://doi.org/10.1109/ICCV.2015.336
  29. 29.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) Proceedings of the International Conference on Learning Representations (ICLR) (2015)Google Scholar
  30. 30.
    Kneip, L., Scaramuzza, D., Siegwart, R.: A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In: Proceedings of the 2011 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 2969–2976. IEEE (2011)Google Scholar
  31. 31.
    Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 10657–10665. IEEE (2019)Google Scholar
  32. 32.
    Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155–166 (2009)CrossRefGoogle Scholar
  33. 33.
    Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 2041–2050. IEEE (2018)Google Scholar
  34. 34.
    Liu, L., Campbell, D., Li, H., Zhou, D., Song, X., Yang, R.: Learning 2D–3D correspondences to solve the blind perspective-n-point problem. Tech. rep., Australian National University arXiv:2003.06752 (2019)
  35. 35.
    Moreno-Noguer, F., Lepetit, V., Fua, P.: Pose priors for simultaneously solving alignment and correspondence. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 405–418. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88688-4_30CrossRefGoogle Scholar
  36. 36.
    Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, Honolulu, USA, pp. 652–660. IEEE (2017)Google Scholar
  37. 37.
    Santa Cruz, R., Fernando, B., Cherian, A., Gould, S.: Visual permutation learning. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 41(12), 3100–3114 (2019)CrossRefGoogle Scholar
  38. 38.
    Sattler, T., Leibe, B., Kobbelt, L.: Efficient effective prioritized matching for large-scale image-based localization. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1744–1756 (2017).  https://doi.org/10.1109/TPAMI.2016.2611662CrossRefGoogle Scholar
  39. 39.
    Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 3302–3312. IEEE (2019)Google Scholar
  40. 40.
    Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 6896–6906. IEEE (2018)Google Scholar
  41. 41.
    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 4104–4113. IEEE (2016)Google Scholar
  42. 42.
    Sinkhorn, R.: Diagonal equivalence to matrices with prescribed row and column sums. Am. Math. Mon. 74(4), 402–405 (1967)MathSciNetCrossRefGoogle Scholar
  43. 43.
    von Stackelberg, H., Bazin, D., Urch, L., Hill, R.R.: Market Structure and Equilibrium. Springer (2011)  https://doi.org/10.1007/978-3-642-12586-7
  44. 44.
    Svärm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1455–1461 (2016)CrossRefGoogle Scholar
  45. 45.
    Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: Proceedings of the International Conference on Computer Vision (ICCV), Computer Society, pp. 627–637. IEEE (2017)Google Scholar
  46. 46.
    Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 1912–1920. IEEE (2015)Google Scholar
  47. 47.
    Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 2666–2674. IEEE (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Australian National University, Australian Centre for Robotic VisionCanberraAustralia

Personalised recommendations