Solving the Blind Perspective-n-Point Problem End-to-End with Robust Differentiable Geometric Optimization

Campbell, Dylan; Liu, Liu; Gould, Stephen

doi:10.1007/978-3-030-58536-5_15

Dylan Campbell¹²,
Liu Liu¹² &
Stephen Gould¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12347))

Included in the following conference series:

European Conference on Computer Vision

6014 Accesses
25 Citations

Abstract

Blind Perspective-n-Point (PnP) is the problem of estimating the position and orientation of a camera relative to a scene, given 2D image points and 3D scene points, without prior knowledge of the 2D–3D correspondences. Solving for pose and correspondences simultaneously is extremely challenging since the search space is very large. Fortunately it is a coupled problem: the pose can be found easily given the correspondences and vice versa. Existing approaches assume that noisy correspondences are provided, that a good pose prior is available, or that the problem size is small. We instead propose the first fully end-to-end trainable network for solving the blind PnP problem efficiently and globally, that is, without the need for pose priors. We make use of recent results in differentiating optimization problems to incorporate geometric model fitting into an end-to-end learning framework, including Sinkhorn, RANSAC and PnP algorithms. Our proposed approach significantly outperforms other methods on synthetic and real data.

D. Campbell and L. Liu—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The probability of choosing a minimal set of 4 true 2D–3D correspondences from the size mn set of all correspondences without replacement is \(\prod _{i=0}^{3} \frac{m-i}{(m-i)(n-i)} \approx 10^{-12}\) for \(m=n=1000\) and no outliers. The number of RANSAC iterations required to achieve 90% confidence is thus \(\log (1 - 0.9) / \log (1 - 10^{-12}) \approx 2.3\times 10^{12}\).

References

Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, Z.: Differentiable convex optimization layers. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32 (NIPS 2019). Curran Associates, Inc., pp. 9562–9574 (2019)
Google Scholar
Agrawal, A., Barratt, S., Boyd, S., Busseti, E., Moursi, W.: Differentiating through a cone program. J. Appl. Numer. Optim. 1(2), 107–115 (2019)
Google Scholar
Amos, B., Kolter, J.Z.: OptNet: differentiable optimization as a layer in neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia. 70, 136–145 (2017)
Google Scholar
Baka, N., Metz, C., Schultz, C.J., van Geuns, R.J., Niessen, W.J., van Walsum, T.: Oriented Gaussian mixture models for nonrigid 2D/3D coronary artery registration. IEEE Trans. Med. Imag. 33(5), 1023–1034 (2014). https://doi.org/10.1109/TMI.2014.2300117
Article Google Scholar
Bard, J.F.: Practical Bilevel Optimization: Algorithms and Applications. Kluwer Academic Press (1998)
Google Scholar
Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 14(2), 239–256 (1992)
Article Google Scholar
Brachmann, E., et al.: DSAC - differentiable RANSAC for camera localization. In: Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 2492–2500. IEEE (2017) https://doi.org/10.1109/CVPR.2017.267
Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 4654–4662. IEEE (2018)
Google Scholar
Brown, M., Windridge, D., Guillemaut, J.Y.: Globally optimal 2D–3D registration from points or lines without correspondences. In: Proceedings of the 2015 International Conference on Computer Vision, pp. 2111–2119 (2015)
Google Scholar
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)
Article MathSciNet Google Scholar
Campbell, D., Petersson, L., Kneip, L., Li, H.: Globally-optimal inlier set maximisation for camera pose and correspondence estimation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 42(2), 328–342 (2020). https://doi.org/10.1109/TPAMI.2018.2848650
Article Google Scholar
Campbell, D., Petersson, L., Kneip, L., Li, H., Gould, S.: The alignment of the spheres: globally-optimal spherical mixture alignment for camera pose estimation. In: Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 11796–11806. IEEE (2019)
Google Scholar
Chen, B., Parra, A., Cao, J., Li, N., Chin, T.J.: End-to-end learnable geometric vision by backpropagating PnP optimization. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 8100–8109. IEEE (2020)
Google Scholar
Cherian, A., Fernando, B., Harandi, M., Gould, S.: Generalized rank pooling for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 3222–3231. IEEE (2017)
Google Scholar
Cuturi, M.: Sinkhorn distances: Light speed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NeurIPS). Curran Associates Inc., pp. 2292–2300 (2013)
Google Scholar
Dang, Z., Yi, K.M., Hu, Y., Wang, F., Fua, P., Salzmann, M.: Eigendecomposition-free training of deep networks with zero eigenvalue-based losses. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 792–807. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_47
Chapter Google Scholar
David, P., Dementhon, D., Duraiswami, R., Samet, H.: SoftPOSIT: simultaneous pose and correspondence determination. Int. J. Comput. Vis. (IJCV) 59(3), 259–284 (2004)
Article Google Scholar
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. SSORFE. Springer, New York (2014). https://doi.org/10.1007/978-1-4939-1037-3
Book MATH Google Scholar
Enqvist, O., Kahl, F.: Robust optimal pose estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 141–153. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_12
Chapter Google Scholar
Fathy, M.E., Tran, Q.-H., Zia, M.Z., Vernaza, P., Chandraker, M.: Hierarchical metric learning and matching for 2D and 3D geometric correspondences. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 832–850. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_49
Chapter Google Scholar
Fernando, B., Gould, S.: Learning end-to-end video classification with rank-pooling. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proc. of the International Conference on Machine Learning (ICML). PMLR, pp. 1187–1196 (2016)
Google Scholar
Fernando, B., Gould, S.: Discriminatively learned hierarchical rank pooling networks. Int. J. Comput. Vis. (IJCV) 124, 335–355 (2017)
Article MathSciNet Google Scholar
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar
Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 930–943 (2003)
Article Google Scholar
Gould, S., Hartley, R., Campbell, D.: Deep declarative networks: a new hope. Tech. rep., Australian National University (arXiv:1909.04866) (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 770–778. IEEE (2016)
Google Scholar
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 6555–6564. IEEE (2017) https://doi.org/10.1109/CVPR.2017.694
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the 2015 International Conference on Computer Vision, Computer Society, pp. 2938–2946. IEEE (2015) https://doi.org/10.1109/ICCV.2015.336
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Kneip, L., Scaramuzza, D., Siegwart, R.: A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In: Proceedings of the 2011 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 2969–2976. IEEE (2011)
Google Scholar
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 10657–10665. IEEE (2019)
Google Scholar
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155–166 (2009)
Article Google Scholar
Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 2041–2050. IEEE (2018)
Google Scholar
Liu, L., Campbell, D., Li, H., Zhou, D., Song, X., Yang, R.: Learning 2D–3D correspondences to solve the blind perspective-n-point problem. Tech. rep., Australian National University arXiv:2003.06752 (2019)
Moreno-Noguer, F., Lepetit, V., Fua, P.: Pose priors for simultaneously solving alignment and correspondence. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 405–418. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_30
Chapter Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, Honolulu, USA, pp. 652–660. IEEE (2017)
Google Scholar
Santa Cruz, R., Fernando, B., Cherian, A., Gould, S.: Visual permutation learning. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 41(12), 3100–3114 (2019)
Article Google Scholar
Sattler, T., Leibe, B., Kobbelt, L.: Efficient effective prioritized matching for large-scale image-based localization. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1744–1756 (2017). https://doi.org/10.1109/TPAMI.2016.2611662
Article Google Scholar
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 3302–3312. IEEE (2019)
Google Scholar
Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 6896–6906. IEEE (2018)
Google Scholar
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 4104–4113. IEEE (2016)
Google Scholar
Sinkhorn, R.: Diagonal equivalence to matrices with prescribed row and column sums. Am. Math. Mon. 74(4), 402–405 (1967)
Article MathSciNet Google Scholar
von Stackelberg, H., Bazin, D., Urch, L., Hill, R.R.: Market Structure and Equilibrium. Springer (2011) https://doi.org/10.1007/978-3-642-12586-7
Svärm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1455–1461 (2016)
Article Google Scholar
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: Proceedings of the International Conference on Computer Vision (ICCV), Computer Society, pp. 627–637. IEEE (2017)
Google Scholar
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 1912–1920. IEEE (2015)
Google Scholar
Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 2666–2674. IEEE (2018)
Google Scholar

Download references

Acknowledgements

This work was conducted by the Australian Research Council Centre of Excellence for Robotic Vision (CE140100016), funded by the Australian Government.

Author information

Authors and Affiliations

Australian National University, Australian Centre for Robotic Vision, Canberra, Australia
Dylan Campbell, Liu Liu & Stephen Gould

Authors

Dylan Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Liu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Gould
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dylan Campbell .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 6358 KB)

Supplementary material 3 (mp4 4770 KB)

Supplementary material 1 (pdf 22328 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Campbell, D., Liu, L., Gould, S. (2020). Solving the Blind Perspective-n-Point Problem End-to-End with Robust Differentiable Geometric Optimization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12347. Springer, Cham. https://doi.org/10.1007/978-3-030-58536-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-58536-5_15
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58535-8
Online ISBN: 978-3-030-58536-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics