Abstract
Blind Perspective-n-Point (PnP) is the problem of estimating the position and orientation of a camera relative to a scene, given 2D image points and 3D scene points, without prior knowledge of the 2Dā3D correspondences. Solving for pose and correspondences simultaneously is extremely challenging since the search space is very large. Fortunately it is a coupled problem: the pose can be found easily given the correspondences and vice versa. Existing approaches assume that noisy correspondences are provided, that a good pose prior is available, or that the problem size is small. We instead propose the first fully end-to-end trainable network for solving the blind PnP problem efficiently and globally, that is, without the need for pose priors. We make use of recent results in differentiating optimization problems to incorporate geometric model fitting into an end-to-end learning framework, including Sinkhorn, RANSAC and PnP algorithms. Our proposed approach significantly outperforms other methods on synthetic and real data.
D. Campbell and L. LiuāEqual contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The probability of choosing a minimal set of 4 true 2Dā3D correspondences from the size mn set of all correspondences without replacement is \(\prod _{i=0}^{3} \frac{m-i}{(m-i)(n-i)} \approx 10^{-12}\) for \(m=n=1000\) and no outliers. The number of RANSAC iterations required to achieve 90% confidence is thus \(\log (1 - 0.9) / \log (1 - 10^{-12}) \approx 2.3\times 10^{12}\).
References
Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, Z.: Differentiable convex optimization layers. In: Wallach, H., Larochelle, H., Beygelzimer, A., dāAlchĆ© Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32 (NIPS 2019). Curran Associates, Inc., pp. 9562ā9574 (2019)
Agrawal, A., Barratt, S., Boyd, S., Busseti, E., Moursi, W.: Differentiating through a cone program. J. Appl. Numer. Optim. 1(2), 107ā115 (2019)
Amos, B., Kolter, J.Z.: OptNet: differentiable optimization as a layer in neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia. 70, 136ā145 (2017)
Baka, N., Metz, C., Schultz, C.J., van Geuns, R.J., Niessen, W.J., van Walsum, T.: Oriented Gaussian mixture models for nonrigid 2D/3D coronary artery registration. IEEE Trans. Med. Imag. 33(5), 1023ā1034 (2014). https://doi.org/10.1109/TMI.2014.2300117
Bard, J.F.: Practical Bilevel Optimization: Algorithms and Applications. Kluwer Academic Press (1998)
Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 14(2), 239ā256 (1992)
Brachmann, E., et al.: DSAC - differentiable RANSAC for camera localization. In: Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 2492ā2500. IEEE (2017) https://doi.org/10.1109/CVPR.2017.267
Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 4654ā4662. IEEE (2018)
Brown, M., Windridge, D., Guillemaut, J.Y.: Globally optimal 2Dā3D registration from points or lines without correspondences. In: Proceedings of the 2015 International Conference on Computer Vision, pp. 2111ā2119 (2015)
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190ā1208 (1995)
Campbell, D., Petersson, L., Kneip, L., Li, H.: Globally-optimal inlier set maximisation for camera pose and correspondence estimation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 42(2), 328ā342 (2020). https://doi.org/10.1109/TPAMI.2018.2848650
Campbell, D., Petersson, L., Kneip, L., Li, H., Gould, S.: The alignment of the spheres: globally-optimal spherical mixture alignment for camera pose estimation. In: Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 11796ā11806. IEEE (2019)
Chen, B., Parra, A., Cao, J., Li, N., Chin, T.J.: End-to-end learnable geometric vision by backpropagating PnP optimization. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 8100ā8109. IEEE (2020)
Cherian, A., Fernando, B., Harandi, M., Gould, S.: Generalized rank pooling for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 3222ā3231. IEEE (2017)
Cuturi, M.: Sinkhorn distances: Light speed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NeurIPS). Curran Associates Inc., pp. 2292ā2300 (2013)
Dang, Z., Yi, K.M., Hu, Y., Wang, F., Fua, P., Salzmann, M.: Eigendecomposition-free training of deep networks with zero eigenvalue-based losses. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 792ā807. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_47
David, P., Dementhon, D., Duraiswami, R., Samet, H.: SoftPOSIT: simultaneous pose and correspondence determination. Int. J. Comput. Vis. (IJCV) 59(3), 259ā284 (2004)
Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. SSORFE. Springer, New York (2014). https://doi.org/10.1007/978-1-4939-1037-3
Enqvist, O., Kahl, F.: Robust optimal pose estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 141ā153. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_12
Fathy, M.E., Tran, Q.-H., Zia, M.Z., Vernaza, P., Chandraker, M.: Hierarchical metric learning and matching for 2D and 3D geometric correspondences. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 832ā850. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_49
Fernando, B., Gould, S.: Learning end-to-end video classification with rank-pooling. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proc. of the International Conference on Machine Learning (ICML). PMLR, pp. 1187ā1196 (2016)
Fernando, B., Gould, S.: Discriminatively learned hierarchical rank pooling networks. Int. J. Comput. Vis. (IJCV) 124, 335ā355 (2017)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381ā395 (1981)
Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 930ā943 (2003)
Gould, S., Hartley, R., Campbell, D.: Deep declarative networks: a new hope. Tech. rep., Australian National University (arXiv:1909.04866) (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 770ā778. IEEE (2016)
Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 6555ā6564. IEEE (2017) https://doi.org/10.1109/CVPR.2017.694
Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the 2015 International Conference on Computer Vision, Computer Society, pp. 2938ā2946. IEEE (2015) https://doi.org/10.1109/ICCV.2015.336
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Kneip, L., Scaramuzza, D., Siegwart, R.: A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In: Proceedings of the 2011 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 2969ā2976. IEEE (2011)
Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 10657ā10665. IEEE (2019)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155ā166 (2009)
Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 2041ā2050. IEEE (2018)
Liu, L., Campbell, D., Li, H., Zhou, D., Song, X., Yang, R.: Learning 2Dā3D correspondences to solve the blind perspective-n-point problem. Tech. rep., Australian National University arXiv:2003.06752 (2019)
Moreno-Noguer, F., Lepetit, V., Fua, P.: Pose priors for simultaneously solving alignment and correspondence. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 405ā418. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_30
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, Honolulu, USA, pp. 652ā660. IEEE (2017)
Santa Cruz, R., Fernando, B., Cherian, A., Gould, S.: Visual permutation learning. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 41(12), 3100ā3114 (2019)
Sattler, T., Leibe, B., Kobbelt, L.: Efficient effective prioritized matching for large-scale image-based localization. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1744ā1756 (2017). https://doi.org/10.1109/TPAMI.2016.2611662
Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 3302ā3312. IEEE (2019)
Schƶnberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 6896ā6906. IEEE (2018)
Schƶnberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 4104ā4113. IEEE (2016)
Sinkhorn, R.: Diagonal equivalence to matrices with prescribed row and column sums. Am. Math. Mon. 74(4), 402ā405 (1967)
von Stackelberg, H., Bazin, D., Urch, L., Hill, R.R.: Market Structure and Equilibrium. Springer (2011) https://doi.org/10.1007/978-3-642-12586-7
SvƤrm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1455ā1461 (2016)
Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: Proceedings of the International Conference on Computer Vision (ICCV), Computer Society, pp. 627ā637. IEEE (2017)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 1912ā1920. IEEE (2015)
Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 2666ā2674. IEEE (2018)
Acknowledgements
This work was conducted by the Australian Research Council Centre of Excellence for Robotic Vision (CE140100016), funded by the Australian Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 2 (mp4 6358 KB)
Supplementary material 3 (mp4 4770 KB)
Rights and permissions
Copyright information
Ā© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Campbell, D., Liu, L., Gould, S. (2020). Solving the Blind Perspective-n-Point Problem End-to-End with Robust Differentiable Geometric Optimization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision ā ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12347. Springer, Cham. https://doi.org/10.1007/978-3-030-58536-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-58536-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58535-8
Online ISBN: 978-3-030-58536-5
eBook Packages: Computer ScienceComputer Science (R0)