Skip to main content

Solving the Blind Perspective-n-Point Problem End-to-End with Robust Differentiable Geometric Optimization

  • Conference paper
  • First Online:
Computer Vision ā€“ ECCV 2020 (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12347))

Included in the following conference series:

Abstract

Blind Perspective-n-Point (PnP) is the problem of estimating the position and orientation of a camera relative to a scene, given 2D image points and 3D scene points, without prior knowledge of the 2Dā€“3D correspondences. Solving for pose and correspondences simultaneously is extremely challenging since the search space is very large. Fortunately it is a coupled problem: the pose can be found easily given the correspondences and vice versa. Existing approaches assume that noisy correspondences are provided, that a good pose prior is available, or that the problem size is small. We instead propose the first fully end-to-end trainable network for solving the blind PnP problem efficiently and globally, that is, without the need for pose priors. We make use of recent results in differentiating optimization problems to incorporate geometric model fitting into an end-to-end learning framework, including Sinkhorn, RANSAC and PnP algorithms. Our proposed approach significantly outperforms other methods on synthetic and real data.

D. Campbell and L. Liuā€”Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The probability of choosing a minimal set of 4 true 2Dā€“3D correspondences from the size mn set of all correspondences without replacement is \(\prod _{i=0}^{3} \frac{m-i}{(m-i)(n-i)} \approx 10^{-12}\) for \(m=n=1000\) and no outliers. The number of RANSAC iterations required to achieve 90% confidence is thus \(\log (1 - 0.9) / \log (1 - 10^{-12}) \approx 2.3\times 10^{12}\).

References

  1. Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, Z.: Differentiable convex optimization layers. In: Wallach, H., Larochelle, H., Beygelzimer, A., dā€™AlchĆ© Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32 (NIPS 2019). Curran Associates, Inc., pp. 9562ā€“9574 (2019)

    Google ScholarĀ 

  2. Agrawal, A., Barratt, S., Boyd, S., Busseti, E., Moursi, W.: Differentiating through a cone program. J. Appl. Numer. Optim. 1(2), 107ā€“115 (2019)

    Google ScholarĀ 

  3. Amos, B., Kolter, J.Z.: OptNet: differentiable optimization as a layer in neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia. 70, 136ā€“145 (2017)

    Google ScholarĀ 

  4. Baka, N., Metz, C., Schultz, C.J., van Geuns, R.J., Niessen, W.J., van Walsum, T.: Oriented Gaussian mixture models for nonrigid 2D/3D coronary artery registration. IEEE Trans. Med. Imag. 33(5), 1023ā€“1034 (2014). https://doi.org/10.1109/TMI.2014.2300117

    ArticleĀ  Google ScholarĀ 

  5. Bard, J.F.: Practical Bilevel Optimization: Algorithms and Applications. Kluwer Academic Press (1998)

    Google ScholarĀ 

  6. Besl, P.J., McKay, N.D.: A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 14(2), 239ā€“256 (1992)

    ArticleĀ  Google ScholarĀ 

  7. Brachmann, E., et al.: DSAC - differentiable RANSAC for camera localization. In: Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 2492ā€“2500. IEEE (2017) https://doi.org/10.1109/CVPR.2017.267

  8. Brachmann, E., Rother, C.: Learning less is more - 6D camera localization via 3D surface regression. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 4654ā€“4662. IEEE (2018)

    Google ScholarĀ 

  9. Brown, M., Windridge, D., Guillemaut, J.Y.: Globally optimal 2Dā€“3D registration from points or lines without correspondences. In: Proceedings of the 2015 International Conference on Computer Vision, pp. 2111ā€“2119 (2015)

    Google ScholarĀ 

  10. Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190ā€“1208 (1995)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  11. Campbell, D., Petersson, L., Kneip, L., Li, H.: Globally-optimal inlier set maximisation for camera pose and correspondence estimation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 42(2), 328ā€“342 (2020). https://doi.org/10.1109/TPAMI.2018.2848650

    ArticleĀ  Google ScholarĀ 

  12. Campbell, D., Petersson, L., Kneip, L., Li, H., Gould, S.: The alignment of the spheres: globally-optimal spherical mixture alignment for camera pose estimation. In: Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 11796ā€“11806. IEEE (2019)

    Google ScholarĀ 

  13. Chen, B., Parra, A., Cao, J., Li, N., Chin, T.J.: End-to-end learnable geometric vision by backpropagating PnP optimization. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 8100ā€“8109. IEEE (2020)

    Google ScholarĀ 

  14. Cherian, A., Fernando, B., Harandi, M., Gould, S.: Generalized rank pooling for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 3222ā€“3231. IEEE (2017)

    Google ScholarĀ 

  15. Cuturi, M.: Sinkhorn distances: Light speed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NeurIPS). Curran Associates Inc., pp. 2292ā€“2300 (2013)

    Google ScholarĀ 

  16. Dang, Z., Yi, K.M., Hu, Y., Wang, F., Fua, P., Salzmann, M.: Eigendecomposition-free training of deep networks with zero eigenvalue-based losses. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 792ā€“807. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_47

    ChapterĀ  Google ScholarĀ 

  17. David, P., Dementhon, D., Duraiswami, R., Samet, H.: SoftPOSIT: simultaneous pose and correspondence determination. Int. J. Comput. Vis. (IJCV) 59(3), 259ā€“284 (2004)

    ArticleĀ  Google ScholarĀ 

  18. Dontchev, A.L., Rockafellar, R.T.: Implicit Functions and Solution Mappings. SSORFE. Springer, New York (2014). https://doi.org/10.1007/978-1-4939-1037-3

    BookĀ  MATHĀ  Google ScholarĀ 

  19. Enqvist, O., Kahl, F.: Robust optimal pose estimation. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 141ā€“153. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88682-2_12

    ChapterĀ  Google ScholarĀ 

  20. Fathy, M.E., Tran, Q.-H., Zia, M.Z., Vernaza, P., Chandraker, M.: Hierarchical metric learning and matching for 2D and 3D geometric correspondences. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 832ā€“850. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_49

    ChapterĀ  Google ScholarĀ 

  21. Fernando, B., Gould, S.: Learning end-to-end video classification with rank-pooling. In: Balcan, M.F., Weinberger, K.Q. (eds.) Proc. of the International Conference on Machine Learning (ICML). PMLR, pp. 1187ā€“1196 (2016)

    Google ScholarĀ 

  22. Fernando, B., Gould, S.: Discriminatively learned hierarchical rank pooling networks. Int. J. Comput. Vis. (IJCV) 124, 335ā€“355 (2017)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  23. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381ā€“395 (1981)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  24. Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 930ā€“943 (2003)

    ArticleĀ  Google ScholarĀ 

  25. Gould, S., Hartley, R., Campbell, D.: Deep declarative networks: a new hope. Tech. rep., Australian National University (arXiv:1909.04866) (2019)

  26. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 770ā€“778. IEEE (2016)

    Google ScholarĀ 

  27. Kendall, A., Cipolla, R.: Geometric loss functions for camera pose regression with deep learning. In: Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 6555ā€“6564. IEEE (2017) https://doi.org/10.1109/CVPR.2017.694

  28. Kendall, A., Grimes, M., Cipolla, R.: PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the 2015 International Conference on Computer Vision, Computer Society, pp. 2938ā€“2946. IEEE (2015) https://doi.org/10.1109/ICCV.2015.336

  29. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) Proceedings of the International Conference on Learning Representations (ICLR) (2015)

    Google ScholarĀ 

  30. Kneip, L., Scaramuzza, D., Siegwart, R.: A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In: Proceedings of the 2011 Conference on Computer Vision and Pattern Recognition, Computer Society, pp. 2969ā€“2976. IEEE (2011)

    Google ScholarĀ 

  31. Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 10657ā€“10665. IEEE (2019)

    Google ScholarĀ 

  32. Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155ā€“166 (2009)

    ArticleĀ  Google ScholarĀ 

  33. Li, Z., Snavely, N.: MegaDepth: learning single-view depth prediction from internet photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 2041ā€“2050. IEEE (2018)

    Google ScholarĀ 

  34. Liu, L., Campbell, D., Li, H., Zhou, D., Song, X., Yang, R.: Learning 2Dā€“3D correspondences to solve the blind perspective-n-point problem. Tech. rep., Australian National University arXiv:2003.06752 (2019)

  35. Moreno-Noguer, F., Lepetit, V., Fua, P.: Pose priors for simultaneously solving alignment and correspondence. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 405ā€“418. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88688-4_30

    ChapterĀ  Google ScholarĀ 

  36. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, Honolulu, USA, pp. 652ā€“660. IEEE (2017)

    Google ScholarĀ 

  37. Santa Cruz, R., Fernando, B., Cherian, A., Gould, S.: Visual permutation learning. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 41(12), 3100ā€“3114 (2019)

    ArticleĀ  Google ScholarĀ 

  38. Sattler, T., Leibe, B., Kobbelt, L.: Efficient effective prioritized matching for large-scale image-based localization. IEEE Trans. Pattern Anal. Mach. Intell. 39(9), 1744ā€“1756 (2017). https://doi.org/10.1109/TPAMI.2016.2611662

    ArticleĀ  Google ScholarĀ 

  39. Sattler, T., Zhou, Q., Pollefeys, M., Leal-Taixe, L.: Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 3302ā€“3312. IEEE (2019)

    Google ScholarĀ 

  40. Schƶnberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 6896ā€“6906. IEEE (2018)

    Google ScholarĀ 

  41. Schƶnberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 4104ā€“4113. IEEE (2016)

    Google ScholarĀ 

  42. Sinkhorn, R.: Diagonal equivalence to matrices with prescribed row and column sums. Am. Math. Mon. 74(4), 402ā€“405 (1967)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  43. von Stackelberg, H., Bazin, D., Urch, L., Hill, R.R.: Market Structure and Equilibrium. Springer (2011) https://doi.org/10.1007/978-3-642-12586-7

  44. SvƤrm, L., Enqvist, O., Kahl, F., Oskarsson, M.: City-scale localization for cameras with known vertical direction. IEEE Trans. Pattern Anal. Mach. Intell. 39(7), 1455ā€“1461 (2016)

    ArticleĀ  Google ScholarĀ 

  45. Walch, F., Hazirbas, C., Leal-Taixe, L., Sattler, T., Hilsenbeck, S., Cremers, D.: Image-based localization using lstms for structured feature correlation. In: Proceedings of the International Conference on Computer Vision (ICCV), Computer Society, pp. 627ā€“637. IEEE (2017)

    Google ScholarĀ 

  46. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 1912ā€“1920. IEEE (2015)

    Google ScholarĀ 

  47. Yi, K.M., Trulls, E., Ono, Y., Lepetit, V., Salzmann, M., Fua, P.: Learning to find good correspondences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Computer Society, pp. 2666ā€“2674. IEEE (2018)

    Google ScholarĀ 

Download references

Acknowledgements

This work was conducted by the Australian Research Council Centre of Excellence for Robotic Vision (CE140100016), funded by the Australian Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dylan Campbell .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 2 (mp4 6358 KB)

Supplementary material 3 (mp4 4770 KB)

Supplementary material 1 (pdf 22328 KB)

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Campbell, D., Liu, L., Gould, S. (2020). Solving the Blind Perspective-n-Point Problem End-to-End with Robust Differentiable Geometric Optimization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision ā€“ ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12347. Springer, Cham. https://doi.org/10.1007/978-3-030-58536-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58536-5_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58535-8

  • Online ISBN: 978-3-030-58536-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics