Skip to main content
Log in

View synthesis for pose computation

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Geometrical registration of a query image with respect to a 3D model, or pose estimation, is the cornerstone of many computer vision applications. It is often based on the matching of local photometric descriptors invariant to limited viewpoint changes. However, when the query image has been acquired from a camera position not covered by the model images, pose estimation is often not accurate and sometimes even fails, precisely because of the limited invariance of descriptors. In this paper, we propose to add descriptors to the model, obtained from synthesized views associated with virtual cameras completing the covering of the scene by the real cameras. We propose an efficient strategy to localize the virtual cameras in the scene and generate valuable descriptors from synthetic views. We also discuss a guided sampling strategy for registration in this context. Experiments show that the accuracy of pose estimation is dramatically improved when large viewpoint changes makes the matching of classic descriptors a challenging task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Billinghurst, M., Clark, A., Lee, G.: A survey of augmented reality. Found. Trends Hum. Comput. Interact. 8(2–3), 73–272 (2015)

    Article  Google Scholar 

  2. Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. 22(12), 2633–2651 (2016)

    Article  Google Scholar 

  3. Charmette, B., Royer, E., Chausse, F.: Vision-based robot localization based on the efficient matching of planar features. Mach. Vis. Appl. 27(4), 415–436 (2016)

    Article  Google Scholar 

  4. Shan, Q., Wu, C., Curless, B., Furukawa, Y., Hernandez, C., Seitz, S.M.: Accurate geo-registration by ground-to-aerial image matching. In: Proceedings of International Conference on 3D Vision (3DV) (2014)

  5. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  6. Hesch, J.A., Roumeliotis, S.I.: A direct least-squares (DLS) method for PnP. In: Proceedings of International Conference on Computer Vision (2011)

  7. Moreels, P., Perona, P.: Evaluation of features detectors and descriptors based on 3D objects. Int. J. Comput. Vis. 73(3), 263–284 (2007)

    Article  Google Scholar 

  8. Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: The IEEE International Conference on Computer Vision (ICCV) (2015)

  9. Purkait, P., Zhao, C., Zach, C.: SPP-Net: Deep absolute pose regression with synthetic views. arXiv:1712.03452 (2018)

  10. Purkait, P., Zhao, C., Zach, C.: Synthetic view generation for absolute pose regression and image synthesis. In: Proceedings of British Machine Vision Conference (BMVC) (2018)

  11. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65(1–2), 43–72 (2005)

    Article  Google Scholar 

  12. Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Proceedings of European Conference on Computer Vision (ECCV) (2002)

    Chapter  Google Scholar 

  13. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Article  Google Scholar 

  14. Yi, K., Trulls, E., Lepetit, V., Fua, P.: LIFT: Learned invariant feature transform. In: Proceedings of European Conference on Computer Vision (ECCV) (2016)

  15. Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1465–1479 (2006)

    Article  Google Scholar 

  16. Williams, B., Klein, G., Reid, I.: Real-time SLAM relocalisation. In: Proceedings of International Conference on Computer Vision (ICCV) (2007)

  17. Paulin, M., Revaud, J., Harchaoui, Z., Perronnin, F., Schmid, C.: Transformation pursuit for image classification. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

  18. Morel, J.-M., Yu, G.: ASIFT: A new framework for fully affine invariant image comparison. SIAM J. Imaging Sci. 2(2), 438–469 (2009)

    Article  MathSciNet  Google Scholar 

  19. Rolin, P., Berger, M.-O., Sur, F.: Viewpoint simulation for camera pose estimation from an unstructured scene model. In: Proceedings of International Conference on Robotics and Automation (ICRA) (2015)

  20. Savarese, S., Fei-Fei, L.: View synthesis for recognizing unseen poses of object classes. In: Proceedings of European Conference on Computer Vision (ECCV) (2008)

  21. Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 448–461 (2010)

    Article  Google Scholar 

  22. Mishkin, D., Matas, J., Perdoch, M.: MODS: Fast and robust method for two-view matching. Comput. Vis. Image Underst. 141, 81–93 (2015a)

    Article  Google Scholar 

  23. Mishkin, D., Matas, J., Perdoch, M., Lenc, K.: WXBS: Wide baseline stereo generalizations. In: Proceedings of British Machine Vision Conference (BMVC) (2015)

  24. Rodriguez, M., Delon, J., Morel, J.-M.: Covering the space of tilts: application to affine invariant image comparison. SIAM J. Imaging Sci. 11(2), 1230–1267 (2018)

    Article  MathSciNet  Google Scholar 

  25. Köser, K., Koch, R.: Perspectively invariant normal features. In: Proceedings of International Conference on Computer Vision (ICCV) (2007)

  26. Kushnir, M., Shimshoni, I.: Epipolar geometry estimation for urban scenes with repetitive structures. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2381–2395 (2014)

    Article  Google Scholar 

  27. Wu, C., Clipp, B., Li, X., Frahm, J.-M., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

  28. Petit, A., Marchand, E., Kanani, K.: Tracking complex targets for space rendezvous and debris removal applications. In: Proceedings of International Conference on Intelligent Robots and Systems (IROS) (2012)

  29. Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  30. Irschara, A., Zach, C., Frahm, J.-M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

  31. Wendel, A., Irschara, A., Bischof, H.: Natural landmark-based monocular localization for MAVs. In: Proceedings of International Conference on Robotics and Automation (ICRA) (2011)

  32. Molton, N, Davison, A.J., Reid, I.: Locally planar patch features for real-time structure from motion. In: Proceedings of British Machine Vision Conference (BMVC) (2004)

  33. Reitmayr, G., Drummond, T.W.: Going out: robust tracking for outdoor augmented reality. In: Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) (2006)

  34. Simon, G.: Tracking-by-synthesis using point features and pyramidal blurring. In: Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) (2011)

  35. Delaunoy, A., Pollefeys, M.: Photometric bundle adjustment for dense multi-view 3D modeling. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

  36. Rolin, P., Berger, M.-O., Sur, F.: Enhancing pose estimation through efficient patch synthesis. In: Proceedings British Machine Vision Conference (BMVC) (2016)

  37. Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  38. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, second edn. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  39. Hoppe, H., DeRose, T., Duchamp, T., McDonald, J., Stuetzle, W.: Surface reconstruction from unorganized points. In: Proceedings of SIGGRAPH (1992)

  40. Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)

    Article  Google Scholar 

  41. Morel, J.-M., Yu, G.: Is SIFT scale invariant? AIMS Inverse Probl. Imaging 5(1), 115–136 (2011)

    Article  Google Scholar 

  42. Rusu, R.B., Cousins, S.: 3D is here: Point Cloud Library (PCL). In: Proceedings of International Conference on Robotics and Automation (ICRA) (2011)

  43. Boulch, A., Marlet, R.: Fast normal estimation for point clouds with sharp features using a robust randomized Hough transform. Comput. Graph. Forum 31(5), 1765–1774 (2012)

    Article  Google Scholar 

  44. Rolin, P., Berger, M.-O., Sur, F.: Simulation de point de vue pour la mise en correspondance et la localisation. Traitement du Signal 32(2–3), 169–194 (2015b)

    Article  Google Scholar 

  45. Katz, S., Tal, A., Basri, R.: Direct visibility of point sets. ACM Trans. Graph. 26(3), 24 (2007)

    Article  Google Scholar 

  46. Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: Proceedings of European Conference on Computer Vision (ECCV) (2008)

  47. Chum, O., Matas, J.: Matching with PROSAC—progressive sample consensus. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

  48. Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

  49. Mount, D.M., Arya, S.: ANN: A library for approximate nearest neighbor searching. https://www.cs.umd.edu/~mount/ANN/. Accessed 19 Aug 2019 (2010)

  50. Li, Y., Snavely, N., Huttenlocher, D.: Location recognition using prioritized feature matching. In: Proceedings of European Conference on Computer Vision (ECCV) (2010)

    Chapter  Google Scholar 

  51. Li, Y., Noah, S., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Proceedings of European Conference on Computer Vision (ECCV) (2012)

    Chapter  Google Scholar 

  52. Wu, C.: VisualSFM: A visual structure from motion system. http://homes.cs.washington.edu/~ccwu/vsfm/. Accessed 19 Aug 2019 (2011)

  53. Aanæs, H., Dahl, A.L., Pedersen, K.S.: Interesting interest points. Int. J. Comput. Vis. 97(1), 18–35 (2012)

    Article  Google Scholar 

  54. http://www.diegm.uniud.it/fusiello/demo/samantha/. Accessed 19 Aug 2019

  55. https://cvg.ethz.ch/research/symmetries-in-sfm/. Accessed 19 Aug 2019

  56. Simon, G., Fond, A., Berger, M.-O.: A simple and effective method to detect orthogonal vanishing points in uncalibrated images of man-made environments. In: Proceedings of Eurographics (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frédéric Sur.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Registering cameras from two SfM reconstructions

Appendix A: Registering cameras from two SfM reconstructions

We here have two SFM reconstructions. One is built from the full image sequence and the second is built from a subsequence. Let \({{{\mathscr {R}}}}_1\) and \({{{\mathscr {R}}}}_2\) be the world coordinate frames attached to these SfM reconstructions.

Given a camera matrix \(P_1=[R_1 T_1]\) expressed in \({{{\mathscr {R}}}}_1\), our goal is to compute its projection matrix \(P_2=[R_2 T_2]\) expressed in \({{{\mathscr {R}}}}_2\).

The rigid + scale transformation (sRT) between \({{{\mathscr {R}}}}_1\) and \({{{\mathscr {R}}}}_2\) can be easily recovered using Procrustes analysis from the set of corresponding camera centers. Since we consider the same camera, the viewing coordinates are the same in the two coordinates frames. As the link between the viewing and the world coordinate in homogeneous coordinates is given by \(X_{vc}^1= R^{1}X_{wc}^1 + T^1\), we can deduce:

$$\begin{aligned} X_{vc}^2= & {} X_{vc}^1=R^{1}( s RX_{wc}^2 + T)+T^1\\= & {} s(R^{1}RX_{wc}^2 + 1/s\,( R^1T+T^1)) \end{aligned}$$

Therefore, the expression of the camera matrix \(P_1\) in \({\mathscr {R}}_2\) is

$$\begin{aligned} {[R^{1}R, 1/s\,( R^1T+T^1)]}. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rolin, P., Berger, MO. & Sur, F. View synthesis for pose computation. Machine Vision and Applications 30, 1209–1227 (2019). https://doi.org/10.1007/s00138-019-01045-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-019-01045-5

Keywords

Navigation