Abstract
Geometrical registration of a query image with respect to a 3D model, or pose estimation, is the cornerstone of many computer vision applications. It is often based on the matching of local photometric descriptors invariant to limited viewpoint changes. However, when the query image has been acquired from a camera position not covered by the model images, pose estimation is often not accurate and sometimes even fails, precisely because of the limited invariance of descriptors. In this paper, we propose to add descriptors to the model, obtained from synthesized views associated with virtual cameras completing the covering of the scene by the real cameras. We propose an efficient strategy to localize the virtual cameras in the scene and generate valuable descriptors from synthetic views. We also discuss a guided sampling strategy for registration in this context. Experiments show that the accuracy of pose estimation is dramatically improved when large viewpoint changes makes the matching of classic descriptors a challenging task.
Similar content being viewed by others
References
Billinghurst, M., Clark, A., Lee, G.: A survey of augmented reality. Found. Trends Hum. Comput. Interact. 8(2–3), 73–272 (2015)
Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. 22(12), 2633–2651 (2016)
Charmette, B., Royer, E., Chausse, F.: Vision-based robot localization based on the efficient matching of planar features. Mach. Vis. Appl. 27(4), 415–436 (2016)
Shan, Q., Wu, C., Curless, B., Furukawa, Y., Hernandez, C., Seitz, S.M.: Accurate geo-registration by ground-to-aerial image matching. In: Proceedings of International Conference on 3D Vision (3DV) (2014)
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Hesch, J.A., Roumeliotis, S.I.: A direct least-squares (DLS) method for PnP. In: Proceedings of International Conference on Computer Vision (2011)
Moreels, P., Perona, P.: Evaluation of features detectors and descriptors based on 3D objects. Int. J. Comput. Vis. 73(3), 263–284 (2007)
Kendall, A., Grimes, M., Cipolla, R.: Posenet: A convolutional network for real-time 6-dof camera relocalization. In: The IEEE International Conference on Computer Vision (ICCV) (2015)
Purkait, P., Zhao, C., Zach, C.: SPP-Net: Deep absolute pose regression with synthetic views. arXiv:1712.03452 (2018)
Purkait, P., Zhao, C., Zach, C.: Synthetic view generation for absolute pose regression and image synthesis. In: Proceedings of British Machine Vision Conference (BMVC) (2018)
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. Int. J. Comput. Vis. 65(1–2), 43–72 (2005)
Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Proceedings of European Conference on Computer Vision (ECCV) (2002)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Yi, K., Trulls, E., Lepetit, V., Fua, P.: LIFT: Learned invariant feature transform. In: Proceedings of European Conference on Computer Vision (ECCV) (2016)
Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1465–1479 (2006)
Williams, B., Klein, G., Reid, I.: Real-time SLAM relocalisation. In: Proceedings of International Conference on Computer Vision (ICCV) (2007)
Paulin, M., Revaud, J., Harchaoui, Z., Perronnin, F., Schmid, C.: Transformation pursuit for image classification. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Morel, J.-M., Yu, G.: ASIFT: A new framework for fully affine invariant image comparison. SIAM J. Imaging Sci. 2(2), 438–469 (2009)
Rolin, P., Berger, M.-O., Sur, F.: Viewpoint simulation for camera pose estimation from an unstructured scene model. In: Proceedings of International Conference on Robotics and Automation (ICRA) (2015)
Savarese, S., Fei-Fei, L.: View synthesis for recognizing unseen poses of object classes. In: Proceedings of European Conference on Computer Vision (ECCV) (2008)
Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 448–461 (2010)
Mishkin, D., Matas, J., Perdoch, M.: MODS: Fast and robust method for two-view matching. Comput. Vis. Image Underst. 141, 81–93 (2015a)
Mishkin, D., Matas, J., Perdoch, M., Lenc, K.: WXBS: Wide baseline stereo generalizations. In: Proceedings of British Machine Vision Conference (BMVC) (2015)
Rodriguez, M., Delon, J., Morel, J.-M.: Covering the space of tilts: application to affine invariant image comparison. SIAM J. Imaging Sci. 11(2), 1230–1267 (2018)
Köser, K., Koch, R.: Perspectively invariant normal features. In: Proceedings of International Conference on Computer Vision (ICCV) (2007)
Kushnir, M., Shimshoni, I.: Epipolar geometry estimation for urban scenes with repetitive structures. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2381–2395 (2014)
Wu, C., Clipp, B., Li, X., Frahm, J.-M., Pollefeys, M.: 3D model matching with viewpoint-invariant patches (VIP). In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
Petit, A., Marchand, E., Kanani, K.: Tracking complex targets for space rendezvous and debris removal applications. In: Proceedings of International Conference on Intelligent Robots and Systems (IROS) (2012)
Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., Pajdla, T.: 24/7 place recognition by view synthesis. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Irschara, A., Zach, C., Frahm, J.-M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Wendel, A., Irschara, A., Bischof, H.: Natural landmark-based monocular localization for MAVs. In: Proceedings of International Conference on Robotics and Automation (ICRA) (2011)
Molton, N, Davison, A.J., Reid, I.: Locally planar patch features for real-time structure from motion. In: Proceedings of British Machine Vision Conference (BMVC) (2004)
Reitmayr, G., Drummond, T.W.: Going out: robust tracking for outdoor augmented reality. In: Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) (2006)
Simon, G.: Tracking-by-synthesis using point features and pyramidal blurring. In: Proceedings of International Symposium on Mixed and Augmented Reality (ISMAR) (2011)
Delaunoy, A., Pollefeys, M.: Photometric bundle adjustment for dense multi-view 3D modeling. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Rolin, P., Berger, M.-O., Sur, F.: Enhancing pose estimation through efficient patch synthesis. In: Proceedings British Machine Vision Conference (BMVC) (2016)
Fischler, M., Bolles, R.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, second edn. Cambridge University Press, Cambridge (2004)
Hoppe, H., DeRose, T., Duchamp, T., McDonald, J., Stuetzle, W.: Surface reconstruction from unorganized points. In: Proceedings of SIGGRAPH (1992)
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Morel, J.-M., Yu, G.: Is SIFT scale invariant? AIMS Inverse Probl. Imaging 5(1), 115–136 (2011)
Rusu, R.B., Cousins, S.: 3D is here: Point Cloud Library (PCL). In: Proceedings of International Conference on Robotics and Automation (ICRA) (2011)
Boulch, A., Marlet, R.: Fast normal estimation for point clouds with sharp features using a robust randomized Hough transform. Comput. Graph. Forum 31(5), 1765–1774 (2012)
Rolin, P., Berger, M.-O., Sur, F.: Simulation de point de vue pour la mise en correspondance et la localisation. Traitement du Signal 32(2–3), 169–194 (2015b)
Katz, S., Tal, A., Basri, R.: Direct visibility of point sets. ACM Trans. Graph. 26(3), 24 (2007)
Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of RANSAC techniques leading to adaptive real-time random sample consensus. In: Proceedings of European Conference on Computer Vision (ECCV) (2008)
Chum, O., Matas, J.: Matching with PROSAC—progressive sample consensus. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2005)
Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR) (2008)
Mount, D.M., Arya, S.: ANN: A library for approximate nearest neighbor searching. https://www.cs.umd.edu/~mount/ANN/. Accessed 19 Aug 2019 (2010)
Li, Y., Snavely, N., Huttenlocher, D.: Location recognition using prioritized feature matching. In: Proceedings of European Conference on Computer Vision (ECCV) (2010)
Li, Y., Noah, S., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Proceedings of European Conference on Computer Vision (ECCV) (2012)
Wu, C.: VisualSFM: A visual structure from motion system. http://homes.cs.washington.edu/~ccwu/vsfm/. Accessed 19 Aug 2019 (2011)
Aanæs, H., Dahl, A.L., Pedersen, K.S.: Interesting interest points. Int. J. Comput. Vis. 97(1), 18–35 (2012)
http://www.diegm.uniud.it/fusiello/demo/samantha/. Accessed 19 Aug 2019
https://cvg.ethz.ch/research/symmetries-in-sfm/. Accessed 19 Aug 2019
Simon, G., Fond, A., Berger, M.-O.: A simple and effective method to detect orthogonal vanishing points in uncalibrated images of man-made environments. In: Proceedings of Eurographics (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Registering cameras from two SfM reconstructions
Appendix A: Registering cameras from two SfM reconstructions
We here have two SFM reconstructions. One is built from the full image sequence and the second is built from a subsequence. Let \({{{\mathscr {R}}}}_1\) and \({{{\mathscr {R}}}}_2\) be the world coordinate frames attached to these SfM reconstructions.
Given a camera matrix \(P_1=[R_1 T_1]\) expressed in \({{{\mathscr {R}}}}_1\), our goal is to compute its projection matrix \(P_2=[R_2 T_2]\) expressed in \({{{\mathscr {R}}}}_2\).
The rigid + scale transformation (s, R, T) between \({{{\mathscr {R}}}}_1\) and \({{{\mathscr {R}}}}_2\) can be easily recovered using Procrustes analysis from the set of corresponding camera centers. Since we consider the same camera, the viewing coordinates are the same in the two coordinates frames. As the link between the viewing and the world coordinate in homogeneous coordinates is given by \(X_{vc}^1= R^{1}X_{wc}^1 + T^1\), we can deduce:
Therefore, the expression of the camera matrix \(P_1\) in \({\mathscr {R}}_2\) is
Rights and permissions
About this article
Cite this article
Rolin, P., Berger, MO. & Sur, F. View synthesis for pose computation. Machine Vision and Applications 30, 1209–1227 (2019). https://doi.org/10.1007/s00138-019-01045-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-019-01045-5