Abstract
Registering image data to Structure from Motion (SfM) point clouds is widely used to find precise camera location and orientation with respect to a world model. In case of videos one constraint has previously been unexploited: temporal smoothness. Without temporal smoothness the magnitude of the pose error in each frame of a video will often dominate the magnitude of frame-to-frame pose change. This hinders application of methods requiring stable poses estimates (e.g. tracking, augmented reality). We incorporate temporal constraints into the image-based registration setting and solve the problem by pose regularization with model fitting and smoothing methods. This leads to accurate, gap-free and smooth poses for all frames. We evaluate different methods on challenging synthetic and real street-view SfM data for varying scenarios of motion speed, outlier contamination, pose estimation failures and 2D-3D correspondence noise. For all test cases a 2 to 60-fold reduction in root mean squared (RMS) positional error is observed, depending on pose estimation difficulty. For varying scenarios, different methods perform best. We give guidance which methods should be preferred depending on circumstances and requirements.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agarwal, S., Snavely, N., Simon, I., Seitz, S., Szeliski, R.: Building Rome in a day. In: ICCV (2009)
Agrawal, M.: A Lie Algebraic Approach for Consistent Pose Registration for General Euclidean Motion. In: IEEE/RSJ IROS (2013)
Aubry, M., Russell, B.C., Sivic, J.: Painting-to-3D Model Alignment Via Discriminative Visual Elements. ACM TOG (2013)
Bergamo, A., Torresani, L.: Leveraging Structure from Motion to Learn Discriminative Codebooks for Scalable Landmark Classification. In: CVPR (2013)
Boix, X., Gygli, M., Roig, G., Van Gool, L.: Sparse Quantization for Patch Description. In: CVPR (2013)
Brubaker, M.A., Geiger, A., Urtasun, R.: Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization. In: CVPR (2013)
Cao, S., Snavely, N.: Graph-Based Discriminative Learning for Location Recognition. In: CVPR (2013)
Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera SLAM. PAMI (2007)
Gordon, I., Lowe, D.G.: What and Where: 3D Object Recognition with Accurate Pose. In: CLOR 2006 (2006)
Hakeem, A., Vezzani, R., Shah, M., Cucchiara, R.: Estimating Geospatial Trajectory of a Moving Camera. In: ICPR (2006)
Hao, Q., Cai, R., Li, Z., Zhang, L., Pang, Y., Wu, F.: 3D visual phrases for landmark recognition. In: CVPR (2012)
Hao, Q., Cai, R., Li, Z., Zhang, L., Pang, Y., Wu, F., Rui, Y.: Efficient 2D-to-3D Correspondence Filtering for Scalable 3D Object Recognition. In: CVPR (2013)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, Data Mining, Inference, and Prediction, 2nd edn. Springer (2009)
Hays, J., Efros, A.A.: IM 2 GPS: estimating geographic information from a single image. In: CVPR (2008)
Hsu, S., Samarasekera, S., Kumar, R., Sawhney, H.S.: Pose estimation, model refinement, and enhanced visualization using video. In: CVPR (2000)
Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR (2009)
Kalantidis, Y., Tolias, G., Avrithis, Y.: Viral: Visual image retrieval and localization. Multimedia Tools and Applications (2011)
Kalogerakis, E., Vesselova, O., Hays, J., Efros, A.A., Hertzmann, A.: Image Sequence Geolocation with Human Travel Priors. In: ICCV (2009)
Klein, G., Murray, D.: Parallel Tracking and Mapping for Small AR Workspaces. In: ISMAR (2007)
Klingner, B., Martin, D., Roseborough, J.: Street View Motion-from-Structure-from-Motion. In: ICCV (2013)
Knopp, J., Sivic, J., Pajdla, T.: Avoiding confusing features in place recognition. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 748–761. Springer, Heidelberg (2010)
Koch, O., Teller, S.: Wide-Area Egomotion Estimation from Known 3D Structure. In: CVPR (2007)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An Accurate O(n) Solution to the PnP Problem. IJCV (2009)
Li, S., Xu, C., Xie, M.: A Robust O(n) Solution to the Perspective-n-Point Problem. PAMI (2012)
Li, Y., Snavely, N., Huttenlocher, D., Fua, P.: Worldwide pose estimation using 3D point clouds. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 15–29. Springer, Heidelberg (2012)
Li, Y., Snavely, N., Huttenlocher, D.P.: Location Recognition using Prioritized Feature Matching. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 791–804. Springer, Heidelberg (2010)
Lim, H., Sinha, S.N., Cohen, M.F., Uyttendaele, M.: Real-time image-based 6-dof localization in large-scale environments. In: CVPR (2012)
Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. IJCV (2004)
Newcombe, R.A., Davison, A.J.: Live dense reconstruction with a single moving camera. In: CVPR (2010)
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: Dense tracking and mapping in real-time. In: ICCV (2011)
Ramalingam, S., Bouaziz, S., Sturm, P.: Pose Estimation Using Both Points and Lines for Geolocation. In: ICRA (2011)
Robertson, D., Cipolla, R.: An Image-Based System for Urban Navigation. In: BMVC (2004)
Rodriguez, J., Aggarwal, J.: Matching aerial images to 3-D terrain maps. PAMI (1990)
Sattler, T., Leibe, B., Kobbelt, L.: Fast image-based localization using direct 2D-to-3D matching. In: ICCV (2011)
Sattler, T., Leibe, B., Kobbelt, L.: Improving Image-Based Localization by Active Correspondence Search. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 752–765. Springer, Heidelberg (2012)
Schindler, G., Brown, M., Szeliski, R.: City-Scale Location Recognition. In: CVPR (2007)
Se, S., Lowe, D., Little, J.: Vision-based mobile robot localization and mapping using scale-invariant features. In: ICRA (2001)
Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., Pollefeys, M.: Live Metric 3D Reconstruction on Mobile Phones. In: ICCV (2013)
Vaca-Castano, G., Zamir, A.R., Shah, M.: City scale geo-spatial trajectory estimation of a moving camera. In: CVPR (2012)
Zamir, A.R., Shah, M.: Accurate image localization based on google maps street view. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 255–268. Springer, Heidelberg (2010)
Zhao, W., Nister, D., Hsu, S.: Alignment of continuous video onto 3D point clouds. In: CVPR (2004)
Zheng, Y., Kuang, Y., Sugimoto, S., Aström, K., Okutomi, M.: Revisiting the PnP Problem: A Fast, General and Optimal Solution. In: ICCV (2013)
Zheng, Y., Sugimoto, S., Okutomi, M.: ASPnP: An Accurate and Scalable Solution to the Perspective-n-Point Problem. IEICE TIS (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
1 Electronic Supplementary Material
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Kroeger, T., Van Gool, L. (2014). Video Registration to SfM Models. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-10602-1_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10601-4
Online ISBN: 978-3-319-10602-1
eBook Packages: Computer ScienceComputer Science (R0)