Feature Trajectory Retrieval with Application to Accurate Structure and Motion Recovery

  • Kai Cordes
  • Oliver Müller
  • Bodo Rosenhahn
  • Jörn Ostermann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6938)


Common techniques in structure from motion do not explicitly handle foreground occlusions and disocclusions, leading to several trajectories of a single 3D point. Hence, different discontinued trajectories induce a set of (more inaccurate) 3D points instead of a single 3D point, so that it is highly desirable to enforce long continuous trajectories which automatically bridge occlusions after a re-identification step. The solution proposed in this paper is to connect features in the current image to trajectories which discontinued earlier during the tracking. This is done using a correspondence analysis which is designed for wide baselines and an outlier elimination strategy using the epipolar geometry. The reference to the 3D object points can be used as a new constraint in the bundle adjustment. The feature localization is done using the SIFT detector extended by a Gaussian approximation of the gradient image signal. This technique provides the robustness of SIFT coupled with increased localization accuracy.

Our results show that the reconstruction can be drastically improved and the drift is reduced, especially in sequences with occlusions resulting from foreground objects. In scenarios with large occlusions, the new approach leads to reliable and accurate results while a standard reference method fails.


Object Point Foreground Object Bundle Adjustment Sift Descriptor Reprojection Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Frahm, J.M., Pollefeys, M., Lazebnik, S., Gallup, D., Clipp, B., Raguram, R., Wu, C., Zach, C., Johnson, T.: Fast robust large-scale mapping from video and internet photo collections. Journal of Photogrammetry and Remote Sensing (ISPRS) 65, 538–549 (2010)CrossRefGoogle Scholar
  2. 2.
    Hasler, N., Rosenhahn, B., Thormählen, T., Wand, M., Seidel, H.P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)Google Scholar
  3. 3.
    van den Hengel, A., Dick, A., Thormählen, T., Ward, B., Torr, P.H.S.: Videotrace: rapid interactive scene modelling from video. In: ACM SIGGRAPH 2007 papers. SIGGRAPH 2007, vol. (86). ACM, New York (2007)Google Scholar
  4. 4.
    Pollefeys, M., Gool, L.V.V., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. International Journal of Computer Vision (IJCV) 59, 207–232 (2004)CrossRefGoogle Scholar
  5. 5.
    Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. International Journal of Computer Vision (IJCV) 80, 189–210 (2008)CrossRefGoogle Scholar
  6. 6.
    Thormählen, T., Hasler, N., Wand, M., Seidel, H.P.: Registration of sub-sequence and multi-camera reconstructions for camera motion estimation. Journal of Virtual Reality and Broadcasting 7 (2010)Google Scholar
  7. 7.
    Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment - a modern synthesis. In: Triggs, B., Zisserman, A., Szeliski, R. (eds.) ICCV-WS 1999. LNCS, vol. 1883, pp. 298–372. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  8. 8.
    Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 674–679 (1981)Google Scholar
  9. 9.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 60, 91–110 (2004)CrossRefGoogle Scholar
  10. 10.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference (BMVC), vol. 1, pp. 384–393 (2002)Google Scholar
  11. 11.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 27, 1615–1630 (2005)CrossRefGoogle Scholar
  12. 12.
    Brown, M., Lowe, D.G.: Invariant features from interest point groups. In: British Machine Vision Conference (BMVC), pp. 656–665 (2002)Google Scholar
  13. 13.
    Engels, C., Fraundorfer, F., Nistér, D.: Integration of tracked and recognized features for locally and globally robust structure from motion. In: VISAPP (Workshop on Robot Perception), pp. 13–22 (2008)Google Scholar
  14. 14.
    Fitzgibbon, A.W., Zisserman, A.: Automatic camera recovery for closed or open image sequences. In: Burkhardt, H.-J., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1406, pp. 311–326. Springer, Heidelberg (1998)Google Scholar
  15. 15.
    Liu, J., Hubbold, R.: Automatic camera calibration and scene reconstruction with scale-invariant features. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Remagnino, P., Nefian, A.V., Gopi, M., Pascucci, V., Zara, J., Molineros, J., Theisel, H., Malzbender, T. (eds.) ISVC 2006. LNCS, vol. 4291, pp. 558–568. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Cornelis, K., Verbiest, F., Van Gool, L.: Drift detection and removal for sequential structure from motion algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 26, 1249–1259 (2004)CrossRefGoogle Scholar
  17. 17.
    Fischler, R.M.A., Bolles, C.: Random sample consensus: A paradigm for model fitting with application to image analysis and automated cartography. Communications of the ACM 24, 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Zhang, G., Dong, Z., Jia, J., Wong, T.T., Bao, H.: Efficient non-consecutive feature tracking for structure-from-motion. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 422–435. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Cordes, K., Müller, O., Rosenhahn, B., Ostermann, J.: Bivariate feature localization for sift assuming a gaussian feature shape. In: Bebis, G., Boyle, R.D., Parvin, B., Koracin, D., Chung, R., Hammoud, R.I., Hussain, M., Tan, K.H., Crawfis, R., Thalmann, D., Kao, D., Avila, L. (eds.) ISVC 2010. LNCS, vol. 6453, pp. 264–275. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Thormählen, T., Broszio, H., Weissenfeld, A.: Keyframe selection for camera motion and structure estimation from multiple views. In: Pajdla, T., Matas, J. (eds.) ECCV 2004, Part I. LNCS, vol. 3021, pp. 523–535. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  21. 21.
    Torr, P.H.S., Fitzgibbon, A.W., Zisserman, A.: The problem of degeneracy in structure and motion recovery from uncalibrated image sequences. International Journal of Computer Vision (IJCV) 32, 27–44 (1999)CrossRefGoogle Scholar
  22. 22.
    Cordes, K., Müller, O., Rosenhahn, B., Ostermann, J.: Half-sift: High-accurate localized features for sift. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Workshop on Feature Detectors and Descriptors: The State Of The Art and Beyond, pp. 31–38 (2009)Google Scholar
  23. 23.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry, 2nd edn. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Kai Cordes
    • 1
  • Oliver Müller
    • 1
  • Bodo Rosenhahn
    • 1
  • Jörn Ostermann
    • 1
  1. 1.Institut für Informationsverarbeitung (TNT)Leibniz UniversitätHannoverGermany

Personalised recommendations