Efficient Non-consecutive Feature Tracking for Structure-from-Motion

  • Guofeng Zhang
  • Zilong Dong
  • Jiaya Jia
  • Tien-Tsin Wong
  • Hujun Bao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6315)


Structure-from-motion (SfM) is an important computer vision problem and largely relies on the quality of feature tracking. In image sequences, if disjointed tracks caused by objects moving in and out of the view, occasional occlusion, or image noise, are not handled well, the corresponding SfM could be significantly affected. In this paper, we address the non-consecutive feature point tracking problem and propose an effective method to match interrupted tracks. Our framework consists of steps of solving the feature ‘dropout’ problem when indistinctive structures, noise or even large image distortion exist, and of rapidly recognizing and joining common features located in different subsequences. Experimental results on several challenging and large-scale video sets show that our method notably improves SfM.


  1. 1.
    Pollefeys, M., Nistér, D., Frahm, J.M., Akbarzadeh, A., Mordohai, P., Clipp, B., Engels, C., Gallup, D., Kim, S.J., Merrell, P., Salmi, C., Sinha, S.N., Talton, B., Wang, L., Yang, Q., Stewénius, H., Yang, R., Welch, G., Towles, H.: Detailed real-time urban 3d reconstruction from video. International Journal of Computer Vision 78, 143–167 (2008)CrossRefGoogle Scholar
  2. 2.
    Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3d. ACM Trans. Graph. 25, 835–846 (2006)CrossRefGoogle Scholar
  3. 3.
    Li, X., Wu, C., Zach, C., Lazebnik, S., Frahm, J.M.: Modeling and recognition of landmark image collections using iconic scene graphs. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 427–440. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building rome in a day. In: ICCV, pp. 72–79 (2009)Google Scholar
  5. 5.
    Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004) ISBN: 0521540518zbMATHGoogle Scholar
  6. 6.
    Fitzgibbon, A., Zisserman, A.: Automatic camera tracking. In: Video Registration, pp. 18–35 (2003)Google Scholar
  7. 7.
    Zhang, G., Qin, X., Hua, W., Wong, T.T., Heng, P.A., Bao, H.: Robust metric reconstruction from challenging video sequences. In: CVPR (2007)Google Scholar
  8. 8.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI, pp. 674–679 (1981)Google Scholar
  9. 9.
    Shi, J., Tomasi, C.: Good features to track. In: CVPR, pp. 593–600 (1994)Google Scholar
  10. 10.
    Georgescu, B., Meer, P.: Point matching under large image deformations and illumination changes. IEEE Trans. Pattern Anal. Mach. Intell. 26, 674–688 (2004)CrossRefGoogle Scholar
  11. 11.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Zach, C., Gallup, D., Frahm, J.M.: Fast gain-adaptive klt tracking on the gpu. In: CVPR Workshop on Visual Computer Vision on GPU’s (CVGPU) (2008)Google Scholar
  13. 13.
    Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation using local affine regions. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1265–1278 (2005)CrossRefGoogle Scholar
  14. 14.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1615–1630 (2005)CrossRefGoogle Scholar
  15. 15.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vision Comput. 22, 761–767 (2004)CrossRefGoogle Scholar
  16. 16.
    Brown, M., Lowe, D.G.: Recognising panoramas. In: ICCV, pp. 1218–1227 (2003)Google Scholar
  17. 17.
    Morel, J.M., Yu, G.: ASIFT: A new framework for fully affine invariant image comparison. SIAM J. Img. Sci. 2, 438–469 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV, pp. 1470–1477 (2003)Google Scholar
  19. 19.
    Schaffalitzky, F., Zisserman, A.: Automated location matching in movies. Computer Vision and Image Understanding 92, 236–264 (2003)CrossRefGoogle Scholar
  20. 20.
    Ho, K.L., Newman, P.M.: Detecting loop closure with scene sequences. International Journal of Computer Vision 74, 261–286 (2007)CrossRefGoogle Scholar
  21. 21.
    Schindler, G., Brown, M., Szeliski, R.: City-scale location recognition. In: CVPR (2007)Google Scholar
  22. 22.
    Irschara, A., Zach, C., Frahm, J.M., Bischof, H.: From structure-from-motion point clouds to fast location recognition. In: CVPR (2009)Google Scholar
  23. 23.
    Engels, C., Fraundorfer, F., Nistér, D.: Integration of tracked and recognized features for locally and globally robust structure from motion. In: VISAPP (Workshop on Robot Perception), pp. 13–22 (2008)Google Scholar
  24. 24.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 381–395 (1981)CrossRefMathSciNetGoogle Scholar
  25. 25.
    Sinha, S.N., Steedly, D., Szeliski, R.: Piecewise planar stereo for image-based rendering. In: ICCV, pp. 1881–1888 (2009)Google Scholar
  26. 26.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR, Washington, DC, USA, pp. 2161–2168. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  27. 27.
    Snavely, N.: Bundler: Structure from motion for unordered image collections,

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Guofeng Zhang
    • 1
  • Zilong Dong
    • 1
  • Jiaya Jia
    • 2
  • Tien-Tsin Wong
    • 2
  • Hujun Bao
    • 1
  1. 1.State Key Lab of CAD&CGZhejiang University 
  2. 2.The Chinese University of Hong Kong 

Personalised recommendations