Skip to main content

2D or Not 2D: Bridging the Gap Between Tracking and Structure from Motion

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9006))

Abstract

In this paper, we address the problem of tracking an unknown object in 3D space. Online 2D tracking often fails for strong out-of-plane rotation which results in considerable changes in appearance beyond those that can be represented by online update strategies. However, by modelling and learning the 3D structure of the object explicitly, such effects are mitigated. To address this, a novel approach is presented, combining techniques from the fields of visual tracking, structure from motion (SfM) and simultaneous localisation and mapping (SLAM). This algorithm is referred to as TMAGIC (Tracking, Modelling And Gaussian-process Inference Combined). At every frame, point and line features are tracked in the image plane and are used, together with their 3D correspondences, to estimate the camera pose. These features are also used to model the 3D shape of the object as a Gaussian process. Tracking determines the trajectories of the object in both the image plane and 3D space, but the approach also provides the 3D object shape. The approach is validated on several video-sequences used in the tracking literature, comparing favourably to state-of-the-art trackers for simple scenes (error reduced by 22 %) with clear advantages in the case of strong out-of-plane rotation, where 2D approaches fail (error reduction of 58 %).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    It is possible to sample more points along line features which have high confidence.

  2. 2.

    See http://cvssp.org/Personal/KarelLebeda/TMAGIC/ for the sequences and more results.

References

  1. Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: ICCV (2011)

    Google Scholar 

  2. Ross, D., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. IJCV 77, 125–141 (2008)

    Article  Google Scholar 

  3. Cehovin, L., Kristan, M., Leonardis, A.: Robust visual tracking using an adaptive coupled-layer visual model. PAMI 35, 941–953 (2013)

    Article  Google Scholar 

  4. Kolsch, M., Turk, M.: Fast 2D hand tracking with flocks of features and multi-cue integration. In: CVPRW (2004)

    Google Scholar 

  5. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. PAMI 34, 1409–1422 (2012)

    Article  Google Scholar 

  6. Mulloni, A., Ramachandran, M., Reitmayr, G., Wagner, D., Grasset, R., Diaz, S.: User friendly SLAM initialization. In: ISMAR (2013)

    Google Scholar 

  7. Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI (1981)

    Google Scholar 

  8. Matas, J., Vojir, T.: Robustifying the flock of trackers. In: CVWW (2011)

    Google Scholar 

  9. Lebeda, K., Matas, J., Bowden, R.: Tracking the untrackable: how to track when your object is featureless. In: Park II, J., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7729, pp. 347–359. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 234–247. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. PAMI 33, 1619–1632 (2011)

    Article  Google Scholar 

  12. Xing, J., Gao, J., Li, B., Hu, W., Yan, S.: Robust object tracking with online multi-lifespan dictionary learning. In: ICCV (2013)

    Google Scholar 

  13. Isard, M., Blake, A.: CONDENSATION - conditional density propagation for visual tracking. IJCV 29, 5–28 (1998)

    Article  Google Scholar 

  14. Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3d reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. Dame, A., Prisacariu, V., Ren, C., Reid, I.: Dense reconstruction using 3D object shape priors. In: CVPR (2013)

    Google Scholar 

  16. Sigal, L., Isard, M., Haussecker, H., Black, M.: Loose-limbed people: estimating 3D human pose and motion using non-parametric belief propagation. IJCV 98, 15–48 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  17. Wojek, C., Walk, S., Roth, S., Schindler, K., Schiele, B.: Monocular visual scene understanding: understanding multi-object traffic scenes. PAMI (2013)

    Google Scholar 

  18. Kim, K., Lepetit, V., Woo, W.: Keyframe-based modeling and tracking of multiple 3D objects. In: ISMAR (2010)

    Google Scholar 

  19. Varol, A., Shaji, A., Salzmann, M., Fua, P.: Monocular 3D reconstruction of locally textured surfaces. PAMI 34, 1118–1130 (2012)

    Article  Google Scholar 

  20. Feng, Y., Wu, Y., Fan, L.: On-line object reconstruction and tracking for 3D interaction. In: ICME (2012)

    Google Scholar 

  21. Prisacariu, V.A., Kahler, O., Murray, D.W., Reid, I.D.: Simultaneous 3D tracking and reconstruction on a mobile phone. In: ISMAR (2013)

    Google Scholar 

  22. Pan, Q., Reitmayr, G., Drummond, T.: ProFORMA: probabilistic feature-based on-line rapid model acquisition. In: BMVC (2009)

    Google Scholar 

  23. Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. IJCV 59, 207–232, (2004)

    Article  Google Scholar 

  24. Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., Pollefeys, M.: Live metric 3D reconstruction on mobile phones. In: ICCV (2013)

    Google Scholar 

  25. Gherardi, R., Farenzena, M., Fusiello, A.: Improving the efficiency of hierarchical structure-and-motion. In: CVPR (2010)

    Google Scholar 

  26. Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: ICCV (2009)

    Google Scholar 

  27. Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. IJCV 80, 189–210 (2007)

    Article  Google Scholar 

  28. Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. PAMI 32, 1362–1376 (2010)

    Article  Google Scholar 

  29. Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: CVPR (2013)

    Google Scholar 

  30. Davison, A.J., Reid, I., Molton, N., Stasse, O.: MonoSLAM: Real-time single camera SLAM. PAMI 29, 1052–1067 (2007)

    Article  Google Scholar 

  31. Smith, P., Reid, I., Davison, A.J.: Real-time monocular slam with straight lines. In: BMVC (2006)

    Google Scholar 

  32. Hirose, K., Saito, H.: Fast line description for line-based slam. In: BMVC (2012)

    Google Scholar 

  33. Holmes, S.A., Murray, D.W.: Monocular SLAM with conditionally independent split mapping. PAMI 35, 1451–1463 (2013)

    Article  Google Scholar 

  34. Strasdat, H., Montiel, J.M.M., Davison, A.: Real-time monocular SLAM: why filter? In: ICRA (2010)

    Google Scholar 

  35. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: ISMAR (2007)

    Google Scholar 

  36. Newcombe, R.A., Davison, A.J.: Live dense reconstruction with a single moving camera. In: CVPR (2010)

    Google Scholar 

  37. Newcombe, R.A., Lovegrove, S., Davison, A.: DTAM: dense tracking and mapping in real-time. In: ICCV (2011)

    Google Scholar 

  38. Zhang, L.: Line primitives and their applications in geometric computer vision. Ph.D. thesis, Department of Computer Science, Kiel University (2013)

    Google Scholar 

  39. Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008). http://www.vlfeat.org/

  40. Lebeda, K., Matas, J., Chum, O.: Fixing the locally optimized RANSAC. In: BMVC (2012)

    Google Scholar 

  41. Chum, O., Matas, J.: Matching with PROSAC - PROgressive SAmple Consensus. In: CVPR (2005)

    Google Scholar 

  42. von Gioi, R., Jakubowicz, J., Morel, J.M., Randall, G.: LSD: a fast line segment detector with a false detection control. PAMI 32, 722–732 (2010)

    Article  Google Scholar 

  43. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  44. Agarwal, S., Mierle, K., Others: Ceres solver. http://code.google.com/p/ceres-solver/

  45. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

  46. Hensman, J., Fusi, N., Andrade, R., Durrande, N., Saul, A., Zwiessele, M., Lawrence, N.D.: GPy library. http://github.com/SheffieldML/GPy

  47. Wu, C., Agarwal, S., Curless, B., Seitz, S.M.: Multicore bundle adjustment. In: CVPR (2011)

    Google Scholar 

  48. Wu, C.: Towards linear-time incremental structure from motion. In: 3DV (2013)

    Google Scholar 

  49. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring image collections in 3D. In: SIGGRAPH (2006)

    Google Scholar 

  50. Chen, M., Pang, S.K., Cham, T.J., Goh, A.: Visual tracking with generative template model based on riemannian manifold of covariances. In: ICIF (2011)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the EPSRC grant “Learning to Recognise Dynamic Visual Content from Broadcast Footage” (EP/I011811/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karel Lebeda .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (zip 21,035 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lebeda, K., Hadfield, S., Bowden, R. (2015). 2D or Not 2D: Bridging the Gap Between Tracking and Structure from Motion. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9006. Springer, Cham. https://doi.org/10.1007/978-3-319-16817-3_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16817-3_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16816-6

  • Online ISBN: 978-3-319-16817-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics