2D or Not 2D: Bridging the Gap Between Tracking and Structure from Motion

Lebeda, Karel; Hadfield, Simon; Bowden, Richard

doi:10.1007/978-3-319-16817-3_42

2D or Not 2D: Bridging the Gap Between Tracking and Structure from Motion

Karel Lebeda¹⁷,
Simon Hadfield¹⁷ &
Richard Bowden¹⁷

Conference paper
First Online: 01 January 2015

2361 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9006))

Abstract

In this paper, we address the problem of tracking an unknown object in 3D space. Online 2D tracking often fails for strong out-of-plane rotation which results in considerable changes in appearance beyond those that can be represented by online update strategies. However, by modelling and learning the 3D structure of the object explicitly, such effects are mitigated. To address this, a novel approach is presented, combining techniques from the fields of visual tracking, structure from motion (SfM) and simultaneous localisation and mapping (SLAM). This algorithm is referred to as TMAGIC (Tracking, Modelling And Gaussian-process Inference Combined). At every frame, point and line features are tracked in the image plane and are used, together with their 3D correspondences, to estimate the camera pose. These features are also used to model the 3D shape of the object as a Gaussian process. Tracking determines the trajectories of the object in both the image plane and 3D space, but the approach also provides the 3D object shape. The approach is validated on several video-sequences used in the tracking literature, comparing favourably to state-of-the-art trackers for simple scenes (error reduced by 22 %) with clear advantages in the case of strong out-of-plane rotation, where 2D approaches fail (error reduction of 58 %).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
It is possible to sample more points along line features which have high confidence.
2.
See http://cvssp.org/Personal/KarelLebeda/TMAGIC/ for the sequences and more results.

References

Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: ICCV (2011)
Google Scholar
Ross, D., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. IJCV 77, 125–141 (2008)
Article Google Scholar
Cehovin, L., Kristan, M., Leonardis, A.: Robust visual tracking using an adaptive coupled-layer visual model. PAMI 35, 941–953 (2013)
Article Google Scholar
Kolsch, M., Turk, M.: Fast 2D hand tracking with flocks of features and multi-cue integration. In: CVPRW (2004)
Google Scholar
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. PAMI 34, 1409–1422 (2012)
Article Google Scholar
Mulloni, A., Ramachandran, M., Reitmayr, G., Wagner, D., Grasset, R., Diaz, S.: User friendly SLAM initialization. In: ISMAR (2013)
Google Scholar
Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI (1981)
Google Scholar
Matas, J., Vojir, T.: Robustifying the flock of trackers. In: CVWW (2011)
Google Scholar
Lebeda, K., Matas, J., Bowden, R.: Tracking the untrackable: how to track when your object is featureless. In: Park II, J., Kim, J. (eds.) ACCV 2012. LNCS, vol. 7729, pp. 347–359. Springer, Heidelberg (2012)
Chapter Google Scholar
Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 234–247. Springer, Heidelberg (2008)
Chapter Google Scholar
Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. PAMI 33, 1619–1632 (2011)
Article Google Scholar
Xing, J., Gao, J., Li, B., Hu, W., Yan, S.: Robust object tracking with online multi-lifespan dictionary learning. In: ICCV (2013)
Google Scholar
Isard, M., Blake, A.: CONDENSATION - conditional density propagation for visual tracking. IJCV 29, 5–28 (1998)
Article Google Scholar
Prisacariu, V.A., Segal, A.V., Reid, I.: Simultaneous monocular 2D segmentation, 3D pose recovery and 3d reconstruction. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 593–606. Springer, Heidelberg (2013)
Chapter Google Scholar
Dame, A., Prisacariu, V., Ren, C., Reid, I.: Dense reconstruction using 3D object shape priors. In: CVPR (2013)
Google Scholar
Sigal, L., Isard, M., Haussecker, H., Black, M.: Loose-limbed people: estimating 3D human pose and motion using non-parametric belief propagation. IJCV 98, 15–48 (2012)
Article MATH MathSciNet Google Scholar
Wojek, C., Walk, S., Roth, S., Schindler, K., Schiele, B.: Monocular visual scene understanding: understanding multi-object traffic scenes. PAMI (2013)
Google Scholar
Kim, K., Lepetit, V., Woo, W.: Keyframe-based modeling and tracking of multiple 3D objects. In: ISMAR (2010)
Google Scholar
Varol, A., Shaji, A., Salzmann, M., Fua, P.: Monocular 3D reconstruction of locally textured surfaces. PAMI 34, 1118–1130 (2012)
Article Google Scholar
Feng, Y., Wu, Y., Fan, L.: On-line object reconstruction and tracking for 3D interaction. In: ICME (2012)
Google Scholar
Prisacariu, V.A., Kahler, O., Murray, D.W., Reid, I.D.: Simultaneous 3D tracking and reconstruction on a mobile phone. In: ISMAR (2013)
Google Scholar
Pan, Q., Reitmayr, G., Drummond, T.: ProFORMA: probabilistic feature-based on-line rapid model acquisition. In: BMVC (2009)
Google Scholar
Pollefeys, M., Van Gool, L., Vergauwen, M., Verbiest, F., Cornelis, K., Tops, J., Koch, R.: Visual modeling with a hand-held camera. IJCV 59, 207–232, (2004)
Article Google Scholar
Tanskanen, P., Kolev, K., Meier, L., Camposeco, F., Saurer, O., Pollefeys, M.: Live metric 3D reconstruction on mobile phones. In: ICCV (2013)
Google Scholar
Gherardi, R., Farenzena, M., Fusiello, A.: Improving the efficiency of hierarchical structure-and-motion. In: CVPR (2010)
Google Scholar
Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: ICCV (2009)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. IJCV 80, 189–210 (2007)
Article Google Scholar
Furukawa, Y., Ponce, J.: Accurate, dense, and robust multi-view stereopsis. PAMI 32, 1362–1376 (2010)
Article Google Scholar
Garg, R., Roussos, A., Agapito, L.: Dense variational reconstruction of non-rigid surfaces from monocular video. In: CVPR (2013)
Google Scholar
Davison, A.J., Reid, I., Molton, N., Stasse, O.: MonoSLAM: Real-time single camera SLAM. PAMI 29, 1052–1067 (2007)
Article Google Scholar
Smith, P., Reid, I., Davison, A.J.: Real-time monocular slam with straight lines. In: BMVC (2006)
Google Scholar
Hirose, K., Saito, H.: Fast line description for line-based slam. In: BMVC (2012)
Google Scholar
Holmes, S.A., Murray, D.W.: Monocular SLAM with conditionally independent split mapping. PAMI 35, 1451–1463 (2013)
Article Google Scholar
Strasdat, H., Montiel, J.M.M., Davison, A.: Real-time monocular SLAM: why filter? In: ICRA (2010)
Google Scholar
Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: ISMAR (2007)
Google Scholar
Newcombe, R.A., Davison, A.J.: Live dense reconstruction with a single moving camera. In: CVPR (2010)
Google Scholar
Newcombe, R.A., Lovegrove, S., Davison, A.: DTAM: dense tracking and mapping in real-time. In: ICCV (2011)
Google Scholar
Zhang, L.: Line primitives and their applications in geometric computer vision. Ph.D. thesis, Department of Computer Science, Kiel University (2013)
Google Scholar
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008). http://www.vlfeat.org/
Lebeda, K., Matas, J., Chum, O.: Fixing the locally optimized RANSAC. In: BMVC (2012)
Google Scholar
Chum, O., Matas, J.: Matching with PROSAC - PROgressive SAmple Consensus. In: CVPR (2005)
Google Scholar
von Gioi, R., Jakubowicz, J., Morel, J.M., Randall, G.: LSD: a fast line segment detector with a false detection control. PAMI 32, 722–732 (2010)
Article Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Agarwal, S., Mierle, K., Others: Ceres solver. http://code.google.com/p/ceres-solver/
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)
MATH Google Scholar
Hensman, J., Fusi, N., Andrade, R., Durrande, N., Saul, A., Zwiessele, M., Lawrence, N.D.: GPy library. http://github.com/SheffieldML/GPy
Wu, C., Agarwal, S., Curless, B., Seitz, S.M.: Multicore bundle adjustment. In: CVPR (2011)
Google Scholar
Wu, C.: Towards linear-time incremental structure from motion. In: 3DV (2013)
Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring image collections in 3D. In: SIGGRAPH (2006)
Google Scholar
Chen, M., Pang, S.K., Cham, T.J., Goh, A.: Visual tracking with generative template model based on riemannian manifold of covariances. In: ICIF (2011)
Google Scholar

Download references

Acknowledgement

This work was supported by the EPSRC grant “Learning to Recognise Dynamic Visual Content from Broadcast Footage” (EP/I011811/1).

Author information

Authors and Affiliations

University of Surrey, Guildford, GU2 7XH, UK
Karel Lebeda, Simon Hadfield & Richard Bowden

Authors

Karel Lebeda
View author publications
You can also search for this author in PubMed Google Scholar
Simon Hadfield
View author publications
You can also search for this author in PubMed Google Scholar
Richard Bowden
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karel Lebeda .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (zip 21,035 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lebeda, K., Hadfield, S., Bowden, R. (2015). 2D or Not 2D: Bridging the Gap Between Tracking and Structure from Motion. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9006. Springer, Cham. https://doi.org/10.1007/978-3-319-16817-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-16817-3_42
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16816-6
Online ISBN: 978-3-319-16817-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics