Skip to main content

Visual Tracking

  • Living reference work entry
  • First Online:
Encyclopedia of Robotics
  • 317 Accesses

Synonyms

Visual localization

Definition

Visual tracking is a state estimation issue. From image measurements one has to consistently estimate the state of one or more objects over the discrete time steps in a video. Various measurements can be considered: pixel intensity (raw data), color, visual features (edges, lines, keypoints, motion field), etc. On the other side, the state to be estimated can be 2D coordinates (center of gravity of the object), geometrical features (line, ellipse, etc.), bounding box, 3D rigid pose, homography, pose and scene structure (vSLAM), etc. (Fig. 1).

Fig. 1
figure 1

Visual tracking has to consistently estimate the state (e.g., position X) over time of an object in an image sequence

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Baker S, Matthews I (2004) Lucas-Kanade 20 years on: a unifying framework. Int J Comput Vis 56(3):221–255

    Article  Google Scholar 

  • Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (SURF). Comput Vis Image Underst 110(3):346–359

    Article  Google Scholar 

  • Benhimane S, Malis E (2004) Real-time image-based tracking of planes using efficient second-order minimization. In: IEEE/RSJ international conference on intelligent robots systems, Sendai, pp 943–948

    Google Scholar 

  • Besl P, McKay N (1992) A method for registration of 3-D shapes. IEEE Trans Pattern Anal Mach Intell 14(2):239–256

    Article  Google Scholar 

  • Byravan A, Fox D (2017) Se3-nets: learning rigid body motion using deep neural networks. In: IEEE international conference on robotics and automation, pp 173–180

    Google Scholar 

  • Calonder M, Lepetit V, Ozuysal M, Trzcinski T, Strecha C, Fua P (2012) BRIEF: computing a local binary descriptor very fast. IEEE Trans Pattern Anal Mach Intell 34(7):1281–1298

    Article  Google Scholar 

  • Choi C, Christensen H (2012) Robust 3D visual tracking using particle filtering on the special euclidean group: a combined approach of keypoint and edge features. Int J Robot Res 31(4):498–519

    Article  Google Scholar 

  • Comaniciu D, Ramesh V, Meer P (2000) Real-time tracking of non-rigid objects using mean shift. In: IEEE international conference on computer vision and pattern recognition, pp 142–149

    Google Scholar 

  • Comport A, Marchand E, Pressigout M, Chaumette F (2006) Real-time markerless tracking for augmented reality: the virtual visual servoing framework. IEEE Trans Vis Comput Graph 12(4):615–628

    Article  Google Scholar 

  • Dame A, Marchand E (2012) Second order optimization of mutual information for real-time image registration. IEEE Trans Image Process 21(9):4190–4203

    Article  MathSciNet  Google Scholar 

  • Davison A (2003) Real-time simultaneous localisation and mapping with a single camera. In: IEEE international conference on computer vision, pp 1403–1410

    Google Scholar 

  • Dementhon D, Davis L (1995) Model-based object pose in 25 lines of codes. Int J Comput Vis 15:123–141

    Article  Google Scholar 

  • DeTone D, Malisiewicz T, Rabinovich A (2016) Deep image homography estimation. In: IEEE international conference on computer vision and pattern recognition, CVPR’16

    Google Scholar 

  • Drummond T, Cipolla R (2002) Real-time visual tracking of complex structures. IEEE Trans Pattern Anal Mach Intell 24(7):932–946

    Article  Google Scholar 

  • Eade E, Drummond T (2006) Scalable monocular slam. In: IEEE international conference on computer vision and pattern recognition, CVPR’2006, vol 1, pp 469–476

    Google Scholar 

  • Engel J, Schöps T, Cremers D (2014) LSD-SLAM: large-scale direct monocular SLAM. In: European conference on computer vision, ECCV’14

    Google Scholar 

  • Fischler N, Bolles R (1981) Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  MathSciNet  Google Scholar 

  • Grabner A, Roth P, Lepetit V (2018) 3D pose estimation and 3D model retrieval for objects in the wild. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  • Hager G, Belhumeur P (1998) Efficient region tracking with parametric models of geometry and illumination. IEEE Trans Pattern Anal Mach Intell 20(10):1025–1039

    Article  Google Scholar 

  • Hager G, Dewan M, Stewart C (2004) Multiple kernel tracking with SSD. In: IEEE conference on computer vision and pattern recognition, CVPR’04, pp 790–797

    Google Scholar 

  • Hartley R, Zisserman A (2001) Multiple view geometry in computer vision. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Irani M, Anandan P (1998) Robust multi-sensor image alignment. In: IEEE international conference on computer vision, ICCV’98, Bombay, pp 959–966

    Google Scholar 

  • Kendall A, Grimes M, Cipolla R (2015) Posenet: a convolutional network for real-time 6-DOF camera relocalization. IEEE international conference on computer vision, ICCV, pp 2938–2946

    Google Scholar 

  • Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. In: IEEE/ACM international symposium on mixed and augmented reality (ISMAR’07), Nara

    Google Scholar 

  • Kneip L, Scaramuzza D, Siegwart R (2011) A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In: IEEE conference on computer vision and pattern recognition, CVPR 2011, pp 2969–2976

    Google Scholar 

  • Kyrki V, Kragic D (2005) Integration of model-based and model-free cues for visual object tracking in 3D. In: IEEE international conference on robotics and automation, ICRA’05, Barcelona, pp 1566–1572

    Google Scholar 

  • Lepetit V, Fua P (2006) Keypoint recognition using randomized trees. IEEE Trans Pattern Anal Mach Intell 28(9):1465–1479

    Article  Google Scholar 

  • Lepetit V, Moreno-Noguer F, Fua P (2009) EPnP: an accurate O(n) solution to the PnP problem. Int J Comput Vis 81(2):155–166

    Article  Google Scholar 

  • Leutenegger S, Chli M, Siegwart R (2011) BRISK: binary robust invariant scalable keypoints. In: International conference on computer vision, pp 2548–2555

    Google Scholar 

  • Lowe D (2001) Local feature view clustering for 3D object recognition. In: IEEE conference on computer vision and pattern recognition, CVPR 2001

    Google Scholar 

  • Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  • Marchand E, Chaumette F (2005) Feature tracking for visual servoing purposes. Robot Auton Syst 52(1):53–70. Special issue on Advances in robot vision, Kragic D, Christensen H (eds)

    Google Scholar 

  • Marchand E, Uchiyama H, Spindler F (2016) Pose estimation for augmented reality: a hands-on survey. IEEE Trans Vis Comput Graph 22(12):2633–2651

    Article  Google Scholar 

  • Mouragnon E, Lhuillier M, Dhome M, Dekeyser F, Sayd P (2006) Real time localization and 3D reconstruction. In: IEEE international conference on computer vision, vol 1, pp 363–370

    Google Scholar 

  • Mur-Artal R, Montiel J, Tardos J (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163

    Article  Google Scholar 

  • Newcombe R, Izadi S, Hilliges O, Molyneaux D, Kim D, Davison AJ, Kohi P, Shotton J, Hodges S, Fitzgibbon A (2011a) Kinectfusion: real-time dense surface mapping and tracking. In: IEEE/ACM international symposium on mixed and augmented reality, ISMAR’11, Basel, pp 127–136

    Google Scholar 

  • Newcombe R, Lovegrove S, Davison A (2011b) DTAM: dense tracking and mapping in real-time. In: IEEE international conference on computer vision, pp 2320–2327

    Google Scholar 

  • Nistér D (2004) An efficient solution to the five-point relative pose problem. IEEE Trans Pattern Anal Mach Intell 26(6):756–770

    Article  Google Scholar 

  • Nistér D, Naroditsky O, Bergen J (2004) Visual odometry. In: IEEE international conference on computer vision and pattern recognition

    Book  Google Scholar 

  • Olson E (2011) Apriltag: a robust and flexible visual fiducial system. In: IEEE international conference on robotics and automation, ICRA’11, pp 3400–3407

    Google Scholar 

  • Petit A, Marchand E, Kanani A (2014) Combining complementary edge, point and color cues in model-based tracking for highly dynamic scenes. In: IEEE international conference on robotics and automation, ICRA’14, Hong Kong, pp 4115–4120

    Google Scholar 

  • Pressigout M, Marchand E (2007) Real-time hybrid tracking using edge and texture information. Int J Robot Res 26(7):689–713

    Article  Google Scholar 

  • Quan L, Lan Z (1999) Linear n-point camera pose determination. IEEE Trans Pattern Anal Mach Intell 21(8):774–780

    Article  Google Scholar 

  • Rosten E, Porter R, Drummond T (2010) Faster and better: a machine learning approach to corner detection. IEEE Trans Pattern Anal Mach Intell 32(1):105–119

    Article  Google Scholar 

  • Royer E, Lhuillier M, Dhome M, Lavest J (2007) Monocular vision for mobile robot localization and autonomous navigation. Int J Comput Vis 74(3):237–260

    Article  Google Scholar 

  • Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: International conference on computer vision, pp 2564–2571

    Google Scholar 

  • Scaramuzza D, Fraundorfer F (2011) Visual odometry. IEEE Robot Autom Mag 18(4):80–92

    Article  Google Scholar 

  • Shi J, Tomasi C (1994) Good features to track. In: IEEE international conference on computer vision and pattern recognition, CVPR’94, Seattle, pp 593–600

    Google Scholar 

  • Strasdat H, Montiel J, Davison A (2010) Real-time monocular SLAM: why filter? In: International conference on robotics and automation, ICRA’10, Anchorage, pp 2657–2664

    Google Scholar 

  • Vacchetti L, Lepetit V, Fua P (2004) Stable real-time 3D tracking using online and offline information. IEEE Trans Pattern Anal Mach Intell 26(10):1385–1391

    Article  Google Scholar 

  • Wang C, Galoogahi HK, Lin C, Lucey S (2018) Deep-LK for efficient adaptive object tracking. In: IEEE international conference on robotics and automation (ICRA), pp 627–634

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric Marchand .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer-Verlag GmbH Germany, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Marchand, E. (2020). Visual Tracking. In: Ang, M., Khatib, O., Siciliano, B. (eds) Encyclopedia of Robotics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41610-1_102-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41610-1_102-1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41610-1

  • Online ISBN: 978-3-642-41610-1

  • eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering

Publish with us

Policies and ethics