Abstract
We present the shape model object tracker, which is accurate, robust, and real-time capable on a standard CPU. The tracker has a failure mode detection, is robust to nonlinear illumination changes, and can cope with occlusions. It uses subpixel-precise image edges to track roughly rigid objects with high accuracy and is virtually drift-free even for long sequences. Furthermore, it is inherently capable of object re-detection when tracking fails. To evaluate the accuracy, robustness, and efficiency of the tracker precisely, we present a challenging new tracking dataset with pixel-precise ground truth. The precise ground-truth labels are created automatically from the photo-realistic synthetic VIPER dataset. The tracker is thoroughly evaluated against the state of the art through a number of qualitative and quantitative experiments. It is able to perform on par with the current state-of-the-art deep-learning trackers, but is at least 45 times faster, even without using a GPU. The efficiency and low memory consumption of the tracker are validated in further experiments that are conducted on an embedded device.
Similar content being viewed by others
References
Akin, O., Erdem, E., Erdem, A., Mikolajczyk, K.: Deformable part-based tracking by coupled global and local correlation filters. J. Vis. Commun. Image Represent. 38, 763–774 (2016). https://doi.org/10.1016/j.jvcir.2016.04.018
Babu, R.V., Suresh, S., Makur, A.: Online adaptive radial basis function networks for robust object tracking. Comput. Vis. Image Understand. 114(3), 297–310 (2010). https://doi.org/10.1016/j.cviu.2009.10.004
Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. Pattern Recognit. 13(2), 111–122 (1981). https://doi.org/10.1016/0031-3203(81)90009-1
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.S.: Staple: Complementary learners for real-time tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1401–1409 (2016). https://doi.org/10.1109/CVPR.2016.156
Borgefors, G.: Hierarchical chamfer matching: a parametric edge matching algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 849–865 (1988). https://doi.org/10.1109/34.9107
Böttger, T., Eisenhofer, C.: Efficiently tracking extremal regions in multichannel images. In: 8th International Conference of Pattern Recognition Systems (ICPRS), pp. 6–14. Institution of Engineering and Technology (2017). https://doi.org/10.1049/cp.2017.0143
Böttger, T., Follmann, P.: The benefits of evaluating tracker performance using pixel-wise segmentations. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1983–1991 (2017). https://doi.org/10.1109/ICCVW.2017.232
Böttger, T., Follmann, P., Fauser, M.: Measuring the accuracy of object detectors and trackers. In: German Conference on Pattern Recognition (GCPR), pp. 415–426 (2017). https://doi.org/10.1007/978-3-319-66709-6_33
Böttger, T.: Real-time maximally stable homogeneous regions. J. Real-time Image Process. (2020). https://doi.org/10.1007/s11554-020-00951-6
Böttger, T., Ulrich, M., Steger, C.: Subpixel-precise tracking of rigid objects in real-time. In: 20th Scandinavian Conference on Image Analysis (SCIA), pp. 54–65 (2017). https://doi.org/10.1007/978-3-319-59126-1_5
Bouguet, J.Y., et al.: Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corp. 5(1–10), 4 (2001)
Choi, J., Chang, H.J., Yun, S., Fischer, T., Demiris, Y., Choi, J.Y.: Attentional correlation filter network for adaptive visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4828–4837 (2017).https://doi.org/10.1109/CVPR.2017.513
Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. IEEE Trans. Pattern Anal. Mach. Intell. 25(5), 564–575 (2003). https://doi.org/10.1109/TPAMI.2003.1195991
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Understand. 61(1), 38–59 (1995). https://doi.org/10.1006/cviu.1995.1004
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6931–6939 (2017). https://doi.org/10.1109/CVPR.2017.733
Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: British Machine Vision Conference (BMVC). BMVA Press (2014)
Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: IEEE International Conference on Computer Vision (ICCV), pp. 4310–4318 (2015). https://doi.org/10.1109/ICCV.2015.490
Danelljan, M., Häger, G., Khan, F.S., Felsberg, M.: Discriminative scale space tracking. IEEE Trans. Pattern Anal. Mach. Intell. 39(8), 1561–1575 (2017). https://doi.org/10.1109/TPAMI.2016.2609928
Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol. 9909, pp. 472–488. Springer, Berlin (2016). https://doi.org/10.1007/978-3-319-46454-1_29
Gundogdu, E., Alatan, A.A.: Good features to correlate for visual tracking. IEEE Trans. Image Process. 27(5), 2526–2540 (2018). https://doi.org/10.1109/TIP.2018.2806280
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015). https://doi.org/10.1109/TPAMI.2014.2345390
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.P.: Exploiting the circulant structure of tracking-by-detection with kernels. In: European Conference on Computer Vision (ECCV), pp. 702–715 (2012). https://doi.org/10.1007/978-3-642-33765-9_50
Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: International Conference on Machine Learning (ICML), pp. 597–606 (2015)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012). https://doi.org/10.1109/T.2011.239
Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960). https://doi.org/10.1115/1.3662552
Kristan, M., et al.: The visual object tracking VOT2017 challenge results. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 1949–1972 (2017). https://doi.org/10.1109/ICCVW.2017.230
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernández, G., Vojír, T., Häger, G., Nebehay, G., Pflugfelder, R.P.: The visual object tracking VOT2015 challenge results. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 564–586 (2015). https://doi.org/10.1109/ICCVW.2015.79
Lamdan, Y., Schwartz, J.T., Wolfson, H.J.: Affine invariant model-based object recognition. IEEE Trans. Robot. Autom. 6(5), 578–589 (1990). https://doi.org/10.1109/70.62047
Lepetit, V., Fua, P.: Monocular model-based 3D tracking of rigid objects: A survey. Found. Trends Comput. Graph. Vis. (2005). https://doi.org/10.1561/0600000001
Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1465–1479 (2006). https://doi.org/10.1109/TPAMI.2006.188
Li, H., Li, Y., Porikli, F.: Deeptrack: Learning discriminative feature representations online for robust visual tracking. IEEE Trans. Image Process. 25(4), 1834–1848 (2016). https://doi.org/10.1109/TIP.2015.2510583
Li, S., Yeung, D.: Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models. In: Conference on Artificial Intelligence (AAAI), pp. 4140–4146. AAAI Press (2017)
Lukezic, A., Vojír, T., Zajc, L.C., Matas, J., Kristan, M.: Discriminative correlation filter tracker with channel and spatial reliability. Int. J. Comput. Vis. 126(7), 671–688 (2018). https://doi.org/10.1007/s11263-017-1061-3
Matthews, I.A., Ishikawa, T., Baker, S.: The template update problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 810–815 (2004). https://doi.org/10.1109/TPAMI.2004.16
Membarth, R., Reiche, O., Hannig, F., Teich, J., Körner, M., Eckert, W.: Hipacc: A domain-specific language and compiler for image processing. IEEE Trans. Parallel Distrib. Syst. 27(1), 210–224 (2016). https://doi.org/10.1109/TPDS.2015.2394802
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol. 9905, pp. 445–461. Springer, Berlin (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Mueller, M., Smith, N., Ghanem, B.: Context-aware correlation filter tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1387–1395 (2017). https://doi.org/10.1109/CVPR.2017.152
Müller, M., Bibi, A., Giancola, S., Al-Subaihi, S., Ghanem, B.: Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In: European Conference on Computer Vision (ECCV), pp. 310–327 (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4293–4302 (2016). https://doi.org/10.1109/CVPR.2016.465
Olson, C.F., Huttenlocher, D.P.: Automatic target recognition by matching oriented edge pixels. IEEE Trans. Image Process. 6(1), 103–113 (1997). https://doi.org/10.1109/83.552100
Perazzi, F., Pont-Tuset, J., McWilliams, B., Gool, L.J.V., Gross, M.H., Sorkine-Hornung, A.: A benchmark dataset and evaluation methodology for video object segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 724–732 (2016). https://doi.org/10.1109/CVPR.2016.85
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Not. 48(6), 519–530 (2013). https://doi.org/10.1145/2499370.2462176
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: IEEE International Conference on Computer Vision (ICCV), pp. 2232–2241 (2017). https://doi.org/10.1109/ICCV.2017.243
Roffo, G., Melzi, S.: Online feature selection for visual tracking. In: British Machine Vision Conference (BMVC). BMVA Press (2016)
Rucklidge, W.: Efficiently locating objects using the hausdorff distance. Int. J. Comput. Vis. 24(3), 251–270 (1997). https://doi.org/10.1023/A:1007975324482
Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014). https://doi.org/10.1109/TPAMI.2013.230
Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R.W.H., Yang, M.: CREST: convolutional residual learning for visual tracking. In: IEEE International Conference on Computer Vision (ICCV), pp. 2574–2583 (2017). https://doi.org/10.1109/ICCV.2017.279
Steger, C.: An unbiased detector of curvilinear structures. IEEE Trans. Pattern Anal. Mach. Intell. 20(2), 113–125 (1998). https://doi.org/10.1109/34.659930
Steger, C.: Similarity measures for occlusion, clutter, and illumination invariant object recognition. In: DAGM-Symposium, Lecture Notes in Computer Science, vol. 2191, pp. 148–154. Springer, Berlin (2001). https://doi.org/10.1007/3-540-45404-7_20
Steger, C.: Occlusion, clutter, and illumination invariant object recognition. Int. Arch. Photogram. Remote Sens. 34(3/A), 345–350 (2002)
Steger, C., Ulrich, M., Wiedemann, C.: Machine Vision Algorithms and Applications, 2nd edn. Wiley-VCH, Weinheim (2018)
Sui, Y., Zhang, Z., Wang, G., Tang, Y., Zhang, L.: Real-time visual tracking: Promoting the robustness of correlation filter learning. In: European Conference on Computer Vision (ECCV), pp. 662–678 (2016). https://doi.org/10.1007/978-3-319-46484-8_40
Sun, C., Wang, D., Lu, H., Yang, M.H.: Learning spatial-aware regressions for visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8962–8970 (2018)
Ulrich, M., Steger, C.: Empirical performance evaluation of object recognition methods. Empirical Evaluation Methods in Computer Vision pp. 62–76 (2001)
Ulrich, M., Steger, C.: Performance evaluation of 2d object recognition techniques. Technical Report PF–2002–01, Lehrstuhl für Photogrammetrie und Fernerkundung, Technische Universität München (2002)
Ulrich, M., Steger, C.: System and methods for automatic parameter determination in machine vision (2011). US Patent 7,953,290
Ulrich, M., Steger, C., Baumgartner, A.: Real-time object recognition using a modified generalized hough transform. Pattern Recognit. 36(11), 2557–2570 (2003). https://doi.org/10.1016/S0031-3203(03)00169-9
Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A.W.M., Torr, P.H.S., Gavves, E.: Long-term tracking in the wild: a benchmark. In: European Conference on Computer Vision (ECCV), pp. 692–707 (2018). https://doi.org/10.1007/978-3-030-01219-9_41
Vojir, T., Matas, J.: Pixel-wise object segmentations for the VOT 2016 dataset. Research Report CTU–CMP–2017–01, Center for Machine Perception, Czech Technical University, Prague, Czech Republic (2017)
Wu, Y., Lim, J., Yang, M.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015). https://doi.org/10.1109/TPAMI.2014.2388226
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Böttger, T., Steger, C. Accurate and robust tracking of rigid objects in real time. J Real-Time Image Proc 18, 493–510 (2021). https://doi.org/10.1007/s11554-020-00978-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-020-00978-9