Advertisement

TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

  • Matthias Müller
  • Adel Bibi
  • Silvio Giancola
  • Salman Alsubaihi
  • Bernard Ghanem
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

Despite the numerous developments in object tracking, further improvement of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

Keywords

Object tracking Dataset Benchmark Deep learning 

Supplementary material

Supplementary material 1 (mp4 73930 KB)

474172_1_En_19_MOESM2_ESM.pdf (14.5 mb)
Supplementary material 2 (pdf 14843 KB)

References

  1. 1.
    Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: complementary learners for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)Google Scholar
  2. 2.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48881-3_56CrossRefGoogle Scholar
  3. 3.
    Bibi, A., Mueller, M., Ghanem, B.: Target response adaptation for correlation filter tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 419–433. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_25CrossRefGoogle Scholar
  4. 4.
    Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2544–2550, June 2010.  https://doi.org/10.1109/CVPR.2010.5539960
  5. 5.
    Collins, R., Zhou, X., Teh, S.K.: An open source tracking testbed and evaluation web site. In: IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS 2005), January 2005Google Scholar
  6. 6.
    Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 21–26 (2017)Google Scholar
  7. 7.
    Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)Google Scholar
  8. 8.
    Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: The IEEE International Conference on Computer Vision (ICCV), December 2015Google Scholar
  9. 9.
    Danelljan, M., Hger, G., Shahbaz Khan, F., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference. BMVA Press (2014).  https://doi.org/10.5244/C.28.65
  10. 10.
    Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: ECCV (2016)Google Scholar
  11. 11.
    Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: a benchmark for higher frame rate object tracking. arXiv preprint arXiv:1703.05884 (2017)
  12. 12.
    Galoogahi, H.K., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 21–26 (2017)Google Scholar
  13. 13.
    Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., Wang, S.: Learning dynamic Siamese network for visual object tracking. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  14. 14.
    Hare, S., Saffari, A., Torr, P.H.S.: Struck: structured output tracking with kernels. In: 2011 International Conference on Computer Vision, pp. 263–270. IEEE, November 2011.  https://doi.org/10.1109/ICCV.2011.6126251
  15. 15.
    Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_45CrossRefGoogle Scholar
  16. 16.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2015).  https://doi.org/10.1109/TPAMI.2014.2345390CrossRefGoogle Scholar
  17. 17.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 702–715. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33765-9_50CrossRefGoogle Scholar
  18. 18.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2015)CrossRefGoogle Scholar
  19. 19.
    Jia, X., Lu, H., Yang, M.H.: Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1822–1829, June 2012.  https://doi.org/10.1109/CVPR.2012.6247880
  20. 20.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2011).  https://doi.org/10.1109/TPAMI.2011.239CrossRefGoogle Scholar
  21. 21.
    Kristan, M.: The visual object tracking VOT2016 challenge results. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 777–823. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48881-3_54. http://www.springer.com/gp/book/9783319488806CrossRefGoogle Scholar
  22. 22.
    Kristan, M., et al.: The visual object tracking vot2017 challenge results (2017). http://openaccess.thecvf.com/content_ICCV_2017_workshops/papers/w28/Kristan_The_Visual_Object_ICCV_2017_paper.pdf
  23. 23.
    Kristan, M., et al.: The visual object tracking vot2015 challenge results. In: Visual Object Tracking Workshop 2015 at ICCV2015, December 2015Google Scholar
  24. 24.
    Kristan, M., et al.: A novel performance evaluation methodology for single-target trackers. IEEE Trans. Pattern Anal. Mach. Intell. 38(11), 2137–2155 (2016).  https://doi.org/10.1109/TPAMI.2016.2516982CrossRefGoogle Scholar
  25. 25.
    Kristan, M.: The visual object tracking VOT2014 challenge results. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 191–217. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16181-5_14CrossRefGoogle Scholar
  26. 26.
    Kristan, M.: The visual object tracking VOT2014 challenge results. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 191–217. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16181-5_14. http://www.votchallenge.net/vot2014/program.htmlCrossRefGoogle Scholar
  27. 27.
    Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K.: Motchallenge 2015: towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942 (2015)
  28. 28.
    Li, A., Lin, M., Wu, Y., Yang, M.H., Yan, S.: NUS-PRO: a new visual tracking challenge. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 335–349 (2016).  https://doi.org/10.1109/TPAMI.2015.2417577CrossRefGoogle Scholar
  29. 29.
    Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A., Hengel, A.V.D.: A survey of appearance models in visual object tracking. ACM Trans. Intell. Syst. Technol. (TIST) 4(4), 58 (2013)Google Scholar
  30. 30.
    Li, Y., Zhu, J.: A scale adaptive kernel correlation filter tracker with feature integration. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 254–265. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16181-5_18CrossRefGoogle Scholar
  31. 31.
    Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. In: Image Processing, pp. 1–14. IEEE (2015). http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7277070MathSciNetCrossRefGoogle Scholar
  32. 32.
    Lukezic, A., Vojír, T., Zajc, L.C., Matas, J., Kristan, M.: Discriminative correlation filter with channel and spatial reliability. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2 (2017)Google Scholar
  33. 33.
    Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking, March 2016. arXiv:1603.00831 [cs], http://arxiv.org/abs/1603.00831, arXiv: 1603.00831
  34. 34.
    Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_27CrossRefGoogle Scholar
  35. 35.
    Mueller, M., Smith, N., Ghanem, B.: Context-aware correlation filter tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1396–1404 (2017)Google Scholar
  36. 36.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  37. 37.
    Ning, J., Yang, J., Jiang, S., Zhang, L., Yang, M.H.: Object tracking via dual linear structured SVM and explicit feature map. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4266–4274 (2016)Google Scholar
  38. 38.
    Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7473. IEEE (2017)Google Scholar
  39. 39.
    Ross, D., Lim, J., Lin, R.S., Yang, M.H.: Incremental learning for robust visual tracking. Int. J. Comput. Vis. 77(1–3), 125–141 (2008).  https://doi.org/10.1007/s11263-007-0075-7CrossRefGoogle Scholar
  40. 40.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Smeulders, A.W.M., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014).  https://doi.org/10.1109/TPAMI.2013.230CrossRefGoogle Scholar
  42. 42.
    Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)CrossRefGoogle Scholar
  43. 43.
    Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5000–5008. IEEE (2017)Google Scholar
  44. 44.
    Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowdsourced video annotation. Int. J. Comput. Vis. 101(1), 184–204 (2013)CrossRefGoogle Scholar
  45. 45.
    Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 3119–3127, December 2015.  https://doi.org/10.1109/ICCV.2015.357
  46. 46.
    Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: 2013 IEEE Conference on Computer vision and pattern recognition (CVPR), pp. 2411–2418. IEEE (2013)Google Scholar
  47. 47.
    Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)CrossRefGoogle Scholar
  48. 48.
    Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. (CSUR) 38(4), 13 (2006)CrossRefGoogle Scholar
  49. 49.
    Zhang, J., Ma, S., Sclaroff, S.: MEEM: robust tracking via multiple experts using entropy minimization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 188–203. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_13CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Matthias Müller
    • 1
  • Adel Bibi
    • 1
  • Silvio Giancola
    • 1
  • Salman Alsubaihi
    • 1
  • Bernard Ghanem
    • 1
  1. 1.King Abdullah University of Science and TechnologyThuwalKingdom of Saudi Arabia

Personalised recommendations