Advertisement

Meta-tracker: Fast and Robust Online Adaptation for Visual Object Trackers

  • Eunbyung ParkEmail author
  • Alexander C. Berg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)

Abstract

This paper improves state-of-the-art visual object trackers that use online adaptation. Our core contribution is an offline meta-learning-based method to adjust the initial deep networks used in online adaptation-based tracking. The meta learning is driven by the goal of deep networks that can quickly be adapted to robustly model a particular target in future frames. Ideally the resulting models focus on features that are useful for future frames, and avoid overfitting to background clutter, small parts of the target, or noise. By enforcing a small number of update iterations during meta-learning, the resulting networks train significantly faster. We demonstrate this approach on top of the high performance tracking approaches: tracking-by-detection based MDNet [1] and the correlation based CREST [2]. Experimental results on standard benchmarks, OTB2015 [3] and VOT2016 [4], show that our meta-learned versions of both trackers improve speed, accuracy, and robustness.

Notes

Acknowledgements

We thank the reviewers for their valuable feedback and acknowledge support from NSF 1452851, 1526367, 1446631.

Supplementary material

474178_1_En_35_MOESM1_ESM.pdf (1.9 mb)
Supplementary material 1 (pdf 1944 KB)

References

  1. 1.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR (2016)Google Scholar
  2. 2.
    Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R., Yang, M.H.: CREST: convolutional residual learning for visual tracking. In: ICCV (2017)Google Scholar
  3. 3.
    Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. TPAMI 37, 1834–1848 (2015)CrossRefGoogle Scholar
  4. 4.
    Kristan, M., Leonardis, A., Matas, J., Felsberg, M., et al.: The visual object tracking VOT2016 challenge results. In: ECCV Workshop (2016)Google Scholar
  5. 5.
    Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: ECCV (2016)Google Scholar
  6. 6.
    Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR (2017)Google Scholar
  7. 7.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. TPAMI (2015).  https://doi.org/10.1109/TPAMI.2014.2345390CrossRefGoogle Scholar
  8. 8.
    Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV (2015)Google Scholar
  9. 9.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. TPAMI 34, 1409 (2010)CrossRefGoogle Scholar
  10. 10.
    Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: CVPR (2010)Google Scholar
  11. 11.
    Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-48881-3_56CrossRefGoogle Scholar
  12. 12.
    Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_45CrossRefGoogle Scholar
  13. 13.
    Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: CVPR (2016)Google Scholar
  14. 14.
    Mueller, M., Smith, N., Ghanem, B.: Context-aware correlation filter tracking. In: CVPR (2017)Google Scholar
  15. 15.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: ICML (2017)Google Scholar
  16. 16.
    Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning. In: ICLR (2017)Google Scholar
  17. 17.
    Andrychowicz, M., et al.: Learning to learn by gradient descent by gradient descent. In: NIPS (2016)Google Scholar
  18. 18.
    Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., Lillicrap, T.: Meta-learning with memory-augmented neural networks. In: ICML (2016)Google Scholar
  19. 19.
    Li, Z., Zhou, F., Chen, F., Li, H.: Meta-SGD: learning to learn quickly for few shot learning. arXiv:1707.09835 (2017)
  20. 20.
    Al-Shedivat, M., Bansal, T., Burda, Y., Sutskever, I., Mordatch, I., Abbeel, P.: Continuous adaptation via meta-learning in nonstationary and competitive environments. In: ICLR (2018)Google Scholar
  21. 21.
    Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: ICCV (2015)Google Scholar
  22. 22.
    Galoogahi, H.K., Sim, T., Lucey, S.: Correlation filters with limited boundaries. In: CVPR (2015)Google Scholar
  23. 23.
    Zhang, K., Zhang, L., Liu, Q., Zhang, D., Yang, M.-H.: Fast visual tracking via dense spatio-temporal context learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 127–141. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_9CrossRefGoogle Scholar
  24. 24.
    Ma, C., Yang, X., Zhang, C., Yang, M.H.: Long-term correlation tracking. In: CVPR (2015)Google Scholar
  25. 25.
    Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: MUlti-Store Tracker (MUSTer): a cognitive psychology inspired approach to object tracking. In: CVPR (2015)Google Scholar
  26. 26.
    Danelljan, M., Hager, G., Khan, F.S., Felsberg, M.: Accurate scale estimation for robust visual tracking. In: BMVC (2014)Google Scholar
  27. 27.
    Valmadre, J., Bertinetto, L., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: End-to-end representation learning for correlation filter based tracking. In: CVPR (2017)Google Scholar
  28. 28.
    Li, H., Li, Y., Porikli, F.: DeepTrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: BMVC (2014)Google Scholar
  29. 29.
    Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. TPAMI 33, 1619–1632 (2010)CrossRefGoogle Scholar
  30. 30.
    Hare, S.: Struck: structured output tracking with kernels. TPAMI 38, 2096–2109 (2015)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 234–247. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88682-2_19CrossRefGoogle Scholar
  32. 32.
    Bai, Q., Wu, Z., Sclaroff, S., Betke, M., Monnier, C.: Randomized ensemble tracking. In: ICCV (2013)Google Scholar
  33. 33.
    Fischer, P., Dosovitskiy, A., Ilg, E., Hausser, P., Hazrbas, C., Golkov, V.: FlowNet: learning optical flow with convolutional networks. In: CVPR (2015)Google Scholar
  34. 34.
    Kahou, S.E., Michalski, V., Memisevic, R.: RATM: recurrent attentive tracking model. In: CVPR Workshop (2017)Google Scholar
  35. 35.
    Gan, Q., Guo, Q., Zhang, Z., Cho, K.: First step toward model-free, anonymous object tracking with recurrent neural networks. arXiv:1511.06425 (2015)
  36. 36.
    Gordon, D., Farhadi, A., Fox, D.: Re3: real-time recurrent regression networks for object tracking. arXiv:1705.06368 (2017)
  37. 37.
    Yang, T., Chan, A.B.: Recurrent filter learning for visual tracking. In: ICCV (2017)Google Scholar
  38. 38.
    Schmidhuber, J.: Evolutionary principles in self-referential learning. Diploma thesis, Institut f. Informatik, Technical University of Munich (1987)Google Scholar
  39. 39.
    Schmidhuber, J.: Learning to control fast-weight memories: an alternative to dynamic recurrent networks. Neural Comput. 4, 131–139 (1992)CrossRefGoogle Scholar
  40. 40.
    Hochreiter, S., Younger, A.S., Conwell, P.R.: Learning to learn using gradient descent. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, pp. 87–94. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-44668-0_13CrossRefGoogle Scholar
  41. 41.
    Thrun, S., Pratt, L.: Learning to learn: introduction and overview. In: Thrun, S., Pratt, L. (eds.) Learning to Learn, pp. 3–17. Springer, Heidelberg (1998).  https://doi.org/10.1007/978-1-4615-5529-2_1CrossRefzbMATHGoogle Scholar
  42. 42.
    Chen, Y., et al.: Learning to learn without gradient descent by gradient descent. In: ICML (2017)Google Scholar
  43. 43.
    Wichrowska, O., et al.: Learned optimizers that scale and generalize. In: ICML (2017)Google Scholar
  44. 44.
    Li, K., Malik, J.: Learning to optimize. In: ICLR (2017)Google Scholar
  45. 45.
    Bertinetto, L., Henriques, J.F., Valmadre, J., Torr, P.H.S., Vedaldi, A.: Learning feed-forward one-shot learners. In: NIPS (2016)Google Scholar
  46. 46.
    Wang, Y.-X., Hebert, M.: Learning to learn: model regression networks for easy small sample learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 616–634. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46466-4_37CrossRefGoogle Scholar
  47. 47.
    Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  48. 48.
    Maclaurin, D., Duvenaud, D., Adams, R.P.: Gradient-based hyperparameter optimization through reversible learning. In: ICML (2015)Google Scholar
  49. 49.
    Metz, L., Poole, B., Pfau, D., Sohl-Dickstein, J.: Unrolled generative adversarial networks. In: ICLR (2017)Google Scholar
  50. 50.
  51. 51.
    Liu, W., et al.: SSD: single shot MultiBox detector. In: ECCV (2016)Google Scholar
  52. 52.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2016)Google Scholar
  53. 53.
    Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS (2016)Google Scholar
  54. 54.
    Kristan, M., et al.: The visual object tracking VOT2014 challenge results. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 191–217. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16181-5_14CrossRefGoogle Scholar
  55. 55.
    Russakovsky, O.: ImageNet large scale visual recognition challenge. IJCV 115, 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  56. 56.
    Supancic, J., Ramanan, D.: Tracking as online decision-making: learning a policy from streaming videos with reinforcement learning. In: ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of North Carolina at Chapel HillChapel HillUSA

Personalised recommendations