Advertisement

Learning to Track at 100 FPS with Deep Regression Networks

  • David HeldEmail author
  • Sebastian Thrun
  • Silvio Savarese
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9905)

Abstract

Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our tracker’s state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker (Our tracker is available at http://davheld.github.io/GOTURN/GOTURN.html) is the first neural-network tracker that learns to track generic objects at 100 fps.

Keywords

Tracking Deep learning Neural networks Machine learning 

Notes

Acknowledgments

We acknowledge the support of Toyota grant 1186781-31-UDARO and ONR grant 1165419-10-TDAUZ.

Supplementary material

419956_1_En_45_MOESM2_ESM.pdf (503 kb)
Supplementary material 2 (pdf 503 KB)

References

  1. 1.
    Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVpPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  2. 2.
    Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755 (2014)
  3. 3.
    Babenko, B., Yang, M.H., Belongie, S.: Visual tracking with online multiple instance learning. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPPR 2009, pp. 983–990. IEEE (2009)Google Scholar
  4. 4.
    Bazzani, L., Larochelle, H., Murino, V.,Ting, J.a., Freitas, N.D.: Learning attentional policies for tracking and recognition in video with deep networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-2011), pp. 937–944 (2011)Google Scholar
  5. 5.
    Bo, L., Ren, X., Fox, D.: Multipath sparse coding using hierarchical matching pursuit. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 660–667. IEEE (2013)Google Scholar
  6. 6.
    Cehovin, L., Kristan, M., Leonardis, A.: Is my new tracker really better than yours? In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 540–547. IEEE (2014)Google Scholar
  7. 7.
    Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 4310–4318 (2015)Google Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  9. 9.
    Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. arXiv preprint arXiv:1411.4389 (2014)
  10. 10.
    Dosovitskiy, A., Fischery, P., Ilg, E., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T., et al.: FlowNet: learning optical flow with convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2758–2766. IEEE (2015)Google Scholar
  11. 11.
    Fan, J., Xu, W., Wu, Y., Gong, Y.: Human tracking using convolutional neural networks. IEEE Trans. Neural Netw. 21(10), 1610–1623 (2010)CrossRefGoogle Scholar
  12. 12.
    Geiger, A.: Probabilistic models for 3D urban scene understanding from movable platforms. Ph.D. thesis, KIT (2013)Google Scholar
  13. 13.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587. IEEE (2014)Google Scholar
  14. 14.
    Han, X., Leung, T., Jia, Y., Sukthankar, R., Berg, A.C.: MatchNet: unifying feature and metric learning for patch-based matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3279–3286 (2015)Google Scholar
  15. 15.
    Hare, S., Saffari, A., Torr, P.H.: Struck: structured output tracking with kernels. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 263–270. IEEE (2011)Google Scholar
  16. 16.
    Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network. In: Proceedings of the 32nd International Conference on Machine Learning, 6–11 July 2015, Lille (2015)Google Scholar
  17. 17.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  18. 18.
    Jin, J., Dundar, A., Bates, J., Farabet, C., Culurciello, E.: Tracking with deep neural networks. In: 2013 47th Annual Conference on Information Sciences and Systems (CISS), pp. 1–5. IEEE (2013)Google Scholar
  19. 19.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409–1422 (2012)CrossRefGoogle Scholar
  20. 20.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)Google Scholar
  21. 21.
    Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernandez, G., Vojir, T., Hager, G., Nebehay, G., Pflugfelder, R.: The visual object tracking vot2015 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pp. 1–23 (2015)Google Scholar
  22. 22.
    Kristan, M., et al.: The visual object tracking VOT2014 challenge results. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8926, pp. 191–217. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-16181-5_14 Google Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  24. 24.
    Kuen, J., Lim, K.M., Lee, C.P.: Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle. Pattern Recognit. 48(10), 2964–2982 (2015)CrossRefGoogle Scholar
  25. 25.
    Levi, G., Hassner, T.: Age and gender classification using convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) (2015)Google Scholar
  26. 26.
    Li, H., Li, Y., Porikli, F.: DeepTrack: learning discriminative feature representations by convolutional neural networks for visual tracking. In: Proceedings of the British Machine Vision Conference. BMVA Press (2014)Google Scholar
  27. 27.
    Li, H., Li, Y., Porikli, F.: DeepTrack: learning discriminative feature representations online for robust visual tracking. arXiv preprint arXiv:1503.00072 (2015)
  28. 28.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  29. 29.
    Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Advances in Neural Information Processing Systems, pp. 2204–2212 (2014)Google Scholar
  30. 30.
    Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. arXiv preprint arXiv:1510.07945 (2015)
  31. 31.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 1–42 (2014)Google Scholar
  32. 32.
    Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)CrossRefGoogle Scholar
  33. 33.
    Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  34. 34.
    Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3119–3127 (2015)Google Scholar
  35. 35.
    Wang, N., Li, S., Gupta, A., Yeung, D.Y.: Transferring rich feature hierarchies for robust visual tracking. arXiv preprint arXiv:1501.04587 (2015)
  36. 36.
    Wang, N., Shi, J., Yeung, D.Y., Jia, J.: Understanding and diagnosing visual tracking systems. arXiv preprint arXiv:1504.06055 (2015)
  37. 37.
    Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: Advances in Neural Information Processing Systems, pp. 809–817 (2013)Google Scholar
  38. 38.
    Zagoruyko, S., Komodakis, N.: Learning to compare image patches via convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361 (2015)Google Scholar
  39. 39.
    Zhang, K., Liu, Q., Wu, Y., Yang, M.H.: Robust visual tracking via convolutional networks. arXiv preprint arXiv:1501.04505 (2015)

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceStanford UniversityStanfordUSA

Personalised recommendations