2D Recurrent Neural Networks for Robust Visual Tracking of Non-Rigid Bodies

  • G. L. Masala
  • B. Golosio
  • M. Tistarelli
  • E. Grosso
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 629)


The efficient tracking of articulated bodies over time is an essential element of pattern recognition and dynamic scenes analysis. This paper proposes a novel method for robust visual tracking, based on the combination of image-based prediction and weighted correlation. Starting from an initial guess, neural computation is applied to predict the position of the target in each video frame. Normalized cross-correlation is then applied to refine the predicted target position.

Image-based prediction relies on a novel architecture, derived from the Elman’s Recurrent Neural Networks and adopting nearest neighborhood connections between the input and context layers in order to store the temporal information content of the video. The proposed architecture, named 2D Recurrent Neural Network, ensures both a limited complexity and a very fast learning stage. At the same time, it guarantees fast execution times and excellent accuracy for the considered tracking task. The effectiveness of the proposed approach is demonstrated on a very challenging set of dynamic image sequences, extracted from the final of triple jump at the London 2012 Summer Olympics. The system shows remarkable performance in all considered cases, characterized by changing background and a large variety of articulated motions.


Recurrent neural network Tracking Video analysis 


  1. 1.
    Marr, D.: Vision: A Computational Approach. Freeman & Co., San Francisco (1982)Google Scholar
  2. 2.
    Ullman, S.: The interpretation of structure from motion. Proc. Roy. Soc. Lond. B: Biol. Sci. 203(1153), 405–426 (1979)CrossRefGoogle Scholar
  3. 3.
    Gibson, J.J.: The Ecological Approach to Visual Perception, Classic edn. Psychology Press, New York (2014)Google Scholar
  4. 4.
    Denman, H., Rea, N., Kokaram, A.: Content-based analysis for video from snooker broadcasts. Comput. Vis. Image Underst. 92(2), 176–195 (2003)CrossRefMATHGoogle Scholar
  5. 5.
    Kokaram, A., Pitie, F., Dahyot, R., Rea, N., Yeterian, S.: Content controlled image representation for sports streaming. In: Proceedings of Content-Based Multimedia Indexing (CBMI05)Google Scholar
  6. 6.
    Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. (CSUR) 38(4), 13 (2006)CrossRefGoogle Scholar
  7. 7.
    Hong, S., You, T., Kwak, S., Han, B.: Online tracking by learning discriminative saliency map with convolutional neural network, arXiv preprint arXiv:1502.06796
  8. 8.
    Bao, C., Wu, Y., Ling, H., Ji, H.: Real time robust L1 tracker using accelerated proximal gradient approach. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1830–1837. IEEE (2012)Google Scholar
  9. 9.
    Jia, X., Lu, H., Yang, M.-H.: Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1822–1829. IEEE (2012)Google Scholar
  10. 10.
    Mei, X., Ling, H.: Robust visual tracking using L1 minimization. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1436–1443. IEEE (2009)Google Scholar
  11. 11.
    Babenko, B., Yang, M.-H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011)CrossRefGoogle Scholar
  12. 12.
    Hare, S., Saffari, A., Torr, P.H.: Struck: structured output tracking with kernels. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 263–270. IEEE (2011)Google Scholar
  13. 13.
    Grabner, H., Grabner, M., Bischof, H.: Real-time tracking via on-line boosting. In: BMVC, vol.1, p. 6 (2006)Google Scholar
  14. 14.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2188–2202 (2011)CrossRefGoogle Scholar
  15. 15.
    Schulter, S., Leistner, C., Roth, P.M., Bischof, H., Van Gool, L.J.: On-line hough forests. In: BMVC 2011, pp. 1–11 (2011)Google Scholar
  16. 16.
    Wang, X., Ma, L., Wang, B., Wang, T.: A hybrid optimization-based recurrent neural network for real-time data prediction. Neurocomputing 120, 547–559 (2013)CrossRefGoogle Scholar
  17. 17.
    Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)CrossRefGoogle Scholar
  18. 18.
    Korekado, K., Morie, T., Nomura, O., Ando, H., Nakano, T., Matsugu, M., Iwata, A.: A convolutional neural network VLSI for image recognition using merged/mixed analog-digital architecture. In: Palade, V., Howlett, R.J., Jain, L. (eds.) KES 2003. LNCS, vol. 2774, pp. 169–176. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Şeker, S., Ayaz, E., Türkcan, E.: Elman’s recurrent neural network applications to condition monitoring in nuclear power plant and rotating machinery. Eng. Appl. Artif. Intell. 16(7), 647–656 (2003)Google Scholar
  20. 20.
    Haykin, S.: Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)MATHGoogle Scholar
  21. 21.
    Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)CrossRefGoogle Scholar
  22. 22.
    Briechle, K., Hanebeck, U.D.: Template matching using fast normalized cross correlation. In: Aerospace/Defense Sensing, Simulation, and Controls, International Society for Optics and Photonics, pp. 95–102 (2001)Google Scholar
  23. 23.
    Baker, S., Matthews, I.: Lucas-kanade 20 years on: a unifying framework. Int. J. Comput. Vis. 56(3), 221–255 (2004)CrossRefGoogle Scholar
  24. 24.
    Nguyen, H.T., Smeulders, A.W.: Fast occluded object tracking by a robust appearance filter. IEEE Trans. Pattern Anal. Mach. Intell. 26(8), 1099–1104 (2004)CrossRefGoogle Scholar
  25. 25.
    Adam, A., Rivlin, E., Shimshoni, I.: Robust fragments-based tracking using the integral histogram. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 798–805. IEEE (2006)Google Scholar
  26. 26.
    Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: Proceedings of 2000 IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 142–149. IEEE (2000)Google Scholar
  27. 27.
    Oron, S., Bar-Hillel, A., Levi, D., Avidan, S.: Locally orderless tracking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1940–1947. IEEE (2012)Google Scholar
  28. 28.
    Ross, D.A., Lim, J., Lin, R.-S., Yang, M.-H.: Incremental learning for robust visual tracking. Int. J. Comput. Vision 77(1–3), 125–141 (2008)CrossRefGoogle Scholar
  29. 29.
    Kwon, J., Lee, K.M., Park, F.C.: Visual tracking via geometric particle filtering on the affine group with optimal importance functions. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition CVPR 2009, pp. 991–998. IEEE (2009)Google Scholar
  30. 30.
    Kwon, J., Lee, K.M.: Tracking by sampling trackers. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1195–1202. IEEE (2011)Google Scholar
  31. 31.
    Kwon, J., Lee, K.M.: Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive basin hopping monte carlo sampling. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 CVPR 2009, pp. 1208–1215. IEEE (2009)Google Scholar
  32. 32.
    Čehovin, L., Kristan, M., Leonardis, A.: An adaptive coupled-layer visual model for robust visual tracking. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1363–1370. IEEE (2011)Google Scholar
  33. 33.
    Mei, X., Ling, H., Wu, Y., Blasch, E., Bai, L.: Minimum error bounded efficient L1 tracker with occlusion detection. In: Proceedings of IEEE CVPR, Providence, RI, USA (2011)Google Scholar
  34. 34.
    Nguyen, H.T., Smeulders, A.W.: Robust tracking using foreground-background texture discrimination. Int. J. Comput. Vision 69(3), 277–293 (2006)CrossRefGoogle Scholar
  35. 35.
    Godec, M., Roth, P.M., Bischof, H.: Hough-based tracking of non-rigid objects. Comput. Vis. Image Underst. 117(10), 1245–1256 (2013)CrossRefGoogle Scholar
  36. 36.
    Yang, F., Lu, H., Yang, M.-H.: Robust superpixel tracking. IEEE Trans. Image Process. 23(4), 1639–1651 (2014)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Kalal, Z., Matas, J., Mikolajczyk, K.: PN learning: Bootstrapping binary classifiers by structural constraints. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 49–56. IEEE (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • G. L. Masala
    • 1
  • B. Golosio
    • 1
  • M. Tistarelli
    • 1
  • E. Grosso
    • 1
  1. 1.Department of Political Science, Communication, Engineering and Information Technologies - Computer Vision LaboratoryUniversity of SassariSassariItaly

Personalised recommendations