Neural Computing and Applications

, Volume 29, Issue 7, pp 329–341 | Cite as

2D recurrent neural networks: a high-performance tool for robust visual tracking in dynamic scenes

  • Giovanni Masala
  • Filippo Casu
  • Bruno Golosio
  • Enrico Grosso
S.I. : EANN 2016


This paper proposes a novel method for robust visual tracking of arbitrary objects, based on the combination of image-based prediction and position refinement by weighted correlation. The effectiveness of the proposed approach is demonstrated on a challenging set of dynamic video sequences, extracted from the final of triple jump at the London 2012 Summer Olympics. A comparison is made against five baseline tracking systems. The novel system shows remarkable superior performances with respect to the other methods, in all considered cases characterized by changing background, and a large variety of articulated motions. The novel architecture, from here onward named 2D Recurrent Neural Network (2D-RNN), is derived from the well-known recurrent neural network model and adopts nearest neighborhood connections between the input and context layers in order to store the temporal information content of the video. Starting from the selection of the object of interest in the first frame, neural computation is applied to predict the position of the target in each video frame. Normalized cross-correlation is then applied to refine the predicted target position. 2D-RNN ensures limited complexity, great adaptability and a very fast learning time. At the same time, it shows on the considered dataset fast execution times and very good accuracy, making this approach an excellent candidate for automated analysis of complex video streams.


Recurrent neural network Convolutional network Video tracking Automated video analysis 


Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Denman H, Rea N, Kokaram A (2003) Content-based analysis for video from snooker broadcasts. Comput Vis Image Underst 92(2):176–195CrossRefzbMATHGoogle Scholar
  2. 2.
    Kokaram A, Pitie F, Dahyot R, Rea N, Yeterian S. Content controlled image representation for sports streaming. In: Proceedings of content-based multimedia indexing (CBMI05)Google Scholar
  3. 3.
    Yilmaz A, Javed O, Shah M (2006) Object tracking: a survey. ACM Comput Surv (CSUR) 38(4):13CrossRefGoogle Scholar
  4. 4.
    Hong S, You T, Kwak S, Han B. Online tracking by learning discriminative saliency map with convolutional neural network. ArXiv preprint arXiv:1502.06796
  5. 5.
    Bao C, Wu Y, Ling H, Ji H (2012) Real time robust L1 tracker using accelerated proximal gradient approach. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 1830–1837Google Scholar
  6. 6.
    Jia X, Lu H, Yang M-H (2012) Visual tracking via adaptive structural local sparse appearance model. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 1822–1829Google Scholar
  7. 7.
    Mei X, Ling H (2009) Robust visual tracking using L1 minimization. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 1436–1443Google Scholar
  8. 8.
    Babenko B, Yang M-H, Belongie S (2011) Robust object tracking with online multiple instance learning. IEEE Trans Pattern Anal Mach Intell 33(8):1619–1632CrossRefGoogle Scholar
  9. 9.
    Hare S, Saffari A, Torr PH (2011) Struck: structured output tracking with kernels. In: 2011 IEEE international conference on computer vision (ICCV), IEEE, pp 263–270Google Scholar
  10. 10.
    Grabner H, Grabner M, Bischof H (2006) Real-time tracking via on-line boosting. In: BMVC, vol 1, p 6Google Scholar
  11. 11.
    Gall J, Yao A, Razavi N, Van Gool L, Lempitsky V (2011) Hough forests for object detection, tracking, and action recognition. IEEE Trans Pattern Anal Mach Intell 33(11):2188–2202CrossRefGoogle Scholar
  12. 12.
    Schulter S, Leistner C, Roth PM, Bischof H, Van Gool LJ (2011) On-line hough forests. In: BMVC, pp 1–11Google Scholar
  13. 13.
    Henriques JF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: Proceedings of the European conference on computer visionGoogle Scholar
  14. 14.
    Danelljan M, Khan FS, Felsberg M, van de Weijer J (2014) Adaptive color attributes for real-time visual tracking. In: 2014 IEEE conference on computer vision and pattern recognition (CVPR), June 2014, pp 1090–1097Google Scholar
  15. 15.
    Wang X, Ma L, Wang B, Wang T (2013) A hybrid optimization-based recurrent neural network for real-time data prediction. Neurocomputing 120:547–559CrossRefGoogle Scholar
  16. 16.
    Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211CrossRefGoogle Scholar
  17. 17.
    Ondruska P, Posner I (2016) Deep tracking: seeing beyond seeing using recurrent neural networks. In: AAAI-16 conference, 12–17 Feb, Phoenix, Arizona USAGoogle Scholar
  18. 18.
    Korekado K, Morie T, Nomura O, Ando H, Nakano T, Matsugu M, Iwata A (2003) A convolutional neural network vlsi for image recognition using merged/mixed analog-digital architecture. In: Knowledge-based intelligent information and engineering systems, pp 169–176. SpringerGoogle Scholar
  19. 19.
    Sermanet P et al (2013) OverFeat: integrated recognition, localization and detection using convolutional networks. In: International conference on learning representations (ICLR 2014), 16, CBLSGoogle Scholar
  20. 20.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation, CVPRGoogle Scholar
  21. 21.
    He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition, ECCVGoogle Scholar
  22. 22.
    Shaoqing R et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems (NIPS)Google Scholar
  23. 23.
    Redmon J et al (2015) You only look once: unified, real-time object detection. arXiv:1506.02640
  24. 24.
    Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016). Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, pp 850–865. SpringerGoogle Scholar
  25. 25.
    Masala GL, Golosio B, Tistarelli M, Grosso E (2016) 2D recurrent neural networks for robust visual tracking of non-rigid bodies. In: International conference on engineering applications of neural networks, pp 18–34. SpringerGoogle Scholar
  26. 26.
    Briechle K, Hanebeck UD (2001) Template matching using fast normalized cross correlation. In: Aerospace/defense sensing, simulation, and controls. International Society for Optics and Photonics, pp 95–102Google Scholar
  27. 27.
    Bradski GR (1998) Real time face and object tracking as a component of a perceptual user interface. In: Proceedings of the fourth IEEE workshop on applications of computer vision (WACV ‘98), pp 214, 219, 19–21 Oct 1998Google Scholar
  28. 28.
    Şeker S, Ayaz E, Türkcan E (2003) Elman’s recurrent neural network applications to condition monitoring in nuclear power plant and rotating machinery. Eng Appl Artif Intell 16(7):647–656Google Scholar
  29. 29.
    Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, Upper Saddle RiverzbMATHGoogle Scholar
  30. 30.
    Dataset: final of triple jump at the London 2012 Summer Olympics available on the YouTube platform.
  31. 31.
    Liu R, Wang D, Han Y, Fan X, Luo Z (2017) Adaptive low-rank subspace learning with online optimization for robust visual tracking. Neural networks, vol 88, April 2017, pp 90–104, ISSN 0893-6080. doi: 10.1016/j.neunet.2017.02.002
  32. 32.
    Smeulders AW, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468CrossRefGoogle Scholar
  33. 33.
    Stewart R, Andriluka M (2016) End-to-end people detection in crowded scenes. In: 29th IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Los Alamitos, CAGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2017

Authors and Affiliations

  • Giovanni Masala
    • 1
  • Filippo Casu
    • 2
  • Bruno Golosio
    • 2
  • Enrico Grosso
    • 2
  1. 1.School of Computing, Electronics and MathematicsPlymouth UniversityPlymouthUK
  2. 2.Department of Political Science, Communication, Engineering and Information Technologies, Computer Vision LaboratoryUniversity of SassariSassariItaly

Personalised recommendations