Skip to main content
Log in

Online object tracking based on BLSTM-RNN with contextual-sequential labeling

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Object context has been verified its significance for appearance modeling in different proposed tracking-by-detection approaches. Unfortunately, the restrictive representation of the target’s contextual relationship within spatial domain has intensively limited its utility with high-level classification strategies. By investigating the learning capability of long-term dependencies from sequential data, in this paper, we propose a novel appearance model by transforming the target contextual dependency into a semantic sequential representation. It can be effectively utilized by a recurrent neural network embedded with bidirectional long short-term memory cells for online tracking-by-learning. Based on the trained BLSTM-RNN model, a searching mechanism by labeling score is proposed to improve the tracking robustness. With the implied appearance variation by labeling, the proposed tracking method has demonstrated to outperform most of state-of-the-art trackers on challenging benchmark videos via a heuristic strategy for model updating.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Adam A, Rivlin E, Shimshoni I (2006) Robust fragments-based tracking using the integral histogram. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 798–805

  • Babenko B, Yang MH, Belongie S (2009) Visual tracking with online multiple instance learning. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 983–990

  • Dengue L, Yu D (2014) Deep learning: methods and applications. Found Trends Signal Process 7(3–4):197–387

    Article  MathSciNet  Google Scholar 

  • Fan B, Wang L, Soong FK, Xie L (2015) Photo-real talking head with deep bidirectional LSTM. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4884–4888

  • Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929

    Article  Google Scholar 

  • Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 580–587

  • Grabner H, Grabner M, Bischof H (2006) Real-time tracking via on-line boosting. In: The British machine vision conference (BMVC), BMVA Press, pp 6.1–6.10

  • Graves A (2012a) Offline Arabic handwriting recognition with multidimensional recurrent neural networks. In: Guide to OCR for Arabic Scripts. Springer, London, pp 297–313

  • Graves A (2012b) Supervised sequence labelling with recurrent neural networks. Stud Comput Intell, vol 385. Springer, Berlin

  • Graves A, Mohamed A, Hinton GE (2013) Speech recognition with deep recurrent neural networks. IEEE international conference on acoustics speech and signal processing (ICASSP), IEEE, pp 6645–6649

  • Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space odyssey. CoRR abs/1503.04069

  • Hare S, Saffari A, Torr PHS (2011) Struck: structured output tracking with kernels. In: The international conference on computer vision (ICCV), IEEE, pp 263–270

  • Henriques JAF, Caseiro R, Martins P, Batista J (2012) Exploiting the circulant structure of tracking-by-detection with kernels. In: The European conference on computer vision (ECCV), Springer International Publishing, pp 702–715

  • Hochreiter S, Bengio Y, Frasconi P (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Field guide to dynamical recurrent networks, IEEE Press

  • Hong S, You T, Kwak S, Han B (2015) Online tracking by learning discriminative saliency map with convolutional neural network. In: The international conference on machine learning (ICML), JMLR Workshop and Conference Proceedings, pp 597–606

  • Hong Z, Wang C, Mei X, Prokhorov D, Tao D (2014) Tracking using multilevel quantizations. In: The European conference on computer vision (ECCV), vol 8694. Springer International Publishing, pp 155–171

  • Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Nat Acad Sci 79(8):2554–2558

    Article  MathSciNet  MATH  Google Scholar 

  • Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3128–3137

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25 (NIPS), Curran Associates, Inc., pp 1097–1105

  • Kwon J, Lee KM (2011) Tracking by sampling trackers. In: The international conference on computer vision (ICCV), pp 1195–1202

  • Li H, Li Y, Porikli F (2014) Robust online visual tracking with a single convolutional neural network. In: The Asian conference on computer vision (ACCV). Springer International Publishing, pp 194–209

  • Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 3367–3375

  • Mei X, Hong Z, Prokhorov D, Tao D (2015) Robust multitask multiview tracking in videos. IEEE Trans Neural Netw Learn Syst 26(11):2874–2890

    Article  MathSciNet  Google Scholar 

  • Pinheiro P, Collobert R (2014) Recurrent convolutional neural networks for scene labeling. In: The international conference on machine learning (ICML), JMLR Workshop and Conference Proceedings, pp 82–90

  • Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77(1–3):125–141

    Article  Google Scholar 

  • Sak H, Senior AW, Rao K, Beaufays F (2015) Fast and accurate recurrent neural network acoustic models for speech recognition. In: InterSpeech, IEEE

  • Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. In: Advances in neural information processing systems 26 (NIPS), Curran Associates, Inc., pp 2553–2561

  • Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. In: Advances in neural information processing systems 26 (NIPS), Curran Associates, Inc., pp 809–817

  • Wu Y, Lim J, Yang MH (2013) Online object tracking: a benchmark. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 2411–2418

  • Zeiler MD, Fergus R (2014) The European conference on computer vision (ECCV). Visualizing and understanding convolutional networks. Springer International Publishing, pp 818–833

  • Zhang J, Ma S, Sclaroff S (2014) MEEM: robust tracking via multiple experts using entropy minimization. In: The European conference on computer vision (ECCV). Springer International Publishing

  • Zhang K, Zhang L, Yang MH (2012) Real-time compressive tracking. In: The European conference on computer vision (ECCV). Springer International Publishing, pp 864–877

  • Zhong W, Lu H, Yang MH (2012) Robust object tracking via sparsity-based collaborative model. In: IEEE conference on computer vision and pattern recognition (CVPR), IEEE, pp 1838–1845

  • Zhou X, Xie L, Zhang P, Zhang Y (2014) An ensemble of deep neural networks for object tracking. In: IEEE international conference on image processing (ICIP), pp 843–847

  • Zhou X, Xie L, Zhang P, Zhang Y (2015) Online object tracking based on CNN with metropolis-hasting re-sampling. In: The 23rd ACM international conference on multimedia (ACM MM), ACM, pp 1163–1166

  • Zuo Z, Shuai B, Wang G, Liu X, Wang X, Wang B, Chen Y (2015) Convolutional recurrent neural networks: learning spatial dependencies for image representation. In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 18–26

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61571363) and The National High Technology Research and Development Program of China (Grant No. 2015AA016402).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiangzeng Zhou or Lei Xie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, X., Xie, L., Zhang, P. et al. Online object tracking based on BLSTM-RNN with contextual-sequential labeling. J Ambient Intell Human Comput 8, 861–870 (2017). https://doi.org/10.1007/s12652-017-0514-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-017-0514-4

Keywords

Navigation