Object Tracking Using Spatio-Temporal Networks for Future Prediction Location

Liu, Yuan; Li, Ruoteng; Cheng, Yu; Tan, Robby T.; Sui, Xiubao

doi:10.1007/978-3-030-58542-6_1

Object Tracking Using Spatio-Temporal Networks for Future Prediction Location

Conference paper
First Online: 17 November 2020

3266 Accesses
11 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12367))

Abstract

We introduce an object tracking algorithm that predicts the future locations of the target object and assists the tracker to handle object occlusion. Given a few frames of an object that are extracted from a complete input sequence, we aim to predict the object’s location in the future frames. To facilitate the future prediction ability, we follow three key observations: 1) object motion trajectory is affected significantly by camera motion; 2) the past trajectory of an object can act as a salient cue to estimate the object motion in the spatial domain; 3) previous frames contain the surroundings and appearance of the target object, which is useful for predicting the target object’s future locations. We incorporate these three observations into our method that employs a multi-stream convolutional-LSTM network. By combining the heatmap scores from our tracker (that utilises appearance inference) and the locations of the target object from our trajectory inference, we predict the final target’s location in each frame. Comprehensive evaluations show that our method sets new state-of-the-art performance on a few commonly used tracking benchmarks.

Y. Liu and R. Li—These two authors contributed equally to this work.

R. T. Tan’s research in this work is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–971 (2016)
Google Scholar
Babenko, B., Yang, M.H., Belongie, S.: Visual tracking with online multiple instance learning. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 983–990. IEEE (2009)
Google Scholar
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6182–6191 (2019)
Google Scholar
Bhat, G., Johnander, J., Danelljan, M., Shahbaz Khan, F., Felsberg, M.: Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 483–498 (2018)
Google Scholar
Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2544–2550. IEEE (2010)
Google Scholar
Cui, Z., Xiao, S., Feng, J., Yan, S.: Recurrently target-attending tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1449–1458 (2016)
Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4660–4669 (2019)
Google Scholar
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: Eco: efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6638–6646 (2017)
Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)
Google Scholar
Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_29
Chapter Google Scholar
Ellis, D., Sommerlade, E., Reid, I.: Modelling pedestrian trajectory patterns with gaussian processes. In: 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pp. 1229–1234. IEEE (2009)
Google Scholar
Johnson, N., Hogg, D.: Learning the distribution of object trajectories for event recognition. Image Vis. Comput. 14(8), 609–615 (1996)
Article Google Scholar
Kristan, M., et al.: The sixth visual object tracking VOT2018 challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 3–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_1
Chapter Google Scholar
Kristan, M., et al.: The seventh visual object tracking vot2019 challenge results. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4282–4291 (2019)
Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
Google Scholar
Liang, J., Jiang, L., Niebles, J.C., Hauptmann, A.G., Fei-Fei, L.: Peeking into the future: predicting future person activities and locations in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5725–5734 (2019)
Google Scholar
Liu, R., Bao, F., Gao, G., Zhang, H., Wang, Y.: Improving Mongolian phrase break prediction by using syllable and morphological embeddings with BiLSTM model. In: Interspeech, pp. 57–61 (2018)
Google Scholar
Liu, R., Bao, F., Gao, G., Zhang, H., Wang, Y.: A LSTM approach with sub-word embeddings for Mongolian phrase break prediction. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 2448–2455 (2018)
Google Scholar
Manh, H., Alaghband, G.: Scene-LSTM: a model for human trajectory prediction. arXiv preprint arXiv:1808.04018 (2018)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Chapter Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Sadeghian, A., Alahi, A., Savarese, S.: Tracking the untrackable: learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 300–311 (2017)
Google Scholar
Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., Savarese, S.: Sophie: an attentive GAN for predicting paths compliant to social and physical constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1349–1358 (2019)
Google Scholar
Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: an experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2013)
Google Scholar
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Article Google Scholar
Yagi, T., Mangalam, K., Yonetani, R., Sato, Y.: Future person localization in first-person videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7593–7602 (2018)
Google Scholar
Yang, H., Shao, L., Zheng, F., Wang, L., Song, Z.: Recent advances and trends in visual tracking: a review. Neurocomputing 74(18), 3823–3831 (2011)
Article Google Scholar
Yang, T., Chan, A.B.: Recurrent filter learning for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2010–2019 (2017)
Google Scholar
Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 152–167 (2018)
Google Scholar
Yilmaz, A., Javed, O., Shah, M.: Object tracking: a survey. ACM Comput. Surv. (CSUR) 38(4), 13 (2006)
Article Google Scholar
Zheng, L., Tang, M., Lu, H., et al.: Learning features with differentiable closed-form solver for tracking. arXiv preprint arXiv:1906.10414 (2019)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–117 (2018)
Google Scholar
Zou, H., Su, H., Song, S., Zhu, J.: Understanding human behaviors in crowds by imitating the decision-making process. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Yuan Liu & Xiubao Sui
National University of Singapore, Singapore, Singapore
Ruoteng Li, Yu Cheng & Robby T. Tan
Yale-NUS College, Singapore, Singapore
Robby T. Tan

Authors

Yuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ruoteng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Robby T. Tan
View author publications
You can also search for this author in PubMed Google Scholar
Xiubao Sui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiubao Sui .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Li, R., Cheng, Y., Tan, R.T., Sui, X. (2020). Object Tracking Using Spatio-Temporal Networks for Future Prediction Location. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12367. Springer, Cham. https://doi.org/10.1007/978-3-030-58542-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-58542-6_1
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58541-9
Online ISBN: 978-3-030-58542-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics