Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Ma, Chao; Huang, Jia-Bin; Yang, Xiaokang; Yang, Ming-Hsuan

doi:10.1007/s11263-018-1076-4

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Published: 16 March 2018

Volume 126, pages 771–796, (2018)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Chao Ma^1,2,
Jia-Bin Huang³,
Xiaokang Yang¹ &
…
Ming-Hsuan Yang ORCID: orcid.org/0000-0003-4848-2304⁴

2490 Accesses
110 Citations
4 Altmetric
Explore all metrics

Abstract

Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive correlation filters are prone to drift due to noisy updates. Moreover, as these algorithms do not maintain long-term memory of target appearance, they cannot recover from tracking failures caused by heavy occlusion or target disappearance in the camera view. In this paper, we propose to learn multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking. First, we learn a kernelized correlation filter with an aggressive learning rate for locating target objects precisely. We take into account the appropriate size of surrounding context and the feature representations. Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes. Third, we learn a complementary correlation filter with a conservative learning rate to maintain long-term memory of target appearance. We use the output responses of this long-term filter to determine if tracking failure occurs. In the case of tracking failures, we apply an incrementally learned detector to recover the target position in a sliding window fashion. Extensive experimental results on large-scale benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention mechanisms in computer vision: A survey

Article Open access 15 March 2022

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

References

Arulampalam, M. S., Maskell, S., Gordon, N. J., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.
Article Google Scholar
Avidan, S. (2004). Support vector tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8), 1064–1072.
Article Google Scholar
Avidan, S. (2007). Ensemble tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(2), 261–271.
Article Google Scholar
Babenko, B., Yang, M. H., & Belongie, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1619–1632.
Article Google Scholar
Bai, Q., Wu, Z., Sclaroff, S., Betke, M., & Monnier, C. (2013). Randomized ensemble tracking. In Proceedings of IEEE international conference on computer vision.
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., & Torr, P. H. S. (2016). Staple: Complementary learners for real-time tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Boddeti, V. N., Kanade, T., & Kumar, B. V. K. V. (2013). Correlation filters for object alignment. In Proceedings of IEEE conference on computer vision and pattern recognition.
Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In Proceedings of IEEE conference on computer vision and pattern recognition.
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.
MathSciNet MATH Google Scholar
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of IEEE conference on computer vision and pattern recognition.
Danelljan, M., Hager, G., Khan, F. S., & Felsberg, M. (2014a). Accurate scale estimation for robust visual tracking. In Proceedings of British machine vision conference.
Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In Proceedings of IEEE international conference on computer vision.
Danelljan, M., Khan, F. S., Felsberg, M., & van de Weijer, J. (2014b). Adaptive color attributes for real-time visual tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Dinh, T. B., Vo, N., & Medioni, G. G. (2011). Context tracker: Exploring supporters and distracters in unconstrained environments. In Proceedings of IEEE conference on computer vision and pattern recognition.
Felsberg, M. (2013). Enhanced distribution field tracking using channel representations. In Proceedings of IEEE international conference on computer vision workshop.
Galoogahi, H. K., Sim, T., & Lucey, S. (2013). Multi-channel correlation filters. In Proceedings of IEEE international conference on computer vision.
Gao, J., Ling, H., Hu, W., & Xing, J. (2014). Transfer learning based visual tracking with gaussian processes regression. In Proceedings of the European conference on computer vision.
Girshick, R.B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition.
Grabner, H., Leistner, C., & Bischof, H. (2008). Semi-supervised on-line boosting for robust tracking. In Proceedings of the European conference on computer vision.
Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11(1), 23–63.
Article Google Scholar
Hare, S., Saffari, A., & Torr, P. H. S. (2011). Struck: Structured output tracking with kernels. In Proceedings of IEEE international conference on computer vision.
He, S., Yang, Q., Lau, R. W. H., Wang, J., & Yang, M. H. (2013). Visual tracking via locality sensitive histograms. In Proceedings of IEEE conference on computer vision and pattern recognition.
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2012). Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the European conference on computer vision.
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.
Article Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.
Article Google Scholar
Hong, Z., Chen, Z., Wang, C., Mei, X., & Prokhorov, D., Tao, D. (2015). MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Hua, Y., Alahari, K., & Schmid, C. (2014). Occlusion and motion reasoning for long-term tracking. In Proceedings of the European conference on computer vision.
Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7), 1409–1422.
Article Google Scholar
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernández, G., Vojír, T., Häger, G., Nebehay, G., & Pflugfelder, R. P. (2015). The visual object tracking VOT2015 challenge results. In Proceedings of IEEE international conference on computer vision workshop.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems.
Kumar, B. V. K. V., Mahalanobis, A., & Juday, R. D. (2005). Correlation pattern recognition. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A. R., & van den Hengel, A. (2013). A survey of appearance models in visual object tracking. ACM TIST, 4(4), 58.
Google Scholar
Li, Y., & Zhu, J. (2014). A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European conference on computer vision workshop.
Liu, T., Wang, G., & Yang, Q. (2015). Real-time part-based visual tracking via adaptive correlation filters. In Proceedings of IEEE conference on computer vision and pattern recognition.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Article Google Scholar
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of International Joint Conference on Artificial Intelligence.
Kristan, M., et al. (2014). The visual object tracking VOT2014 challenge results. In Proceedings of the European conference on computer vision workshop.
Ma, C., Huang, J., Yang, X., & Yang, M. (2015a). Hierarchical convolutional features for visual tracking. In Proceedings of IEEE international conference on computer vision.
Ma, C., Yang, X., Zhang, C., & Yang, M. H. (2015b). Learning a temporally invariant feature representation for visual tracking. In Proceedings of IEEE international conference on image processing.
Ma, C., Yang, X., Zhang, C., & Yang, M. H. (2015c). Long-term correlation tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Matthews, I., Ishikawa, T., & Baker, S. (2004). The template update problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 810–815.
Article Google Scholar
Ning, G., Zhang, Z., Huang, C., Ren, X., Wang, H., Cai, C., & He, Z. (2017). Spatially supervised recurrent convolutional neural networks for visual object tracking. In IEEE international symposium on circuits and systems (ISCAS).
Pernici, F. (2012). Facehugger: The ALIEN tracker applied to faces. In Proceedings of the European conference on computer vision.
Santner, J., Leistner, C., Saffari, A., Pock, T., & Bischof, H. (2010). PROST: Parallel robust online simple tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Sevilla-Lara, L., & Learned-Miller, E. G. (2012). Distribution fields for tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representation.
Smeulders, A. W. M., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2014). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1442–1468.
Article Google Scholar
Supancic, J. S., & Ramanan, D. (2013). Self-paced learning for long-term tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Wu, Y., Lim, J., & Yang, M. H. (2013a). Online object tracking: A benchmark. In Proceedings of IEEE conference on computer vision and pattern recognition.
Wu, Y., Lim, J., & Yang, M. H. (2013b). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848.
Article Google Scholar
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys, 38(4), 13.
Article Google Scholar
Zabih, R., & Woodfill, J. (1994). Non-parametric local transforms for computing visual correspondence. In Proceedings of the European conference on computer vision.
Zhang, J., Ma, S., & Sclaroff, S. (2014a). MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the European conference on computer vision.
Zhang, K., Zhang, L., Liu, Q., Zhang, D., & Yang, M. H. (2014b). Fast visual tracking via dense spatio-temporal context learning. In Proceedings of the European conference on computer vision.
Zhang, K., Zhang, L., & Yang, M. H. (2012a). Real-time compressive tracking. In Proceedings of the European conference on computer vision.
Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012b). Low-rank sparse learning for robust visual tracking. In Proceedings of the European conference on computer vision.
Zhang, T., Liu, S., Ahuja, N., Yang, M. H., & Ghanem, B. (2015). Robust visual tracking via consistent low-rank sparse learning. International Journal of Computer Vision, 111(2), 171–190.
Article Google Scholar
Zhong, W., Lu, H., & Yang, M. H. (2014). Robust object tracking via sparse collaborative appearance model. IEEE Transactions on Image Processing, 23(5), 2356–2368.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported in part by the National Key Research and Development Program of China (2016YFB1001003), NSFC (61527804, 61521062) and the 111 Program (B07022).

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, 200240, People’s Republic of China
Chao Ma & Xiaokang Yang
Australian Centre for Robotic Vision, The University of Adelaide, Adelaide, SA, 5000, Australia
Chao Ma
Virginia Tech, Blacksburg, VA, 24060, USA
Jia-Bin Huang
University of California at Merced, Merced, CA, 95344, USA
Ming-Hsuan Yang

Authors

Chao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Bin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaokang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Hsuan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming-Hsuan Yang.

Additional information

Communicated by Bernt Schiele.

Appendix

In this appendix, we present two additional ablation studies on the OTB2013 dataset. First, we show the results of directly minimizing the errors over all the tracked results to update the correlation filters. Second, we analyze the robustness of the proposed method by spatially shifting the ground truth bounding boxes.

By directly minimizing the errors over all the tracked results, we consider all the extracted appearances $\{x_j,j=1,2,\ldots ,p\}$ of the target object from the first frame up to the current frame p. The cost function is the weighted average quadratic error over these p frames. We assign each frame j with a weight $\beta _j \ge 0$ and learn correlation filter w by minimizing the following objective function:

$$\begin{aligned} \min _w\sum _{j=1}^p\beta _j\left( \sum _{m,n}\left| \left\langle \phi \left( x_{m,n}^j\right) , w^j \right\rangle -y^j(m,n) \right| ^2 +\lambda \left\langle w^j,w^j\right\rangle \right) , \end{aligned}$$

(23)

where $w^j=\sum _{k,l}a(k,l)\phi (x_{k,l}^j)$. We have the solution to (23) in the Fourier domain as:

$$\begin{aligned} \mathcal {A}^p=\frac{\sum _{j=1}^p\beta _j\mathcal {K}_x^j\odot \overline{\mathcal {Y}}}{\sum _{j=1}^p\beta _j\mathcal {K}_x^j\odot \left( \mathcal {K}_x^j +\lambda \right) }, \end{aligned}$$

(24)

where $\mathcal {K}_x^j=\mathscr {F}\left\{ k_x^j\right\} $ and $k_x^j(m,n)=k(x_{m,n}^j,x^j)$. We perform a grid search and set the weight $\beta _j=0.01$ and the update rate $\lambda =10^{-4}$ for the best accuracy. We restore the parameter $\{\mathcal {K}_x^j\}$, $j=1,2,\ldots ,p-1$, to update the correlation filter in frame j.

Note that such an update scheme is not applicable in practice as it requires a linearly increasing computation and memory storage over the increase of frame number p. The average tracking speed is 2.5 frames per second (fps) versus 20.8 fps (ours) on the OTB2013 dataset. However, Fig. 21 shows that this update scheme does not improve performance. The average distance precision is 83.5 versus 84.8% (ours), and the average overlap success is 62.0 versus 62.8% (ours).

We spatially shift the ground truth bounding boxes with eight directions (Fig. 22) and rescale the ground truth bounding boxes with scaling factors 0.8, 0.9, 1.1 and 1.2. Figure 23 shows that slightly enlarge the ground truth bounding boxes (with scaling factor 1.1) does not significantly affect the tracking performance.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, C., Huang, JB., Yang, X. et al. Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking. Int J Comput Vis 126, 771–796 (2018). https://doi.org/10.1007/s11263-018-1076-4

Download citation

Received: 05 December 2016
Accepted: 02 March 2018
Published: 16 March 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11263-018-1076-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

References

Acknowledgements