Skip to main content
Log in

Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive correlation filters are prone to drift due to noisy updates. Moreover, as these algorithms do not maintain long-term memory of target appearance, they cannot recover from tracking failures caused by heavy occlusion or target disappearance in the camera view. In this paper, we propose to learn multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking. First, we learn a kernelized correlation filter with an aggressive learning rate for locating target objects precisely. We take into account the appropriate size of surrounding context and the feature representations. Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes. Third, we learn a complementary correlation filter with a conservative learning rate to maintain long-term memory of target appearance. We use the output responses of this long-term filter to determine if tracking failure occurs. In the case of tracking failures, we apply an incrementally learned detector to recover the target position in a sliding window fashion. Extensive experimental results on large-scale benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  • Arulampalam, M. S., Maskell, S., Gordon, N. J., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.

    Article  Google Scholar 

  • Avidan, S. (2004). Support vector tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8), 1064–1072.

    Article  Google Scholar 

  • Avidan, S. (2007). Ensemble tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(2), 261–271.

    Article  Google Scholar 

  • Babenko, B., Yang, M. H., & Belongie, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1619–1632.

    Article  Google Scholar 

  • Bai, Q., Wu, Z., Sclaroff, S., Betke, M., & Monnier, C. (2013). Randomized ensemble tracking. In Proceedings of IEEE international conference on computer vision.

  • Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., & Torr, P. H. S. (2016). Staple: Complementary learners for real-time tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Boddeti, V. N., Kanade, T., & Kumar, B. V. K. V. (2013). Correlation filters for object alignment. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.

    MathSciNet  MATH  Google Scholar 

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Danelljan, M., Hager, G., Khan, F. S., & Felsberg, M. (2014a). Accurate scale estimation for robust visual tracking. In Proceedings of British machine vision conference.

  • Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In Proceedings of IEEE international conference on computer vision.

  • Danelljan, M., Khan, F. S., Felsberg, M., & van de Weijer, J. (2014b). Adaptive color attributes for real-time visual tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Dinh, T. B., Vo, N., & Medioni, G. G. (2011). Context tracker: Exploring supporters and distracters in unconstrained environments. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Felsberg, M. (2013). Enhanced distribution field tracking using channel representations. In Proceedings of IEEE international conference on computer vision workshop.

  • Galoogahi, H. K., Sim, T., & Lucey, S. (2013). Multi-channel correlation filters. In Proceedings of IEEE international conference on computer vision.

  • Gao, J., Ling, H., Hu, W., & Xing, J. (2014). Transfer learning based visual tracking with gaussian processes regression. In Proceedings of the European conference on computer vision.

  • Girshick, R.B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Grabner, H., Leistner, C., & Bischof, H. (2008). Semi-supervised on-line boosting for robust tracking. In Proceedings of the European conference on computer vision.

  • Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11(1), 23–63.

    Article  Google Scholar 

  • Hare, S., Saffari, A., & Torr, P. H. S. (2011). Struck: Structured output tracking with kernels. In Proceedings of IEEE international conference on computer vision.

  • He, S., Yang, Q., Lau, R. W. H., Wang, J., & Yang, M. H. (2013). Visual tracking via locality sensitive histograms. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2012). Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the European conference on computer vision.

  • Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.

    Article  Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Hong, Z., Chen, Z., Wang, C., Mei, X., & Prokhorov, D., Tao, D. (2015). MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Hua, Y., Alahari, K., & Schmid, C. (2014). Occlusion and motion reasoning for long-term tracking. In Proceedings of the European conference on computer vision.

  • Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7), 1409–1422.

    Article  Google Scholar 

  • Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernández, G., Vojír, T., Häger, G., Nebehay, G., & Pflugfelder, R. P. (2015). The visual object tracking VOT2015 challenge results. In Proceedings of IEEE international conference on computer vision workshop.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems.

  • Kumar, B. V. K. V., Mahalanobis, A., & Juday, R. D. (2005). Correlation pattern recognition. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A. R., & van den Hengel, A. (2013). A survey of appearance models in visual object tracking. ACM TIST, 4(4), 58.

    Google Scholar 

  • Li, Y., & Zhu, J. (2014). A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European conference on computer vision workshop.

  • Liu, T., Wang, G., & Yang, Q. (2015). Real-time part-based visual tracking via adaptive correlation filters. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of International Joint Conference on Artificial Intelligence.

  • Kristan, M., et al. (2014). The visual object tracking VOT2014 challenge results. In Proceedings of the European conference on computer vision workshop.

  • Ma, C., Huang, J., Yang, X., & Yang, M. (2015a). Hierarchical convolutional features for visual tracking. In Proceedings of IEEE international conference on computer vision.

  • Ma, C., Yang, X., Zhang, C., & Yang, M. H. (2015b). Learning a temporally invariant feature representation for visual tracking. In Proceedings of IEEE international conference on image processing.

  • Ma, C., Yang, X., Zhang, C., & Yang, M. H. (2015c). Long-term correlation tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Matthews, I., Ishikawa, T., & Baker, S. (2004). The template update problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 810–815.

    Article  Google Scholar 

  • Ning, G., Zhang, Z., Huang, C., Ren, X., Wang, H., Cai, C., & He, Z. (2017). Spatially supervised recurrent convolutional neural networks for visual object tracking. In IEEE international symposium on circuits and systems (ISCAS).

  • Pernici, F. (2012). Facehugger: The ALIEN tracker applied to faces. In Proceedings of the European conference on computer vision.

  • Santner, J., Leistner, C., Saffari, A., Pock, T., & Bischof, H. (2010). PROST: Parallel robust online simple tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Sevilla-Lara, L., & Learned-Miller, E. G. (2012). Distribution fields for tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representation.

  • Smeulders, A. W. M., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2014). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1442–1468.

    Article  Google Scholar 

  • Supancic, J. S., & Ramanan, D. (2013). Self-paced learning for long-term tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Wu, Y., Lim, J., & Yang, M. H. (2013a). Online object tracking: A benchmark. In Proceedings of IEEE conference on computer vision and pattern recognition.

  • Wu, Y., Lim, J., & Yang, M. H. (2013b). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848.

    Article  Google Scholar 

  • Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys, 38(4), 13.

    Article  Google Scholar 

  • Zabih, R., & Woodfill, J. (1994). Non-parametric local transforms for computing visual correspondence. In Proceedings of the European conference on computer vision.

  • Zhang, J., Ma, S., & Sclaroff, S. (2014a). MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the European conference on computer vision.

  • Zhang, K., Zhang, L., Liu, Q., Zhang, D., & Yang, M. H. (2014b). Fast visual tracking via dense spatio-temporal context learning. In Proceedings of the European conference on computer vision.

  • Zhang, K., Zhang, L., & Yang, M. H. (2012a). Real-time compressive tracking. In Proceedings of the European conference on computer vision.

  • Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012b). Low-rank sparse learning for robust visual tracking. In Proceedings of the European conference on computer vision.

  • Zhang, T., Liu, S., Ahuja, N., Yang, M. H., & Ghanem, B. (2015). Robust visual tracking via consistent low-rank sparse learning. International Journal of Computer Vision, 111(2), 171–190.

    Article  Google Scholar 

  • Zhong, W., Lu, H., & Yang, M. H. (2014). Robust object tracking via sparse collaborative appearance model. IEEE Transactions on Image Processing, 23(5), 2356–2368.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Key Research and Development Program of China (2016YFB1001003), NSFC (61527804, 61521062) and the 111 Program (B07022).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming-Hsuan Yang.

Additional information

Communicated by Bernt Schiele.

Appendix

Appendix

In this appendix, we present two additional ablation studies on the OTB2013 dataset. First, we show the results of directly minimizing the errors over all the tracked results to update the correlation filters. Second, we analyze the robustness of the proposed method by spatially shifting the ground truth bounding boxes.

By directly minimizing the errors over all the tracked results, we consider all the extracted appearances \(\{x_j,j=1,2,\ldots ,p\}\) of the target object from the first frame up to the current frame p. The cost function is the weighted average quadratic error over these p frames. We assign each frame j with a weight \(\beta _j \ge 0\) and learn correlation filter w by minimizing the following objective function:

Fig. 21
figure 21

Performance of different update schemes on the OTB2013 dataset (Wu et al. 2013a) under one pass evaluation (OPE). Considering all the tracked results (ours-all-update) to update the translation filter does not improve tracking performance. The legend of precision plots shows the distance precision scores at 20 pixels. The legend of success plots contains the overlap success scores with the area under the curve (AUC)

Fig. 22
figure 22

Spatial shifts. The amount of shift is 10% of width or height of the ground-truth bounding box

$$\begin{aligned} \min _w\sum _{j=1}^p\beta _j\left( \sum _{m,n}\left| \left\langle \phi \left( x_{m,n}^j\right) , w^j \right\rangle -y^j(m,n) \right| ^2 +\lambda \left\langle w^j,w^j\right\rangle \right) , \end{aligned}$$
(23)

where \(w^j=\sum _{k,l}a(k,l)\phi (x_{k,l}^j)\). We have the solution to (23) in the Fourier domain as:

$$\begin{aligned} \mathcal {A}^p=\frac{\sum _{j=1}^p\beta _j\mathcal {K}_x^j\odot \overline{\mathcal {Y}}}{\sum _{j=1}^p\beta _j\mathcal {K}_x^j\odot \left( \mathcal {K}_x^j +\lambda \right) }, \end{aligned}$$
(24)

where \(\mathcal {K}_x^j=\mathscr {F}\left\{ k_x^j\right\} \) and \(k_x^j(m,n)=k(x_{m,n}^j,x^j)\). We perform a grid search and set the weight \(\beta _j=0.01\) and the update rate \(\lambda =10^{-4}\) for the best accuracy. We restore the parameter \(\{\mathcal {K}_x^j\}\), \(j=1,2,\ldots ,p-1\), to update the correlation filter in frame j.

Note that such an update scheme is not applicable in practice as it requires a linearly increasing computation and memory storage over the increase of frame number p. The average tracking speed is 2.5 frames per second (fps) versus 20.8 fps (ours) on the OTB2013 dataset. However, Fig. 21 shows that this update scheme does not improve performance. The average distance precision is 83.5 versus 84.8% (ours), and the average overlap success is 62.0 versus 62.8% (ours).

Fig. 23
figure 23

Tracking performance with spatially shifted ground truth bounding boxes on the OTB2013 dataset (Wu et al. 2013a) under one pass evaluation (OPE)

We spatially shift the ground truth bounding boxes with eight directions (Fig. 22) and rescale the ground truth bounding boxes with scaling factors 0.8, 0.9, 1.1 and 1.2. Figure 23 shows that slightly enlarge the ground truth bounding boxes (with scaling factor 1.1) does not significantly affect the tracking performance.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, C., Huang, JB., Yang, X. et al. Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking. Int J Comput Vis 126, 771–796 (2018). https://doi.org/10.1007/s11263-018-1076-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-018-1076-4

Keywords

Navigation