Abstract
Object tracking is challenging as target objects often undergo drastic appearance changes over time. Recently, adaptive correlation filters have been successfully applied to object tracking. However, tracking algorithms relying on highly adaptive correlation filters are prone to drift due to noisy updates. Moreover, as these algorithms do not maintain long-term memory of target appearance, they cannot recover from tracking failures caused by heavy occlusion or target disappearance in the camera view. In this paper, we propose to learn multiple adaptive correlation filters with both long-term and short-term memory of target appearance for robust object tracking. First, we learn a kernelized correlation filter with an aggressive learning rate for locating target objects precisely. We take into account the appropriate size of surrounding context and the feature representations. Second, we learn a correlation filter over a feature pyramid centered at the estimated target position for predicting scale changes. Third, we learn a complementary correlation filter with a conservative learning rate to maintain long-term memory of target appearance. We use the output responses of this long-term filter to determine if tracking failure occurs. In the case of tracking failures, we apply an incrementally learned detector to recover the target position in a sliding window fashion. Extensive experimental results on large-scale benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods in terms of efficiency, accuracy, and robustness.
Similar content being viewed by others
References
Arulampalam, M. S., Maskell, S., Gordon, N. J., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174–188.
Avidan, S. (2004). Support vector tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8), 1064–1072.
Avidan, S. (2007). Ensemble tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(2), 261–271.
Babenko, B., Yang, M. H., & Belongie, S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8), 1619–1632.
Bai, Q., Wu, Z., Sclaroff, S., Betke, M., & Monnier, C. (2013). Randomized ensemble tracking. In Proceedings of IEEE international conference on computer vision.
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., & Torr, P. H. S. (2016). Staple: Complementary learners for real-time tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Boddeti, V. N., Kanade, T., & Kumar, B. V. K. V. (2013). Correlation filters for object alignment. In Proceedings of IEEE conference on computer vision and pattern recognition.
Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In Proceedings of IEEE conference on computer vision and pattern recognition.
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551–585.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of IEEE conference on computer vision and pattern recognition.
Danelljan, M., Hager, G., Khan, F. S., & Felsberg, M. (2014a). Accurate scale estimation for robust visual tracking. In Proceedings of British machine vision conference.
Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In Proceedings of IEEE international conference on computer vision.
Danelljan, M., Khan, F. S., Felsberg, M., & van de Weijer, J. (2014b). Adaptive color attributes for real-time visual tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Dinh, T. B., Vo, N., & Medioni, G. G. (2011). Context tracker: Exploring supporters and distracters in unconstrained environments. In Proceedings of IEEE conference on computer vision and pattern recognition.
Felsberg, M. (2013). Enhanced distribution field tracking using channel representations. In Proceedings of IEEE international conference on computer vision workshop.
Galoogahi, H. K., Sim, T., & Lucey, S. (2013). Multi-channel correlation filters. In Proceedings of IEEE international conference on computer vision.
Gao, J., Ling, H., Hu, W., & Xing, J. (2014). Transfer learning based visual tracking with gaussian processes regression. In Proceedings of the European conference on computer vision.
Girshick, R.B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition.
Grabner, H., Leistner, C., & Bischof, H. (2008). Semi-supervised on-line boosting for robust tracking. In Proceedings of the European conference on computer vision.
Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, 11(1), 23–63.
Hare, S., Saffari, A., & Torr, P. H. S. (2011). Struck: Structured output tracking with kernels. In Proceedings of IEEE international conference on computer vision.
He, S., Yang, Q., Lau, R. W. H., Wang, J., & Yang, M. H. (2013). Visual tracking via locality sensitive histograms. In Proceedings of IEEE conference on computer vision and pattern recognition.
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2012). Exploiting the circulant structure of tracking-by-detection with kernels. In Proceedings of the European conference on computer vision.
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.
Hong, Z., Chen, Z., Wang, C., Mei, X., & Prokhorov, D., Tao, D. (2015). MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Hua, Y., Alahari, K., & Schmid, C. (2014). Occlusion and motion reasoning for long-term tracking. In Proceedings of the European conference on computer vision.
Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(7), 1409–1422.
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., Fernández, G., Vojír, T., Häger, G., Nebehay, G., & Pflugfelder, R. P. (2015). The visual object tracking VOT2015 challenge results. In Proceedings of IEEE international conference on computer vision workshop.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems.
Kumar, B. V. K. V., Mahalanobis, A., & Juday, R. D. (2005). Correlation pattern recognition. Cambridge: Cambridge University Press.
Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A. R., & van den Hengel, A. (2013). A survey of appearance models in visual object tracking. ACM TIST, 4(4), 58.
Li, Y., & Zhu, J. (2014). A scale adaptive kernel correlation filter tracker with feature integration. In Proceedings of the European conference on computer vision workshop.
Liu, T., Wang, G., & Yang, Q. (2015). Real-time part-based visual tracking via adaptive correlation filters. In Proceedings of IEEE conference on computer vision and pattern recognition.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of IEEE conference on computer vision and pattern recognition.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Lucas, B. D., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of International Joint Conference on Artificial Intelligence.
Kristan, M., et al. (2014). The visual object tracking VOT2014 challenge results. In Proceedings of the European conference on computer vision workshop.
Ma, C., Huang, J., Yang, X., & Yang, M. (2015a). Hierarchical convolutional features for visual tracking. In Proceedings of IEEE international conference on computer vision.
Ma, C., Yang, X., Zhang, C., & Yang, M. H. (2015b). Learning a temporally invariant feature representation for visual tracking. In Proceedings of IEEE international conference on image processing.
Ma, C., Yang, X., Zhang, C., & Yang, M. H. (2015c). Long-term correlation tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Matthews, I., Ishikawa, T., & Baker, S. (2004). The template update problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 810–815.
Ning, G., Zhang, Z., Huang, C., Ren, X., Wang, H., Cai, C., & He, Z. (2017). Spatially supervised recurrent convolutional neural networks for visual object tracking. In IEEE international symposium on circuits and systems (ISCAS).
Pernici, F. (2012). Facehugger: The ALIEN tracker applied to faces. In Proceedings of the European conference on computer vision.
Santner, J., Leistner, C., Saffari, A., Pock, T., & Bischof, H. (2010). PROST: Parallel robust online simple tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Sevilla-Lara, L., & Learned-Miller, E. G. (2012). Distribution fields for tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of international conference on learning representation.
Smeulders, A. W. M., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2014). Visual tracking: An experimental survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1442–1468.
Supancic, J. S., & Ramanan, D. (2013). Self-paced learning for long-term tracking. In Proceedings of IEEE conference on computer vision and pattern recognition.
Wu, Y., Lim, J., & Yang, M. H. (2013a). Online object tracking: A benchmark. In Proceedings of IEEE conference on computer vision and pattern recognition.
Wu, Y., Lim, J., & Yang, M. H. (2013b). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1834–1848.
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys, 38(4), 13.
Zabih, R., & Woodfill, J. (1994). Non-parametric local transforms for computing visual correspondence. In Proceedings of the European conference on computer vision.
Zhang, J., Ma, S., & Sclaroff, S. (2014a). MEEM: Robust tracking via multiple experts using entropy minimization. In Proceedings of the European conference on computer vision.
Zhang, K., Zhang, L., Liu, Q., Zhang, D., & Yang, M. H. (2014b). Fast visual tracking via dense spatio-temporal context learning. In Proceedings of the European conference on computer vision.
Zhang, K., Zhang, L., & Yang, M. H. (2012a). Real-time compressive tracking. In Proceedings of the European conference on computer vision.
Zhang, T., Ghanem, B., Liu, S., & Ahuja, N. (2012b). Low-rank sparse learning for robust visual tracking. In Proceedings of the European conference on computer vision.
Zhang, T., Liu, S., Ahuja, N., Yang, M. H., & Ghanem, B. (2015). Robust visual tracking via consistent low-rank sparse learning. International Journal of Computer Vision, 111(2), 171–190.
Zhong, W., Lu, H., & Yang, M. H. (2014). Robust object tracking via sparse collaborative appearance model. IEEE Transactions on Image Processing, 23(5), 2356–2368.
Acknowledgements
This work is supported in part by the National Key Research and Development Program of China (2016YFB1001003), NSFC (61527804, 61521062) and the 111 Program (B07022).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Bernt Schiele.
Appendix
Appendix
In this appendix, we present two additional ablation studies on the OTB2013 dataset. First, we show the results of directly minimizing the errors over all the tracked results to update the correlation filters. Second, we analyze the robustness of the proposed method by spatially shifting the ground truth bounding boxes.
By directly minimizing the errors over all the tracked results, we consider all the extracted appearances \(\{x_j,j=1,2,\ldots ,p\}\) of the target object from the first frame up to the current frame p. The cost function is the weighted average quadratic error over these p frames. We assign each frame j with a weight \(\beta _j \ge 0\) and learn correlation filter w by minimizing the following objective function:
where \(w^j=\sum _{k,l}a(k,l)\phi (x_{k,l}^j)\). We have the solution to (23) in the Fourier domain as:
where \(\mathcal {K}_x^j=\mathscr {F}\left\{ k_x^j\right\} \) and \(k_x^j(m,n)=k(x_{m,n}^j,x^j)\). We perform a grid search and set the weight \(\beta _j=0.01\) and the update rate \(\lambda =10^{-4}\) for the best accuracy. We restore the parameter \(\{\mathcal {K}_x^j\}\), \(j=1,2,\ldots ,p-1\), to update the correlation filter in frame j.
Note that such an update scheme is not applicable in practice as it requires a linearly increasing computation and memory storage over the increase of frame number p. The average tracking speed is 2.5 frames per second (fps) versus 20.8 fps (ours) on the OTB2013 dataset. However, Fig. 21 shows that this update scheme does not improve performance. The average distance precision is 83.5 versus 84.8% (ours), and the average overlap success is 62.0 versus 62.8% (ours).
We spatially shift the ground truth bounding boxes with eight directions (Fig. 22) and rescale the ground truth bounding boxes with scaling factors 0.8, 0.9, 1.1 and 1.2. Figure 23 shows that slightly enlarge the ground truth bounding boxes (with scaling factor 1.1) does not significantly affect the tracking performance.
Rights and permissions
About this article
Cite this article
Ma, C., Huang, JB., Yang, X. et al. Adaptive Correlation Filters with Long-Term and Short-Term Memory for Object Tracking. Int J Comput Vis 126, 771–796 (2018). https://doi.org/10.1007/s11263-018-1076-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-018-1076-4