Skip to main content
Log in

LaSOT: A High-quality Large-scale Single Object Tracking Benchmark

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Despite great recent advances in visual tracking, its further development, including both algorithm design and evaluation, is limited due to lack of dedicated large-scale benchmarks. To address this problem, we present LaSOT, a high-quality Large-scale Single Object Tracking benchmark. LaSOT contains a diverse selection of 85 object classes, and offers 1550 totaling more than 3.87 million frames. Each video frame is carefully and manually annotated with a bounding box. This makes LaSOT, to our knowledge, the largest densely annotated tracking benchmark. Our goal in releasing LaSOT is to provide a dedicated high quality platform for both training and evaluation of trackers. The average video length of LaSOT is around 2500 frames, where each video contains various challenge factors that exist in real world video footage,such as the targets disappearing and re-appearing. These longer video lengths allow for the assessment of long-term trackers. To take advantage of the close connection between visual appearance and natural language, we provide language specification for each video in LaSOT. We believe such additions will allow for future research to use linguistic features to improve tracking. Two protocols, full-overlap and one-shot, are designated for flexible assessment of trackers. We extensively evaluate 48 baseline trackers on LaSOT with in-depth analysis, and results reveal that there still exists significant room for improvement. The complete benchmark, tracking results as well as analysis are available at http://vision.cs.stonybrook.edu/~lasot/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Note that for tracking benchmark using full overlap split protocol, category bias should be inhibited in both training and evaluation of trackers. For tracking benchmark using one-shot split protocol, category bias should be inhibited in only training of trackers.

References

  • Babenko, B., Yang, M.H., & Belongie, S. (2009). Visual tracking with online multiple instance learning. In: CVPR.

  • Bao, C., Wu, Y., Ling, H., & Ji, H. (2012). Real time robust l1 tracker using accelerated proximal gradient approach. In: CVPR

  • Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., & Torr, P.H. (2016). Staple: Complementary learners for real-time tracking. In: CVPR.

  • Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., & Torr, P.H. (2016). Fully-convolutional siamese networks for object tracking. In: ECCVW

  • Bhat, G., Danelljan, M., Gool, L.V., Timofte, R. (2019) Learning discriminative model prediction for tracking. In: ICCV

  • Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M. (2010). Visual object tracking using adaptive correlation filters. In: CVPR.

  • Choi, J., Chang, H.J., Fischer, T., Yun, S., Lee, K., Jeong, J., Demiris, Y., Choi, J.Y. (2018). Context-aware deep feature compression for high-speed visual tracking. In: CVPR

  • Choi, J., Jin Chang, H., Jeong, J., Demiris, Y., Young Choi, J. (2016). Visual tracking using attention-modulated disintegration and integration. In: CVPR.

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In: CVPR.

  • Dai, K., Wang, D., Lu, H., Sun, C., Li, J. (2019). Visual tracking via adaptive spatially-regularized correlation filters. In: CVPR

  • Dai, K., Zhang, Y., Wang, D., Li, J., Lu, H., Yang, X. (2020). High-performance long-term tracking with meta-updater. In: CVPR.

  • Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M. (2017). Eco: Efficient convolution operators for tracking. In: CVPR

  • Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M. (2019). Atom: Accurate tracking by overlap maximization. In: CVPR

  • Danelljan, M., Häger, G., Khan, F., Felsberg, M. (2014). Accurate scale estimation for robust visual tracking. In: BMVC.

  • Danelljan, M., Häger, G., Khan, F. S., & Felsberg, M. (2017). Discriminative scale space tracking. TPAMI, 39(8), 1561–1575.

    Article  Google Scholar 

  • Danelljan, M., Hager, G., Shahbaz Khan, F., & Felsberg, M. (2015). Learning spatially regularized correlation filters for visual tracking. In: ICCV.

  • Danelljan, M., Robinson, A., Khan, F.S., & Felsberg, M. (2016). Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: ECCV.

  • Danelljan, M., Shahbaz Khan, F., Felsberg, M., Van de Weijer, J. (2014). Adaptive color attributes for real-time visual tracking. In: CVPR.

  • Dave, A., Khurana, T., Tokmakov, P., Schmid, C., Ramanan, D. (2020). Tao: A large-scale benchmark for tracking any object. In: ECCV.

  • Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In: CVPR.

  • Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. IJCV, 88(2), 303–338.

    Article  Google Scholar 

  • Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H. (2019). Lasot: A high-quality benchmark for large-scale single object tracking. In: CVPR.

  • Fan, H., Ling, H. (2017). Parallel tracking and verifying: A framework for real-time and high accuracy visual tracking. In: ICCV.

  • Fan, H., Ling, H. (2017). Sanet: Structure-aware network for visual tracking. In: CVPRW.

  • Fan, H., Ling, H. (2019). Siamese cascaded region proposal networks for real-time visual tracking. In: CVPR

  • Fan, H., Yang, F., Chu, P., Yuan, L., & Ling, H. (2020). TracKlinic: Diagnosis of challenge factors in visual tracking. In: arXiv:1911.07959.

  • Feng, Q., Ablavsky, V., Bai, Q., Li, G., & Sclaroff, S. (2020). Real-time visual object tracking with natural language description. In: WACV.

  • Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., & Lucey, S. (2017). Need for speed: A benchmark for higher frame rate object tracking. In: ICCV.

  • Galoogahi, H.K., Fagg, A., Lucey, S. (2017). Learning background-aware correlation filters for visual tracking. In: ICCV.

  • Ganin, Y., Lempitsky, V. (2015). Unsupervised domain adaptation by backpropagation. In: ICML.

  • Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., & Wang, S. (2017). Learning dynamic siamese network for visual object tracking. In: ICCV.

  • Gupta, A., Dollar, P., & Girshick, R. (2019). Lvis: A dataset for large vocabulary instance segmentation. In: CVPR.

  • Hare, S., Saffari, A., Torr, P.H.S. (2011). Struck: Structured output tracking with kernels. In: ICCV.

  • He, A., Luo, C., Tian, X., Zeng, W. (2018). A twofold siamese network for real-time object tracking. In: CVPR.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In: CVPR.

  • Henriques, J.F., Caseiro, R., Martins, P., & Batista, J. (2012). Exploiting the circulant structure of tracking-by-detection with kernels. In: ECCV.

  • Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. TPAMI, 37(3), 583–596.

    Article  Google Scholar 

  • Hu, R., Xu, H., Rohrbach, M., Feng, J., Saenko, K., & Darrell, T. (2016). Natural language object retrieval. In: CVPR.

  • Huang, L., Zhao, X., & Huang, K. (2019). Got-10k: A large high-diversity benchmark for generic object tracking in the wild. TPAMI.

  • Huang, L., Zhao, X., & Huang, K. (2020). Globaltrack: A simple and strong baseline for long-term tracking. In: AAAI.

  • Jia, X., Lu, H., & Yang, M.H. (2012). Visual tracking via adaptive structural local sparse appearance model. In: CVPR.

  • Kalal, Z., Mikolajczyk, K., & Matas, J. (2012). Tracking-learning-detection. TPAMI, 34(7), 1409–1422.

    Article  Google Scholar 

  • Kristan, M., Matas, J., Leonardis, A., Vojíř, T., Pflugfelder, R., Fernandez, G., et al. (2016). A novel performance evaluation methodology for single-target trackers. TPAMI, 38(11), 2137–2155.

    Article  Google Scholar 

  • Kristan et al., M. (2017). The visual object tracking vot2017 challenge results. In: ICCVW.

  • Kristan et al., M. (2018). The visual object tracking vot2018 challenge results. In: ECCVW.

  • Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In: NIPS.

  • Li, A., Lin, M., Wu, Y., Yang, M. H., & Yan, S. (2016). Nus-pro: A new visual tracking challenge. TPAMI, 38(2), 335–349.

    Article  Google Scholar 

  • Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., & Yan, J. (2019). Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: CVPR.

  • Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In: CVPR.

  • Li, F., Tian, C., Zuo, W., Zhang, L., Yang, M.H. (2018). Learning spatial-temporal regularized correlation filters for visual tracking. In: CVPR.

  • Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., & Lu, H. (2019). Gradnet: Gradient-guided network for visual object tracking. In: ICCV.

  • Li, P., Wang, D., Wang, L., & Lu, H. (2018). Deep visual tracking: Review and experimental comparison. Pattern Recog., 76, 323–338.

    Article  Google Scholar 

  • Li, S., Xiao, T., Li, H., Zhou, B., Yue, D., & Wang, X. (2017). Person search with natural language description. In: CVPR.

  • Li, X., Hu, W., Shen, C., Zhang, Z., Dick, A., & Hengel, A. V. D. (2013). A survey of appearance models in visual object tracking. ACM TIST, 4(4), 58.

    Google Scholar 

  • Li, Y., & Zhu, J. (2014). A scale adaptive kernel correlation filter tracker with feature integration. In: ECCVW.

  • Li, Z., Tao, R., Gavves, E., Snoek, C.G., & Smeulders, A.W., et al. (2017). Tracking by natural language specification. In: CVPR.

  • Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: Algorithms and benchmark. TIP, 24(12), 5630–5644.

    MathSciNet  MATH  Google Scholar 

  • Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C.L. (2014) Microsoft coco: Common objects in context. In: ECCV.

  • Liu, T., Wang, G., & Yang, Q. (2015) Real-time part-based visual tracking via adaptive correlation filters. In: CVPR

  • Lukezic, A., Kart, U., Kapyla, J., Durmush, A., Kamarainen, J.K., Matas, J., Kristan, M. (2019). Cdtb: A color and depth visual object tracking dataset and benchmark. In: ICCV.

  • Lukezic, A., Vojir, T., Zajc, L.C., Matas, J., & Kristan, M. (2017). Discriminative correlation filter with channel and spatial reliability. In: CVPR.

  • Ma, C., Huang, J.B., Yang, X., & Yang, M.H. (2015) Hierarchical convolutional features for visual tracking. In: ICCV

  • Ma, C., Yang, X., Zhang, C., & Yang, M.H. (2015). Long-term correlation tracking. In: CVPR.

  • Milan, A., Leal-Taixé, L., Reid, I., Roth, S., & Schindler, K. (2016). Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831.

  • Mueller, M., Smith, N., & Ghanem, B. (2016). A benchmark and simulator for uav tracking. In: ECCV.

  • Mueller, M., Smith, N., & Ghanem, B. (2017). Context-aware correlation filter tracking. In: CVPR.

  • Müller, M., Bibi, A., Giancola, S., Al-Subaihi, S., & Ghanem, B. (2018). Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In: ECCV

  • Nam, H., Han, B. (2016). Learning multi-domain convolutional neural networks for visual tracking. In: CVPR.

  • Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017) Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In: CVPR

  • Ross, D. A., Lim, J., Lin, R. S., & Yang, M. H. (2008). Incremental learning for robust visual tracking. IJCV, 77(1–3), 125–141.

    Article  Google Scholar 

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. IJCV, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In: ICLR.

  • Smeulders, A. W., Chu, D. M., Cucchiara, R., Calderara, S., Dehghan, A., & Shah, M. (2014). Visual tracking: An experimental survey. TPAMI, 36(7), 1442–1468.

    Article  Google Scholar 

  • Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R., & Yang, M.H. (2018). Vital: Visual tracking via adversarial learning. In: CVPR.

  • Tao, R., Gavves, E., & Smeulders, A.W. (2016). Siamese instance search for tracking. In: CVPR.

  • Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H. (2017). End-to-end representation learning for correlation filter based tracking. In: CVPR.

  • Valmadre, J., Bertinetto, L., Henriques, J.F., Tao, R., Vedaldi, A., Smeulders, A., Torr, P., & Gavves, E. (2018). Long-term tracking in the wild: A benchmark. In: ECCV.

  • Wang, G., Luo, C., Xiong, Z., & Zeng, W. (2019) Spm-tracker: Series-parallel matching for real-time visual object tracking. In: CVPR.

  • Wang, L., Ouyang, W., Wang, X., Lu, H. (2015). Visual tracking with fully convolutional networks. In: ICCV.

  • Wang, N., Song, Y., Ma, C., Zhou, W., Liu, W., & Li, H. (2019). Unsupervised deep tracking. In: CVPR.

  • Wang, N., & Yeung, D.Y. (2013). Learning a deep compact image representation for visual tracking. In: NIPS.

  • Wang, Q., Zhang, L., Bertinetto, L., Hu, W., & Torr, P.H. (2019). Fast online object tracking and segmentation: A unifying approach. In: CVPR.

  • Wu, Y., Lim, J., & Yang, M.H. (2013). Online object tracking: A benchmark. In: CVPR.

  • Wu, Y., Lim, J., & Yang, M. H. (2015). Object tracking benchmark. TPAMI, 37(9), 1834–1848.

    Article  Google Scholar 

  • Xu, T., Feng, Z.H., Wu, X.J., & Kittler, J. (2019). Joint group feature selection and discriminative filter learning for robust visual object tracking. In: ICCV.

  • Yan, B., Zhao, H., Wang, D., Lu, H., Yang, X. (2019). ’skimming-perusal’tracking: A framework for real-time and robust long-term tracking. In: ICCV.

  • Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM CSUR, 38(4), 13.

    Article  Google Scholar 

  • Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In: NIPS.

  • Zhang, J., Ma, S., & Sclaroff, S. (2014). Meem: robust tracking via multiple experts using entropy minimization. In: ECCV.

  • Zhang, K., Zhang, L., Liu, Q., Zhang, D., Yang, M.H. (2014). Fast visual tracking via dense spatio-temporal context learning. In: ECCV.

  • Zhang, K., Zhang, L., & Yang, M.H. (2012). Real-time compressive tracking. In: ECCV.

  • Zhang, Y., Wang, L., Qi, J., Wang, D., Feng, M., & Lu, H. (2018). Structured siamese network for real-time visual tracking. In: ECCV

  • Zhang, Z., & Peng, H. (2019). Deeper and wider siamese networks for real-time visual tracking. In: CVPR.

  • Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene parsing through ade20k dataset. In: CVPR.

  • Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., & Hu, W. (2018). Distractor-aware siamese networks for visual object tracking. In: ECCV.

Download references

Acknowledgements

We thank the anonymous reviewers for insightful suggestions, and Jeremy Chu for proofreading the final draft. Ling was supported partially by the Amazon AWS Machine Learning Research Award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Fan.

Additional information

Communicated by Konrad Schindler.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, H., Bai, H., Lin, L. et al. LaSOT: A High-quality Large-scale Single Object Tracking Benchmark. Int J Comput Vis 129, 439–461 (2021). https://doi.org/10.1007/s11263-020-01387-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-020-01387-y

Keywords

Navigation