Skip to main content

SOTVerse: A User-Defined Task Space of Single Object Tracking

Abstract

Single object tracking (SOT) research falls into a cycle—trackers perform well on most benchmarks but quickly fail in challenging scenarios, causing researchers to doubt the insufficient data content and take more effort to construct larger datasets with more challenging situations. However, inefficient data utilization and limited evaluation methods more seriously hinder SOT research. The former causes existing datasets can not be exploited comprehensively, while the latter neglects challenging factors in the evaluation process. In this article, we systematize the representative benchmarks and form a single object tracking metaverse (SOTVerse)—a user-defined SOT task space to break through the bottleneck. We first propose a 3E Paradigm to describe tasks by three components (i.e., environment, evaluation, and executor). Then, we summarize task characteristics, clarify the organization standards, and construct SOTVerse with 12.56 million frames. Specifically, SOTVerse automatically labels challenging factors per frame, allowing users to generate user-defined spaces efficiently via construction rules. Besides, SOTVerse provides two mechanisms with new indicators and successfully evaluates trackers under various subtasks. Consequently, SOTVerse first provides a strategy to improve resource utilization in the computer vision area, making research more standardized. The SOTVerse, toolkit, evaluation server, and results are available at http://metaverse.aitestunion.com.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Data Availability

All data will be made available on reasonable request.

Code Availability

The toolkit and experimental results will be made publicly available.

References

  • Abu Alhaija, H., Mustikovela, S. K., Mescheder, L., Geiger, A., & Rother, C. (2018). Augmented reality meets computer vision: Efficient data generation for urban driving scenes. International Journal of Computer Vision, 126(9), 961–972.

    Article  Google Scholar 

  • Beals, R., Mayyasi, A., Templeton, A., & Johnston, W. (1971). The relationship between basketball shooting performance and certain visual attributes. American Journal of Optometry and Archives of American Academy of Optometry, 48(7), 585–590.

    Article  Google Scholar 

  • Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional Siamese networks for object tracking. In European conference on computer vision (pp. 850–865). Springer.

  • Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019). Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6182–6191).

  • Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2020). Know your surroundings: Exploiting scene information for object tracking. In European conference on computer vision (pp. 205–221). Springer.

  • Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2), 115.

    Article  Google Scholar 

  • Burg, A. (1966). Visual acuity as measured by dynamic and static tests: A comparative evaluation. Journal of Applied Psychology, 50(6), 460.

    Article  Google Scholar 

  • Čehovin, L., Leonardis, A., & Kristan, M. (2016). Visual object tracking performance measures revisited. IEEE Transactions on Image Processing, 25(3), 1261–1274.

    Article  MathSciNet  MATH  Google Scholar 

  • Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., & Yu, N. (2017). Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In 2017 IEEE international conference on computer vision (ICCV) (pp. 4846–4855). https://doi.org/10.1109/ICCV.2017.518

  • Ciaparrone, G., Sanchez, F. L., Tabik, S., Troiano, L., Tagliaferri, R., & Herrera, F. (2019). Deep learning in video multi-object tracking: A survey. Neurocomputing, 381, 61–88.

    Article  Google Scholar 

  • Collins, R. T. (2003). Mean-shift blob tracking through scale space. In Proceedings of the 2003 IEEE computer society conference on computer vision and pattern recognition, 2003 (Vol. 2, p. 234). IEEE.

  • Collins, R., Zhou, X., & Teh, S. K. (2005). An open source tracking testbed and evaluation web site. In IEEE international workshop on performance evaluation of tracking and surveillance (Vol. 2, p. 35).

  • Cook, D. J. (2012). How smart is your home. Science, 335(6076), 1579–1581.

    Article  Google Scholar 

  • Cui, Y., Jiang, C., Wang, L., & Wu, G. (2022). Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13608–13618).

  • Danelljan, M., Bhat, G., Khan, F. S., & Felsberg, M. (2019). Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4660–4669).

  • Danelljan, M., Bhat, G., Shahbaz Khan, F., & Felsberg, M. (2017). Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6638–6646).

  • Danelljan, M., Gool, L. V., & Timofte, R. (2020). Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7183–7192).

  • Dendorfer, P., Osep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S., & Leal-Taixé, L. (2021). MOTChallenge: A benchmark for single-camera multiple target tracking. International Journal of Computer Vision, 129(4), 845–881.

    Article  Google Scholar 

  • Dunnhofer, M., Furnari, A., Farinella, G. M., & Micheloni, C. (2023). Visual object tracking in first person vision. International Journal of Computer Vision, 131(1), 259–283.

    Article  Google Scholar 

  • Dupeyroux, J., Serres, J. R., & Viollet, S. (2019). AntBot: A six-legged walking robot able to home like desert ants in outdoor environments. Science Robotics, 4(27), eaau0307.

    Article  Google Scholar 

  • Esteva, A., Chou, K., Yeung, S., Naik, N., Madani, A., Mottaghi, A., Liu, Y., Topol, E., Dean, J., & Socher, R. (2021). Deep learning-enabled medical computer vision. NPJ Digital Medicine, 4(1), 5.

    Article  Google Scholar 

  • Fan, H., Bai, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Huang, M., Liu, J., & Xu, Y. (2021). LaSOT: A high-quality large-scale single object tracking benchmark. International Journal of Computer Vision, 129(2), 439–461.

    Article  Google Scholar 

  • Ferryman, J., & Shahrokni, A. (2009). PETS2009: Dataset and challenge. In 2009 twelfth IEEE international workshop on performance evaluation of tracking and surveillance (pp. 1–6). IEEE.

  • Finlayson, G. D., & Trezzi, E. (2004). Shades of gray and colour constancy. In The twelfth color imaging conference 2004 (pp. 37–41).

  • Fisher, R. B. (2004). The PETS04 surveillance ground-truth data sets. In Proceedings of the 6th IEEE international workshop on performance evaluation of tracking and surveillance (pp. 1–5).

  • Gao, S., Zhou, C., & Zhang, J. (2023). Generalized relation modeling for transformer tracking. arXiv preprint arXiv:2303.16580

  • Gauglitz, S., Höllerer, T., & Turk, M. (2011). Evaluation of interest point detectors and feature descriptors for visual tracking. International Journal of Computer Vision, 94(3), 335–360.

    Article  MATH  Google Scholar 

  • Geuther, B. Q., Deats, S. P., Fox, K. J., Murray, S. A., Braun, R. E., White, J. K., Chesler, E. J., Lutz, C. M., & Kumar, V. (2019). Robust mouse tracking in complex environments using neural networks. Communications Biology, 2(1), 124.

    Article  Google Scholar 

  • Godec, M., Roth, P. M., & Bischof, H. (2013). Hough-based tracking of non-rigid objects. Computer Vision and Image Understanding, 117(10), 1245–1256.

    Article  Google Scholar 

  • Guo, D., Wang, J., Cui, Y., Wang, Z., & Chen, S. (2020). SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6269–6277).

  • Han, B., Comaniciu, D., Zhu, Y., & Davis, L. S. (2008). Sequential kernel density approximation and its application to real-time visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1186–1197.

    Article  Google Scholar 

  • Held, D., Guillory, D., Rebsamen, B., Thrun, S., & Savarese, S. (2016). A probabilistic framework for real-time 3D segmentation using spatial, temporal, and semantic cues. https://doi.org/10.15607/RSS.2016.XII.024

  • Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2014). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.

    Article  Google Scholar 

  • Huang, L., Zhao, X., & Huang, K. (2020). GlobalTrack: A simple and strong baseline for long-term tracking. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 11037–11044).

  • Huang, L., Zhao, X., & Huang, K. (2021). GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464

    Article  Google Scholar 

  • Hu, S., Zhao, X., Huang, L., & Huang, K. (2023). Global instance tracking: Locating target more like humans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 576–592. https://doi.org/10.1109/TPAMI.2022.3153312

  • Kiani Galoogahi, H., Fagg, A., Huang, C., Ramanan, D., & Lucey, S. (2017). Need for speed: A benchmark for higher frame rate object tracking. In Proceedings of the IEEE international conference on computer vision (pp. 1125–1134).

  • Kim, J., Misu, T., Chen, Y.-T., Tawari, A., & Canny, J. (2019). Grounding human-to-vehicle advice for self-driving vehicles. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10591–10599).

  • Kohl, P., Coffey, B., Reichow, A., Thompson, W., & Willer, P. (1991). A comparative study of visual performance in jet fighter pilots and non-pilots. Journal of Behavioral Optometry, 5(2), 123–126.

    Google Scholar 

  • Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5), 1366–1401.

    Article  Google Scholar 

  • Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A. (2018). The sixth visual object tracking vot2018 challenge results. In Proceedings of the European conference on computer vision (ECCV) workshops.

  • Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin, L., Vojir, et al. (2017). The visual object tracking VOT2017 challenge results, 1949–1972. https://doi.org/10.1109/ICCVW.2017.230

  • Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Danelljan, M., Zajc, L. Č., Lukežič, A., & Drbohlav, O. (2020). The eighth visual object tracking vot2020 challenge results. In European conference on computer vision (pp. 547–601). Springer.

  • Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kamarainen, J.-K., Čehovin Zajc, L., Drbohlav, O., Lukezic, A., & Berg, A. (2019). The seventh visual object tracking vot2019 challenge results. In Proceedings of the IEEE/CVF international conference on computer vision workshops.

  • Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Porikli, F., Cehovin, L., Nebehay, G., Fernandez, G., Vojir, T., Gatt, A., Khajenezhad, A., Salahledin, A., Soltani-Farani, A., Zarezade, A., Petrosino, A., Milton, A., Bozorgtabar, B., Li, B., Chan, C. S., Heng, C., Ward, D., Kearney, D., Monekosso, D., Karaimer, H. C., Rabiee, H. R., Zhu, J., Gao, J., Xiao, J., Zhang, J., Xing, J., Huang, K., Lebeda, K., Cao, L., Maresca, M.E., Lim, M. K., El Helw, M., Felsberg, M., Remagnino, P., Bowden, R., Goecke, R., Stolkin, R., Lim, S.Y., Maher, S., Poullot, S., Wong, S., Satoh, S., Chen, W., Hu, W., Zhang, X., Li, Y., & Niu, Z. (2013). The visual object tracking vot2013 challenge results. In 2013 IEEE international conference on computer vision workshops (pp. 98–111). https://doi.org/10.1109/ICCVW.2013.20

  • Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Zajc, L. Č, et al. (2016). The visual object tracking VOT2016 challenge results. Springer.

    Book  Google Scholar 

  • Kristan, M., Matas, J., Leonardis, A., Vojíř, T., Pflugfelder, R., Fernandez, G., Nebehay, G., Porikli, F., & Čehovin, L. (2016). A novel performance evaluation methodology for single-target trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11), 2137–2155.

    Article  Google Scholar 

  • Kwon, J., & Lee, K. M. (2009). Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive basin hopping Monte Carlo sampling. In 2009 IEEE conference on computer vision and pattern recognition (pp. 1208–1215). IEEE.

  • Land, M. F., & McLeod, P. (2000). From eye movements to actions: How batsmen hit the ball. Nature Neuroscience, 3(12), 1340–1345.

    Article  Google Scholar 

  • Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.

    Article  MATH  Google Scholar 

  • Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., & Yan, J. (2019). Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4282–4291).

  • Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8971–8980).

  • Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: Algorithms and benchmark. IEEE Transactions on Image Processing, 24(12), 5630–5644. https://doi.org/10.1109/TIP.2015.2482905

    Article  MathSciNet  MATH  Google Scholar 

  • Li, A., Lin, M., Wu, Y., Yang, M.-H., & Yan, S. (2015). NUS-PRO: A new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 335–349.

    Article  Google Scholar 

  • Liu, Q., He, Z., Li, X., & Zheng, Y. (2019). PTB-TIR: A thermal infrared pedestrian tracking benchmark. IEEE Transactions on Multimedia, 22(3), 666–675.

    Article  Google Scholar 

  • Lukeźič, A., Zajc, L. Č, Vojíř, T., Matas, J., & Kristan, M. (2020). Performance evaluation methodology for long-term single-object tracking. IEEE Transactions on Cybernetics, 51, 6305–6318.

    Article  Google Scholar 

  • M, J. W. (1962). The effect of relative motion on visual acuity. Survey of Ophthalmology, 7, 83–116.

    Google Scholar 

  • Mayer, C., Danelljan, M., Paudel, D. P., & Van Gool, L. (2021). Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13444–13454).

  • McLeod, P., Reed, N., & Dienes, Z. (2003). How fielders arrive in time to catch the ball. Nature, 426(6964), 244–245.

    Article  Google Scholar 

  • Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Moudgil, A., & Gandhi, V. (2018). Long-term visual object tracking benchmark. In Asian conference on computer vision (pp. 629–645).

  • Mueller, M., Smith, N., & Ghanem, B. (2016). A benchmark and simulator for UAV tracking. In European conference on computer vision (pp. 445–461). Springer.

  • Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). TrackingNet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision (ECCV) (pp. 300–317).

  • Nejhum, S. S., Ho, J., & Yang, M.-H. (2008). Visual tracking with histograms and articulating blocks. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1–8). IEEE.

  • Pech-Pacheco, J.L., Cristobal, G., Chamorro-Martinez, & J., Fernandez-Valdivia, J. (2000). Diatom autofocusing in brightfield microscopy: A comparative study. In Proceedings 15th international conference on pattern recognition. ICPR-2000 (Vol. 3, pp. 314–317).

  • Ramakrishnan, S. K., Jayaraman, D., & Grauman, K. (2021). An exploration of embodied visual exploration. International Journal of Computer Vision, 129(5), 1616–1649.

    Article  Google Scholar 

  • Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017). Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5296–5305).

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., & Bernstein, M. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Team, O. E. L., Stooke, A., Mahajan, A., Barros, C., Deck, C., Bauer, J., Sygnowski, J., Trebacz, M., Jaderberg, M., Mathieu, M., et al. (2021). Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808

  • Valmadre, J., Bertinetto, L., Henriques, J. F., Tao, R., Vedaldi, A., Smeulders, A. W., Torr, P. H., & Gavves, E. (2018). Long-term tracking in the wild: A benchmark. In Proceedings of the European conference on computer vision (ECCV) (pp. 670–685).

  • Voigtlaender, P., Luiten, J., Torr, P. H., & Leibe, B. (2020). Siam R-CNN: Visual tracking by re-detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6578–6588).

  • Wang, S., Zhou, Y., Yan, J., & Deng, Z. (2018). Fully motion-aware network for video object detection. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision—ECCV 2018 (pp. 557–573). Springer.

    Chapter  Google Scholar 

  • Wu, Y., Lim, J., & Yang, M.-H. (2013). Online object tracking: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2411–2418).

  • Wu, Y., Lim, J., & Yang, M.-H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(09), 1834–1848.

    Article  Google Scholar 

  • Xu, Y., Wang, Z., Li, Z., Yuan, Y., & Yu, G. (2020). Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 12549–12556).

  • Yan, B., Jiang, Y., Sun, P., Wang, D., Yuan, Z., Luo, P., & Lu, H. (2022). Towards grand unification of object tracking. In Computer vision—ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXI (pp. 733–751). Springer.

  • Yan, B., Zhao, H., Wang, D., Lu, H., & Yang, X. (2019). ’Skimming-perusal’ tracking: A framework for real-time and robust long-term tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2385–2393).

  • Ye, B., Chang, H., Ma, B., Shan, S., & Chen, X. (2022). Joint feature learning and relation modeling for tracking: A one-stream framework. In Computer vision—ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII (pp. 341–357). Springer.

  • Yoon, J. H., Lee, C.-R., Yang, M.-H., & Yoon, K.-J. (2019). Structural constraint data association for online multi-object tracking. International Journal of Computer Vision, 127(1), 1–21.

    Article  Google Scholar 

  • Zhang, Z., & Peng, H. (2019). Deeper and wider Siamese networks for real-time visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4591–4600).

  • Zhang, G., & Vela, P. A. (2015). Good features to track for visual slam. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1373–1382).

  • Zhang, Z., Peng, H., Fu, J., Li, B., & Hu, W. (2020). Ocean: Object-aware anchor-free tracking. In European conference on computer vision (pp. 771–787). Springer.

  • Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W. (2018). Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV) (pp. 101–117).

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shiyu Hu or Xin Zhao.

Ethics declarations

Conflict of interest

All authors declare no conflicts of interest.

Additional information

Communicated by Matej Kristan.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Subsets of Challenging Space

Fig. 17
figure 17

Distribution of subsets in challenging space

Table 5 The non-repetitive sequences from the three VOT datasets (VOT2016 (Kristan et al., 2016), VOT2018 (Kristan et al., 2018), and VOT2019 (Kristan et al., 2019))

Due to some repeated sequences in VOT2016 (Kristan et al., 2016), VOT2018 (Kristan et al., 2018), and VOT2019 (Kristan et al., 2019), we have removed duplicates from the three datasets when constructing the challenge space, ensuring that the constructed subspace does not contain any repeated sequences (Fig. 17). Specifically, we carefully examined the sequences of VOT2016, VOT2018, and VOT2019 and retained only the non-duplicated ones. After the selection process, a total of 82 sequences remained out of the original 180 sequences, as illustrated in the Table 5. Out of these, 60 sequences belong to VOT2016, 10 sequences belong to VOT2018, and 12 sequences belong to VOT2019.

Appendix B: Relationship of Dynamic Attributes

Fig. 18
figure 18

Relationship of dynamic attributes in SOTVerse

The last row in Fig. 18 indicates that variations of the other five dynamic attributes will change the corrcoef. In addition, compared with the other five dynamic attributes, corrcoef can better comprehensively reflect the dynamic variations in the video sequence. Thus, it can be used as an indicator of variation degree in the tracking process.

Appendix C: An Example of Attribute Plot

Fig. 19
figure 19

An example of the calculation process of the attribute plot

As shown in Fig. 19, sub-figure (a) is from car-6 in LaSOT (Fan et al., 2021) dataset. Here, we select six algorithms (GRM (Gao et al., 2023), Unicorn (Yan et al., 2022), and OSTrack (Ye et al., 2022) representing the latest state-of-the-art algorithms; while ECO (Danelljan et al., 2017), SiamFC (Bertinetto et al., 2016), and KCF (Henriques et al., 2014) representing classical algorithms) as representatives to generate the attribute plot. Among them, sub-figure (b) employs the previous method, which calculates the proportions based on the fail frames (\(\mathcal {A}_{f}(\cdot )\)). In contrast, sub-figure (c) calculates the proportion of each attribute based on the success frames (\(\mathcal {A}_{s}(\cdot )\)). Sub-figure (d) represents the difference between sub-figures (b) and (c) and serves as the updated attribute plot \(\mathcal {A}(\cdot )\) (i.e., \(\mathcal {A}(\cdot ) = \mathcal {A}_{f}(\cdot ) - \mathcal {A}_{s}(\cdot )\)).

Clearly, the utilization of the previous calculation method (b) highlights a substantial correlation between the fail frames and the challenging factor of blur bounding-box. However, upon closer examination of sub-figure (c), it is evident that blur bounding-box remains consistently prevalent across all successful frames. A comparison between (b) and (c) demonstrates that blur bounding-box is a widely observed challenging factor in the majority of frames within this sequence. However, it does not serve as the primary cause of algorithm failure. Sub-figure (d) offers a more precise depiction of the challenging factors that contribute to algorithm failure. For instance, in the case of GRM, its failure primarily results from the fast motion of the target.

Sub-figures (e–h) are from the person-1 sequence in LaSOT dataset, analyzed using the same process as (a–d). For the GRM, Unicorn, and OSTrack methods, the most challenging factors in this sequence are a series of dynamic attributes (top right corner of sub-figure (h)), including variations in illumination, scale, ratio, and fast movement. Moreover, within sub-figure (h), negative value regions indicate that the algorithms excel in these attributes, making them more likely to achieve successful target tracking in most cases. For instance, the GRM, Unicorn, and OSTrack methods exhibit strong tracking capabilities on the static attributes of abnormal scale and ratio within this sequence.

Appendix D: Comprehensive Experimental Results

Table 6 The model architectures and URLs of open-sourced algorithms used in this work
Fig. 20
figure 20

Experiments in normal space. Three columns represent the results in the short-term tracking task (left), long-term tracking task (middle), and global instance tracking task (right). Each task is evaluated by precision plots in OPE (a1–a3), normalized precision plots in OPE (b1–b3), precision plots in R-OPE (c1–c3), normalized precision plots in R-OPE (d1–d3)

All experiments are performed on a server with 4 NVIDIA TITAN RTX GPUs and a 64 Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz. We use the parameters provided by the original authors.

Table 7 Performance of 23 representative trackers on all sub-spaces, based on precision score (Color table online)
Table 8 Performance of 23 representative trackers on all sub-spaces, based on normalized precision score (Color table online)
Table 9 Performance of 23 representative trackers on all sub-spaces, based on success score (Color table online)
Table 10 Performance of 23 representative trackers on all sub-spaces, based on precision score, weighted by sequences’ length (Color table online)
Table 11 Performance of 23 representative trackers on all sub-spaces, based on normalized precision score, weighted by sequences’ length (Color table online)
Table 12 Performance of 23 representative trackers on all sub-spaces, based on success score, weighted by sequences’ length (Color table online)
Fig. 21
figure 21

Experiments in OTB (Wu et al., 2015) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot

Fig. 22
figure 22

Experiments in OTB (Wu et al., 2015) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot

Appendix E: Experiments in Short-Term Tracking

1.1 E.1 Experiments in OTB (Wu et al., 2015)

Fig. 23
figure 23

Experiments in VOT2016 (Kristan et al., 2016) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot

Fig. 24
figure 24

Experiments in VOT2016 (Kristan et al., 2016) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot

1.2 E.2 Experiments in VOT2016 (Kristan et al., 2016)

Fig. 25
figure 25

Experiments in VOT2018 (Kristan et al., 2018) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot

Fig. 26
figure 26

Experiments in VOT2018 (Kristan et al., 2018) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot

1.3 E.3 Experiments in VOT2018 (Kristan et al., 2018)

Fig. 27
figure 27

Experiments in VOT2019 (Kristan et al., 2019) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot

Fig. 28
figure 28

Experiments in VOT2019 (Kristan et al., 2019) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot

1.4 E.4 Experiments in VOT2019 (Kristan et al., 2019)

Fig. 29
figure 29

Experiments in GOT-10k (Huang et al., 2021) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot

Fig. 30
figure 30

Experiments in GOT-10k (Huang et al., 2021) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot

1.5 E.5 Experiments in GOT-10k (Huang et al., 2021)

Fig. 31
figure 31

Experiments in VOTLT2019 (Kristan et al., 2019) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot

Fig. 32
figure 32

Experiments in VOTLT2019 (Kristan et al., 2019) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot

Appendix F: Experiments in Long-term Tracking

1.1 F.1 Experiments in VOTLT2019 (Kristan et al., 2019)

Fig. 33
figure 33

Experiments in LaSOT (Fan et al., 2021) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot

Fig. 34
figure 34

Experiments in LaSOT (Fan et al., 2021) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot

1.2 F.2 Experiments in LaSOT (Fan et al., 2021)

Fig. 35
figure 35

Experiments in VideoCube (Hu et al., 2023) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot

Fig. 36
figure 36

Experiments in VideoCube (Hu et al., 2023) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot

Appendix G: Experiments in Global Instance Tracking

1.1 G.1 Experiments in VideoCube (Hu et al., 2023)

Fig. 37
figure 37

The composition of abnormal ratio space. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

Appendix H: The Composition of Challenging Space

1.1 H.1 Abnormal Ratio

Fig. 38
figure 38

The composition of abnormal scale space. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

1.2 H.2 Abnormal Scale

Fig. 39
figure 39

The composition of abnormal illumination space. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

1.3 H.3 Abnormal Illumination

Fig. 40
figure 40

The composition of blur bounding-box space. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

1.4 H.4 Blur Bounding-box

Fig. 41
figure 41

The composition of delta ratio space. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

1.5 H.5 Delta Ratio

Fig. 42
figure 42

The composition of delta scale space. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

1.6 H.6 Delta Scale

Fig. 43
figure 43

The composition of delta illumination. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

1.7 H.7 Delta Illumination

Fig. 44
figure 44

The composition of delta blur bounding-box. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

1.8 H.8 Delta Blur Bounding-Box

Fig. 45
figure 45

The composition of fast motion. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

1.9 H.9 Fast Motion

Fig. 46
figure 46

The composition of low correlation coefficient. a The distribution of attribute values and sequence lengths, each point representing a sub-sequence. b The distribution of sequence lengths. c The distribution of attribute values

1.10 H.10 Low Correlation Coefficient

Fig. 47
figure 47

Experiments in challenging space with static attributes. a–d The tracking results in different challenging factors. Each task is evaluated by precision plot, normalized precision plot, and success plot with OPE mechanism

Appendix I: Experiments in Challenging Space

1.1 I.1 Static Attributes

Fig. 48
figure 48

Experiments in challenging space with dynamic attributes. a–c The tracking results in different challenging factors. Each task is evaluated by precision plot, normalized precision plot, and success plot with OPE mechanism

1.2 I.2 Dynamic Attributes

Fig. 49
figure 49

Experiments in challenging space with dynamic attributes. a–c The tracking results in different challenging factors. Each task is evaluated by precision plot, normalized precision plot, and success plot with OPE mechanism

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, S., Zhao, X. & Huang, K. SOTVerse: A User-Defined Task Space of Single Object Tracking. Int J Comput Vis (2023). https://doi.org/10.1007/s11263-023-01908-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11263-023-01908-5

Keywords