Abstract
Single object tracking (SOT) research falls into a cycle—trackers perform well on most benchmarks but quickly fail in challenging scenarios, causing researchers to doubt the insufficient data content and take more effort to construct larger datasets with more challenging situations. However, inefficient data utilization and limited evaluation methods more seriously hinder SOT research. The former causes existing datasets can not be exploited comprehensively, while the latter neglects challenging factors in the evaluation process. In this article, we systematize the representative benchmarks and form a single object tracking metaverse (SOTVerse)—a user-defined SOT task space to break through the bottleneck. We first propose a 3E Paradigm to describe tasks by three components (i.e., environment, evaluation, and executor). Then, we summarize task characteristics, clarify the organization standards, and construct SOTVerse with 12.56 million frames. Specifically, SOTVerse automatically labels challenging factors per frame, allowing users to generate user-defined spaces efficiently via construction rules. Besides, SOTVerse provides two mechanisms with new indicators and successfully evaluates trackers under various subtasks. Consequently, SOTVerse first provides a strategy to improve resource utilization in the computer vision area, making research more standardized. The SOTVerse, toolkit, evaluation server, and results are available at http://metaverse.aitestunion.com.
This is a preview of subscription content, access via your institution.

















Data Availability
All data will be made available on reasonable request.
Code Availability
The toolkit and experimental results will be made publicly available.
References
Abu Alhaija, H., Mustikovela, S. K., Mescheder, L., Geiger, A., & Rother, C. (2018). Augmented reality meets computer vision: Efficient data generation for urban driving scenes. International Journal of Computer Vision, 126(9), 961–972.
Beals, R., Mayyasi, A., Templeton, A., & Johnston, W. (1971). The relationship between basketball shooting performance and certain visual attributes. American Journal of Optometry and Archives of American Academy of Optometry, 48(7), 585–590.
Bertinetto, L., Valmadre, J., Henriques, J. F., Vedaldi, A., & Torr, P. H. (2016). Fully-convolutional Siamese networks for object tracking. In European conference on computer vision (pp. 850–865). Springer.
Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019). Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6182–6191).
Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2020). Know your surroundings: Exploiting scene information for object tracking. In European conference on computer vision (pp. 205–221). Springer.
Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2), 115.
Burg, A. (1966). Visual acuity as measured by dynamic and static tests: A comparative evaluation. Journal of Applied Psychology, 50(6), 460.
Čehovin, L., Leonardis, A., & Kristan, M. (2016). Visual object tracking performance measures revisited. IEEE Transactions on Image Processing, 25(3), 1261–1274.
Chu, Q., Ouyang, W., Li, H., Wang, X., Liu, B., & Yu, N. (2017). Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In 2017 IEEE international conference on computer vision (ICCV) (pp. 4846–4855). https://doi.org/10.1109/ICCV.2017.518
Ciaparrone, G., Sanchez, F. L., Tabik, S., Troiano, L., Tagliaferri, R., & Herrera, F. (2019). Deep learning in video multi-object tracking: A survey. Neurocomputing, 381, 61–88.
Collins, R. T. (2003). Mean-shift blob tracking through scale space. In Proceedings of the 2003 IEEE computer society conference on computer vision and pattern recognition, 2003 (Vol. 2, p. 234). IEEE.
Collins, R., Zhou, X., & Teh, S. K. (2005). An open source tracking testbed and evaluation web site. In IEEE international workshop on performance evaluation of tracking and surveillance (Vol. 2, p. 35).
Cook, D. J. (2012). How smart is your home. Science, 335(6076), 1579–1581.
Cui, Y., Jiang, C., Wang, L., & Wu, G. (2022). Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 13608–13618).
Danelljan, M., Bhat, G., Khan, F. S., & Felsberg, M. (2019). Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4660–4669).
Danelljan, M., Bhat, G., Shahbaz Khan, F., & Felsberg, M. (2017). Eco: Efficient convolution operators for tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6638–6646).
Danelljan, M., Gool, L. V., & Timofte, R. (2020). Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7183–7192).
Dendorfer, P., Osep, A., Milan, A., Schindler, K., Cremers, D., Reid, I., Roth, S., & Leal-Taixé, L. (2021). MOTChallenge: A benchmark for single-camera multiple target tracking. International Journal of Computer Vision, 129(4), 845–881.
Dunnhofer, M., Furnari, A., Farinella, G. M., & Micheloni, C. (2023). Visual object tracking in first person vision. International Journal of Computer Vision, 131(1), 259–283.
Dupeyroux, J., Serres, J. R., & Viollet, S. (2019). AntBot: A six-legged walking robot able to home like desert ants in outdoor environments. Science Robotics, 4(27), eaau0307.
Esteva, A., Chou, K., Yeung, S., Naik, N., Madani, A., Mottaghi, A., Liu, Y., Topol, E., Dean, J., & Socher, R. (2021). Deep learning-enabled medical computer vision. NPJ Digital Medicine, 4(1), 5.
Fan, H., Bai, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Huang, M., Liu, J., & Xu, Y. (2021). LaSOT: A high-quality large-scale single object tracking benchmark. International Journal of Computer Vision, 129(2), 439–461.
Ferryman, J., & Shahrokni, A. (2009). PETS2009: Dataset and challenge. In 2009 twelfth IEEE international workshop on performance evaluation of tracking and surveillance (pp. 1–6). IEEE.
Finlayson, G. D., & Trezzi, E. (2004). Shades of gray and colour constancy. In The twelfth color imaging conference 2004 (pp. 37–41).
Fisher, R. B. (2004). The PETS04 surveillance ground-truth data sets. In Proceedings of the 6th IEEE international workshop on performance evaluation of tracking and surveillance (pp. 1–5).
Gao, S., Zhou, C., & Zhang, J. (2023). Generalized relation modeling for transformer tracking. arXiv preprint arXiv:2303.16580
Gauglitz, S., Höllerer, T., & Turk, M. (2011). Evaluation of interest point detectors and feature descriptors for visual tracking. International Journal of Computer Vision, 94(3), 335–360.
Geuther, B. Q., Deats, S. P., Fox, K. J., Murray, S. A., Braun, R. E., White, J. K., Chesler, E. J., Lutz, C. M., & Kumar, V. (2019). Robust mouse tracking in complex environments using neural networks. Communications Biology, 2(1), 124.
Godec, M., Roth, P. M., & Bischof, H. (2013). Hough-based tracking of non-rigid objects. Computer Vision and Image Understanding, 117(10), 1245–1256.
Guo, D., Wang, J., Cui, Y., Wang, Z., & Chen, S. (2020). SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6269–6277).
Han, B., Comaniciu, D., Zhu, Y., & Davis, L. S. (2008). Sequential kernel density approximation and its application to real-time visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1186–1197.
Held, D., Guillory, D., Rebsamen, B., Thrun, S., & Savarese, S. (2016). A probabilistic framework for real-time 3D segmentation using spatial, temporal, and semantic cues. https://doi.org/10.15607/RSS.2016.XII.024
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2014). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.
Huang, L., Zhao, X., & Huang, K. (2020). GlobalTrack: A simple and strong baseline for long-term tracking. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 11037–11044).
Huang, L., Zhao, X., & Huang, K. (2021). GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5), 1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464
Hu, S., Zhao, X., Huang, L., & Huang, K. (2023). Global instance tracking: Locating target more like humans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1), 576–592. https://doi.org/10.1109/TPAMI.2022.3153312
Kiani Galoogahi, H., Fagg, A., Huang, C., Ramanan, D., & Lucey, S. (2017). Need for speed: A benchmark for higher frame rate object tracking. In Proceedings of the IEEE international conference on computer vision (pp. 1125–1134).
Kim, J., Misu, T., Chen, Y.-T., Tawari, A., & Canny, J. (2019). Grounding human-to-vehicle advice for self-driving vehicles. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10591–10599).
Kohl, P., Coffey, B., Reichow, A., Thompson, W., & Willer, P. (1991). A comparative study of visual performance in jet fighter pilots and non-pilots. Journal of Behavioral Optometry, 5(2), 123–126.
Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. International Journal of Computer Vision, 130(5), 1366–1401.
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A. (2018). The sixth visual object tracking vot2018 challenge results. In Proceedings of the European conference on computer vision (ECCV) workshops.
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin, L., Vojir, et al. (2017). The visual object tracking VOT2017 challenge results, 1949–1972. https://doi.org/10.1109/ICCVW.2017.230
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Kämäräinen, J.-K., Danelljan, M., Zajc, L. Č., Lukežič, A., & Drbohlav, O. (2020). The eighth visual object tracking vot2020 challenge results. In European conference on computer vision (pp. 547–601). Springer.
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Pflugfelder, R., Kamarainen, J.-K., Čehovin Zajc, L., Drbohlav, O., Lukezic, A., & Berg, A. (2019). The seventh visual object tracking vot2019 challenge results. In Proceedings of the IEEE/CVF international conference on computer vision workshops.
Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Porikli, F., Cehovin, L., Nebehay, G., Fernandez, G., Vojir, T., Gatt, A., Khajenezhad, A., Salahledin, A., Soltani-Farani, A., Zarezade, A., Petrosino, A., Milton, A., Bozorgtabar, B., Li, B., Chan, C. S., Heng, C., Ward, D., Kearney, D., Monekosso, D., Karaimer, H. C., Rabiee, H. R., Zhu, J., Gao, J., Xiao, J., Zhang, J., Xing, J., Huang, K., Lebeda, K., Cao, L., Maresca, M.E., Lim, M. K., El Helw, M., Felsberg, M., Remagnino, P., Bowden, R., Goecke, R., Stolkin, R., Lim, S.Y., Maher, S., Poullot, S., Wong, S., Satoh, S., Chen, W., Hu, W., Zhang, X., Li, Y., & Niu, Z. (2013). The visual object tracking vot2013 challenge results. In 2013 IEEE international conference on computer vision workshops (pp. 98–111). https://doi.org/10.1109/ICCVW.2013.20
Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Zajc, L. Č, et al. (2016). The visual object tracking VOT2016 challenge results. Springer.
Kristan, M., Matas, J., Leonardis, A., Vojíř, T., Pflugfelder, R., Fernandez, G., Nebehay, G., Porikli, F., & Čehovin, L. (2016). A novel performance evaluation methodology for single-target trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11), 2137–2155.
Kwon, J., & Lee, K. M. (2009). Tracking of a non-rigid object via patch-based dynamic appearance modeling and adaptive basin hopping Monte Carlo sampling. In 2009 IEEE conference on computer vision and pattern recognition (pp. 1208–1215). IEEE.
Land, M. F., & McLeod, P. (2000). From eye movements to actions: How batsmen hit the ball. Nature Neuroscience, 3(12), 1340–1345.
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791.
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., & Yan, J. (2019). Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4282–4291).
Li, B., Yan, J., Wu, W., Zhu, Z., & Hu, X. (2018). High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8971–8980).
Liang, P., Blasch, E., & Ling, H. (2015). Encoding color information for visual tracking: Algorithms and benchmark. IEEE Transactions on Image Processing, 24(12), 5630–5644. https://doi.org/10.1109/TIP.2015.2482905
Li, A., Lin, M., Wu, Y., Yang, M.-H., & Yan, S. (2015). NUS-PRO: A new visual tracking challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 335–349.
Liu, Q., He, Z., Li, X., & Zheng, Y. (2019). PTB-TIR: A thermal infrared pedestrian tracking benchmark. IEEE Transactions on Multimedia, 22(3), 666–675.
Lukeźič, A., Zajc, L. Č, Vojíř, T., Matas, J., & Kristan, M. (2020). Performance evaluation methodology for long-term single-object tracking. IEEE Transactions on Cybernetics, 51, 6305–6318.
M, J. W. (1962). The effect of relative motion on visual acuity. Survey of Ophthalmology, 7, 83–116.
Mayer, C., Danelljan, M., Paudel, D. P., & Van Gool, L. (2021). Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 13444–13454).
McLeod, P., Reed, N., & Dienes, Z. (2003). How fielders arrive in time to catch the ball. Nature, 426(6964), 244–245.
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.
Moudgil, A., & Gandhi, V. (2018). Long-term visual object tracking benchmark. In Asian conference on computer vision (pp. 629–645).
Mueller, M., Smith, N., & Ghanem, B. (2016). A benchmark and simulator for UAV tracking. In European conference on computer vision (pp. 445–461). Springer.
Muller, M., Bibi, A., Giancola, S., Alsubaihi, S., & Ghanem, B. (2018). TrackingNet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision (ECCV) (pp. 300–317).
Nejhum, S. S., Ho, J., & Yang, M.-H. (2008). Visual tracking with histograms and articulating blocks. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1–8). IEEE.
Pech-Pacheco, J.L., Cristobal, G., Chamorro-Martinez, & J., Fernandez-Valdivia, J. (2000). Diatom autofocusing in brightfield microscopy: A comparative study. In Proceedings 15th international conference on pattern recognition. ICPR-2000 (Vol. 3, pp. 314–317).
Ramakrishnan, S. K., Jayaraman, D., & Grauman, K. (2021). An exploration of embodied visual exploration. International Journal of Computer Vision, 129(5), 1616–1649.
Real, E., Shlens, J., Mazzocchi, S., Pan, X., & Vanhoucke, V. (2017). Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5296–5305).
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., & Bernstein, M. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Team, O. E. L., Stooke, A., Mahajan, A., Barros, C., Deck, C., Bauer, J., Sygnowski, J., Trebacz, M., Jaderberg, M., Mathieu, M., et al. (2021). Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808
Valmadre, J., Bertinetto, L., Henriques, J. F., Tao, R., Vedaldi, A., Smeulders, A. W., Torr, P. H., & Gavves, E. (2018). Long-term tracking in the wild: A benchmark. In Proceedings of the European conference on computer vision (ECCV) (pp. 670–685).
Voigtlaender, P., Luiten, J., Torr, P. H., & Leibe, B. (2020). Siam R-CNN: Visual tracking by re-detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6578–6588).
Wang, S., Zhou, Y., Yan, J., & Deng, Z. (2018). Fully motion-aware network for video object detection. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision—ECCV 2018 (pp. 557–573). Springer.
Wu, Y., Lim, J., & Yang, M.-H. (2013). Online object tracking: A benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2411–2418).
Wu, Y., Lim, J., & Yang, M.-H. (2015). Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(09), 1834–1848.
Xu, Y., Wang, Z., Li, Z., Yuan, Y., & Yu, G. (2020). Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 12549–12556).
Yan, B., Jiang, Y., Sun, P., Wang, D., Yuan, Z., Luo, P., & Lu, H. (2022). Towards grand unification of object tracking. In Computer vision—ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXI (pp. 733–751). Springer.
Yan, B., Zhao, H., Wang, D., Lu, H., & Yang, X. (2019). ’Skimming-perusal’ tracking: A framework for real-time and robust long-term tracking. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 2385–2393).
Ye, B., Chang, H., Ma, B., Shan, S., & Chen, X. (2022). Joint feature learning and relation modeling for tracking: A one-stream framework. In Computer vision—ECCV 2022: 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII (pp. 341–357). Springer.
Yoon, J. H., Lee, C.-R., Yang, M.-H., & Yoon, K.-J. (2019). Structural constraint data association for online multi-object tracking. International Journal of Computer Vision, 127(1), 1–21.
Zhang, Z., & Peng, H. (2019). Deeper and wider Siamese networks for real-time visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4591–4600).
Zhang, G., & Vela, P. A. (2015). Good features to track for visual slam. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1373–1382).
Zhang, Z., Peng, H., Fu, J., Li, B., & Hu, W. (2020). Ocean: Object-aware anchor-free tracking. In European conference on computer vision (pp. 771–787). Springer.
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W. (2018). Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV) (pp. 101–117).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
All authors declare no conflicts of interest.
Additional information
Communicated by Matej Kristan.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Subsets of Challenging Space
Due to some repeated sequences in VOT2016 (Kristan et al., 2016), VOT2018 (Kristan et al., 2018), and VOT2019 (Kristan et al., 2019), we have removed duplicates from the three datasets when constructing the challenge space, ensuring that the constructed subspace does not contain any repeated sequences (Fig. 17). Specifically, we carefully examined the sequences of VOT2016, VOT2018, and VOT2019 and retained only the non-duplicated ones. After the selection process, a total of 82 sequences remained out of the original 180 sequences, as illustrated in the Table 5. Out of these, 60 sequences belong to VOT2016, 10 sequences belong to VOT2018, and 12 sequences belong to VOT2019.
Appendix B: Relationship of Dynamic Attributes
The last row in Fig. 18 indicates that variations of the other five dynamic attributes will change the corrcoef. In addition, compared with the other five dynamic attributes, corrcoef can better comprehensively reflect the dynamic variations in the video sequence. Thus, it can be used as an indicator of variation degree in the tracking process.
Appendix C: An Example of Attribute Plot
As shown in Fig. 19, sub-figure (a) is from car-6 in LaSOT (Fan et al., 2021) dataset. Here, we select six algorithms (GRM (Gao et al., 2023), Unicorn (Yan et al., 2022), and OSTrack (Ye et al., 2022) representing the latest state-of-the-art algorithms; while ECO (Danelljan et al., 2017), SiamFC (Bertinetto et al., 2016), and KCF (Henriques et al., 2014) representing classical algorithms) as representatives to generate the attribute plot. Among them, sub-figure (b) employs the previous method, which calculates the proportions based on the fail frames (\(\mathcal {A}_{f}(\cdot )\)). In contrast, sub-figure (c) calculates the proportion of each attribute based on the success frames (\(\mathcal {A}_{s}(\cdot )\)). Sub-figure (d) represents the difference between sub-figures (b) and (c) and serves as the updated attribute plot \(\mathcal {A}(\cdot )\) (i.e., \(\mathcal {A}(\cdot ) = \mathcal {A}_{f}(\cdot ) - \mathcal {A}_{s}(\cdot )\)).
Clearly, the utilization of the previous calculation method (b) highlights a substantial correlation between the fail frames and the challenging factor of blur bounding-box. However, upon closer examination of sub-figure (c), it is evident that blur bounding-box remains consistently prevalent across all successful frames. A comparison between (b) and (c) demonstrates that blur bounding-box is a widely observed challenging factor in the majority of frames within this sequence. However, it does not serve as the primary cause of algorithm failure. Sub-figure (d) offers a more precise depiction of the challenging factors that contribute to algorithm failure. For instance, in the case of GRM, its failure primarily results from the fast motion of the target.
Sub-figures (e–h) are from the person-1 sequence in LaSOT dataset, analyzed using the same process as (a–d). For the GRM, Unicorn, and OSTrack methods, the most challenging factors in this sequence are a series of dynamic attributes (top right corner of sub-figure (h)), including variations in illumination, scale, ratio, and fast movement. Moreover, within sub-figure (h), negative value regions indicate that the algorithms excel in these attributes, making them more likely to achieve successful target tracking in most cases. For instance, the GRM, Unicorn, and OSTrack methods exhibit strong tracking capabilities on the static attributes of abnormal scale and ratio within this sequence.
Appendix D: Comprehensive Experimental Results
Experiments in normal space. Three columns represent the results in the short-term tracking task (left), long-term tracking task (middle), and global instance tracking task (right). Each task is evaluated by precision plots in OPE (a1–a3), normalized precision plots in OPE (b1–b3), precision plots in R-OPE (c1–c3), normalized precision plots in R-OPE (d1–d3)
All experiments are performed on a server with 4 NVIDIA TITAN RTX GPUs and a 64 Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz. We use the parameters provided by the original authors.
Experiments in OTB (Wu et al., 2015) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot
Experiments in OTB (Wu et al., 2015) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot
Appendix E: Experiments in Short-Term Tracking
1.1 E.1 Experiments in OTB (Wu et al., 2015)
Experiments in VOT2016 (Kristan et al., 2016) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot
Experiments in VOT2016 (Kristan et al., 2016) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot
1.2 E.2 Experiments in VOT2016 (Kristan et al., 2016)
Experiments in VOT2018 (Kristan et al., 2018) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot
Experiments in VOT2018 (Kristan et al., 2018) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot
1.3 E.3 Experiments in VOT2018 (Kristan et al., 2018)
Experiments in VOT2019 (Kristan et al., 2019) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot
Experiments in VOT2019 (Kristan et al., 2019) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot
1.4 E.4 Experiments in VOT2019 (Kristan et al., 2019)
Experiments in GOT-10k (Huang et al., 2021) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot
Experiments in GOT-10k (Huang et al., 2021) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot
1.5 E.5 Experiments in GOT-10k (Huang et al., 2021)
Experiments in VOTLT2019 (Kristan et al., 2019) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot
Experiments in VOTLT2019 (Kristan et al., 2019) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot
Appendix F: Experiments in Long-term Tracking
1.1 F.1 Experiments in VOTLT2019 (Kristan et al., 2019)
Experiments in LaSOT (Fan et al., 2021) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot
Experiments in LaSOT (Fan et al., 2021) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot
1.2 F.2 Experiments in LaSOT (Fan et al., 2021)
Experiments in VideoCube (Hu et al., 2023) with OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, and e attribute plot
Experiments in VideoCube (Hu et al., 2023) with R-OPE mechanisms, evaluated by a precision plot, b normalized precision plot, c success plot, d challenging plot, e attribute plot, and f robust plot
Appendix G: Experiments in Global Instance Tracking
1.1 G.1 Experiments in VideoCube (Hu et al., 2023)
Appendix H: The Composition of Challenging Space
1.1 H.1 Abnormal Ratio
1.2 H.2 Abnormal Scale
1.3 H.3 Abnormal Illumination
1.4 H.4 Blur Bounding-box
1.5 H.5 Delta Ratio
1.6 H.6 Delta Scale
1.7 H.7 Delta Illumination
1.8 H.8 Delta Blur Bounding-Box
1.9 H.9 Fast Motion
1.10 H.10 Low Correlation Coefficient
Appendix I: Experiments in Challenging Space
1.1 I.1 Static Attributes
1.2 I.2 Dynamic Attributes
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, S., Zhao, X. & Huang, K. SOTVerse: A User-Defined Task Space of Single Object Tracking. Int J Comput Vis (2023). https://doi.org/10.1007/s11263-023-01908-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11263-023-01908-5